The
extremely useful British Newspaper
Archive has
just been made available worldwide on FindMyPast -
and needless to say everyone is trying to use
it. As a result it is slow and occasionally
throws you off with an apology, but the rush
will eventually slow down. If you try it out
there are some problems which are not
immediately clear from the instructions.
To illustrate the problem I did a search for
Keyword "Tring" between 1830 and 1840 setting
the county as follows.
-
HERTFORDSHIRE - found 350 Articles in the Herts
Mercury & Reformer
-
BUCKINGHAMSHIRE - found 634 articles in the Bucks
Herald
-
BEDFORDSHIRE - found 0 articles
-
OXFORDSHIRE - found 150 articles in the Oxford
Journal
-
DURHAM - found 411 articles in the Northern
Echo
The important thing to note is that if you
select a county it is the county the newspaper
is published in. Tring is close to the
Buckinghamshire border and at the time was well
covered by the Aylesbury papers (only one of
which is currently in the archive). It is less
well covered by the only Hertfordshire paper in
the archive. If you search in Bedfordshire the
search still takes time - and a "0" appears with
any warning to tell you that there is no
Bedfordshire paper for this period in the
archive (yet)! The Oxford figure included
references to a proposed railway line from Tring
to Cheltenham which was never built.
The high figure for the Northern
Echo suggests
it was a much larger paper than the rest which
regularly reprinted national news from other
papers. In fact at least half the "Tring"
responses are due to the fact that scanning is
far from perfect and the real word is something
like "string" with a badly printed "s" or
"King". However I spotted plenty of genuine
Tring references.
I will have another look - using it for personal
names - and report sometime in December.
|
A Detailed Assessment of The Newspaper Archive
on FindMyPast
This
looks at the current FindMyPast package and highlights the
things you need to know to get the most out of it. It also
compares it with the official British Newspaper Archive web
site, and some test search show that given the same question
they come up with the same answer, despite their different
interfaces.
I carried out a number of test searches as
follows.
Search 1 - An rare and unusual surname
Any references to the surname Phipson before
1810. This was chosen because the surname is very uncommon, and
distinctive, and I studies the family in depth. I found 6
references as follows:
1796
Reading Mercury - advert (advertisement)
1799
Reading Mercury - Letter correcting error
(article)
1804
Hampshire Chronicle - accident at Covent Garden
(article)
1804
Salisbury and Winchester Journal - marriage
announcement (Article)
1805
Morning Chronicle - marriage announcement
(Miscellaneous)
1807
Hereford Journal - death listed (Family Notice)
All were relevant and the first three were new to
me. The 1796 advert told me that Joseph Phipson has taken on a
practice as a surgeon in Henley upon Thames and the 1804 story
told me that he had been at a pantomime in Covent Garden when
there was an accident and his services were needed. The 1799
letter was interesting because Joseph Phipson was saying that he
was still alive despite the paper having published that he had
died. I spent some time trying, unsuccessfully, to locate the
offending passage. It showed that FindMyPast has no effective
mechanism or searching a single issue, or even displaying the
pages for a particular paper on a particular date! The two
family marriages and the death should all have been classified
as "Family Notices" demonstrating that the classification system
is unreliable.
I started to look at the larger number of results
between 1810 and 1820, and most if not all, seemed potentially
relevant but I hit a problem with one - and it took so much time
I decided to not to check out everything. The problem was that
the search reference took me to a single block of text,
containing many vary different stories spread out over several
pages. The system give no indication of where on the pages your
key word is and I could not spot a single "Phipson". What may
have happened is that a different word was misread as "Phipson"
- or perhaps I just missed it.
Search 2 - Search for a rare place name
See
Amwellbury 1800-1850
Search 3 -
A known "missing person" problem with a common
surname
My great great grandfather, Francis Reynolds,
abandoned the family and became involved on the shadier side of
horse racing. When I first researched him in the late 1970s we
knew nothing about him between the 1841 census and his tombstone
dated 1874 i except for indirect references in family wills that
he was not acceptable to the wife's family. More information
became available later in the form of an obituary in an old
family scrapbook, which revealed that he had been known as "the
Marquis" around East Anglian race tracks, and online censuses
which at least provided information as to where he had been
lodging.
I carried out a number of searches looking for
combinations of "Francis," "Reynolds," "Newton" (where his farm
was, "The Marquis" and several racing terms. While I
found some useful references (see
Francis
Evered Reynolds) there were considerable difficulties
and I can have no confidence that all relevnt references were
found.
Search 4 - Jacob Reynolds and Lawes Patent
Manure
Jacob Reynolds was born and grew up in Norfolk,
worked for his Uncles, and became manager for their business
selling Lawes patent manure, before moving to St Albans, where
he played a notable role in the town, later becoming an alderman
on the Herts County Council. Considerable problems were
encountered.
A
Summary of the Problems.
While some of the searches were very successful
they tended to relate to very rare words or word combinations -
such as the searches for the surname "Phipson" or the place "Amwellbury."
However in other cases there could be severe problems, and while
there can be successful finds it is possible to waste a lot of
time going nowhere.
It is also important to realise that there are
significant differences between the search facilities on the
British Newspaper Archive web site and the FindMyPast website -
the latter having extremely serious limitations compared with
the former - ant it would perhaps appropriate to describe the
FindMyPast user interface to provide a "lucky dip" facility
unsuitable for serious historical research. Basically any search
you can carry out on FindMyPast can be carries out on the main
Archive web site - but there are many searchws possible on the
Archive site which cannot be carried out on FindMyPast.
1. The
basic limitations of scanned newspapers
The package is limited by the problems associated
with the print quality of old newspapers, and in some cases
their preservation. The automatic scanning software used seems
to have done a very good job bearing in mind the poor quality of
many originals, but clearly does not recognise
every word - and there may be some words that are more difficult
than others. As a result not every word is recognised and there
are many cases (such as "string" with a poorly printed "s" being
indexed as "Tring" ) where the wrong word is recorded. While not
everything is perfectly recorded it allows vast amounts of data
to be searched that you could not do manually and so one should
not grumble that everything is not retrievable.
2.
Chunking the Newspaper Articles
Particularly the earlier papers lacked the clear
headline we expect in modern papers and a single column, will
often include a dozen or more news stories with little more than
a new paragraph to indicate when one finished and the next one
begins. What has happened here is that text has been manually
broken down into blocks - which may be a single article or a
whole page of small advertisements. These blocks are flagged up
as "Articles", "Advertisements", "Family Notices" and
"Miscellaneous". If you are looking for two or more words they
might, for example, happen to occur in two different articles in
the same block of text - so the smaller the blocks are the
better. The size of blocks chosen is such that you may get many
"false" returns from your searches because the words you are
looking for occurs in different articles in the same block. A
number of the problems occurred because the blocks were, in many
cases, far too large - and for instance one block might include
a a dozen or more news stories or a hundred or more small
advertisements and there was no way, when searching for two
words to say that they must be close together.
3.
Coverage
Early local weekly newspapers were often no more
than a single folded broadsheet - and often served to distribute
national news, however more titles come into existence after
1855 (abolition of stamp duty) and they contain much more local
- perhaps concentrating on a single town and national news
disappeared as trains allowed the wider distribution of genuine
national newspapers. The archive only has (so far) a small
selection of the possible newspapers, and may not have all years
for some papers - and some papers may only been published for a
few years. How useful the facility will be to you depends on how
well the material currently in the archive meets your needs.
The papers currently included which relate to
Hertfordshire are:
Bucks Herald (1833-1909)
Hertford Mercury & Reformer (1834-1868)
Herts Advertiser (1925 only)
Herts Guardian ... (1852-1867)
Luton Times & Advertiser (1856-1880)
The main archive site makes it easy to find out
what newspapers are covered and their dates while the FindMyPast
does not. However the archive is being updates with more
newspapers and more years for some papers which are already
covered - so if you are going to use the archives you will need
to know what you have searched and what has changed. - because
you will only want to search the new material. As the current
archive only represents a small subset of what is planned this
is a serious potential problem.
4.
Potentially Too Many Matches
The total volume of material already in the
archive is already vast, and the collection is only a few percent of
what it eventually will be. In addition the about of coverage
any individual will get will vary widely - from someone who is
never mention to someone who get extensive coverage (often
duplicated many times over) because they were famous or who
regularly advertised weekly in a number of papers. For instance
a search for the rare surname "Phipson" up to 1850 gives 1819
hits while "Reynolds" gives 81705 and "Smith" (which may include
other uses of the word) gives 854656 hits. For the same period "Amwell
Bury" gives 51, "Amwellbury" gives 6, Watford (not such an
important town at this date) gives 19254, Hertford gives 77712,
and London gives 2,224,965 - or if you search all dates
16,255,588. For all dates there are 105 "John Phipson" and
240,423 "John Smith" of which 3144 are in the same block of text
as the word "Watford" but not necessarily in the same article.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~
We have a problem which will get at least 10
times worse when the archive is complete in that many of the
questions we want to ask, are going to produce very large number
of answers, and we need tools that allow us to narrow the field
down to a manageable number - and while there are some tools, in
most cases they are totally inadequate on FindMyPast, and a bit
better, but still limited on the British Newspaper Archive site.
I list some of the problems I noted.
Searching for someone by name:
FindMyPast has two boxes labled "First Name(s)" and "Last
Name" but this is misleading. The software has no idea what
words are personal names and which are not. All it means
that the Last word directly follow the first word. The British
Newspaper Archive simply has keywords and a phrase is indicated
by using quotation marks.
Searching by Date: The British Newspaperl Archive site
allows you to search by dates - which means that you can search
for a single day, week or month. In FindMyPast you search for a
range of years (which could be just one year) and you cannot
concentrate you search on a short period when, for example,
looking for an obituary when you know the date of death.
Rejecting unwanted terms: The British Newspaper Archive site
allows you to exclude blocks which contain given words, which
allows you to reduce the number of hits you need to examine. The
facility is not available on FindMyPast which is a very serious
omission.
"Alternative Words" - Find my Past had no facility to allow
you to carry out a single search using alternatives - such as "Amwellbury"
OR "Amwell Bury" - you have to carry out two separate searches.
"Spelling Problems" - Both the British Newspaper Archive
Archive and FindMyPast have a built-in spelling corrector which
you cannot switch off. I wanted to search for advertisements
(which could be anywhere in the country) for Lawes artificial
fertilizer. A search of "Lawes" is interpreted to include any
text which includes the common words "law" and "laws". Another
search for the surname "Locke" automatically gave me texts
containing the word "Lock" (as on a canal or as part of a door)
together with words such as "locked".
"Nearby Words" - there is no way of limiting searches to say
all the search terms must be within a given number of words. The
absence of such a facility and the very large block size means a
very large number of irrelevant hits.
Presentation of Matches - this gives the "headline" for the
block (which may be totally irrelevant because of the way the
blocks have been set up) and a limited number of words
surrounding one of the key words. If you are carrying a search
for say "Phipson" and "Death" it will normally come up with an
excerpt containing the common word - when you need to see an
excerpt which contains the rarer word (i.e. surname in this
case) before deciding whether it is relevant. This means you can
waste a lot of time looking a the full newspaper page for
matches which are totally irrelevant.
Order of Results - On both the British Newspaper results
appear to be presented in random date order. This means that if
the same story is published in a large number of papers in the
same week they are scattered at random through the list. It
would be much better if they could be organised in date order so
you can get a feel for what kinds of information is available at
a given date. In the "Amwellbury" example I had to repeat the
searches in 10 year (or less) slices in order to be able to draw
up a dateline of the key entries.
"Have I been here before" - Because the poor search
facilities and the possible number of possible hits you
sometimes need to search through lists which contain the same
items. There is not facility to say you have already down-loaded
this newspaper page - so again you download more than is
necessary - wasting more times.
"Where on earth is the reference" - When you download the
newspaper page there is no indication of where the relevant
paragraph is and this can be very time consuming because to the
size of the blocks - and in some cases I have been unable to
find the relevant text.
I could go on ...
Despite my very critical comments the facilities provide a
wonderful lucky dip facility - and if you are lucky you may find
out things about your ancestors you could never have found in
any other way. It could well be that the easy to use (at
the lucky dip level) interface which lacks the tools to ask
serious questions is deliberate - to ensure that any serious
research is carried out on the British Newspaper Archive web
site. However a number of the failings also apply to that site
as well. Factors such as the size of indexed blocks seem to be
more relevant to minimising the amount of manual work the firm
scanning the newspapers has to do - while having the effect of
maximising the time wasted by searchers using the system.