Genealogy Search Tips

Historical Newspapers, Books and Optical Character Recognition (OCR)

The vast majority of pages from historical newspapers and books that are available for online reading are images of previously, published pages that cannot be read and indexed by computers.

OCR Text and the Search Index

The text in these images must be extracted from the images and put into a form that can be read by computers before keywords can be selected and a search index created. In most all cases Optical Character Recognition (OCR) is used to convert the images into text that can be processed by computers. The text generated by Optical Character Recognition is called the OCR text.

Computer programs use the OCR text to create an index of the keywords and phrases found in the OCR text. The index can then be used to find relevant newspaper articles and pages in books in response to search queries.

When a page in a historical newspaper is displayed online the OCR text generated for that page is usually printed under the image of the historical newspaper page. In most cases there are many differences between the OCR text and the text printed in the newspaper articles and in general OCR text generated in the past is not as accurate as the OCR text generated with today's technology.

Mistakes in the OCR Text

The text in the images of most old newspaper pages is not as clear as the text in the images of most book pages and so in general the OCR text generated from most historical newspapers is not as accurate as the OCR text generated from most books.

If some of the OCR extracted text does not reflect the text that appeared in some newspaper articles or books then the index will not be accurate. For example if an article about "William Jones" was indexed, based on the OCR text, as "William James" then a search for "William Jones" would not find the article even though the actual article contains reference to "William Jones".

So any mistakes in the OCR text will create an inaccurate Search Index.

Searching the OCR Text

When searching historicl newspapers and books that have been indexed with OCR text it is important to remembver that the OCR text does not usually contain all of the information that is actually in the artices or on the book pages.

For example you might search for a newspaper article about an ancestor named "William Jones" and not find anything. It's possible that there is an article about him but it can't be found using his name for the search because his name is not indexed correctly.

So try some alternative search terms if you don't have any positive results with your initial serach terms.










© 2017 genealogysearchtips.com. All Rights Reserved.