When you’re searching in online newspapers do you seek out alternate sites with the same newspaper? Maybe you should. Different newspaper sites, for example Chronicling America (http://chroniclingamerica.loc.gov/) and the California Digital Newspaper Collection (CDNC) (http://cdnc.ucr.edu/) have some of the same newspapers, including the San Francisco Call. But they don’t necessarily use the same Optical Character Recognition (OCR) software so one paper may read a string of printed newspaper text as one set of characters and another site may read it differently. If you’ve searched for a word or a phrase, one search engine may find it, but another one may not. You’ve got better odds of finding what you’re searching for if you check out both sites. But there’s an even more important reason than just the straight-up OCR software used. Some newspaper sites allow readers to correct the text. If you search for the phrase “Wife Wants a Divorce” in the San Francisco Call on 10 December 1904 using the California Digital Newspaper Collection you’ll find an article with the headline “Wife Wants A Divorce from Charles O. Huber.” But if you search for “Wife Wants a Divorce” on the same date in the same paper using Chronicling America, you’re out of luck. Why? Because someone (me) edited the text on CNDC but not on Chronicling America. I’ve captured a series of images to explain what I’m talking about. This first image is my search and the results of that search on CDNC. I got a hit! The next image is my search for the correct spelling on Chronicling America and the following image shows the results. Zip. Nada. Zilch. But the next two images show my search and results for an alternate spelling. "Aviite avants a divorce." Ah, that pesky OCR! To my ear, it reads a little like Zsa Zsa Gabor ending yet another marriage… “A vife a vants a divorce.” And when Zsa Zsa asks, Chronicling America listens! I got a hit for the article. The final two images show the text now as it now appears on CDNC after my correction, along with the image of the article. And following that is the text as Chronicling America read it. Admittedly, I set this example up. But can you be certain that the words you’re searching for in a newspaper appear the same way on two sites? What if someone corrected the text to show how your ancestor’s name was spelled in the article but you didn’t check that site? Instead, you searched in the one with the sketchy OCR mistakes. Are you willing to take that chance? I’m not. On Saturday 24 September 2016, I'll be presenting "A Nose for the News" at the Kelowna and District Society Harvest Your Family Tree Conference. (http://kdgsconference2016.blogspot.ca/) I hope to see you there!
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
AuthorMary Kircher Roddy is a genealogist, writer and lecturer, always looking for the story. Her blog is a combination of the stories she has found and the tools she used to find them. Archives
April 2021
Categories
All
|