MARY KIRCHER RODDY
  • Home
  • Coaching and Research
  • Lectures
    • Upcoming and Past Presentations
  • Searching For Stories blog
  • Publications
  • Contact
  • Resources
  • Privacy Policy

Searching for Stories

All OCR Is Not Created Equal (and Even If It Is, It Might Not Stay Equal)

9/20/2016

0 Comments

 
When you’re searching in online newspapers do you seek out alternate sites with the same newspaper?  Maybe you should.

Different newspaper sites, for example Chronicling America  (http://chroniclingamerica.loc.gov/) and the California Digital Newspaper Collection (CDNC) (http://cdnc.ucr.edu/) have some of the same newspapers, including the San Francisco Call.  But they don’t necessarily use the same Optical Character Recognition (OCR) software so one paper may read a string of printed newspaper text as one set of characters and another site may read it differently.  If you’ve searched for a word or a phrase, one search engine may find it, but another one may not.  You’ve got better odds of finding what you’re searching for if you check out both sites. 

But there’s an even more important reason than just the straight-up OCR software used.  Some newspaper sites allow readers to correct the text.  If you search for the phrase “Wife Wants a Divorce” in the San Francisco Call on 10 December 1904 using the California Digital Newspaper Collection you’ll find an article with the headline “Wife Wants A Divorce from Charles O. Huber.”  But if you search for “Wife Wants a Divorce” on the same date in the same paper using Chronicling America, you’re out of luck. 

Why?  Because someone (me) edited the text on CNDC but not on Chronicling America.  I’ve captured a series of images to explain what I’m talking about.
​
This first image is my search and the results of that search on CDNC.  I got a hit!
Picture

​The next image is my search for the correct spelling on Chronicling America and the following image shows the results.  Zip.  Nada.  Zilch.

Picture
Picture

But the next two images show my search and results for an alternate spelling.  "Aviite avants a divorce."  Ah, that pesky OCR!  To my ear, it reads a little like Zsa Zsa Gabor ending yet another marriage… “A vife a vants a divorce.”  And when Zsa Zsa asks, Chronicling America listens!  I got a hit for the article.

Picture
Picture

​The final two images show the text now as it now appears on CDNC after my correction, along with the image of the article.  And following that is the text as Chronicling America read it.

Picture
Picture

Admittedly, I set this example up.  But can you be certain that the words you’re searching for in a newspaper appear the same way on two sites?  What if someone corrected the text to show how your ancestor’s name was spelled in the article but you didn’t check that site?  Instead, you searched in the one with the sketchy OCR mistakes.  Are you willing to take that chance?  I’m not.

On Saturday 24 September 2016, I'll be presenting "A Nose for the News" at the Kelowna and District Society Harvest Your Family Tree Conference. (http://kdgsconference2016.blogspot.ca/)  I hope to see you there!

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Author

    Mary Kircher Roddy is a genealogist, writer and lecturer, always looking for the story.  Her blog is a combination of the stories she has found and the tools she used to find them.

    Read more of Mary's writings at "Adventures of A Broad Abroad" and at Letters from Limerick

    Archives

    April 2021
    January 2021
    November 2020
    August 2020
    July 2020
    February 2020
    January 2020
    November 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    October 2018
    September 2018
    August 2018
    July 2018
    February 2018
    January 2018
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016

    Categories

    All
    Achard
    Ahern
    Aldrich
    Amador County Genealogy
    Ancestry.com
    Archives
    Blair County
    Bradley Family
    Brannack
    Brannock
    Brown
    Brown Family
    California Genealogy
    Cemetery
    Census
    Citations
    City Directories
    Clark County
    Death Records
    DNA Strategies
    Education
    Enslaved People Research
    FamilySearch
    Family Stories
    Fields Family
    Freuhauf
    Genealogy Conferences
    Genealogy Education
    German Research
    Germany
    Graham Family
    Grandparents
    Hardy
    Hartmann
    Indexes
    Ireland
    Kircher
    Letters
    Lunenburg
    Map
    Mapping Tools
    Midwest Resources
    Midwives
    Military
    Newspaper
    Newspapers
    New York
    Ohio
    Pennsylvania
    ProGen
    Railroad
    Records
    Research Techniques
    San Francisco
    Sonoma County Genealogy
    Spreadsheets
    Springer
    Tiburon
    Timelines
    Virginia
    Virginia Genealogy
    Vital Records
    War
    Webster
    World War II Research
    Writing

     Subscribe in a reader

    Enter your email address:

    Delivered by FeedBurner

    Picture
Powered by Create your own unique website with customizable templates.
  • Home
  • Coaching and Research
  • Lectures
    • Upcoming and Past Presentations
  • Searching For Stories blog
  • Publications
  • Contact
  • Resources
  • Privacy Policy