I sent a message to the friendly folks at the Google Catalogs last Saturday. It occured to me that as they’d already scanned and converted to text all of the catalog pages, the next logical thing to do would be to make that text available to people with screen readers (i.e. visually impaired and those who don’t read).
Today I received the following reply:
Dear Peter,Too bad for now. Perhaps as OCR improves, this will become more a possibility.
Thank you for your suggestion. Unfortunately, this type of feature is not in our near-term product plan. While the OCR quality is acceptable for search, it is not good enough for reading, which is why we prefer to show the image of the page rather than making available a text view.
Thanks for using Google’s catalog search!
The Google Catalogs Team
When OCR software first hit the market it came in around $500.00 as I recall. I don’t, however, recall which companies released it but I do remember a really good PC Mag review of about a dozen different packages. I’ll illustrate their central point about the failure of OCR software.
The best of them were boasting 95% accuracy. This note has 708 letters. At 95% accurate OCR software would make about thirty five individual errors. There are 146 words in this note. Consequently, when those are letter errors and they are put into words, distributed evenly, then out of the 146 words in this note, 35 of them will have an error and that would leave the overall accuracy, when considered word-by-word, at 75% percent. Current OCR technology is about 99% accurate. When calculated letter by letter it leaves a word accuracy of 95% percent. Unimpressive.