Sometimes documents arrive in the wrong format. If it’s a Word document and you don’t have Word, then it isn’t such a big deal, but if it’s a fax or printout, then you’re in real trouble. It can be frustrating in our email-based world to have to re-type documents to get them into a useable format. The answer to that particular problem is OCR – and in particular, Readiris Pro 9.
OCR has been around for a long time. In fact, years ago whole machines were built just to scan and process typed (on a typewriter) documents. In those days, when the average PC or Mac had less than a megabyte of RAM, desktop OCR was barely thought of. Now you can pick up a scanner for under £50, and be scanning documents into text with ease.
When you have Readiris installed, you may need to reinstall your scanner drivers so that the plug-ins are available to the application. Any TWAIN or Photoshop plug-in can hook into Readiris, so any scanner should be compatible. Once the software }is able to see the scanner, the user tells it how to deal with the scanned documents. Text files can be produced as .txt, .rtf, .html or .pdf files. The application will then trigger your choice of reader to open the document.
In practice, we tested Readiris Pro with a number of increasingly difficult documents, starting with simple text pages in the same font, to mixed media pages with different fonts, images and columns.
The initial tests were passed with flying colours. Accuracy was excellent, only faltering when confronted with scribbles or smudges. Just as impressive was the speed – pages were processed in seconds – a far cry from when I last looked at OCR software.
Simpler pages were processed in automatic mode, leaving all the judgements and adjustments to the software. When we moved to more-challenging pages, Readiris was able for the most part to recreate the pages in .rtf format. The recreated pages weren’t identical to the originals, but this is more of a limitation of the format than the OCR software. It was slightly better when output was set to HTML or PDF, but these formats are less suitable for editing.
Readiris uses four fonts to recreate pages. It also emboldens and italicizes to match the original page. Four fonts may not seem like many, but it’s enough to have two serif and two sans-serif fonts.
This helps recreate pages without adding dozens of fonts, and gets close enough. Occasionally you may find a sans serif font in a serif paragraph, or point sizes changing in a sentence, so some tweaking may be required when recreating pages. But compared to the challenge of laying out a whole page and typing it in, Readiris Pro is a lifesaver.
Readiris does other clever things that you may not notice at all. For example, older OCR packages were easily baffled by slightly slanting text. This is automatically taken care of by Readiris when the de-skewing feature is turned on. Similarly there is a de-speckle option for dealing with grainy originals. Possibly the best timesaver is the automatic orientation detection. Usually when you put a sheet on a scanner, it isn’t obvious which way up it should be. Upside-down pages usually come out as gobbledegook with OCR packages, but orientation detection takes care of that.
If the results you get aren’t quite what you’re after, it’s simple to go back and tweak things to get better results. One thing we found was a tendency to read some small images or icons as letters. This results in garbled text in the document, so it’s better to go back to Readiris and tell it that the element is a image rather than text. It means the process isn’t entirely automatic, but it gets results that are much more useable, so saving time in the long run.
Readiris Pro boasts excellent language support, but unfortunately support for Unicode in OS X differs from OS 9. While the operating system supports Unicode, it relies on individual applications to implement it. In OS 9 Unicode was handled by the operating system, so languages worked well. Language support is now much more patchy, and for the purposes of OCR only Latin alphabet languages work properly in OS X. The boffins at Iris tell me that although Readiris can recognize 104 languages, there are system issues that complicate getting the languages out again in OS X. If you want multilingual support, you may have to use Readiris in OS 9.
It you habitually re-type documents then OCR is a must – and the latest Readiris Pro is about as good as you’ll find. I can’t say that it’s perfect on all documents; nothing is. But on simple documents it rarely falters. It’s only on more-complex documents that results are less acceptable. A little care and attention will keep even the busiest documents close to perfect.
It’s a shame that the language support isn’t as all encompassing as on the Windows version, but it isn’t down to any failure on the part of Iris. As long as foreign languages use a Latin alphabet, there should be no problem.