Have you considered the quality of your documents?
When using digital documents and automated invoice extraction, the extraction rates may have nothing to do with the effectiveness of the invoice OCR software itself. How critical to OCR is the impact of invoice image quality, a simple question we check does it pass what we call the squint test, if it doesn't then the old computer principle of Garbage In – Garbage Out still applies, as the extraction rates are most influenced by these factors:
Even if you receive documents in emails, they may have been scanned by the supplier. If the image is speckly, streaked, too dark or light then the pattern of dots seen by the OCR engine will not be correctly interpreted as characters.
Modern scanners and capture software have features to help create clean scans, even with difficult documents. Getting a clean image affects extraction rates more than anything else so it is worth investing in the right scanners and ensuring that suppliers send you clean images.
Above all DO NOT WRITE ON OR MARK THE INVOICE BEFORE IT IS SCANNED. Ticks and comments, date stamps and highlighter will ruin the OCR results. If you wish to check invoices manually for assurance or apply manual receipting processes, then do it after they have been scanned.
There is also a hidden image quality issue when you use colour documents. OCR engines require a black and white image so there is a conversion occurring in the extraction software which you may not be aware of and which may not use the sophisticated techniques that scanners use for producing clean images.
This is how many dots per inch the page is been scanned and captured at. For good OCR, even with clean images, the resolution of the scan must be 300 Dots Per Inch (DPI) as opposed to 200 DPI. Higher resolutions, greater than 300dpi, are not necessarily better because few OCR engines can make use of them. The DPI is usually set by altering the scan configuration in the document capture software.
If you are not using digital documents, then this is a new factor to consider. All invoice recognition systems work on the assumption that one image file is one invoice – and nothing but an invoice. If you are receiving single PDFs containing multiple invoices, then you must use the document capture software to enable staff to split one document into many documents. It is possible to do this automatically, but success rates are not high, and the extra cost is usually not justified unless your volumes are hundreds of thousands of invoices daily.
If you are receiving documents other than invoices that support the invoice e.g. detailed inventories, remittances, timesheets or copies of previous invoices then those documents must be captured separately, if they are added as pages of the invoice document, the recognition software will be confused by all the extra data and will almost certainly extract the wrong data.
Invoice recognition software is, of necessity, designed to deal with varied layouts but it makes certain assumptions, which help to improve accuracy. It generally expects supplier details to be in the top half of the document, it expects certain phrases or date formats. If the invoice layout differs from these norms, then certain values may not be extracted. The software itself can be trained to recognise these unusual formats but there are certain disadvantages to doing so, it is also only cost-effective to use that approach if there are a large number of them, small volumes are cheaper to correct the data.
All of the above factors are directly within your control if the documents are scanned or generated within your organisation but if your suppliers are providing poor quality images, conjoined files or abnormal layouts then it is out of your control – or is it?
Before automated invoice processing has been implemented, contact all your suppliers explaining the new system and how they can ensure the best results and how their invoices will be paid faster if they do so. Review the invoices from your top twenty suppliers to see if there are any layout issues. If so, it is well worth engaging with them directly to see if they can fix any of these issues. The main incentive to suppliers is rapid payment but it is possible to provide other incentives to encourage best practice. Our experience shows that extraction rates can be improved by 10% just by suppliers making small changes.