Effects of Image Quality on OCR success

Effects of Image Quality on OCR success

OCR the Impact on Recognition Rates of Invoice Image Quality

Have you considered the quality of your documents?                                           

When using digital documents and automated invoice extraction, the extraction rates may have nothing to do with the effectiveness of the invoice OCR software itself. How critical to OCR is the impact of invoice image quality, a simple question we check does it pass what we call the squint test, if it doesn't then the old computer principle of Garbage In – Garbage Out still applies, as the extraction rates are most influenced by these factors:

        See a Demo

Image Quality

Even if you receive documents in emails, they may have been scanned by the supplier. If the image is speckly, streaked, too dark or light then the pattern of dots seen by the OCR engine will not be correctly interpreted as characters.

Accounts Payable invoice image quality

Modern scanners and capture software have features to help create clean scans, even with difficult documents. Getting a clean image affects extraction rates more than anything else so it is worth investing in the right scanners and ensuring that suppliers send you clean images.

Above all DO NOT WRITE ON OR MARK THE INVOICE BEFORE IT IS SCANNED. Ticks and comments, date stamps and highlighter will ruin the OCR results. If you wish to check invoices manually for assurance or apply manual receipting processes, then do it after they have been scanned.

There is also a hidden image quality issue when you use colour documents. OCR engines require a black and white image so there is a conversion occurring in the extraction software which you may not be aware of and which may not use the sophisticated techniques that scanners use for producing clean images.

Image Resolution 

This is how many dots per inch the page is been scanned and captured at. For good OCR, even with clean images, the resolution of the scan must be 300 Dots Per Inch (DPI) as opposed to 200 DPI. Higher resolutions, greater than 300dpi, are not necessarily better because few OCR engines can make use of them. The DPI is usually set by altering the scan configuration in the document capture software.  

Document Boundaries

invoice capture software  image qualityIf you are not using digital documents, then this is a new factor to consider. All invoice recognition systems work on the assumption that one image file is one invoice – and nothing but an invoice. If you are receiving single PDFs containing multiple invoices, then you must use the document capture software to enable staff to split one document into many documents. It is possible to do this automatically, but success rates are not high, and the extra cost is usually not justified unless your volumes are hundreds of thousands of invoices daily.

If you are receiving documents other than invoices that support the invoice e.g. detailed inventories, remittances, timesheets or copies of previous invoices then those documents must be captured separately, if they are added as pages of the invoice document, the recognition software will be confused by all the extra data and will almost certainly extract the wrong data.

Mi Invoices

Invoice Layout

Invoice recognition software is, of necessity, designed to deal with varied layouts but it makes certain assumptions, which help to improve accuracy. It generally expects supplier details to be in the top half of the document, it expects certain phrases or date formats. If the invoice layout differs from these norms, then certain values may not be extracted. The software itself can be trained to recognise these unusual formats but there are certain disadvantages to doing so, it is also only cost-effective to use that approach if there are a large number of them, small volumes are cheaper to correct the data.

All of the above factors are directly within your control if the documents are scanned or generated within your organisation but if your suppliers are providing poor quality images, conjoined files or abnormal layouts then it is out of your control – or is it?

Before automated invoice processing has been implemented, contact all your suppliers explaining the new system and how they can ensure the best results and how their invoices will be paid faster if they do so. Review the invoices from your top twenty suppliers to see if there are any layout issues. If so, it is well worth engaging with them directly to see if they can fix any of these issues. The main incentive to suppliers is rapid payment but it is possible to provide other incentives to encourage best practice. Our experience shows that extraction rates can be improved by 10% just by suppliers making small changes.

For more information

Certified Oracle Integration with Oracle ERP applications

Seamless Integration

Mi Invoices Integrated with Oracle ERP platforms

The importance of integration is massively overlooked. Why? Because the whole point of automation is to save time and money, which is the opposite of a flat file with no integration is imported into your ERP.  Our integration allows the full benefit of validating the data your ERP requires from an invoice, to meet your business processes, along with making sure there are no errors or additional manual tasks.


Mi Invoices Tips and Tricks OCR Feed BackWatch Now

Invoice Processing on providing OCR Feedback in Mi Invoices

This tips and tricks episode on automated invoice processing will show how the AP team can easily provide OCR Feedback using Mi Invoices. Transforming and Enhancing Oracle ERP Cloud or EBusiness Suite Accounts Payable processing, significantly reducing the time and effort required.


Follow Us On 



Follow us on LinkedIn    Follow us on Twitter  Follow us on YouTube


Subscribe to our Blogs and Invoicing Insights