Data Capture from Documents in Business Processes
Business processes typically deal with large volumes of documents from which data is captured and keyed into databases for downstream use or analysis. Some typical documents examples include:
Depending on the document type and business specification, the data capture efforts vary. For example, a bank cheque deposit requiring 8-10 different data fields (Payee Name, Payee Address, Bank Name, Cheque date, Cheque Amount, Routing Number, Check Number, Account Number, Signature) to be captured takes approximately 3 minutes for a human agent.
Traditionally, this intensive work was accomplished by teams of humans who see the documents on a window and in-parallel extract the data into another system. Further, one or more cycles of quality checks would be done to certify good-for-use.
Document processing has many complexities including (and not limited to): structured-unstructured layouts, text / image / hybrid formats, printed vs handwritten content, hand-printed vs cursive writing, tables, radio-buttons, checkboxes, logos, fonts, languages. The data capture agents had to understand all these in conjunction with their industry & process nuances to ensure that the final extracted data was both relevant and correct.
A data field on a document may be available in any of the following types:
Recommended by LinkedIn
While data appears in a certain way on the document, during the data capture or extraction, it is essential to appropriately update / massage it to a standardized form that will be readily usable for subsequent processing. A few illustrations given below:
In recent years, automation solutions using smart OCR/ICR engines have revolutionized data extraction by significantly complementing the manual processes. This has helped the human agents move higher up in the value chain of the business process.
There are many players providing products or solutions capable of handling select types of documents. While some are focused on a specific task in the data capture process (Ex: OCR or data extraction from textual documents) there are others who provide a bouquet of technologies to serve the business line (ex: ingestion, workflow, allocation and SLA management, data export & reporting).
With improvements in the quality of recognition, businesses are realizing improved efficiencies thus boosting productivity & quality. The widespread adoption of these technology solutions across industries is proving to be an important lever for the digital transformation program.
Nice article, Siva!
There is nothing like hearing it from someone who has been there, done that!
Worth reading thanks Siva for sharing.
Great article. Please keep posting & sharing your experiences.