Optical Character Recognition (OCR)

Pruthvidhar Pendyala

Published Aug 26, 2016

Optical Character Recognition (OCR) is the process of electronically extracting text from images. The extracted text can be reused in a variety of ways such as document editing, free-text searches, or compression.

OCR process includes the mechanical and electrical conversion of scanned images of handwritten, typewritten text into machine text. It is common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed on line, and used in machine processes such as machine translation, text to speech and text mining.

OCR technology can be applied across the entire spectrum of industries, revolutionizing the document management process. OCR has enabled scanned documents to be more than just image files by turning them into fully searchable documents. OCR extracts relevant information from the scanned documents and enters it automatically into a database making the data entry accurate and efficient information processing.

There are three essential elements in OCR technology

Scanning
Recognition
Reading Text

Initially, a printed document is scanned by a camera. OCR software converts the image into recognized characters and words. The synthesizer in the OCR system then speaks the recognized text. Finally, the information is stored in an electronic form, either in a personal computer (PC) or the memory of the OCR system itself.

The recognition process takes account of the logical structure of the language. An OCR system will deduce that the word "tke" at the beginning of a sentence is a mistake and should be read as the word "the." OCR's also use a lexicon and apply spell checking techniques similar to those found in many word processors.

All OCR systems create temporary files containing the texts' characters and page layout. In some OCR's these temporary files can be converted into formats retrievable by commonly used computer software such as word processors and spreadsheet and database software.

There are three essential elements to OCR technology—scanning, recognition, and reading text. Initially, a printed document is scanned by a camera. OCR software then converts the images into recognized characters and words. The synthesizer in the OCR system then speaks the recognized text. Finally, the information is stored in an electronic form, either in a personal computer (PC) or the memory of the OCR system itself.

OCR has multiple research areas but most common areas are listed as follows:

Banking

OCR is widely used application in banking, where it is used to process cheques without human involvement. A cheque can be inserted into a machine. The text on the cheque is scanned instantly, and the correct amount of money is transferred.

This technology has nearly been perfected for printed cheques, and is fairly accurate for handwritten cheques as well, though manual authentication/involvement is occasionally needed for various approvals and confirmation. This technology not only reduced waiting time in many banks but also the human effort needed.

Blind and visually impaired persons

One of the major reason behind the research on OCR is that to device a a software/system which could read a book to the blind people out loud. As part of this research a flatbed scanner was found which is most commonly known to us as document scanner.

OCR technology offers blind and visually impaired persons the capacity to scan printed text and then speak it back in synthetic speech or save it to a computer. Little technology exists to interpret graphics such as line art, photographs, and graphs into a medium i.e. easily accessible to blind and visually impaired persons. It also is not yet possible to convert handwriting, whether script or block printing, into an accessible medium.

The blind or visually impaired user can access the scanned text by using adaptive technology devices that magnify the computer screen or provide speech or braille output.

Legal department

There is a huge a significant movement to digitize paper documents in legal industry. In order to save space and eliminate the need to sift through boxes of paper files, documents are being scanned and entered into computer databases. OCR further simplifies the process by making documents text-searchable, so that they are easier to locate and work with once in the database. Legal professionals now have fast, easy access to a huge library of documents in electronic format, which they can find simply by typing in a few keywords.

Retail Industry

Barcode recognition technology is also related to OCR. As we daily come across different consumer goods, the usage is very well known to us.

Other Uses

OCR is widely used in many other fields, including education, finance, and government agencies. OCR has made countless texts available online, saving money for students and allowing knowledge to be shared. Invoice imaging applications are used in many businesses to keep track of financial records and prevent a backlog of payments from piling up. In government agencies and independent organizations, OCR simplifies data collection and analysis, among other processes. As the technology continues to develop, more and more applications are found for OCR technology, including increased use of handwriting recognition.

Taking few minutes to OCR our PDF document is all it'll take to get them from being basic images of paper documents to fully digitized documents in which we can search, copy, markup and many more things that we can do with a normal file.

Following few tools are considered to be best for OCR.

gImage Reader
Capture2Text
VueScan

To view or add a comment, sign in

Optical Character Recognition (OCR)

Pruthvidhar Pendyala

Banking

Blind and visually impaired persons

Legal department

Retail Industry

Other Uses

More articles by Pruthvidhar Pendyala

Others also viewed

The Many Sides of OCR

Huffman coding and It's Application in Image processing.

A Construction Metaphor for How Large Language Models Actually Work

Generative Architecture: Unleashing Architecture as Code Generation with LLMs and Human-in-the-Loop Validation

Codes and Information Theory: A Comprehensive Overview

Correctly Rotate Images using Open CV for better Model training.

GPT as an AI Assisted 3D Modelling Creation Platform: An Intriguing Experiment

Baking-in Bibliographic References Directly Into Ontologies

Image Processing: Valuable Information Extraction from Images !

Explore content categories

Banking

Blind and visually impaired persons

Legal department

Retail Industry

Other Uses

More articles by Pruthvidhar Pendyala

Not the Bahubali way

QUALITY?? COSTLY!!

Artificial Stupidity

Agile Methodology

Natural Language Processing (NLP)

Software Configuration Management (SCM)

Ad-hoc Testing

Communication - the sweet way

Comparison between code review tools

RBT (Risk Based Testing)

Others also viewed

The Many Sides of OCR

Huffman coding and It's Application in Image processing.

A Construction Metaphor for How Large Language Models Actually Work

Generative Architecture: Unleashing Architecture as Code Generation with LLMs and Human-in-the-Loop Validation

Codes and Information Theory: A Comprehensive Overview

Correctly Rotate Images using Open CV for better Model training.

GPT as an AI Assisted 3D Modelling Creation Platform: An Intriguing Experiment

Baking-in Bibliographic References Directly Into Ontologies

Image Processing: Valuable Information Extraction from Images !

Explore content categories