Text Extraction in Python with Neural Networks: Deep Learning for Image Processing
Image capture makes a snapshot in time of a person, place, or object. Many devices include cameras for taking pictures. This is integrated into everyday life. When taking the picture, there is recognition of that picture and often an autocorrection. Taking that further, there is Optical Character Recognition (OCR) that can take a picture of text and create a usable file that is same as document. Creating a definition of a picture, understanding content, is a complex task. OCR addresses this, and a piece of OCR is knowledge from images.
Why AI?
Creating software to translate an image into text is sophisticated but easier with updates to libraries in common tools such as pytesseract in Python. This is a complicated task that requires an image to be statistically evaluated and assigned the highest probably match for each portion for a recognizable letter. Then, these pieces placed together to output a result without error that is same as the original object. This approach is deep learning using recurrent neural network (RNN), Long Short Term Memory (LSTM), to take an image as input and output text from the image in a file. This is known as text extraction from an image.
Project, Image to Text
For this example, take a picture of a receipt and save to local directory. Next, open Python with the pytesseract and cv2 libraries installed. Using little code, the image can be converted to text using a process of layers of learning to understand text from images and return only characters using layers of repetition to “drop out” leaving only text. For this project, pytesseract is pretrained to find only characters or numeric from the English language and will exclude information that is not a letter or number within that defined set. Output is to a file within local directory.
import cv2 import sys import pytesseractif name == ‘main’:if len(sys.argv) < 2: print(‘Usage: python ocr_receipt.py receipt.jpg’) sys.exit(1) # Read image path from command line imPath = sys.argv[1] # Uncomment and complete the line below to provide path to tesseract # pytesseract.pytesseract.tesseract_cmd = ‘/usr/bin/tesseract’# Parameters: ‘-l eng’ for using the English language LSTM OCR Engine config = (‘-l eng — oem 1 — psm 3’)# Read image from disk im = cv2.imread(imPath, cv2.IMREAD_COLOR)# Run tesseract OCR on image text = pytesseract.image_to_string(im, config=config)# Write recognized text to file f = open(‘receipt_text.txt’, ‘w’) f.write(text)
f.close()
Conclusion
LSTM based models excel at complex tasks with a firm definition that can be learned through a training set. Python contains several methods of text recognition and pytesseract is an excellent library for image to text processing providing a way to learn from an image and generate knowledge based on analytics.
Note: This was originally published November 15 2020 on Medium added to LinkedIn as an article June 10 2021.
Example based on: https://www.learnopencv.com/deep-learning-based-text-recognition-ocr-using-tesseract-and-opencv/
More information on Python: https://docs.python.org/3/tutorial/index.html