From the course: Intelligent Automation Foundations
Vision technologies
From the course: Intelligent Automation Foundations
Vision technologies
- [Instructor] Now let's dive deeper into the four capabilities I described earlier. This time, I'll go over some of the impressive underlying technologies that make up these intelligent automation capabilities. Vision technologies consist of OCR, ICR, image and video analysis and biometrics. For each technology, I'll provide an overview covering definitions, what the technology is, usage, how it can be applied and my opinions on the key advantages and the limitations. Let's begin with OCR or optical character recognition. OCR works by recognizing characters and numbers within a scanned image, and then compiling and grouping them into words and sentences as they appear. This technology enables a computer to digitize non-digital text, such as that found in a page of a physical book. Once a document has been digitized, it goes from being something that is simply treated as an image file for computers, being a scanned document, to digitized text which can be edited, copied, extracted and so on. OCR is best used for straightforward standard plain text documents. This is where it excels at quick and easy digitization. However, when you move into more complex use cases where formats and fonts vary, OCR hits some limitations very quickly. For example, any unusual or handwritten fonts will hinder OCR drastically, along with forms where there are tables and fields, rather than just plain text. This is where the second technology comes into play. This one is called ICR or intelligent character recognition, or IDP for intelligent document processing. ICR combines OCR technology with machine-learning algorithms to digitize documents that would otherwise be too complex for OCR alone. It does this through self-tuning algorithms which require large sets of example data and expected outputs. This enables ICR to interpret handwritten text and manage varying document layouts as commonly seen in contracts or invoices. The limitations of ICR stem from it being much more advanced than OCR. It's more costly. It takes a lot more time to implement and train properly, and it also requires processes with significant document volume. Like OCR, ICR is still limited to some degree by the quality of input. Low-quality scanned images will reduce accuracy rates. The third technology in the vision category is image and video analysis. This technology allows for the extraction of data from digital images. Images and videos are digitally seen through deep learning, which is a method of machine learning that mimics the neural networks in human brains. By feeding enough image data of the same object, a deep-learning algorithm can develop accuracy in identifying it. Going back to the self-driving car example, deep-learning algorithms trained on millions of relevant images help the cars' navigation system identify lane markings, signs and people with speed and accuracy. The challenge with image and video analysis is that even high quality datasets can create misleading results if the data is not varied enough compared to the real world. This happens often enough to make rounds on the internet. Look at these blueberry muffins for example. You can tell they're muffins, right? Well, an algorithm might mistake them for dogs, Chihuahuas to be exact, as seen here, because the visual similarity fires off enough artificial neurons for it to conclude "dog" as the final answer. The last technology in this category is biometrics. It involves the measurement of unique physical and behavioral aspects of humans, such as facial or fingerprint recognition. These measurements are then interpreted and validated by algorithms, allowing for identification and authentication procedures. Automating identification through technology adds an additional layer of security for any software application, and it can also be used to reduce the need for physical verifications, which can be slow, inconvenient and resource-intensive in comparison. Biometrics can enhance an organization's resilience to security breaches, but it also comes with challenges. One is that automated biometrics may encounter rare errors in processing image data, leading to false positives or false negatives, requiring additional contingency plans to avoid inconveniences or breaches. We covered a lot there. In summary, the vision capability of intelligent automation is made up of a number of ways a computer can see scanned images, data, computer screens, three-dimensional objects and even characteristics of people accessing systems. Those digital eyes are an important part of almost every intelligent automation deployment.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.