A Machine Learning Primer for Auditors
If you’re an internal auditor and not yet familiar with machine learning (ML), then you’re missing out. ML is likely being piloted or even used at your organization, so you should be familiar with it as it likely influences your organization’s internal control structure. More importantly perhaps, ML is a tool that can help you be a better internal auditor.
Visual Risk IQ helps finance and audit teams get up both the learning curve and the doing curve with data analytics and visual reporting. This post explains some of basic concepts relating to ML, breaks down its major categories and offers potential uses for ML analytics for your audit team. In our next post, we’ll dive more deeply into a specific application of machine learning.
First, a warning: This post is a bit jargon- and concept-heavy. If you are only interested in applications of machine learning, then check out our future posts. If you’re an aspiring data nerd and sometimes feel lost in all the vocabulary around machine learning, then you’re in the right place.
What is Machine Learning vs Artificial Intelligence?
Machine learning is a subset of artificial intelligence. If we think of artificial intelligence as machines “thinking” the way humans think, then we can think of machine learning as machines drawing conclusions from information in the same way humans do, by taking in information, identifying relationships and patterns in the data, and developing a model of how it thinks the world works. Machine learning happens when a machine produces a predictive model. If you’ve watched any dystopian science-fiction movies (e.g. War Games, Minority Report, Eagle Eye) then you the importance of testing these models. Just like with people, the expectations machines have based on these models can be good (i.e. useful, predictive) or bad (i.e. misleading, caused by spurious correlations).
In other words, machine learning means that the computer is taking in data and using one algorithm (a machine learning algorithm) to produce another algorithm (the model). That resulting model is usually used to tell the computer how to handle future, similar information.
Key Concept: Machine learning means the machine is drawing conclusions about how information is related. Machine learning is evidenced by the machine producing a predictive model.
Types of Machine Learning: Supervised vs Unsupervised
Two main types of machine learning are supervised and unsupervised learning. It is a common mistake to think that unsupervised learning is a more sophisticated method of achieving the same goals as supervised learning, but this is not the case.
Supervised Learning
In supervised learning we give the computer a goal. We tell the computer, “I care about the value in this field and I want you to take all the other information I’ve given you to help me predict the value in this field.” That field is called the target variable. Supervised learning algorithms take in: 1) a data set, and 2) instructions on which field is the target variable. The learning algorithm then produces a model (algorithm / formula) to predict the value of the target variable.
Within supervised learning, there are two sub-types: regression and classification. The distinction relates to whether the target variable is numerical (regression) or categorical (classification).
Regression: If you ask the computer, what does a normal salary look like for a group of employees based on title, department, location, etc., then you are using regression. The target variable in this case is the salary. Regression algorithms are used when the target variable could be any numerical value. Auditors use regression to help uncover biases and outliers.
Classification: If instead you ask the computer, based on what we know about our historical timecards, overtime, and payroll adjustments, is this disbursement likely to be an improper payment, yes, or no. Classification techniques are helpful for diagnosing problems, automating selections and stratifying populations. Using the improper disbursement example, an auditor could develop a model that scans disbursements and assigns them to multiple risk categories such as High/Medium/Low or Full Review/ Sample Test/No Review Required. Another use of classification algorithms is diagnostic analytics. Instead of using the model to predict future values, the target variable flags problems and the model is used to uncover the cause of those problems.
Key Concept: Regression algorithms are used when the target variable could be any numerical value. Classification algorithms are used when the target variable is categorical or Boolean (true/false).
Unsupervised Learning
Unsupervised machine learning simply means that no target or goal is given to the machine. This means it has no objective. Unsupervised learning algorithms generally serve one of two purposes: 1) To group things that are similar, or 2) To reduce the number of fields you need to look at while working with your data. These two purposes are commonly called clustering and dimensionality reduction. These techniques, particularly clustering, can be by auditors in risk assessments, segmenting populations of entities (such as store locations, vendors, or customers).
Key Concept: Supervised learning means an objective is provided to the machine by a human. Unsupervised learning means no such objective is set.
Conclusion
This post helps explain some of the vocabulary associated with machine learning and offers some ideas of where and how to get started. Unsupervised learning is useful in the right situations, but for auditors just beginning to incorporate machine learning into their work, we recommend supervised learning.
It is easier to conceptualize and therefore auditors are quicker to recognize potential use cases. We’ll discuss use cases for both supervised and unsupervised learning in future posts.