Talking about Machine Learning: How A Model is Built and Improved
In our last post, we shared our learnings about the problem of cloud infrastructure cost planning and prediction, and how we apply data to building ML models to solve it. In this post, we use our ML model to demonstrate an approach to building models; choosing the right methods; and measuring and improving the quality of a model.
You have created ML models to analyze how cloud resources are used, in order to predict and control infrastructure costs.
What methods does your model employ?
We apply supervised learning for building our normalcy models of infrastructure resource use. In the model, we treat cost as the variable to be predicted. We build multiple models that do the best jobs of predicting, say, the cost of large servers in a particular hour of month. Then we compare the actual cost with the prediction, thereby identifying anomalies.
Our form of mapping the predictions is typically random forest, or gradient boosting. Yes, the quantity of data overall is large, but for each individual factor it is not huge. In cases where the data does reach a challenging size, we would apply some pre-processing, such as removing nearly identical entries, or applying some hashing.
Does the size of the data, and the distribution of data and compute power, constrain how you design your models?
Right now these are not issues. We’re able to fit our solution into the resources of a single cloud service provider.
How do you approach a model-building project?
You identify your chief goal; you develop patterns and examine how the data fit the patterns; and you apply that analysis to improve the patterns.
Models are dealing with complex data, so an early step is to use methods to identify the most predictive factors, narrowing them to the most useful.
Our model’s job is fundamentally anomaly prediction.
For example, our application is cost-control, so the model’s goal is to justify each cost data point. How does this particular cost fit into previous patterns? Is it inline? How far out of line is it? Does it fit in between two different patterns, for example, linear growth and 2x growth patterns?
These questions are answered via statistical analysis, looking at previous (historic) statistical averages and standard deviations. These analyses produce insights that are used to increase the complexity and sophistication of the models. How do patterns add up, how do they respond to infrastructure changes, to cost increases, to time of day or week?
In our model, all of this information is extracted and exposed at different granular levels, for different audiences. The CTO/CIO sees overall cost projections for applications. The application owner sees cost projections for groups of resources, regions, and servers. The DevOps engineer will see, on a per instance basis, what is going on.
Why did you choose supervised learning for your cost-optimizing model?
We have created many, many learning models, and tend to lean toward supervised learning. Supervised methods are attractive intuitively, because this is how we humans learn, but also because at some level of the problem there is an answer that we can learn from. In our cost model, we do not get explicit labels for the data. There are too many data points— millions—you can’t assign an answer. So we transform the decision to supervised learning because we can make a decision after the fact. This approach is well-suited to our problem. Unlike, say, stock trading: If you want to predict the market you need the decision before the fact. In anomaly detection, which is our problem, you can decide after the fact and use that decision to improve the model and predict what future costs will be.
What methods did you consider, and why did you choose random forest and boosting methods?
Random forest and boosting methods lend themselves to this type of problem. Intuitively, a human would think, “This has been my cost in the past, how do I predict today’s cost? So here are my facts - it’s Wednesday, mid month, so it should be kind of average.” You are going down a decision tree in your head, and ML (machine learning) models will learn to do this.
Another approach, Principal Component Analysis, assumes that a typical data point is made of overlapping factors. If Netflix is trying to decide traffic is anomaly, analyzing components makes sense. This is not applicable to the problem we are modeling. In our case we are looking at individual cost producers, not aggregates.
You described the model as getting more complex. How does that happen?
A model gets smarter due to feedback. Lets say you start with one million data points with one thousand anomalies. The model makes its predictions: if it is a good model, it will not fit the anomalies. Then you exclude the anomalies, and the model improves. The next step is to figure out which apparent anomalies are actually not. This is a second layer to the analysis— is a data point far enough out of line to show to the owner and ask for a human assessment of its validity. You can’t swamp humans with data to be evaluated. You need to prioritize the apparent anomalies according to what they should spend their time on, and with an eye to how much resource is available for this type of assessment.
So you could describe this type of model as having two layers: the first layer is classification (is it normal); and the second layer is ordering (what to fix first).
What is the quality measure for the model?
Mean squared or similar methods are the most common methods for a qualitative score. In our case, we wanted to to mark out anomalies without too many false positives.
How does learning occur once the model is built?
You build a feedback loop to the system. For example, there is big spike in costs on Thursday; the application owner indicates there was new version released, and so the spike is explained. The system then revises the pattern.
Evaluating the data points that the model identifies as anomalies takes some work from subject matter experts, who decide if a specific data point is or is not an anomaly. We do not have a labeled data set for anomalies, but we have a small data set of feedback. This small data set helps to prioritize them.
The tricky bit is the false negatives - anomalies you don’t pick up. Stakeholders aren’t shown them, so you don’t get feedback. You have to constantly balance the quantity of false negatives (actual anomalies that you end up ignoring) vs. swamping people with positives (the apparent anomalies that must be reviewed).
There is huge demand right now for ML expertise. How does the industry deal with the supply shortage?
I believe the advent of deep learning will make significant inroads into the talent problem. because ML and cloud computing companies are creating platforms that can build models without experts on staff. For example, Azure has a designer view where you can drag and drop objects to create a churn analysis system. It provides some common use cases which you can adapt to your own situation. IBM Watson and Google are doing great work on NLP.
What is the future of cloud cost models?
Within the next two years or so, the industry will be automatically optimizing for cost and performance. This level of modeling and prediction will likely apply deep learning to discern more granular patterns than we are looking at today. I think service providers like Google and AWS will begin to automatically take the recommended action. The service provider already controls the infrastructure, so this is not such a big step technically. It is a big step emotionally, for some clients, but this is definitely the direction.
Of course, a more holistic approach is needed for the increasingly common situation where an application system spans vendors, clouds and possibly the data center as well. Now the data and the potential actions become vastly more complicated, and we’ll be building out our model to address that environment.
If you have questions our suggestions for topics please contact me at abbas.yousafzai@yotascale.com.
Abbas, Asim Razzaq: I hope to setup some time to talk details. Would like to discuss the how to's. Introducing Bill Green