Make optimized decision with Deep Learning

Make optimized decision with Deep Learning

As humans, we have been practicing to make good decisions since we were young. As we passed the skill to machine, more and more AI applications can make really good decision by itself now. AI can be used to drive cars, trade stocks, manage data centers, and applications like AlphaGo even out performed best player in games. Like to know how AI did it? In this article, I will talk about a common way to make optimized decision using deep learning.

Before I throw out the theory and everything, let's portent to be wizards, and think about how to make good decision as a wizard:

An easy way would be:

  1. Grab a crystal ball.
  2. Put magic in, so it can predict future.
  3. Explore the predictions of all your choices, and pick the best one.

For muggles, use our brain as the crystal ball, use our experiences and knowledge as the magic.

How would data scientists or machine learning engineers do it? Well, these three simple steps still can be used, except prediction model is our crystal ball, and large amount of training data are our magic. To summarize the theory, the three steps can be represented mathematically as:

Now, I will show you an example of how to do these three steps using a simple forward deep neural network.


Prediction model - Grab your crystal ball.

The first step is to select a good prediction model. The model should have the capability of taking controls and states as its input, then output a prediction.

In this article, I use a simple forward fully connected deep neural network as an example of the prediction model. We put both our controls X and stats S together as the input, and expect DNN model to output the predicted future.

Deep Neural Network (DNN) is an neural network with multiple hidden layers between the input and output layers. It is good at handling complex non-linear relationships by breaking and preserving the inconspicuous features inside, and aggregate to project the final result in the out put layer.

In some more complicated cases, the we need more complicated DNN models. Convolutional Neural Networks(CNN) is good when dealing with image data, and Recurrent Neural Networks(RNN) like LSTM are good with sequential or time series data. Both CNN and RNN are widely used for making decision in cognitive services.


Training the model - Put magic in

Having a model defined is not enough. The model is considered as a function set with turntable weights. It is more like a function structure than a particular function. Until the model is trained with data, it usually wont give a good prediction result. The training process can be mathematically represented as following:

For example, if you like to predict how many hot dogs for a coming party, you can first define a function set as : Y = a * adults + b * children. Then you can use past experiences to train it and find best weights (a and b) in this function set. After trying a lot of past data, you may end up find a function like Y = 1.7 * adults + 0.8 * children.

In a DNN, Stochastic Gradient Decent (SGD) is usually used to train the model and find the best weights and bias in the model.

It is similar to what you did in ski or snow boarding, going down towards the direction of gradient usually is the quickest way to find the minimum point.

For each data, we use Forward-feed to calculate the loss of the current perdition, and use Backward Propagation (BP) to calculate the gradient and update each weight.

You can also put the Momentum in the calculation to give a more consecutive moving direction to pass the long shallow ravine or steep walls.

Many other optimization method like Adaptive Momentum(Adam) are also widely used as believed to be much faster than the normal SGD. They have the adaptability to change the weight updating rate according to the past near and far history and control the speed accordingly.

Although these methods looks very fancy, all SGD based methods are only trying to find a Local minimum instead of a Global minimum. As shown in the picture below, depending on the starting point and learning rates it may lands to a nearby local minimum and stop, rather than trying to find a globe one. Luckily, in the world of DNN, local minimums are usually as good as the global minimum. So in most of the cases, local minimum is good enough for training a DNN model.

Another thing that we need to be really careful about is the Over-fitting problem. Symptom usually is that a trained model matched training data perfectly but got large error for testing data.

It usually because the model is too complicated or data is much less than needed. We can reduce the model complicity or add more data when facing an over-fitting issue. Technics like Cross Validation, Normalization, Weight Decay and Drop Out are also usually needed for building and training an accurate DNN model and avoid over-fitting.


Making Decision - Explore and pick the best prediction

From the above, we now have a well trained prediction model like having a magic crystal ball, making the optimized decision is relatively easy. You just need to explore your chooses, and pick the best one!

The process can be mathematically represented as:

Doesn't it look familiar? Yes, you can use Gradient Descent to find the local minimum instead of exploring the entire space to find the globe minimum. When you are making optimized decision in real time in a dynamical state or environment, local minimum is usually good enough in most cases. In special cases while dimension of control vector is very low, and states are never change, it also make sense loop for all possibilities to find the globe minimum.

Newton methods (2nd order methods) can also be very handy when deal with this problem. It usually faster than normal gradient decent method, and we should use it as long as the function is second order differentiable. An example for that is L-BFGS-B method, which is using limited memory, and you can also set boundary constraints for the controls which is normally needed in reality.


Summary

We discussed a common way for AI to make decision though 3 simple steps, which are:

  1. Make a prediction model.
  2. Train the model with data.
  3. Explore the predictions, and find the best choice.

I believe this way works for most cases when you want to build an AI service to makes decisions. Hope it can give you some inspiration.

DNN is used as an example to demonstrate how each step is done. SGD and BP are normally used to train the DNN model. Newton method is usually faster to find the best choice.

At last, like to point out that it is a common way, but definitely not the only way. Actually, according to the problem and scenario, some simple threshold based decision-making are much faster and easier, especially if you do not have enough data for training.

To view or add a comment, sign in

More articles by Peng Zhang

  • Setup Spark Cluster in Hyper-V with standby master

    Background This is the high level design of spark cluster. Spark currently supports three type of cluster managers:…

  • Setup Zookeeper Cluster in Hyper-V

    Background The Zookeeper cluster architecture is like this below, and detail can be found here https://zookeeper.apache.

  • Setup Kafka Cluster in hyper-V

    Background Kafka Cluster is different than zookeeper or spark. For each topic, you can have multiple partitions.

Others also viewed

Explore content categories