Data Analytics in Short
This is short and a very high-level explanation on data analytics. The word analytics does not only mean prediction. Most take data straight from their source systems and try to make predictions which can be misleading. Just because we have some data doesn’t mean we can put it across an algorithm to make a prediction. It is correct and meaningful to describe data analytics as the form of understanding data from the past, present, the future impacts (positive/negative) and lastly how do we avoid if its negative or make it happen if its positive.
Prior to performing analytics on data we must make sure our data is ready to be analyzed. Now what does that mean, we clearly cant drive a car without its tiers can we? The data that is available to us is through various source systems therefore it not practical and also not right to use every single piece of that information to perform our job. Therefore step one is Data Preparation. This is where we make use of our common sense to choose data that we think can be helpful. Example:- We don’t need the names and addresses of people to understand about company sales.
Step two is called Data Pre-Processing where we fine tune the data. It is possible that some of the data is missing and we might also realize that data is not in the form we require. Example:- Age of people is usually represented in a numbered format but however if we require it to fall under age buckets there will be some extra work involved therefore the process involved in this conversion is called Data Transformation. In this step we mainly fill the missing values and transform the data to fit the purpose.
We can classify data analytics into 3 broad categories which is understanding the, hindsight, insight and foresight of data. The first and foremost in data analytics is to understand what has happened so far, the technical term for that is Descriptive Analytics. We perform statistical operations to understand the data flows, data distributions, outliers, extreme values like wise. If you don’t know these terms I suggest you do a quick search on them. To perform a good analysis we have must make our data distribution normal and eliminate skewness’s, also outliers and extreme values must be removed because having them will make the process hard which will be explained in the coming steps.
The next form of Analytics is referred to Diagnostic Analytics where we try to understand why did everything that has been happening has happened. The word explains a lot I believe where we look at our data in the form of charts, graphs/ pictorial representations and diagnose why so and so is happening. We can use mathematical models to understand data relationships between various attributes/ factors.
After diagnosing why things have been happening this way we can now go ahead and predict a given factor. This is the most challenging step as it involves a lot of thought and understanding on what we are trying to deal with therefore we must make sure that we’ve got our fundamentals correct. There is something called a class variable/attribute which is a name given to the attribute we wish to predict. A proper and a valid model must be chosen to proceed, it can be a clustering, classification or an associative modeling technique. The process of predicting the class variable is called Predictive Analytics. This is not easy as it sounds.
The last step is all about acting upon our predictions from Predictive Analytics. The big question is how are we to make the prediction happen? So far we know what will happen, when will it happen and why will it happen therefore what’s remaining is to act upon it and identify how we can benefit from these predictions. This is called Prescriptive Analytics