Do not confuse Big Data and Fat Data

Do not confuse Big Data and Fat Data

I don't really understand why, but Big Data experts often like to compare the size of their data sets.
Within the last 2 years, I am sure that most of the professionals heard about Peta, Exa or Zettabytes for the first time.
I've attended a big data conference in Turkey few years ago and I was amazed to see the analysts from Facebook, Amazon and all the usual suspects proudly presenting the huge volume of data they've collected so far.
The most recurrent objectives of this bulimia for data are to produce accurate predictive modelings to customize the business at the customer level across all channels.
In simple words: delivering the right message to the right people at the right time and anticipate anything that might happen to drive the strategy/innovation. I agree, you need a lot of data to do so.

Therefore, I did ask to myself: Does size really matters? Are we not pushing professional on the wrong direction by insinuating that bigger means better?
Well, in one hand there is the law of large numbers (statistics) which states that the more volume of data you have on a specific behavior of a population, the higher is the probability that this reflects the behavior for the entire population.
But in the other hand there is the variety of the data. My personal conviction is that there is a limit between enough and too much for anything, big data can’t be an exception.
Of course, this limit vary depending on the technical capabilities, the experience, the budget, the business strategy etc.. But in my sense, the excellence is reach when you hit the 4 “I’ve got”:

- I’ve got only relevant data. (quality/variety)
- I’ve got enough data for each business objective to make predictions over my audience. (volume)
- I’ve got the right processing power to produce insights aligned with the big data objectives. (velocity and scalability)
- I’ve got the knowledge and the seniority to define priorities and follow precise objectives for big analytics. (experience)

Because we understand what Big Data is, it is our role of analysts to bring to the industry a definition for this red zone where analytics works against the business by collecting and processing irrelevant data.
This is what I call: FAT DATA - When the volume and the variety of the data affects the velocity of the system and the decision making process.

As I see it, the fat in a dataset represents data that do not have an immediate influence on the objectives set for the analytics.

For example, some companies collect local weather, sport results and variation of different economic indicators because they think it has a potential impact on the customer's decision making process. They might be right but most of those companies are unable to define simple patterns based on online/cross channels behavior, age and gender.

So why wasting time setting up a collection system, wasting processing power due to the size of the dataset, and losing lucidity due to the number of variables that enters in the equation now? (Also if this is part of a long term objective, collecting the data into a separate dead dataset that can be easily merged with the live dataset might make sense for later investigations).

So here is the first definition I have shaped of Fat data, please feel free to contribute on the comments, I would really like to have as much feedback as possible:

Fat Data: Describes a state of big data applied to business strategy where the variety of the data slows down the processing and misleads the decision making. A Fat data situation works against the principle of big data as the main outcomes are late or inaccurate insights which can result of hurting the business and lower the value of analytics.

This is why the DATA collection, organizing, applying should be managed by a the new breed of Marketers that knows. In advertising setting up a system of 1st party DATA in DMPs will help them not only to improve communication to their own users but more important find lookalikes and generate more revenues. Thanks for the article!

Like
Reply

tl;dr: Just because you can collect that data doesn't mean you should :)

Like
Reply

Good points! The classical dilemma will probably follow: Should companies go on a diet at data collection level ("eat less") or should they leave fat behind at data integration level ("do more exercise")? :)

Like
Reply

To view or add a comment, sign in

More articles by Benjamin Mercier

Others also viewed

Explore content categories