Big Data Simplified

Big Data Simplified

Big data is a phrase that's been around long enough, we feel like we should understand it and yet most of us are actually too embarrassed to admit how little we really know.

Ultimately most popular definitions involve some numbers words that start with the letter 'v'. Usually this includes volume, variety and velocity.  Others sometimes include value, validity and visualisation.  A word on the first three.

Volume is just what you typically think of when you think of big data from customers transactions, say. Lots of data, from multiple sources, all ready to yield some kind of insight if only it could be managed appropriately.

Variety refers to data that doesn't fit nicely into rows and columns on a spreadsheet.  This could be anything from video to social network content.

Next we have velocity which usually some combination of data coming in at a higher rate and data coming in persistently over the course of time. Smart devices in large manufacturing operations often  spit out this kind of data as the machines monitor themselves or faults and optimal performance.

Now the confusing thing about big data is that each of these three aspects present quite different problems for organisations and therefore lend themselves to very different solutions. And that's why "big data" it is probably too big a catchall phrase to be truly useful.

For example when it comes to volume of data there are different ways to store the data such that it's still relatively accessible in a timely fashion but also takes up less space and therefore less expense. When you hear about  hadoop clusters this is just a technology that basically allows that to happen.

In contrast, with variety you might need to use machine learning to interpret what certain kinds of images are showing you and then use advanced analytics, like clustering, to tell you how those images correlate with other information to  drive you insights.

Ultimately there about three different things that you can get out of big data solution: First, you lower-cost. Hadoop implementations, for example, can lower cost to store large volumes to 1/20 the normal cost.  Second, you make faster decisions and better decisions.  Machine learning algorithms already spot anomalies in X-rays for doctors so they can flag issues and prioritise the most important scans for extra scrutiny.  Third, you can create value added services. A commonly cited example is LinkedIn's "people you may know" function which basically figures out other people you might know without any info proving a connection.

Once you know which of those you want to achieve (lower cost storage, fast/better decisions or value added services) that can help figure out what kind of experts to look for.  More on that next time.

Note: this post largely inspired by the book Big Data @ Work.

To view or add a comment, sign in

More articles by Ian Hill

Others also viewed

Explore content categories