A few thoughts on Data Science

A few thoughts on Data Science

A lot has been said and a lot will be said about the trend #1 in the modern IT revolution. Data Science. Wow, that sounds intimidating from the get go. I have been asked many times what I think about it and today I have decided to put together a few thoughts to answer. Let's start from my answer itself:

"To me Data Science is primarily enabling a company to fully exploit and understand their data at rest assets and to action upon analysis of data in motion. That is achieved analysing and visualizing data of good quality, where conjectures and hypotheses are made on data trends, patterns or pockets and where algorithms and mathematical or analytical models are applied to back up those very same conjectures and where the whole process is fully transparent."

There you go, that should keep you busy for a few seconds.

Let's read between the lines now.  I would start from the use of 'enabling'. That rhymes with business value, well it technically doesn't but you know what I mean. It is imperative that every Data Science initiative will eventually resolve into value for the company investing in that market. No brainer. Then 'data at rest' and 'data in motion'. I strongly believe there is not much competitive edge in data at rest. Those assets are like the diary of a company and yes can lead to predictive analytics, at best, but the real perk of Data Science is the capability to make decisions and act based on prescriptive analytics. Data at rest cannot help you to achieve that. Data in motion on the other side contains all the trends and patterns that will enable your company to get that competitive edge.

Let's move to visualization. It's a must and a media not only to allow a data scientist to find those data pockets and patterns that are so important in any Data Science endeavour but to also be able to communicate back to the business and to ultimately display those actionable items we have worked so hard to find. I would also stop at 'good quality data' for a second. If your data scientist needs to format or transform or mess at any level with the data first, I think you might have a serious problem. Any data engineer or administrator should be able to provide to science good data sets that would then need to be mined, visualized and analysed to find good answers. Reality is that a lot of companies move that responsibility to the data science team and instead of spending time on good analytics that team will then spend most of the time on data quality.

My last thought goes to 'transparent'. In my opinion every algorithm, every portion of code, every  step along the way needs to be properly logged not only for retrospectives or good house keeping, but to create opportunities for new findings and new conjectures. Monitoring also helps on visibility and in a way on accountability. Today we have Zeppelin, it would help on all the above so to me there is no more excuse when it comes to the transparency of a data scientist's work and foremost the tempting opportunity to find new possibilities and hypotheses to scout.

Everything else in my attempt to give a double line definition of Data Science, it's what belongs in a data scientist day to day. Formulating ideas and conjectures based on data finds, visualizations and discoveries and extracting value from the same using different models and techniques to ultimately deliver actionable scenarios, circumstances and new opportunities to the business.

To view or add a comment, sign in

More articles by Max Cottica

Others also viewed

Explore content categories