What is a Data Lab?

What is a Data Lab?

The industrial world gave rise to industrial research labs where innovation and quality control took place. In today’s Data Driven century the rise of the Data Labs is evident.

Remember Bell Labs or the NatLab of Philips? They were the innovative workspaces that attracted great minds of last century where orbit-shifting innovations took place.

In today’s data savvy century all companies and governments are seeking to explore, find and execute data driven improvements since Thomas H. Davenport took analytics to the management level as the new science of winning.

Data Scientists became the new rockstars and their scarcity is evident.

So why scatter their talent over various single laptops? Why waste the track record of every data experiment when a Data Scientist decides to leave for a new horizon? Why not have reproducibility and version control in place to answer to regulatory requirements? Or answering "how did you do this" questions from the board? And why does it takes ages to put your new predictive model into production?

There’s a Data Science platform rising and its name is Data Lab

As a concept, the Data Lab can be regarded from both a Technical as an Organizational context. As usual, the process level will give most clarity to what it should entail so let’s have a look at the elementary process steps that every Data Scientist takes today:

Being created over 20 years ago to popularize the then nerdy niche of Data Mining, the six elementary steps of CRISP-DM diagram are still at the core of what every Data Scientists does every day.

The dialogue between Business Understanding and Data Understanding is at the core of understanding every Data Science challenge and defining a meaningful scope. The next steps of Data Preparation, Modeling and Evaluation are the essence of every Data Scientist's job. Deployment is usually the area where a Data Engineer steps in.

So how to envision a new platform that supports Data Science in every step? At Xomnia we took a good look at all the functionality that we wanted to support every step of this Data Science process and came up with this schema:

A Data Lab should support every step of the Data Science process

So for starters, a Data Lab should be much richer than being just a Hadoop cluster . It should at least contain functionality to collaborate, test and evaluate Predictive Models all on a secured platform. And not the least it should include a production facility to deploy the business value of the predictive model created.

And wouldn’t it be nice if these functional building blocks are integrated in a cohesive way to provide an almost seamless user experience?

This is exactly what a good Data Lab platform should do. Learn what platform can support your Data Driven Innovation and reach out to me or follow the Executive Masterclass of TIAS


Hello there, do you need version control of data too ? I am just trying to picture on a particular scenario. I woul love to hear one of a use case from you.

Like
Reply

Hi Martijn Imrich, thx for the write up. I love your way of thinking however I think you need to take it one step further. In the analogy to the Industrial labs, like Bell's Lab or the NatLab, it was not so much about having all the materials nicely and organized together but more importantly people, the true engineers, with bright ideas. The success of these industrial labs were the minds working there plus the freedom to explore the possibilities. So I would argue that the data scientists (the true ones) needs to get the freedom to explore the possibilities of the data and extend the limits of use of the data for the business. Maybe we are not looking for Data Scientists for our Data Labs but Data Visionairs. And of course we need organizations who are open for a different way of working with data itself. The Data Lab can play an important role in it but its just data (material) what needs to be handled.

Like
Reply

To view or add a comment, sign in

More articles by Martijn Imrich

  • And the winner is….. Snowflake.

    In 2015 I wrote down the first Big Data Reference Architecture ever published. Merely because I wanted to stop the…

    4 Comments
  • A Big Data Reference Architecture

    So we arrived in the area of Big Data. And everyone is talking about Spark, Data Lakes and why we should have a Data…

    25 Comments

Others also viewed

Explore content categories