Demystification of Data Science

Demystification of Data Science

by Todd Lohvinenko

Is it just me, or are we constantly getting bombarded with the buzzwords of Data Science or Data Scientist at conferences, workshops and webinars? It seems over the past year, 2018 has been the year of data and science. Why should we be surprised by this? We have been talking about this topic for over the past decade and the more recent attribute of machine learning, which has been on the rise since Watson won Jeopardy back in 2011. When we look at the hype-cycle over the past couple of years, the focus on Deep Learning, IoT, Virtual Assistants, Big Data, AI PaaS have all moved from the "innovation trigger" and may have peaked or close to peaking in the areas of "inflated expectations". We are promised that everyone needs a data scientist and if you can’t find one, you can always contract one out through the many services available to you. However, as we are sent down our marry way of the hype-cycle, I wonder how deep the trough of disillusionment is going to be before we enter enlightenment and eventually plateau to productivity? After all, Biotechnology and Flying Autonomous Vehicles have been sitting at the beginning of the hype-cycle for more than a decade and show no signs of movement.

As our inflated expectations start to sway, what are we looking for in a Data Scientist, or maybe more specifically a Data Technologist? Database professionals to start, with the ability to Extract, Transform and Load (ETL) data as well as Business Intelligence (BI) experience tapped with coding, methodical data analysis, critical thinking, collaboration, and let us through in excellent presentation skills, just because. As businesses wrap their minds around the expectations of a data scientist, we look towards having successful candidates with a Ph.D. or at very least a Masters in Statistical or Machine Learning, Computational Statistics, High Dimensional Data Analysis, Applied Statistics, Artificial Intelligence, Databases and Data Mining techniques and skills. That’s a big demand on resources and I have to ask the question, what should the expectations be for your business moving forward? What is exactly a data scientist and what do we want them to do for us as a business?

Over the past year I had read a paper by Chikio Hayashi at the The Institute of Statistical Mathematics in Sakuragaoka Shibuya-ku Tokyo Japan. In his 1998 paper "What is Data Science? Fundamental Concepts and a Heuristic Example" he defined data science, as a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena". This definition meets all the requirements of science and in fact presents itself as a reasonable definition of what is Data Science. Looking a decade later from Chikio Hayashi definition of what data science is, has this changed to how we perceive the discipline to be? Looking at the two words separately, we see that data is simply information. A known quantity of something known or assumed to be a fact. Using this information we can make a reasonable calculation or reference on something we value as important at the time. Science on the other hand is the systematic approach to understanding this information. Science looks to the organization of knowledge by encompassing a method to formulate a coherent body of ideas or principles in such a way to study the structure and behaviour of the data-set though observation and experimentation.

With that definition in mind, what is the make up of a data scientist? First and foremost it would be a person with the ability to understand numbers, a good data nerd if you will. Someone that likes to delve into data, look at and understand the statistical approaches using numerical and logistical analysis skills pertaining to prediction, scoring and ranking of information. Secondly this person has a strong sense of programming and is comfortable and proficient with such programming languages such as R, Python, SQL, Scala, Julia, Java, and whatever else comes along. They must have the ability to understand the business, by understanding the product and/or service the customers and users are requiring and be able to create ideas and solutions that are economically viable from both the customer and business perspective. Finally with the ability of communication be able to communicate these known outcomes in a clear and concise manner to the business and customers alike.

As a business, the expectation of the data scientist is to be able to deliver, and deliver they must. In the areas of:

  • Prediction (predict a value based on inputs)
  • Classification (Is the data restricted, public, or controlled information)
  • Recommendations (What should the business do next?)
  • Pattern detection and grouping (Classification without known classes)
  • Anomaly detection (Fraud)
  • Recognition (Data patterns)
  • Actionable insights (via dashboards, reports, visualizations)
  • Automated processes and decision-making (Approval Processes)
  • Scoring and ranking (Statistical likelihoods of success and failure)
  • Segmentation (Demographic-based marketing)
  • Optimization (Risk management and mitigation)
  • Forecasting (Sales and revenue)

Given the fact that, yes we are in the age of big data and the need to be able to architect the computing, storage and data flow of our organization is something that we must seriously consider, and something I will likely speak to at some other point in time, let us now look more closely at the above expectations and ask ourselves the question: are these expectations anything new to our business? I would suspect that the answer to that question is no, not really. Any business, worth it’s grain in salt, has these bullets well marked and for the most part understood. They are in fact important to the successful operation of any business.

If we take a serious look at these bullets the term “data science” has simply replaced business analytics. Perhaps a new sexier term? If we look carefully, we would recognize that business analytics depends on much the same aspects of the now data science. In the terms of business analytics, we require sufficient volumes of high quality data. As with data science, business analytics has similar difficulties ensuring data quality is integrating and reconciled across different systems, and then the difficulty of deciding what subsets of data to make available. Regardless of the difficulties the scientific approach to resolve these issues has been and should be considered and will serve us well to understand the method of:

  • Taking known information and making an observation through experience thoughts and reading
  • Asking questions to why a certain event is occurring
  • Formulating a hypotheses to why something is doing what it does
  • Developing testable, repeatable predictions
  • Gathering relevant data to test and predict
  • Refine and alter or possibly expand or reject your Hypotheses
  • Finally, develop your general theories and conclusions that support the information you have and with other current theories and make decisions based upon what you have discovered

The question I have for you is this. Do you have the perfect data scientist, and do you really require one? Depending upon the size of the business, you may have an IT department. More than one or two individuals that manage a network and help the user community with the odd computer glitch. This IT department is structured with a head of IT capable of communicating IT requirements to the business community and visa versa. You have that “IT person”, programmer, hacker who is well rounded in hardware and software. Hopefully you have more than one! From the business side the number cruncher and business analyst as well as the obvious accountant. If you are lucky, the head of IT is a bit of a data nerd, or has someone in the department that has that skill set. Given that dynamic, you have a good team that is capable of understanding what the business is all about and will be able to see what is required of the business to be successful. If the business is progressive, it will be able to develop projects based upon sales, marketing and product delivery. Pending the size of the project or complexity you may require a good consultant to come on board once in awhile. Pending the project requirements this consultant may have good statistical, communication and programming skills. They may have a good understanding of cloud based solutions, hardware requirements or accounting practices. However that would look, you have a team that if you look very closely at becomes your perfect data scientist.

In conclusion, I believe the role of the data scientist is not yet fully realized. It will be, and more likely sooner than later. With the onset of big data and tools such as machine learning, manifolds and higher mathematics to assimilate, postulate and understand what all this information is about, the role of the data scientist will become integris scribebat. Until then, for your business to be productive and future driven, take a good look at the team you currently have. Train them and support them in such a way to ensure that you have a dynamic and resilency that can see a project through completion. Take a good look around you, you are in good hands.

Yes I have no doubt that data scientists are needed for AI, machine learning, statistics etc. They will complement existing skills usually within business analysis, analytics, integration and data warehousing. They will not be a business panacea enabling leapfrogging into the data lake and bypassing the treatment plant : 'it's  about the use-case stupid', you may croak. In fact every demo I see is about narrow use cases - ain't seen Deep Blue pick up Suduku yet, and it will never solve an enterprise-scale puzzle in my time.

Is data science really a science? Isn’t it a branch of mathematics ? Arguably mathematics is not a science since experimental proof is absent - for that you need Physics. Similarly is software engineering really engineering? Isn’t it creative writing?

To view or add a comment, sign in

More articles by Todd Lohvinenko

Others also viewed

Explore content categories