From Engineering (not computer engineering) to Data Science

From Engineering (not computer engineering) to Data Science

Since some years ago, it has become more common seeing professionals with different backgrounds such as engineering, economics, medicine, or biology for instance starting their path to get data science skills or even making a complete career switch to it. On the following lines, based on my humble experience, I will describe the main aspects that an engineer needs to take into account if he or she is thinking about discovering this new world and, yet, does not know how to start.

First, it is important to know the different roles that exists through the process of turning raw data into valuable insights or sophisticated models that automate specific tasks. As time passed, the amount of data that was generated by a diverse bunch of sources forced a natural division into two main roles into the data area: Data Engineering and Data Science per-se.

Data Engineering: This role was created due to the fact that data is generated in different ways and are available to us in different formats. It is necessary to ensure that data, that will feed predictive models, is free of errors and will be continually distributed in useful formats. That is the mission of a data engineer. 2 In, Garbage Out” assures that if data does not pass through a process to ensure their quality, the Data Science process cannot start. A data engineer is in charge of developing, building, and maintaining architectures such databases and systems for processing and distribution. In this way, Data Reliability & Data Quality are key concepts in data engineering field.

Even though, companies are used to setting the tools Data Engineers are going to use in order to do their job. Most of them rely on some of the following tools: Oracle, MySQL, SAP, PostgreSQL, MongoDB, Scala and so on. Sometimes and depending on the size and budget of the company, the data engineering responsibilities rely to on the Data Scientist, so it is important to be aware of that.

The role of a data engineer is strongly based on technical skills and they tend to have backgrounds like computer science or computer engineering, however this does not mean that an engineer with another background cannot get the data engineering role.

Data Science: On the other side, Data Scientists get data that has been passed through a first pre-processing and manipulation process. They use this data to feed analytics programs, machine learning and statistical models which are usually built by themselves. These models can have different purposes such analyzing a failure mode of a power distribution grid or finding the best compressor’s configuration, based on historical data, to save energy in a production plant. These tasks involve processes to explore and examine data to find patterns that can help to predict future situations.

After the analysis are done and the predictions or insights are ready, data scientists must be able to present a clear story to the stakeholders who can be managers or other engineers in different areas, like production, maintenance, sales, etc. These roles demand strong skills of storytelling and building visualization that can express in a simple way what was found.

Data scientists usually use open-source tools but sometimes companies can provide some specific commercial programs for them to do their job. The most popular and useful tools in this field are Python and R (both open-source) but others are also important, like SPSS, SAS, Matlab and even Excel (not my favorite). Making data science with python, which is a programming language, requires the use of numerous packages or libraries designed specifically for this purpose, such as: Scikit-Learn, NumPy, Matplotlib, Statsmodel, etc.

Despite that, most of the data scientists come from computer-based careers. It is common to have different prior background, specially to engineers. Having a deep knowledge of other subjects, like petroleum, chemical or industrial engineering, can be a big plus for a data scientist. The better the understanding of the business, the better models they can develop.

Following there are some useful advices for those who are planning to start their journey to become Data Scientists:

  • Begin enrolling in one of the numerous online courses available on internet for Data Science Basics. DataCamp and EdX, for example, are good choices. If you do not have any previous experience with programming, it can be better to start learning the basics of Python before Data Science Courses.
  • Practice as much as you can. At the same time, you are taking online courses, it can be useful to start doing engineering calculations from your daily activity using Python or R instead of Excel.
  • Make statistics part of your routine. Data science applies statistic all the time, so you must get familiar with the main concepts and start including it in your analysis.
  • Get familiar with your data sources and practice gathering it in different ways. Python and R are able to read data in different formats, so you must be aware of what are the formats used on your data sources.
  • Once in a while, take a moment to see what other data scientist are doing in different industries. You might find some applications that can be applied in your field.
  • Do not forget to write how you programs work. Documenting your projects will save a lot of time to your future you. Start using a markdown tool that is included in Python or R tools.
  • Always include visualization in your results. An image is worth more than a thousand words. Matplotlib for example is a powerful library to build fancy and powerful graphs.
  • Start using a control version tool like Git. It will for sure one of the best decisions you can take. Git will let you to synchronize, share and work with different teams and computers. To use Git on the cloud you may use GitHub or GitLab.

To view or add a comment, sign in

More articles by Jerson Rodas

Others also viewed

Explore content categories