Demand for Data Engineers exceed Data Scientists – An Analysis
Multiple recent recruitment surveys revealed that the demand for data engineers has recently exceeded the previous demand for data scientists. The Dice 2020 Tech Job Report said data engineer was the fastest growing job in technology with a 50% year-over-year growth in the number of open positions. Many mentees asked me how do I perceive this shifts in demand, and what area should they pursue.
Actually, I see a course correction happening in the industry. Most organisations that hired data scientists to get competitive advantage of data and advanced analytics over its competitors took the role of data scientists out of context from the hype created by the media. Many who hired data scientists present them with the work of data engineers. While in many organisations data scientists face the challenge of not having any data foundations to build analytics on.
Data scientists spend months on cleaning the data and generating insights. However, to replicate the same month-to-month insight they had to go back doing the same data cleaning and analysis on newly arrived data which often renders the insight useless due to the time taken to produce it as opposed to when the insight is actually required to take important business decisions. Ergo, businesses have realised the importance of the role of data engineers.
For many years, data scientists were the only role thought to be dealing with data. Data management and governance, data modelling and engineering all thought to be of data scientists’ area, which are not. But good for data scientists that this misconception is now clearing. To understand the difference between the work of data engineers, data scientists and other associated roles, understanding the data science value chain is necessary.
There are numerous blog posts out there which claims these roles have overlapping responsibilities, but that is far from the truth. Data engineers are focused on building infrastructure and architecture for data generation, whereas, data scientists are focused on advanced mathematics and statistical analysis on that generated data to discover actionable insights. While this positions data engineers as an IT role, data scientists are predominantly a business role. And like all business roles that depend on IT to enable technologies to do their job, data scientists depend on data engineers to enable the necessary data and technology to generate insights.
Looking at the growing demands for data engineers, it is easy to understand that organisations are now looking to recruit data engineers directly.
Modern data engineers shouldn’t be writing ETLs
If data engineers are asked to build pipelines, they will think their job is to build pipelines and consider off-the-shelf tested tools as threats to their existence instead of tremendous force multipliers. They’ll find reasons why off-the-shelf pipelines won’t actually suit the organisations very custom data needs, and reasons why analysts shouldn’t actually be building their own data transformations. They’ll write code that is fragile, hard to maintain, and non-performant. And the organisation come to rely on ‘this’ code because it’s underneath everything else the data team does.
Avoid this situation like the plague. The pace of innovation on the data team will plummet and the organisation will spend all of their time thinking about infrastructure issues that aren’t actually revenue-generating for the business.
Data Engineers are still a critical part of any high-functioning data team. Instead of building ingestion pipelines that are available off-the-shelf and implementing SQL-based data transformations, here’s what the data engineers should really be focused on:
- managing and optimising data infrastructure,
- building and maintaining custom ingestion pipelines,
- supporting data team resources with design and performance optimisation, and
- building non-SQL transformation pipelines.
Organisations need more engineers than data scientists
Jeff Magnusson wrote in his 2016 blog post, some fundamental friction between data scientists and data engineers.
Data scientists (the thinkers) are often frustrated that engineers are slow to put their ideas into production and that work cycles, road maps, and motivations are not aligned. By the time version 1 of their ideas are put into an A/B test, they already have versions 2 and 3 queued up. Their frustration is completely justified.
Data engineers (the doers) are often frustrated that data scientists produce inefficient and poorly written code, have little consideration for the maintenance cost of productionising ideas, demand unrealistic features that skew implementation effort for little gain… The list goes on, but you get the point.
Infrastructure engineers (the plumbers) get frustrated with everyone for overloading the clusters and filling up disk space. They are kept at arm’s length from the scientists and engineers, which means they never gain a solid context into how the infrastructure is being used, or the business and technical problems that it needs to be used to solve. This makes them feel powerless to improve the situation. Instead, they react by making the infrastructure more restrictive. In turn, everyone becomes frustrated with them.
While data scientists need time and space to think about novel solutions, they need to be free up from monotonic engineering work. High value insight cannot be generated without solid data foundations. That’s why organisations need more data engineers than data scientists to mine value from the data.
Going forward
The future doesn’t stop here. Along with data engineers, demand for Machine Learning Engineer and Automation Engineer will be rising too. To address the data engineers’ frustration on data scientists and to implement ideas developed by data scientists machine learning engineers will be in greater demand, if it is not already happening. Just like early days where there were many misconceptions around the definition and responsibilities of data scientists, the responsibilities of machine learning engineer are also quite clouded. While some think they are a hybrid of data engineer and data scientists, I place them at a different end of the data science value chain and consider them to have a very different skillset from both data scientist and data engineer.
Though automation engineers are around for very long, with the rise of data products, their skill will be in quite demand. They would have to obtain some knowledge to integrate data products in their solutions, but their core skill would remain mostly the same.
Concluding remarks
Even though multiple engineering roles will be on higher demand, data scientists and data storyteller will stay in the centre of the data revolution, as they’re positioned closer to the business and more relatable to business needs.
While recruiting for data related positions, it is important for all parties to ensure what the position actually needs and where the skill and interest of the candidate lies. If machine learning engineer or automation engineer role is filled with data engineers or vice versa it is highly likely that people will keep on leaving, and our recruiters will always have a busy time looking for the right candidates.
Oliver Mannion
Well said. Data foundation is the key to move towards analytics. Many times the management need people who can give them a roadmap to move towards analytics.roadmap to convert legacy into strategic system.having proper infrastructure to address these needs require data engineers , architects etc. who can setup the infrastructure,automate regular data cleaning thus reducing turn around time. Once you have proper infra in place ,you can then use it for machine learning,AI,etc. Management has realized that with proper data and analytics they can improve profits, take better decision etc. Companies are looking for people who can guide them in this journey.