There are (broadly) two types of people called 'data scientist' working today. 1. Those that perform analysis or run models on data, using many languages. 2. Those that try to get LLMs to deliver responses in the way they need them to, mostly using Python. Number 2 is starting to look a lot like 'Software Engineer'. Meanwhile, a lot of what I hear from Number 1 is that Generative AI has ruined the fun of their work. #analytics #rstats #python #datascience #peopleanalytics #ai #technology
This has been somewhat true for years - the trend of software engineering meging or absorbing data science was there prior to LLMs - it’s only gained momentum with the proliferation of GenAI. The unicorns exist for those that can do both well and combine them in creative ways.
Can’t agree that the fun in 1 is spoilt. I can think more about how to approach a problem and then get help coding up some of the more boring parts of it. It’s a challenge to find the balance between trusting the generated code too much (because it sometimes messes up) and not using it enough (because then you waste time), but overall it’s been a great boost for me.
It was shocking at first to see how well generative AI can finish and alter SQL queries so easily, but since embracing it my life has become much easier. Frees up a lot of time to actually analyze the data, strategize about new projects, and provide proactive insights.
The power comes from 1 + 2. LLMs are incredibly useful and amazing for prototyping, helping to wire up analysis tools from other languages you might not be an expert in, to say nothing of unstructured data processing. Handmade models and analysis are still incredibly important for bespoke workflows or datasets that are not easily managed by and LLM. A great data scientist is someone who can jack of all trades a data pipeline from ingestion to deployed product. LLMs are just another tool on the belt.
I actually think LLMs have opened up a whole new door for prototyping production ML models. For example, if I'm working on a time-to-fill model, I can quickly spin it up as a simple executable and hand it off for others to use like a calculator. It's really expanded how freely I can explore the back end and even opened up new opportunities. I can now 'vibe code' functional programs into production that might have otherwise had to purchase from a vendor, and give the freedom of design control. In my experience, hallucinations are rarely an issue if you know how to read and debug the code.
Applied ML Researcher | Time Series & Physics-Informed Modeling | End to End Systems | xAI
5moFrom my experience working on operational atmospheric ML models, I sometimes feel there’s a “third type” that isn’t mentioned here. The people who have to model real-world systems — weather, aviation, energy, biology — where data alone is not enough, and the key challenge is understanding the underlying physics or dynamics of the process. LLMs are incredibly useful as accelerators, but they don’t replace the need to frame the problem properly or to translate a physical system into a mathematical one. In that sense, the fun hasn’t disappeared. If anything, it’s increasing — because tools are faster, but the reasoning still matters.