If your analysts are using Python, you may have a data problem.

John Thompson

Published Jan 6, 2026

Python is one of the most popular coding languages right now, primarily because it is (a) more user friendly than languages like Java or C#, and (b) it has a lot of built-in data manipulation functionality. Python is a great resource that can be used by relatively low-skilled users to acquire, combine, clean, sort, filter and transform data - i.e. to perform 'data wrangling'.

Data Wrangling: The informal, ungoverned, and largely manual preparation of data for analysis and reporting to meet the specific needs of individuals and/or their immediate colleagues.

These are tasks that are commonly taken on by data scientists who are working with new data to determine if and how it will be useful to the broader analyst in the longer term, or by data engineers building data integration pipelines to present clean, reliable data to the broader analyst community.

Data Integration: The formal, governed, and automated preparation of data to make it available for a range of uses, including analysis and reporting by a broad range of enterprise users.

Alarm Bells

However, alarm bells should be ringing if your ordinary 'rank and file' data analysts are heavy Python users. The implication is that the data they are accessing is frequently in a rough and raw state, requiring significant individual, manual effort subsequently to make it useful for analysis. In an enterprise setting where you may have dozens or even hundreds of analysts this is BAD (capital letters and emphasis intentional).

When large numbers of individuals need to do this kind of processing before using data, we can assume that (a) similar effort is being expended repeatedly by many individuals to get the data ready, and (b) that it is highly unlikely that all individuals are 'wrangling' the data in exactly the same way.

The first point indicates that expensive analyst time and effort is being spent doing mundane, repetitive tasks. Analysts should be spending their time analyzing data, not wrangling it.

The second point all but guarantees that idiosyncrasies, errors and discrepancies will creep in, that will cause confusion and disagreement amongst both analysts and amongst the consumers of the analysis they produce, i.e. business leaders.

Recommended by LinkedIn

Python for data analysis… is it really that simple?!?

Ferenc Bodon Ph.D. 7 years ago

What Makes Python a Great Pick for Data Analysis?

Sotiris Pafitis 3 years ago

Unlock Unlimited Extensibility: Python Table Functions…

Timeplus 2 months ago

Shadow IT

The widespread and persistent use of use of Python or other tools like Power-Query for data wrangling by business data analysts is a form of 'shadow IT'. Shadow IT is what develops when end users stop asking IT to deliver solutions for them, and decide to implement their own solutions instead. This may be because it is slow or expensive to get IT to facilitate their requirements, or because there is no capacity (technical or otherwise) for such requests to be implemented.

This results in the development of ad hoc solutions, and the execution of tasks and processes happening freely 'in the wild', that really should be centrally planned and controlled.

In a good enterprise data architecture, the data your analysts are using on a regular basis should already be clean, consistent, reliable and transformed, and aside from occasional proofs-of-concept, the use of Python or other tools for data wrangling should be largely unnecessary.

Where analysts need a new data wrangling task done regularly, they should assume that others need it too, or will need it in the future, and should seek to have it implemented in the data layer so that it is readily available for all users. In making this request, it may become apparent that the task has already been done for them in a manner they were unaware of, or that there is some reason why it shouldn't be done that way, avoiding wasted time and confusion.

Conclusion

The data that most organizations are using regularly - sales, CRM, production, operations etc. are stable and predictable. Even if the business may want to investigate different aspects of the data more closely or in different ways over time the data doesn't really change.

End-user data wrangling in situations like this, like the development of 'spreadmarts' and other Shadow IT, is typically a symptom of a disfunction in data management, rather than a deliberate strategy. Where you see it developing, ask why, and no doubt the can of worms will soon be exposed.

John Thompson is a Director with EY's Technology Consulting practice. His primary focus for many years has been the effective design and management of enterprise data systems.

Eoin O'Reilly 3mo

Good article John

1 Reaction

Jorge Basilio 3mo

Hi John, that's a very good point !

1 Reaction

See more comments

To view or add a comment, sign in

If your analysts are using Python, you may have a data problem.

John Thompson

Alarm Bells

Recommended by LinkedIn

Shadow IT

Conclusion

More articles by John Thompson

Others also viewed

Beyond Python: Alternative Tools for Data Scientists

Google Analytics Data Analysis With Python And Data Studio

SAS vs R vs Python

Advanced Data Types in Python

SAS vs R vs Python - Which is the Best Analytics Tool to Learn?

How We Use Python & Kestra to Turn “Messy” Enterprise Data Into Actionable Insights

Mastering Python Data Structures for Data Analysis

🧱 Why Every Data Engineer Should Learn Object-Oriented Python

Python - Extract, Load and Transform Processes (ELT). Pratical Example

R, Python, Scala ? Building the Data Science Dream Team.

Explore content categories

Alarm Bells

Recommended by LinkedIn

Shadow IT

Conclusion

More articles by John Thompson

You Buy Your Freedom - AI and Regulation

Is the DMBoK Wrong on Data Governance?

What is a Data Fabric?

The Data Model is the API

If Cork won't buy AI, maybe AI could buy Cork?

Data Centers and Sustainability

Where's the (Business) Logic?

The Burger Architecture - A Real Data Stack?

Is 'Medallion' a Data Architecture?

The Pocket Data Warehouse

Others also viewed

Beyond Python: Alternative Tools for Data Scientists

Google Analytics Data Analysis With Python And Data Studio

SAS vs R vs Python

Advanced Data Types in Python

SAS vs R vs Python - Which is the Best Analytics Tool to Learn?

How We Use Python & Kestra to Turn “Messy” Enterprise Data Into Actionable Insights

Mastering Python Data Structures for Data Analysis

🧱 Why Every Data Engineer Should Learn Object-Oriented Python

Python - Extract, Load and Transform Processes (ELT). Pratical Example

R, Python, Scala ? Building the Data Science Dream Team.

Similar topics

How to Use Python for Real-World Applications

Python Tools for Improving Data Processing

Explore content categories