“You’re given four Excel files as a data source.” “I will be working with Big Data.” Acceptable. Most real-world data science doesn’t start clean, scalable, or even connected. It starts exactly like this—fragmented files, inconsistent schemas, and unclear definitions. The value isn’t just in modeling. It’s in turning messy inputs into something structured, reliable, and actually usable. That’s where the work happens. #DataScience #DataEngineering #Analytics #ETL #BigData #SQL #Python #DataCleaning #BusinessIntelligence
Week/months/years of work only to hear “Can I get that in an Excel file?” 😅🤦 The truth is Excel has played and will continue to play a role in the data world. Better to accept and enable uses of it where appropriate than to fight it across the board. I find it particularly useful for mapping/override tables that customers can manage through Box/OneDrive.
Completely agree. Data engineering and ETL play a critical role in standardizing disparate sources and ensuring data quality before any meaningful analytics can happen.
Agreed. The ability to transform disparate data into something usable for analysis and model evaluation is consistently undervalued. In practice, this isn’t just a technical gap. It is a governance failure. Poorly structured or unvetted data doesn’t just produce weak models; it creates downstream risk, wasted resources, and false confidence in decision-making. “Garbage in, garbage out” is still undefeated. Data quality is not ancillary. It is foundational. And Excel is not a data storage system.