Alan Oliveira’s Post

🐍 Stop Writing "Spaghetti" Data Science Code We’ve all been there: a Jupyter Notebook with 47 cells, variables named df2, df_final, and df_final_v2_FIXED, and a loop that takes three hours to run. Data analysis is about insights, but your code quality determines how fast (and how reliably) you get them. Here are 4 Python best practices to move from "it works on my machine" to "production-ready." 1. Embrace Vectorization (Forget the for loops) If you’re iterating over a Pandas DataFrame with a loop, you’re likely doing it wrong. Python’s numpy and pandas are built on C—let them do the heavy lifting. Bad: Using .iterrows() to calculate a new column. Good: Use vectorized operations like df['new_col'] = df['a'] * df['b']. It’s orders of magnitude faster. 2. The Magic of Method Chaining Clean code is readable code. Instead of creating five intermediate DataFrames, chain your operations. It keeps your namespace clean and your logic linear. Python # Instead of multiple assignments, try this: df_clean = (df .query('age > 18') .assign(name=lambda x: x['name'].str.upper()) .groupby('region') .agg({'salary': 'mean'}) ) 3. Type Hinting & Docstrings Data types in Python are flexible, which is a blessing and a curse. Use Type Hints to tell your team exactly what a function expects. def process_data(df: pd.DataFrame) -> pd.DataFrame: It saves hours of debugging when someone tries to pass a list into a function expecting a Series. 4. Memory Management Matters Working with "Big-ish" data? Downcast your numerics (e.g., float64 to float32). Convert object columns with low cardinality to category types. Your RAM (and your IT department) will thank you. The Bottom Line: Great data analysis isn't just about the model accuracy; it's about the maintainability of the pipeline. Which Python habit changed your workflow the most? Let’s swap tips in the comments! 👇 #Python #DataScience #Pandas #DataAnalysis #CodingBestPractices #MachineLearning

  • graphical user interface

To view or add a comment, sign in

Explore content categories