23 Python ML tips to save hours from a 4-year veteran

I have been working with Python to develop ML for over 4 years. Here are 23 tips to save hours that I wish I had known in my early days: ↳ Pin package versions to avoid “works on my machine” surprises. ↳ Keep feature definitions in one place and version them like code. ↳ Prefer vectorized pandas or polars over .apply loops for speed. ↳ Use categorical dtypes for high-cardinality strings to shrink RAM. ↳ Cache expensive steps to parquet or feather and read them everywhere. ↳ Use a Makefile or tox tasks for one-command setup, test, and train. ↳ Format code with black and lint with ruff using a pre-commit hook. ↳ Use logging instead of prints and write logs to a run-specific file. ↳ Structure repos with src/ modules and keep notebooks in notebooks/. ↳ Add lightweight types with typing to catch shape and None bugs early. ↳ Use pyarrow dtypes in pandas to reduce memory and weird NaN behavior. ↳ Profile hot spots with cProfile or line_profiler before optimizing. ↳ Keep data paths in a single config and never hardcode local directories. ↳ Track runs with a simple MLflow setup and log params, metrics, and tags. ↳ Load configs with environment variables so secrets never touch notebooks. ↳ Turn stable notebook cells into functions and import them like a library. ↳ Plot a quick learning curve and a calibration curve before chasing models. ↳ Persist models and artifacts with clear names that include metric and date. ↳ Add unit tests for data contracts like column presence, dtypes, and ranges. ↳ Seed Python, NumPy, and any framework once in a shared utils.seed() function. ↳ Validate splits with a time-aware split or group-aware split to prevent leakage. ↳ Schedule error analysis notebooks and keep a running “bug zoo” of failure modes. ↳ Use a project env (venv or conda) and freeze with requirements.txt or pyproject.toml. Extra: Python Machine Learning notes by Michael Brothers. ♻️ Repost to Your Network Who Need to Read These Tips

To view or add a comment, sign in

Explore content categories