Pandas 3.0 Released: CoW, Improved UDFs, and Enhanced Performance

2mo Edited

Pandas 3.0 is here! 🎉https://lnkd.in/dfAUP2bH - Copy-on-Write (CoW) fully implemented: SettingWithCopyWarning is gone ✅. No more debugging mysterious copies - chained assignments just work - pd.col() syntax: Clean column references in assign() and loc() without messy lambdas. E.g., df.assign(c=pd.col('a') + pd.col('b')) - Faster UDFs 🚀: No more "slow as molasses" user-defined functions - major perf boosts via better optimization (full Arrow backend didn't land, but it's solid) I made a Kaggle notebook to try https://lnkd.in/d-SsfryV #Pandas #DataScience #Python #DataAnalysis #MachineLearning

To view or add a comment, sign in

More Relevant Posts

KANAK ACHARYA
2mo
Report this post
150 flowers. 4 measurements. 3 species. 1 algorithm that just... gets it. I built a KNN classifier on the Iris dataset — and while the dataset is classic, the process taught me something that no tutorial spells out: The model doesn't "think." It just remembers. K-Nearest Neighbors works by asking "who are your closest neighbors?" — and classifying based on majority vote. No equations being solved. No weights being learned. Just proximity. And yet — it achieves high accuracy on a real classification task. That gap between simplicity and power is what keeps pulling me deeper into ML. What I built: → Loaded & explored the Iris dataset with pandas → Trained a KNN classifier (k=3) using scikit-learn → Evaluated performance with accuracy score + confusion matrix → Built prediction for new, unseen flower samples Another project in the books. Each one teaches me something the last one didn't. 🔗 GitHub: https://lnkd.in/eybDDsdY #MachineLearning #Python #ScikitLearn #KNN #DataScience #BuildingInPublic

GitHub - KanakAcharya/iris_ml_project: Beginner-friendly Iris dataset KNN classifier in Python (scikit-learn) github.com
Like Comment
To view or add a comment, sign in
Djalila BENSALEM
2mo
Report this post
⚠️ Pandas trap: groupby() silently drops NaN keys by default, groupby() excludes rows where grouping columns contain NaN (dropna=True). This means: • Your training population may shrink • Group sizes may be biased • Downstream thresholds may fail Always define explicitly 💪 : Which rows you learn from. Whether NaN groups should be included (dropna=False). Your data quality assumptions before aggregation 🙅♀️ Silent defaults create silent bias. #Python #Pandas #DataScience #DataEngineering #DataQuality
Like Comment
To view or add a comment, sign in
Don't Use This Code

2,069 followers
2mo
Report this post
Why aren’t my Matplotlib tick labels behaving? Let’s ask Cameron Riddell! In this week’s Cameron’s Corner, Cameron digs into Matplotlib’s ticker system and shows how small choices can make your charts much clearer (or much more confusing). Learn: ✅ How major and minor tickers work ✅ When to use AutoLocator, MultipleLocator, and custom formatters ✅ Tips for clean, readable axes that communicate your message Read here: https://lnkd.in/g5hkw8ua Ever wrestled with cluttered tick labels? Drop your best Matplotlib tip below 👇 #Python #Matplotlib #DataViz #CameronsCorner #DontUseThisCode
Like Comment
To view or add a comment, sign in
Khuyen Tran
2mo
Report this post
What if you could write multi-condition logic without nested function calls? pandas requires np.where() for conditional columns, which breaks method chaining and becomes nested fast. The apply() alternative is slow and also breaks the DataFrame workflow. Polars replaces nested np.where() with readable when().then().otherwise() chains that scale cleanly to any number of conditions. Better yet, you can combine them with any other Polars expression like string or date operations. 🚀 Article comparing pandas, polars, and DuckDB: https://bit.ly/4qfdtDd ☕️ Run this code: https://bit.ly/4qKPn3H #Python #Polars #DataScience #pandas
9 Comments
Like Comment
To view or add a comment, sign in
Pavel Zhovtiak
2mo
Report this post
sum() vs NumPy vs math.fsum(): Which One Is Faster? Simulation script is available here: https://lnkd.in/ec9ecZxx I benchmarked four ways to sum 1,000,000 floats stored in a Python list: - sum() - np.sum() - np.add.reduce() - math.fsum() Each function was executed 1000 times (after warm-up), and I compared the mean execution time. Result - math.fsum() - fastest - sum() - slightly slower - np.add.reduce() - slower - np.sum() - slowest Surprising? A bit. Why NumPy Lost Here Because the data is a Python list. When calling np.sum(list), NumPy first converts the list into an array. That conversion overhead dominates the runtime. Meanwhile: > sum() works directly with the list > math.fsum() is a C-optimized implementation with better numerical stability The Takeaway NumPy is extremely fast - when working with NumPy arrays. But if your data is already a list and you just need a single aggregation, plain Python may be faster. Performance always depends on context: - Data structure - Memory layout - Conversion cost Benchmark in your real setup - not in theory. #python #numpy #sum #math #fsum
Like Comment
To view or add a comment, sign in
Arturo Javier Borbon Rojas
2mo
Report this post
Weekly challenge 5: Recursion To understand recursion, you must first understand recursion. For Week 5 of my algorithm challenge, I decided to tackle a concept that trips up many beginners: Recursive Functions, using the classic Factorial problem. What is Recursion? Instead of using a standard `for` or `while` loop, a recursive function calls **itself** to solve a smaller piece of the problem. It keeps digging deeper until it hits a "Base Case" (the bottom), and then it passes the answers back up the chain. Think of it like a set of Russian nesting dolls. The Trade-off:** > While recursive code is extremely clean and mathematical, it uses more memory. Every time the function calls itself, it adds a new layer to the computer's **Call Stack**. If you forget your Base Case, your program crashes with a "Stack Overflow"! > > I added a visual trace to my Python script so you can literally see the Call Stack growing and shrinking in the console. Check the full code and console output on GitHub: https://lnkd.in/es5TzCUg #Python #Recursion #Algorithms #CodingChallenge #SoftwareEngineering #DataScience
Like Comment
To view or add a comment, sign in
Meenalochani Selvam
2mo
Report this post
While working with datasets in Pandas, one small thing that made a big difference for me was understanding vectorization. In the beginning, I used apply() for many transformations. It worked — but as datasets got bigger, I noticed things slowing down. Then I started using column-wise operations instead of row-wise logic, and my code became both simpler and faster. Now, apply() is something I use only when there’s no easier alternative. Still learning something new with every dataset I work on. What’s one Pandas habit or trick that improved your workflow? #Pandas #Python #DataEngineering #DataAnalysis
Like Comment
To view or add a comment, sign in
Uday Raut
2mo
Report this post
📅 Day 21/30 – Matplotlib (Data Visualization) Today I learned Matplotlib, a powerful Python library used for data visualization. What I covered: • Introduction to Matplotlib • Line plots • Bar charts • Pie charts • Labels, titles, and legends • Customizing graphs It was exciting to turn raw data into meaningful visual insights 📊 📚 Learning resource: HackerBytez – https://lnkd.in/gzKTANVt Visualization makes data easier to understand and analyze 🚀 #Day21 #PythonChallenge #30DaysOfPython #Matplotlib #DataVisualization #Python #LearningInPublic #CodingJourney
Like Comment
To view or add a comment, sign in
Khushi Mishra
2mo
Report this post
🚀 Day-56 of #100DaysOfCode 📊 NumPy Practice – Finding Unique Values & Frequency Today I practiced identifying unique elements and counting their occurrences using NumPy. 🔹 Concepts Practiced: ✔ np.unique() ✔ Frequency counting ✔ Handling duplicate values ✔ Efficient array analysis 🔹 Key Learning: Using return_counts=True makes frequency analysis simple and efficient without loops — very useful in data preprocessing. Slowly stepping into data analysis concepts using NumPy 💡🔥 #Python #NumPy #DataAnalysis #ArrayOperations #100DaysOfCode #LearnPython #CodingPractice #PythonDeveloper
Like Comment
To view or add a comment, sign in
Santosh Anupoju
2mo
Report this post
📊 Day 13/90 — Creating Powerful Visuals with Seaborn Yesterday we learned basic visualization. Today we level up using Seaborn, a Python library that helps create more professional and insightful charts. ✅ Today’s Focus: • What is Seaborn & why analysts use it • Creating attractive statistical charts • Visualizing relationships between variables • Understanding distributions & trends 🎯 Why this matters: Seaborn makes it easier to discover patterns and present insights in a professional, presentation-ready format. 📌 Practice Tip: Try this in Python: import seaborn as sns import matplotlib.pyplot as plt data = [12, 15, 14, 10, 18, 20, 17] sns.histplot(data) plt.show() Better visuals → clearer insights → stronger impact. 💬 Comment “DAY 13” if you’re learning with me. #DataAnalytics #Seaborn #DataVisualization #Python #LearningInPublic #90DaysChallenge
Like Comment
To view or add a comment, sign in

2,833 followers

49 Posts

View Profile Follow

Pandas 3.0 Released: CoW, Improved UDFs, and Enhanced Performance

More Relevant Posts

Explore content categories