I just published c5tree — the first sklearn-compatible C5.0 Decision Tree for Python! Here's how it started: I noticed C5.0 wasn't part of scikit-learn. R has had the C50 package for years. Python didn't have anything equivalent. So I built it. pip install c5tree --- I benchmarked c5tree against sklearn's DecisionTreeClassifier (CART) across 3 classic datasets using 5-fold cross-validation: Dataset | CART | C5.0 Iris | 95.3% | 96.0% Breast Cancer | 91.0% | 92.1% Wine | 89.3% | 90.5% C5.0 wins on accuracy across every single dataset. I will be experimenting more combining with advanced ensemble methods. --- Why does C5.0 outperform CART? CART uses Gini/Entropy and always makes binary splits. C5.0 uses Gain Ratio - which corrects the bias toward high-cardinality features - and supports multi-way splits on categorical data. C5.0 also handles missing values natively (no imputation needed) and uses Pessimistic Error Pruning to produce smaller, more interpretable trees. --- Key features of c5tree: - Gain Ratio splitting (less biased than Gini) - Multi-way categorical splits - Native missing value handling - Pessimistic Error Pruning - Full sklearn compatibility (Pipelines, GridSearchCV, cross_val_score) - Human-readable tree output --- Quick start: from c5tree import C5Classifier clf = C5Classifier(pruning=True, cf=0.25) clf.fit(X_train, y_train) print(clf.score(X_test, y_test)) --- This is my first open-source Python package and it fills a genuine gap in the Python ML ecosystem. If you find it useful, a ⭐ on GitHub goes a long way! 🔗 PyPI: https://lnkd.in/e-sjUdSG 🔗 GitHub: https://lnkd.in/eecNYs_z #Python #MachineLearning #OpenSource #DataScience #ScikitLearn #DecisionTree
c5tree: Sklearn Compatible C5.0 Decision Tree for Python
More Relevant Posts
-
📘 Python for PySpark Series – Day 15 🎭 Polymorphism in Python ✨ What is Polymorphism? Polymorphism means “many forms”. It allows the same method or function to behave differently based on the object. ➡️ Promotes flexibility and dynamic behavior in code 🔹 Why Polymorphism? ✔ Improves code flexibility ✔ Reduces complexity ✔ Enhances readability ✔ Supports method reusability 🔹 Types of Polymorphism ✔ Method Overriding (Runtime) ✔ Method Overloading (Compile-time – limited in Python) ✔ Operator Overloading 🔹 Syntax (Method Overriding) class Parent: def show(self): print("Parent method") class Child(Parent): def show(self): print("Child method") 🔹 Example class Animal: def sound(self): print("Some sound") class Dog(Animal): def sound(self): print("Bark") class Cat(Animal): def sound(self): print("Meow") animals = [Dog(), Cat()] for a in animals: a.sound() ➡️ Same method sound() → different outputs 🔗 Why Polymorphism in PySpark? ✔ Same operations work on different data types (RDD, DataFrame) ✔ Functions behave differently based on input ✔ Helps in writing generic and reusable code 🏫 Real-Life Analogy (Remote Control 📺) One remote → multiple devices ➡️ Same button (action) → different behavior (TV, AC, etc.) 🧠 Interview Key Points ✔ Polymorphism = one interface, multiple implementations ✔ Achieved using method overriding ✔ Supports dynamic method dispatch ✔ Increases flexibility and scalability 🧠 Key Takeaway Polymorphism allows writing flexible and reusable code where the same operation behaves differently for different objects. 🔖 Hashtags #python #pyspark #dataengineering #oop #polymorphism #pythonbasics #learningjourney #coding
To view or add a comment, sign in
-
-
How to "Slice the Cake" in Python? 🎂🐍 (Slicing & Indexing) Once you’ve learned how to store strings, the big question is: Do we always have to use the entire text? 🧐 The Answer: Absolutely not! Python gives us precision tools (Indexing & Slicing) that allow us to manipulate text data and extract exactly what we need. At Data Hub, we use this constantly during Data Cleaning. Whether you're extracting specific "Product Codes" from a long string or separating "Dates" to generate accurate reports, these tools are your best friends. 📊 1️⃣ Indexing (Finding the Address): Remember, Python starts counting from 0, not 1. If we have: word = "Python" Letter P is at index 0 Letter y is at index 1 Letter n is at index 5 (or -1 if you count from the end) 💡 Pro Tip: Negative indexing is a lifesaver when dealing with long strings where you only need the last few characters! 2️⃣ Slicing (Cutting the Data): To extract a specific "portion" of text, we use the slice operator [start : stop]. word[0:4] ➡️ Starts at index 0 and stops "before" index 4. Result: Pyth. word[:] ➡️ Leaving it empty selects the entire string from start to finish. word[-3:-1] ➡️ Starts 3 characters from the end and stops before the last one. Result: ho. 🧠 The Bottom Line: Index is the "Address" of the character, while Slicing is the "Scissors" that separates the data. Mastering these is your first step toward becoming a Data Analyst who handles data with speed and intelligence! 👌 💬 Weekly Challenge: If you have the variable: name = "DataHub" What should we write between the brackets [ : ] to extract only the word "Data"? Show me your answers in the comments! 👇 #Python #DataAnalysis #DataHub #PythonBasics #DataScience #LinkedInLearning #Programming #DataCleaning
To view or add a comment, sign in
-
-
Python's Pro Tips : 🚀 What I learned building my first real Python dashboard (no one talks about this) Everyone talks about: 👉 Pandas 👉 Plotly 👉 AI tools But the real breakthrough for me was NOT libraries… It was understanding how Python actually thinks: 💡 1. Indentation = Logic Not formatting. Not style. 👉 One wrong space = broken program. GOLDEN RULE 👉 Every block must follow: if something: code (4 spaces) deeper code (8 spaces)à if file: → 0 spaces inside it → 4 spaces inside if len(...) → 8 spaces 🔒 BRACKETS Every: ( → must close ) [ → must close ] { → must close } WHY 0 / 4 / 8 SPACES? Python doesn’t use { } like JS… 👉 it uses INDENTATION = STRUCTURE 🔥 YOUR 3 if STATEMENTS (EXPLAINED CLEAN) ✅ LEVEL 1 (0 spaces) if file: 👉 This is TOP LEVEL 👉 Means: “only run everything below IF file is uploaded” ✅ LEVEL 2 (4 spaces) if categorical_cols: 👉 This is INSIDE if file 👉 Means: “only run this IF file exists AND categorical columns exist” ✅ LEVEL 3 (8 spaces) if selected_click != "None": 👉 This is INSIDE both above 👉 Means: “only run if: file exists categorical exists user selected something” 🎯 SIMPLE WAY TO SEE IT Think like this: IF file exists: DO A IF categorical exists: DO B IF user clicked: DO C 📊 VISUAL TREE (VERY IMPORTANT) if file: ← level 0 df = ... if categorical_cols: ← level 4 selected_click = ... if selected_click != "None": ← level 8 filtered_df = ... 💡 2. Execution Order matters 👉 You can’t use a variable before it exists. 👉 “Create → then use” (simple, but critical) 💡 3. Scope is everything If something is inside: if file: 👉 it ONLY exists there. 💡 4. Structure > Syntax You can know all libraries… But without structure, nothing works. 💡 5. Build → Break → Fix → Repeat That’s the real learning loop. Not tutorials. 🔥 After struggling with these, I built: ✔ Dynamic dashboard (multi-KPI) ✔ Power BI–style filtering ✔ Multi-chart engine ✔ RCA (insight layer coming next) 📌 Biggest takeaway: Stop chasing tools. Learn how things flow. Everyone learns in different way for me is outside à So first understand the whole picture (Structure) then see what section goes where and each section have what scripts inside it and how they structured (Spacing…..) That’s when everything clicks. Have difficult undertesting it I can explain in Mental Model: “Python Rooms & Floors” (only the first section if the image _Having the right Spacing_ kept me confused for while) :-) Next step: 📡 Turning this into a 💎 A commercial-grade telecom intelligence platform. Let’s see where this goes 🚀 Thanks for your impressions, feedbacks & comments
To view or add a comment, sign in
-
-
ANATOMY OF NUMPY NumPy is often described as "fast arrays for Python," but architecturally, it is a typed contiguous memory layout with vectorized C operations and broadcasting rules that let you compute on millions of elements without writing a single loop. Understanding its anatomy means knowing why np.dot beats a Python for-loop by 100×, and when it silently returns the wrong answer. 1. THE NDARRAY The core object: a contiguous block of memory plus metadata. dtype: fixed data type (float32, int64, bool). Chosen once, applied to every element. Wrong dtype silently loses precision. shape: tuple of dimensions. A (3, 4) array has 12 elements. Operations must agree on the shape or broadcast. strides: bytes to jump when moving along each axis. Transposes and slices just update strides, never copy data. 2. BROADCASTING NumPy aligns shapes from the right. Size-1 dimensions stretch to match. Rule 1: If shapes have different dimensions, prepend 1s to the shorter one. Rule 2: size 1 stretches to match. Anything else must agree, or broadcasting fails. Why it matters: avoids allocating huge intermediate arrays. x[:, None] - y[None, :] computes pairwise diffs with no loop. 3. VECTORIZATION Every operation is implemented in C with SIMD. Python loops are 50-200× slower. Ufuncs: np.sin, np.exp, np.add. Elementwise with no Python overhead. Reductions: sum, mean, argmax. Accept an axis argument to reduce along specific dimensions. Linear algebra: np.dot, np.linalg.solve, @ operator. Calls BLAS/LAPACK under the hood. 4. THE TRAPS Four foot-guns that bite every data scientist at least once. Views vs copies: slicing returns a view. Modifying a slice modifies the original. Use .copy() when you need independence. Integer overflow: int32 silently wraps at ±2.1 billion. Sum large integer arrays as int64 or float64. Shape mismatch: (n,) is not the same as (n, 1). The first is 1D, the second is 2D. Many bugs live in this gap. Hidden copies: fancy indexing with a boolean or integer array always returns a copy, even for a read. 🔥 THE BOTTOM LINE: The anatomy of NumPy is a balance between structure (dtype, shape, strides), math (vectorized C and BLAS calls), and iteration (broadcasting to avoid explicit loops). Master these three, and every Python data library built on NumPy (pandas, scikit-learn, PyTorch) starts to make sense at a lower level. #Python #NumPy #DataScience #MachineLearning #MLEngineering #AI
To view or add a comment, sign in
-
-
𝗣𝘆𝘁𝗵𝗼𝗻 𝗦𝗲𝗿𝗶𝗲𝘀 — 𝗗𝗮𝘆 𝟮 Python dictionaries are often called “O(1)”. That’s only half the truth. Under the hood, a dictionary is a hash table. 𝗪𝗵𝗲𝗻 𝘆𝗼𝘂 𝗶𝗻𝘀𝗲𝗿𝘁 𝗮 𝗸𝗲𝘆: * Python computes a hash * Maps it to an index * Stores the value That’s why lookups are *usually* O(1) But here’s what most developers ignore: **Hash collisions exist.** Two different keys can produce the same hash → same index. 𝗪𝗵𝗲𝗻 𝘁𝗵𝗮𝘁 𝗵𝗮𝗽𝗽𝗲𝗻𝘀: * Python has to resolve the collision * It probes for the next available slot More collisions = more probing = slower lookups In worst cases, O(1) degrades to O(n) Example (simplified): ```python id="dictday2" class BadHash: def __init__(self, value): self.value = value def __hash__(self): return 1 # forcing collision def __eq__(self, other): return self.value == other.value d = {} for i in range(10000): d[BadHash(i)] = i ``` This dictionary will perform poorly because every key collides. 𝗪𝗵𝗲𝗿𝗲 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: * Large-scale caching systems * High-frequency lookups * Custom object keys in dict 𝗛𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵: “Dictionaries are always O(1)” is a myth. They are *optimized for O(1)* — not guaranteed. If you ignore how hashing works, you’ll hit performance issues you won’t understand. --- 𝗧𝗼𝗺𝗼𝗿𝗿𝗼𝘄: 𝗪𝗵𝘆 𝘀𝗲𝘁𝘀 𝗮𝗿𝗲 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗹𝗶𝘀𝘁𝘀 (𝗮𝗻𝗱 𝘄𝗵𝗲𝗻 𝘁𝗵𝗲𝘆’𝗿𝗲 𝗻𝗼𝘁)
To view or add a comment, sign in
-
🚀 Python Series – Day 10: Strings in Python (Text Handling Basics) Till now, we worked with numbers and collections. But what about text data? 🤔 👉 That’s where Strings come in! 🧠 What is a String? A string is a sequence of characters enclosed in quotes. ✔️ Can use single ' ' or double " " quotes 🔧 Example: name = "Mustaqeem" print(name) 🔁 Access Characters text = "Python" print(text[0]) # P print(text[-1]) # n ✂️ String Slicing text = "Python" print(text[0:3]) # Pyt print(text[2:]) # thon 🔄 String Methods msg = "hello world" print(msg.upper()) # HELLO WORLD print(msg.lower()) # hello world print(msg.title()) # Hello World ❌ Mutability Fails in String Strings are immutable — meaning you cannot change them directly. text = "Python" text[0] = "J" # ❌ Error 👉 This will give an error because strings cannot be modified. ✅ Correct Way (Create New String) text = "Python" new_text = "J" + text[1:] print(new_text) # Jython 🎯 Why Strings are Important? ✔️ Used in almost every program ✔️ Helps in user input & output ✔️ Important for data processing 🔥 Pro Tip: Whenever you want to modify a string 👉 create a new one instead of changing the original ⚡ Quick Challenge: What will be the output? text = "Python" print(text[1:4]) 👇 Comment your answer! 📌 Tomorrow: Dictionaries & Sets (Advanced Data Structures) Follow me to learn Python step-by-step from basics to advanced 🚀 #Python #DataScience #Coding #Programming #LearnPython #Beginners #Tech #MustaqeemSiddiqui
To view or add a comment, sign in
-
-
Day 39: The "Main" Gatekeeper — if __name__ == "__main__": 🚪 To understand this line, you first have to understand how Python treats files when it loads them. 1. What is __name__? Every time you run a Python file, Python automatically creates a few "special" variables behind the scenes. One of those is __name__. Scenario A: If you run the file directly (e.g., python script.py), Python sets the variable __name__ to the string "__main__". Scenario B: If you import that file into another script (e.g., import script), Python sets __name__ to the filename (e.g., "script"). 2. Why do we need this check? Imagine you wrote a script with some useful functions, but also some code at the bottom that prints a "Welcome" message and runs a test. If another developer wants to use your functions and types import your_script, Python will automatically execute every line of code in your file. Suddenly, their program is printing your welcome messages and running your tests! The Fix: def calculate_tax(price): return price * 0.1 # This code ONLY runs if I play the file directly. # It WON'T run if someone else imports this file. if __name__ == "__main__": print("Testing the tax function...") print(calculate_tax(100)) 3. The "Execution Flow" (How it works) Python starts reading your file from the top. It records your functions and classes into memory. It reaches the if statement. If you clicked "Run": The condition is True. The code inside the block executes. If another script imported this: The condition is False. The code inside is skipped. Your functions are available for use, but no "messy" output is generated. 4. Professional Best Practice: The main() function In senior-level engineering, we don't just put logic directly under the if statement. We bundle our starting logic into a function called main(). def main(): # Start the app here print("App is starting...") if __name__ == "__main__": main() 💡 The Engineering Lens: This makes your code cleaner and allows other developers to manually call your main() function if they ever need to "reset" or "restart" your script from their own code. #Python #SoftwareEngineering #CleanCode #ProgrammingTips #PythonDevelopment #LearnToCode #TechCommunity #PythonMain #BackendDevelopment
To view or add a comment, sign in
-
🔧 Building AI Agents from Scratch – Part 10: AI Agent Python Library Packaging is live! In this post, I explore how agents can be packaged and shared like any other Python library: ✨ From Scripts to Libraries – agents move beyond ad‑hoc scripts into structured, reusable packages. ✨ Packaging with setup.py / pyproject.toml – standard Python packaging ensures agents can be installed via pip. ✨ Wheel Files (.whl) – agents are compiled into distributable wheels, making installation fast and dependency‑safe. ✨ Distribution via Git – teams can version, share, and collaborate on agents across repositories. ✨ FastAPI Discovery Integration – packaged agents can register themselves automatically, enabling plug‑and‑play orchestration. This series continues to be based entirely on my work experience. It’s not about frameworks—it’s about learning the fundamentals and understanding what they’re built on. 👉 Read Part 10: https://lnkd.in/gAsxewjw If you’re curious about how packaging transforms agents into modular, reusable components, I’d love for you to follow along. #AI #Agents #Python #Packaging #AgenticAI #LearningByDoing
To view or add a comment, sign in
-
Day 12 of My Data Science Journey — Python Lists: Methods, Comprehension & Shallow vs Deep Copy Today’s focus was on one of the most essential data structures in Python — Lists. From data storage to manipulation, lists are used everywhere in real-world applications and data science workflows. 𝐖𝐡𝐚𝐭 𝐈 𝐋𝐞𝐚𝐫𝐧𝐞𝐝: List Properties – Ordered, mutable, allows duplicates, and supports mixed data types Accessing Elements – Used indexing, negative indexing, slicing, and stride for flexible data access List Methods – append(), extend(), insert() for adding elements – remove(), pop() for deletion – sort(), reverse() for ordering – count(), index() for searching and analysis Shallow vs Deep Copy – Understood that direct assignment does not create a new copy – Used copy(), list(), slicing for safe duplication – Learned the importance of copying, especially with nested data List Comprehension – Wrote concise and efficient code using list comprehension – Combined loops and conditions in a single readable line Built-in Functions – Used sum(), len(), min(), max() for quick data insights Additional Useful Methods – clear(), sorted(), zip(), filter(), map(), any(), all() 𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭: Understanding how lists work — especially copying and comprehension — is critical for writing efficient and bug-free Python code. Lists are not just a data structure; they are a core tool for solving real-world problems. Read the full breakdown with examples on Medium 👇 https://lnkd.in/gFp-nHzd #DataScienceJourney #Python #Lists #Programming
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
R has C50, Python has sklearn, TensorFlow has its own thing but none of them are C5.0 AND sklearn compatible at the same time, you literaly own that spot alone 🙌