Name: Automate Feature Engineering with Featurewiz-pro | Fares Ashraf posted on the topic | LinkedIn
Uploaded: 2026-03-26T23:17:59.605Z
Duration: 1 min
Channel: Fares Ashraf

Fares Ashraf

1mo

I spent weeks doing the same feature engineering steps manually on every project. Missing value maps. Outlier detection. Linearity checks. Cramér's V. VIF. RFECV. So I built a Python package that does all of it automatically. Introducing featurewiz-pro — a 7-phase feature engineering pipeline I designed, built, and published to PyPI from scratch. One command. 47 seconds. Clean, model-ready data. What it does: → Profiles your data and drops useless columns automatically → Detects which features are linear vs non-linear → Expands non-linear features with splines → Screens for multicollinearity and interactions → Selects the best features using RFECV + permutation importance → Audits for data leakage before you ever touch a model Tested across Python 3.9–3.12. 55 tests. 0 failures. Live on PyPI now → pip install featurewiz-pro Walkthrough in the video Git: https://lnkd.in/dHPV7xNK #Python #MachineLearning #DataScience #OpenSource #FeatureEngineering #PyPI

Transcript

Every machine learning project leaves or dies on one thing, the quality of your features. But feature engineering is painful, it's repetitive, it's time consuming, and if you skip steps, your model pays the price. So I built a tool to solve that. Feature Wiz Pro is an open source Python package I designed, built, and published to Pypi from scratch. It automates the full feature engineering work flow from raw data all the way to a clean model ready data set. Across 7 structured phases, you install it with a single pip command, point it at your data frame and your target column, and it does the rest. Before this tool, a typical feature engineering workflow meant dozens of scattered notebook cells, no consistent structure, and easy to miss steps like leakage detection or multicollinearity checks. Feature Wiz Pro packages all of that into one reproducible, configurable pipeline, so you never skip a critical step again.

To view or add a comment, sign in

More Relevant Posts

Aarush Aggarwal
1w
Report this post
Spent 5 days chasing ghosts—DLL hell and ABI mismatches. I followed the agentic debugger down the wrong path as it hallucinated at a wrong layer because it misread the WinError 1114 as a load-path issue rather than a missing export. The actual fix was two lines. I used TORCH_LIBRARY when I needed PYBIND11_MODULE. The Architecture Gap: - Use TORCH_LIBRARY to register ops into the PyTorch C++ Dispatcher (accessed via torch.ops). It fires static C++ constructors at DLL load time but does not create a PyInit_* function. Python can't "see" it as a module. - Use PYBIND11_MODULE to generate the standard Python C Extension entry point. This generates the PyInit_{name} entry point Python needs to "see" the module. The error was literal: "dynamic module does not define module export function." No PyInit_* existed because TORCH_LIBRARY isn't meant to be imported directly. {just correcting the record} #CPP #PyTorch #SystemsProgramming #MachineLearning #barebones #3D
Like Comment
To view or add a comment, sign in
Rohit Sul
2w
Report this post
🚀 Another LeetCode Problem Solved: Palindrome Number! 🔗 Check out my solution: https://lnkd.in/dwDMqXXn 💡 Problem Overview Given an integer x, determine whether it is a palindrome — meaning it reads the same forward and backward. (LeetCode) Examples: ✔ 121 → Palindrome ❌ -121 → Not a palindrome ❌ 10 → Not a palindrome 🧠 My Approach (Digit Reversal) Instead of converting the number to a string, I used a mathematical approach: Extract digits using % 10 Reverse the number step by step Compare reversed number with original ⚙️ Key Learnings ✔ Strong understanding of number manipulation ✔ Importance of handling edge cases (negative numbers, trailing zeros) (leet-solution.com) ✔ Practicing clean and efficient logic ⏱️ Complexity • Time Complexity: O(log n) • Space Complexity: O(1) 🔥 Why this problem matters Even though it’s an “easy” problem, it builds: Logical thinking Problem-solving fundamentals Confidence for bigger challenges #LeetCode #DSA #Python #CodingJourney #ProblemSolving #100DaysOfCode
Like Comment
To view or add a comment, sign in
Jan Bremec
2w
Report this post
I've shared requirements.txt files generated with pip freeze and watched them fail on every machine that wasn't mine. So I built envcore. Because waiting for the Python ecosystem to fix a 15-year-old problem seemed optimistic. It hooks directly into Python's import system and records what actually loads while your code runs. Not what's installed on your machine. Not what a static scanner thinks might be imported. What. Actually. Runs. envcore trace train.py → env_manifest.json → envcore restore Clean, pinned, minimal manifest. Exact environment rebuilt anywhere. No 200-package soup, no missing runtime imports, no "works on my machine" as if that's a valid thing to say to another human. It also resolves import aliases correctly — PIL to Pillow, cv2 to opencv-python, sklearn to scikit-learn — because the gap between what you type and what you install has existed since forever and apparently needed one person to care. pip freeze has been lying to you for 15 years. Everyone accepted it. I got tired of it. 30 seconds to try: pip install envcore If it's useful, a GitHub star helps a new project get noticed. https://lnkd.in/dz3MFTbD #Python #OpenSource #DevTools
2 Comments
Like Comment
To view or add a comment, sign in
Zhi Xuan Chong
3w
Report this post
Worked on an Event Scheduler project in today’s CPSC 335 (Algorithms) class, implementing it in Python using a combination of data structures. I used a min-heap for efficient priority handling, a hash table for O(1) lookups, and a sorted list for time-based queries. This exercise was a great way to see how different data structures can be combined to balance performance and functionality, and how trade-offs play a key role in algorithm design.
Like Comment
To view or add a comment, sign in
Kalyan KS
3w
Report this post
𝐃𝐞𝐭𝐞𝐜𝐭 𝐇𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝐑𝐀𝐆 𝐮𝐬𝐢𝐧𝐠 𝐋𝐞𝐭𝐭𝐮𝐜𝐞𝐃𝐞𝐭𝐞𝐜𝐭 LettuceDetect is a lightweight open-source tool for detecting hallucinations in RAG. It identifies unsupported parts of an answer by comparing it to the provided context. LettuceDetect uses ModernBERT model trained over RAGTruth dataset. 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 - Token-level precision: detect exact hallucinated spans - Optimized for inference: smaller model size and faster inference - 4K context window via ModernBERT - MIT-licensed models & code - HF Integration: one-line model loading - Easy to use python API: can be downloaded from pip and few lines of code to integrate into your RAG system
21 Comments
Like Comment
To view or add a comment, sign in
Ádám Kovács
3w
Report this post
Always great to see LettuceDetect gaining more adoption! I think currently the biggest challenge in the hallucination detection space is dataset diversity. Most open benchmarks are plain text only, while real pipelines are full of tables, markdown, and generated code. Stay tuned because we have several releases in progress: structured data benchmarks and hallucination detection for code generation agents (Claude, Copilot, etc.). So expect LettuceDetect getting big updates 😊 Reach out if you'd like to collaborate or learn more!
Kalyan KS
3w

𝐃𝐞𝐭𝐞𝐜𝐭 𝐇𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝐑𝐀𝐆 𝐮𝐬𝐢𝐧𝐠 𝐋𝐞𝐭𝐭𝐮𝐜𝐞𝐃𝐞𝐭𝐞𝐜𝐭 LettuceDetect is a lightweight open-source tool for detecting hallucinations in RAG. It identifies unsupported parts of an answer by comparing it to the provided context. LettuceDetect uses ModernBERT model trained over RAGTruth dataset. 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 - Token-level precision: detect exact hallucinated spans - Optimized for inference: smaller model size and faster inference - 4K context window via ModernBERT - MIT-licensed models & code - HF Integration: one-line model loading - Easy to use python API: can be downloaded from pip and few lines of code to integrate into your RAG system
Like Comment
To view or add a comment, sign in
Abhishek Arya
2w
Report this post
🧠✨ Reducing Hallucinations in RAG with LatticeDetect One of the biggest challenges in Retrieval-Augmented Generation (RAG) isn’t retrieval… It’s hallucination. Even with the right context, LLMs can still: ❌ Generate confident but incorrect answers ❌ Mix facts with assumptions ❌ Drift away from source truth So how do we fix this? 🚀 Enter: LatticeDetect Instead of blindly trusting generated responses, LatticeDetect adds a validation layer that ensures: ✔️ Responses stay grounded in retrieved context ✔️ Inconsistencies are detected early ✔️ Output aligns with factual evidence 💡 Think of it as: Turning RAG from a storyteller into a truth-teller. ⚙️ What changes with this approach? • Better factual accuracy • More reliable AI systems • Production-ready trust (not just demo magic) 🔥 In a world where LLMs can “sound right,” we need systems that are actually right. 👉 RAG + LatticeDetect = Reliability > Creativity #AI #GenerativeAI #RAG #LLM #MachineLearning #AIEngineering #DataScience #Innovation
Kalyan KS
3w

𝐃𝐞𝐭𝐞𝐜𝐭 𝐇𝐚𝐥𝐥𝐮𝐜𝐢𝐧𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝐑𝐀𝐆 𝐮𝐬𝐢𝐧𝐠 𝐋𝐞𝐭𝐭𝐮𝐜𝐞𝐃𝐞𝐭𝐞𝐜𝐭 LettuceDetect is a lightweight open-source tool for detecting hallucinations in RAG. It identifies unsupported parts of an answer by comparing it to the provided context. LettuceDetect uses ModernBERT model trained over RAGTruth dataset. 𝐊𝐞𝐲 𝐅𝐞𝐚𝐭𝐮𝐫𝐞𝐬 - Token-level precision: detect exact hallucinated spans - Optimized for inference: smaller model size and faster inference - 4K context window via ModernBERT - MIT-licensed models & code - HF Integration: one-line model loading - Easy to use python API: can be downloaded from pip and few lines of code to integrate into your RAG system
Like Comment
To view or add a comment, sign in
Botsz HUANG
3w Edited
Report this post
Happy to share 🛠️ my_mlir_track#6 Structured Control Flow (SCF) - for (Repo link: https://lnkd.in/gPE4Mrmk)! ✅ Induction Variables & Bounds: How to manage index types for loop control. ✅ Loop-Carried Variables: Using iter_args and scf.yield to maintain state across iterations. ✅ Python Integration: Compiling the logic into a shared library to bridge the gap between high-level orchestration and hardware-level execution. By utilizing this pipeline, we can generate highly optimized logic that remains easily accessible via a clean Python interface. (NOTE)Since I’ll be moving toward an assembly-like style in the next stage, I'm concluding the SCF series with this brief for loop entry. The next post will also be short; as the code becomes less "human-readable," keeping the content minimal should make it easier to digest. #LearningInPublic #Compiler #MLIR #SCF #for #C++ #HighPerformanceComputing
Like Comment
To view or add a comment, sign in
Ryan Codrai
2w Edited
Report this post
Turbovec is now available on PyPi 🐍 and Crates.io 📦 Turbovec is a vector index built on Google's TurboQuant algorithm, written in Rust with Python bindings. Turbovec has identical or better speed, compression and recall compared to Faiss while also being data-oblivious. Because adding new vectors doesn't require re-indexing, Turbovec is dramatically simpler to operate in production. → pip install turbovec → cargo add turbovec Check out the open-source repo: https://lnkd.in/e5M4dVRk #RAG #LLM #OpenSource #Gemma4
23 Comments
Like Comment
To view or add a comment, sign in

887 followers

82 Posts

View Profile Follow

Transcript

More Relevant Posts

Explore related topics

Explore content categories