🎉 New DPK Release: Version 1.1.7 We’re excited to share the latest release of Data Prep Kit, packed with fresh transform capabilities, performance boosts, and expanded compatibility. Here’s a look at what’s new in v1.1.7: ⚙️ Enhancements Python 3.13 Compatibility - Expanded version compatibility to support Python 3.13 Faster Installation with uv -Migrated the repo to use uv, significantly speeding up environment setup and dependency installation. Rich Logging - A new Rich-based log handler offers cleaner, colorized, and more structured console output. 🔁 Transform Updates Folder-to-Parquet Transform - A brand new transform that converts an entire folder of files into a unified Parquet dataset—making it easier to batch-process large document collections. Text Encoder Upgrade - The Text Encoder now uses LanceDB for improved vector storage and retrieval performance. Spark Support for docling2parquet and doc_quality - Both doc_quality and docling2parquet transforms now support Spark execution, enabling scalable distributed processing. 📄 Explore the full release notes: 👉 https://lnkd.in/eZufxzv4 ⭐ Support the project by starring the repo and following our updates! #DataPrepKit #OpenSource #Python #MLOps #RAG #LLM #DataEngineering #AItools
DPK Release 1.1.7: Python 3.13 Support & Performance Boosts
More Relevant Posts
-
Today I built and deployed a basic ML API using Python and FastAPI. The focus was not just training a model, but understanding how ML works inside real backend systems. I implemented: - Request–response flow - Input validation - Model loading at startup - Error handling - SQLite database logging - Clean architecture (API → Service → DB → Model) - Deployment to a public server This helped me understand that ML in production is more about system design and integration than just model accuracy. You can check out the project here: https://lnkd.in/gn_S46VY Small step, but meaningful progress. #MachineLearning #Backend #FastAPI #LearningInPublic
To view or add a comment, sign in
-
🚀 Day 5/100 — Working with Persistent Storage 🧠 “Persistence transforms execution into continuity.” Systems become meaningful when they retain and retrieve information reliably. Today, I learned how Python interacts with files to store and retrieve persistent data. ⚙️ 🔧 Today’s focus areas: 📂 File Reading — Accessing stored data 📝 File Writing — Persisting new information 🔄 File Modes — Managing read and write operations 🎯 Data Persistence — Ensuring continuity across executions 🎯 The objective was to enable programs to maintain state beyond runtime. ✅ Day 5 complete: Persistent data handling established. ▶️ Day 6: Strengthening reliability through exception handling. Step by step. The system evolves. 🏗️ #Python #BackendDevelopment #100DaysOfCode #SoftwareEngineering
To view or add a comment, sign in
-
Polars is quietly becoming one of the most exciting tools in the modern Python data stack. Most of us have hit the limits of traditional DataFrame workflows: slow group‑bys, memory issues with medium‑large datasets, and complex pipelines that are hard to optimize. Polars tackles all of that head‑on with a fresh design. Docs: https://docs.pola.rs/
To view or add a comment, sign in
-
-
🔥 Day 4 – Pandas Selection & Production-Style Filtering Today I focused on strengthening my data selection and filtering skills using Pandas — but doing it the right way. Instead of just filtering rows, I practiced production-style defensive programming. Here’s what I worked on: ✅ Column & row selection using .loc and .iloc ✅ Boolean filtering with multiple conditions ✅ Cleaning messy CSV column names ✅ Safe numeric conversion using pd.to_numeric() ✅ Writing a custom function to parse "HH:MM" delay values into proper Timedelta objects ✅ Handling invalid values using pd.NaT ✅ Preventing runtime errors with defensive filtering logic Built a workflow that: • Filters orders with Miles ≤ 30 • Converts delay strings into real time objects • Filters delays ≤ 30 minutes • Ensures no invalid comparisons occur Real-world data is messy. Learning how to clean, validate, and safely filter it is what turns simple analysis into production-ready logic. 📂 GitHub Repository: https://lnkd.in/gNWeQ5KE On to Day 5 🚀 #Python #Pandas #DataEngineering #Analytics #LearningInPublic #100DaysOfCode
To view or add a comment, sign in
-
Real-world data is messy. Your models shouldn't be. CSV files, external APIs, user input - they all deliver junk like "N/A", "unknown", or empty strings. Technically valid. Logically useless. Instead of scattering cleanup logic across your codebase, normalize once - at the model level. Downstream code stops worrying about edge cases. Business logic gets simpler. That's the real value of structured models: a hard boundary between messy input and reliable internal state. This pattern - and dozens like it - are covered in Practical Pydantic. https://lnkd.in/eaASBPzP Clean code starts with clean data. #Python #Pydantic #CleanData
To view or add a comment, sign in
-
-
Unpopular opinion: Is Jupyter *really* the best tool for *everything* in your data science workflow? 🤔 While notebooks are great for exploration, let's talk about building robust, maintainable projects. I'm advocating for a move towards: * Modular Code (.py files): For better organization and reusability. * Git Versioning: Because "final_version_v2_FINAL.ipynb" gives me nightmares. * Unit Testing: Catching bugs before they become full-blown crises. Are we over-relying on notebooks? What are your thoughts on moving towards more structured approaches in data science? Share your experiences in the comments! 👇 #DataScience #MachineLearning #Python #SoftwareEngineering #CodeQuality
To view or add a comment, sign in
-
-
Built a real-time network traffic dashboard called NetAnlyzer pro The dashboard monitors live internet traffic on a network and breaks it down visually — showing what type of data is moving, which devices are the most active, how traffic behaves over time, and automatically flagging anything that looks suspicious or unusual. It's essentially a live window into what's happening inside a network at any given second. The kind of tool that data and security teams use daily to keep systems running clean. Tools used and what each one did: 🐍 Python — the core language that runs everything 🐼 Pandas — organised and processed the live network data into clean tables 📊 Plotly — turned that data into the interactive charts and graphs ⚡ Dash — built the live web dashboard that updates every second 🖥️ psutil — pulled real-time network stats directly from the system Still learning. #DataAnalytics #Python #NetworkSecurity
To view or add a comment, sign in
-
Knwler is now a proper Python package and available via pipx. No more cloning the repo to get started, you can generate documents with just two CLI commands. In addition: - Complete refactor: the monolithic script became a well-structured package with a clean CLI, making it easier to extend and integrate. - Graph database integrations: import scripts for Neo4j, SurrealDB, and HelixDB are now included out of the box. Your extracted graph can land directly in your database of choice. - Stability fixes: template rendering issues resolved, packaging corrected to ensure all assets ship with the wheel. If you're working with unstructured text and want to turn it into structured knowledge — entities, relationships, communities — knwler does that in a few lines of Python. https://knwler.com Next, I will create exports for RDF (Neptune and Qlever). #knowledgegraph #graphdb
To view or add a comment, sign in
-
HoloViz MCP now ships a CLI. The same tools that power AI assistants through the Model Context Protocol — semantic documentation search, component introspection, best-practice skills — are now available directly in your terminal. The namespaces mirror Python imports: `pn`, `hv`, `hvplot`. If you know `import panel as pn`, you already know the CLI. Every command supports three output formats: - `--output pretty` — Rich tables for terminal use (default) - `--output markdown` — for piping into LLMs or documentation - `--output json` — for scripting and automation $ pip install holoviz-mcp
To view or add a comment, sign in
-
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development