DPK Release 1.1.7: Python 3.13 Support & Performance Boosts

🎉 New DPK Release: Version 1.1.7 We’re excited to share the latest release of Data Prep Kit, packed with fresh transform capabilities, performance boosts, and expanded compatibility. Here’s a look at what’s new in v1.1.7: ⚙️ Enhancements Python 3.13 Compatibility - Expanded version compatibility to support Python 3.13 Faster Installation with uv -Migrated the repo to use uv, significantly speeding up environment setup and dependency installation. Rich Logging - A new Rich-based log handler offers cleaner, colorized, and more structured console output. 🔁 Transform Updates Folder-to-Parquet Transform - A brand new transform that converts an entire folder of files into a unified Parquet dataset—making it easier to batch-process large document collections. Text Encoder Upgrade - The Text Encoder now uses LanceDB for improved vector storage and retrieval performance. Spark Support for docling2parquet and doc_quality - Both doc_quality and docling2parquet transforms now support Spark execution, enabling scalable distributed processing. 📄 Explore the full release notes: 👉 https://lnkd.in/eZufxzv4 ⭐ Support the project by starring the repo and following our updates! #DataPrepKit #OpenSource #Python #MLOps #RAG #LLM #DataEngineering #AItools

To view or add a comment, sign in

Explore content categories