Google LangExtract Simplifies LLM and Document Processing

1w Edited

Just came across something interesting — Google dropped a new library called LangExtract. It’s a Python tool that basically takes unstructured documents and turns them into structured data with just a few lines of code. No complicated setup. What I found genuinely useful: - It maps every extracted piece back to where it came from in the document - Keeps outputs consistent with defined schemas - Can handle long documents using parallel processing - Generates HTML visualizations to actually see what’s happening - Works with Gemini, Ollama, and even open-source models - Doesn’t feel tied to one specific use case — pretty flexible Also, it’s open source. No API keys, no usage limits. Feels like something that could simplify a lot of LLM and document processing workflows. Here’s the link if you want to check it out: https://lnkd.in/gNKBKNwx #AI #Python #OpenSource #LLM #GenAI

GitHub - google/langextract: A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization. github.com

To view or add a comment, sign in

More Relevant Posts

Gokul Jinu
1w
Report this post
Shipped: Python SDK for tag-graph agent memory. For a year I've been chasing one problem — how do you give an LLM agent memory that's bounded, predictable, and doesn't blow your token bill? Vector DBs → fuzzy, impossible to budget. Raw history → 5-turn context overflow. Summarize-and-re-inject → silently drops facts the agent needs three turns later. So we built MME — a bounded tag-graph memory engine. Every memory carries tags, retrieval starts from the current scope, propagates to neighbors with bounded fanout, ranks by graph proximity. Deterministic, token-budgeted, sub-50ms at 100k items. Today the Python SDK is live: → pip install railtech-mme → Native LangChain + LangGraph tool integrations → Online learning via feedback loops → Open source Wrote up the full design rationale, tradeoffs vs. vector search, and the SDK surface area here: https://lnkd.in/eNR5n_iq Honest beat — this is launch day. If you're building LLM agents in Python and "my agent doesn't remember things well" feels familiar, I'd love to hear what's clunky about the API. #AI #Python #LangChain #LLM #AgentMemory #BuildInPublic #OpenSource

Why we built tag-graph memory for AI agents — and shipped a Python SDK for it dev.to
Like Comment
To view or add a comment, sign in
Spandana Gajangi
4d
Report this post
AI Beyond the Hype | Part 8: Vector Databases “What is Python used for?” “Is python dangerous?” Same word. Completely different meaning. 👉 In one case → Python = programming language 🧑💻 👉 In another → python = reptile 🐍 We can’t store every possible variation or phrasing. Traditional search fails here because it works on exact match, not meaning. This is where semantic search (search based on meaning) comes in — and that’s where vector databases play a key role. ## 🧠 What is a Vector Database? A vector DB stores data as embeddings (numbers) instead of plain text, so it can search based on meaning. ## 🔢 How data is generated and stored Text → tokens → embeddings Example: “Python is used for backend development” → [0.12, -0.45, 0.78, …] “Python is a dangerous reptile” → [-0.33, 0.91, -0.12, …] These numbers capture meaning, not just words. ## 🔍 How search happens User query → embedding Example: “Python coding” → vector “Is python poisonous” → vector Then system finds vectors that are closest in meaning (not exact match). This is semantic search. ## ⚡ How search is optimized Searching millions of vectors directly is slow. So vector DBs use indexing (ANN – Approximate Nearest Neighbors) and sometimes hashing/partitioning to find nearest vectors quickly. ## 🧩 How prompt-based retrieval works 1. Query → embedding 2. Retrieve relevant chunks 3. Add to prompt 4. LLM generates answer → This is how RAG works internally. ## 🚨 Reality check Vector DB doesn’t understand meaning. It just finds patterns that are mathematically close. ## ⚠️ Challenges Similar ≠ correct Bad embeddings → bad retrieval Needs tuning (top-k, thresholds) Scaling & latency trade-offs ## 💡 Takeaway 👉 “Vector DB doesn’t search words — it searches meaning.” Funny how things work — what felt pointless in school is now the backbone of AI systems
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
1w
Report this post
Workflow Experiment Tracking using pycaret #machinelearning #datascience #workflowexperimenttracking #pycaret PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive. Compared with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more. The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Features PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the hypothesis to insight cycle time in an ML experiment. It enables data scientists to perform end-to-end experiments quickly and efficiently. In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to perform complex machine learning tasks with only a few lines of code. PyCaret is simple and easy to use. PyCaret for Citizen Data Scientists The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are ‘power users’ who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise. Seasoned data scientists are often difficult to find and expensive to hire but citizen data scientists can be an effective way to mitigate this gap and address data science challenges in the business setting. PyCaret deployment capabilities PyCaret is a deployment ready library in Python which means all the steps performed in an ML experiment can be reproduced using a pipeline that is reproducible and guaranteed for production. A pipeline can be saved in a binary file format that is transferable across environments. PyCaret and its Machine Learning capabilities are seamlessly integrated with environments supporting Python such as Microsoft Power BI, Tableau, Alteryx, and KNIME to name a few. This gives immense power to users of these BI platforms who can now integrate PyCaret into their existing workflows and add a layer of Machine Learning with ease. Ideal for : Experienced Data Scientists who want to increase productivity. Citizen Data Scientists who prefer a low code machine learning solution. Data Science Professionals who want to build rapid prototypes. Data Science and Machine Learning students and enthusiasts. https://lnkd.in/g2b_5wTd

GitHub - pycaret/pycaret: An open-source, low-code machine learning library in Python github.com

1 Comment
Like Comment
To view or add a comment, sign in
Joeran Beel
3w
Report this post
At ECIR 2026, we presented #OmniRec, a new open-source Python library for unified RecSys Experiments with popular libraries like #Lenskit, #RecBole, #RecPack, and #Elliot. Proudly hosted by Recommender-Systems.com (RS_c): https://lnkd.in/eSSmvUEb, joint work of Lukas Wegmeth, Moritz Baumgart, Philipp Meister, Bela Gipp, Joeran Beel. OmniRec acts as a central hub for the entire #recsys experimentation pipeline, from loading data to final evaluation. Its modular architecture is designed for transparency and ease of use. If offers: -- Access to 230+ Datasets: The library provides standardized, registered access to a vast library of datasets through a single interface. -- Write Once, Run Anywhere: Users define their preprocessing pipeline, including subsampling, filtering, and splitting strategies, just once. This pipeline then executes consistently across multiple integrated frameworks. -- Seamless Integration: The initial release features custom adapters for leading frameworks: RecPack, RecBole, Lenskit, and Elliot. -- No More Dependency Hell: OmniRec automatically manages isolated virtual environments for each library using the uv tool, preventing version conflicts between different research frameworks. -- Standardized Evaluation: A centralized Evaluator module ensures that metrics like nDCG, Recall, and RMSE are computed identically, regardless of which underlying framework trained the model

Welcome to OmniRec omnirec.recommender-systems.com

2 Comments
Like Comment
To view or add a comment, sign in
Sergio Cuellar
2w
Report this post
🔮 Gemma 4 model family enables local, privacy-focused tool-calling agents. Learn how to build a system using Python and Ollama for dynamic interactions with external functions. #AI #Tech #DevOps

How to Implement Tool Calling with Gemma 4 and Python https://machinelearningmastery.com
Like Comment
To view or add a comment, sign in
GyaanSetu AI (Artificial Intelligence)

878 followers
1w
Report this post
𝗡𝘂𝗺𝗣𝘆 𝗔𝗿𝗿𝗮𝘆𝘀 𝗩𝘀 𝗣𝘆𝘁𝗵𝗼𝗻 𝗟𝗶𝘀𝘁𝘀 You use NumPy arrays often. You might wonder why you need them. Python lists hold numbers. Python lists support indexing. Speed is the main reason. Testing 5 million numbers shows a huge gap. A Python list takes 0.83 seconds. A NumPy array takes 0.0089 seconds. NumPy is 94 times faster. This gap grows with more data. Memory is the secret. Python lists store references to objects. These objects are scattered. To multiply a list, Python visits each object one by one. NumPy arrays store raw numbers in one block. All elements have the same type. NumPy uses C code to process these in parallel. Packing wins. Fixed types provide speed. - int8 uses 1 byte per number. - int64 uses 8 bytes per number. Using int8 saves 8 times more memory. This helps you fit large datasets into RAM. Deep learning models use float32 to save GPU memory. Useful NumPy tools: - linspace: Creates evenly spaced numbers. - Fancy indexing: Picks specific rows without loops. - Boolean masking: Filters data in one line. - Broadcasting: Adds arrays of different shapes. Essential functions: - sum, mean, and std: Fast statistics. - argsort: Finds the rank of items. - vstack and hstack: Combines data matrices. Now you know NumPy. Next is Pandas. Pandas handles labels and messy real world data. Source: https://lnkd.in/gVMVwUyC Optional learning community: https://t.me/GyaanSetuAi
Like Comment
To view or add a comment, sign in
Kinshuk Chawla
3w
Report this post
Day 36: Polymorphism — One Interface, Many Forms 🎭 Polymorphism allows us to write code that doesn't care exactly what object it is talking to, as long as that object knows how to perform the requested action. 1. Function Polymorphism In Python, many built-in functions are polymorphic. They work on different data types because those types all follow a specific "protocol." The len() Example: len("Hello") returns 5 (counts characters). len([1, 2, 3]) returns 3 (counts items). len({"a": 1, "b": 2}) returns 2 (counts keys). 💡 The Engineering Lens: You don't need len_string(), len_list(), and len_dict(). One function handles them all. This makes your code much cleaner and easier to maintain. 2. Operator Polymorphism The same operator can behave differently depending on the objects it is acting upon. This is also called Operator Overloading. The + Operator: 5 + 5 results in 10 (Addition). "Hello " + "World" results in "Hello World" (Concatenation). [1, 2] + [3, 4] results in [1, 2, 3, 4] (List Merging). 💡 The Engineering Lens: Python looks at the "Dunder Methods" (like __add__) inside the class to decide what + should do. You can even make the + operator work for your own custom classes! 3. Class Polymorphism (Duck Typing) This is the most powerful version. In Python, we follow the rule: "If it walks like a duck and quacks like a duck, it’s a duck." If two different classes have a method with the same name, you can loop through them and call that method without checking their type. class Cat: def speak(self): return "Meow" class Dog: def speak(self): return "Woof" # A polymorphic loop for animal in [Cat(), Dog()]: print(animal.speak()) # Python doesn't care if it's a Cat or Dog! 4. Polymorphism with Inheritance Often, a Parent class defines a "Standard" and the Child classes provide their own version. Example: A Shape parent class has a draw() method. Circle, Square, and Triangle all inherit from Shape, but each draw() method is coded differently. 💡 The Engineering Lens: This allows you to create a list of shapes and tell them all to draw(). You don't need to know which is which; they each handle their own logic. #Python #OOP #Polymorphism #SoftwareEngineering #CleanCode #DuckTyping #ProgrammingTips #LearnToCode #TechCommunity #PythonDev
Like Comment
To view or add a comment, sign in
Marcelo Bicalho
3w
Report this post
🐍 If FastAPI changed how you build Python APIs, PydanticAI is doing the same thing for AI agents. Built by the Pydantic team — the library with 10 billion downloads across Python projects — **PydanticAI** reached stable 1.x in late 2025 and has since hit 16,000+ GitHub stars. The design philosophy is the same one that made FastAPI dominant: type safety as the default, not an afterthought. In practice, this means every agent is generic over its **dependency type** and **output type**: ```python from pydantic import BaseModel from pydantic_ai import Agent class OrderSummary(BaseModel): order_id: str total: float items: list[str] agent = Agent( 'anthropic:claude-sonnet-4-6', result_type=OrderSummary, # structured, validated output system_prompt='Summarize the order from the message.', ) result = await agent.run("Order #4421: 2x shirt, 1x shoes, total $148") print(result.data.total) # 148.0 — fully typed, no parsing, no guessing ``` Runtime errors from malformed LLM output move to **write-time** with your IDE catching them before you deploy. That alone saves hours of debugging in production. What makes PydanticAI stand out architecturally in 2026: - **MCP-native**: expose your agents as MCP servers or consume external tools — same protocol as Claude, NVIDIA NemoClaw, and the broader ecosystem - **Streaming structured outputs**: validate progressively as the model generates, not just at the end - **Graph-based workflows**: durable execution across failures, built-in human-in-the-loop - **Logfire integration**: OpenTelemetry-based observability out of the box And the timing is right: Python 3.14 just landed on AWS Lambda, bringing **free-threaded execution** (PEP 779 — the GIL is officially optional). For I/O-bound agent workloads running parallel tool calls, this is the concurrency upgrade the ecosystem has waited years for. Are you building AI agents in Python? What's blocking you from using PydanticAI in production? 👇 Source(s): https://ai.pydantic.dev/ https://lnkd.in/dfHvWJFf https://lnkd.in/d27iyycj https://lnkd.in/dTiG-WmY https://lnkd.in/di-Dk3Xw #Python #PydanticAI #AIAgents #LLM #TypeSafety #SoftwareEngineering #AIEngineering #WebDev
Like Comment
To view or add a comment, sign in
VINODH RAMU
1w
Report this post
🔧 Building AI Agents from Scratch – Part 10: AI Agent Python Library Packaging is live! In this post, I explore how agents can be packaged and shared like any other Python library: ✨ From Scripts to Libraries – agents move beyond ad‑hoc scripts into structured, reusable packages. ✨ Packaging with setup.py / pyproject.toml – standard Python packaging ensures agents can be installed via pip. ✨ Wheel Files (.whl) – agents are compiled into distributable wheels, making installation fast and dependency‑safe. ✨ Distribution via Git – teams can version, share, and collaborate on agents across repositories. ✨ FastAPI Discovery Integration – packaged agents can register themselves automatically, enabling plug‑and‑play orchestration. This series continues to be based entirely on my work experience. It’s not about frameworks—it’s about learning the fundamentals and understanding what they’re built on. 👉 Read Part 10: https://lnkd.in/gAsxewjw If you’re curious about how packaging transforms agents into modular, reusable components, I’d love for you to follow along. #AI #Agents #Python #Packaging #AgenticAI #LearningByDoing

Packaging & Discovery: From AI Agent to Python Library - Everything About Programming https://careerpath-gs.com
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
2w
Report this post
Hyperparameter Optimization Machine Learning using opytimizer #machinelearning #datascience #hyperparameteroptimization #opytimizer Opytimizer is a Python library consisting of meta-heuristic optimization algorithms. Nature-Inspired Python Optimizer Opytimizer: A Nature-Inspired Python Optimizer Latest release DOI Build status Open issues License Welcome to Opytimizer. Did you ever reach a bottleneck in your computational experiments? Are you tired of selecting suitable parameters for a chosen technique? If yes, Opytimizer is the real deal! This package provides an easy-to-go implementation of meta-heuristic optimizations, supporting both single and multi-objective problems. From agents to search space, from internal functions to external communication, from single to multiple objectives, we will foster all research related to optimizing stuff. Use Opytimizer if you need a library or wish to: Create your optimization algorithm; Design or use pre-loaded optimization tasks; Mix-and-match different strategies to solve your problem; Because it is fun to optimize things. Opytimizer is compatible with: Python 3.6+. https://lnkd.in/gAimXdZu

GitHub - recogna-lab/opytimizer: 🐦 Opytimizer is a Python library consisting of meta-heuristic optimization algorithms. github.com
Like Comment
To view or add a comment, sign in

2,248 followers

148 Posts

View Profile Connect

Google LangExtract Simplifies LLM and Document Processing

More Relevant Posts

Explore related topics

Explore content categories