Nutrient's Python SDK for Document Processing and AI

A lot of AI work around documents is already happening in Python. That’s a big reason Nutrient’s new Python SDK matters. If you’re building an app where a model needs to interact with documents, the model is only part of the equation. OCR, data extraction, conversion, and structured output are what make those workflows usable in production. This release gives teams a stronger foundation for building document-heavy AI applications in Python. #Python #AI #DocumentProcessing #OCR #DataExtraction

Nutrient Python SDK: Production-grade document processing for Python nutrient.io

To view or add a comment, sign in

More Relevant Posts

Nutrient

3,594 followers
4w
Report this post
Python developers have been duct-taping together PyPDF2, Tesseract, Pillow, and three other libraries to process documents. There's a better way. Nutrient Python SDK brings production-grade document processing to Python in a single, Pythonic API — conversion, OCR in 100+ languages, template-based generation, redaction, digital signatures, and async support for Django, Flask, and FastAPI. Built to handle multi-GB documents with disk streaming, no cobbled-together dependencies required. https://twp.ai/9PbA5x

Nutrient Python SDK: Production-grade document processing for Python nutrient.io

1 Comment
Like Comment
To view or add a comment, sign in
Aish Singh
3w
Report this post
If you've ever struggled with document processing in Python, this new SDK is designed to replace that mess of different libraries with one clean API.

Nutrient

3,594 followers
4w

Python developers have been duct-taping together PyPDF2, Tesseract, Pillow, and three other libraries to process documents. There's a better way. Nutrient Python SDK brings production-grade document processing to Python in a single, Pythonic API — conversion, OCR in 100+ languages, template-based generation, redaction, digital signatures, and async support for Django, Flask, and FastAPI. Built to handle multi-GB documents with disk streaming, no cobbled-together dependencies required. https://twp.ai/9PbA5x

Nutrient Python SDK: Production-grade document processing for Python nutrient.io
Like Comment
To view or add a comment, sign in
Pascal Biese
1w
Report this post
Most agent frameworks tightly couple workflow logic with Python code. AgentSPEX is a dedicated specification language for LLM-agent workflows. Instead of burying control flow inside Python scripts, AgentSPEX makes it explicit. Typed steps, branching, loops, parallel execution, reusable submodules, and state management all live in a readable spec - separate from the execution layer. The agent harness underneath handles tool access, sandboxed environments, checkpointing, and verification. It's the difference between editing a blueprint and rewiring a building. The team evaluated AgentSPEX across 7 benchmarks and ran a user study comparing it against a popular existing agent framework. Users found AgentSPEX workflows significantly more interpretable and accessible to author. The project also ships with ready-to-use agents for deep research and scientific research tasks, plus a visual editor that synchronizes graph and workflow views in real time. The practical upside here is maintainability. Current orchestration tools like LangGraph, DSPy, and CrewAI give you structure, but modifying a workflow still means modifying code. A dedicated spec language means non-engineers can inspect, edit, and verify agent behavior without touching the runtime. The real question: will teams adopt a new language when Python already works? If the interpretability gains hold up in production, the answer might be yes - especially when debugging a failing 15-step agent pipeline at 2 AM. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

5 Comments
Like Comment
To view or add a comment, sign in
Anupam Dutta
5d
Report this post
Understanding Asyncio Internals: How Python Manages State Without Threads A question I keep hearing from devs new to async Python: “When an async function hits await, how does it pick up right where it left off later with all its variables intact?” Let’s pop the hood. No fluff, just how it actually works. The short answer: An async function in Python isn’t really a function – it’s a stateful coroutine object. When you await, you don’t lose anything. You just pause, stash your state, and hand control back to the event loop. What gets saved under the hood? Each coroutine keeps: 1. Local variables (like x, y, data) 2. Current instruction pointer (where you stopped) 3. Its call stack (frame object) 4. The future or task it’s waiting on This is managed via a frame object, the same mechanism as generators, but turbocharged for async. Let’s walk through a real example async def fetch_data(): await asyncio.sleep(1) # simulate I/O return 42 async def compute(): a = 10 b = await fetch_data() return a + b Step‑by‑step runtime: 1. compute() starts, a = 10 2. Hits await fetch_data() 3. Coroutine captures its state (a=10, instruction pointer) 4. Control goes back to the event loop 5. The event loop runs other tasks while I/O happens 6. When fetch_data() completes, its future resolves 7. compute() resumes from the exact same line b gets the result (42) 8. Returns 52 No threads. No magic. Just a resumable state machine. Execution flow: Imagine a simple loop: pause → other work → resume on completion.) Components you should know: Coroutine: holds your paused state Task: wraps a coroutine for scheduling Future: represents a result that isn’t ready yet Event loop: the traffic cop that decides who runs next Why this matters for real systems This design is why you can build high‑concurrency APIs, microservices, or data pipelines without thread overhead. Frameworks like FastAPI, aiohttp, and async DB drivers rely on this every single day. Real‑world benefit: One event loop can handle thousands of idle connections while barely touching the CPU. A common mix‑up “Async means parallel execution.” Not quite. Asyncio gives you concurrency (many tasks making progress), not parallelism (multiple things at the exact same time). It’s cooperative, single‑threaded, and preemption‑free. Take it with you Python async functions = resumable state machines. Every await is a checkpoint. You pause, but you never lose the plot. #AsyncIO #PythonInternals #EventLoop #Concurrency #BackendEngineering #SystemDesign #NonBlockingIO #Coroutines #HighPerformance #ScalableSystems #FastAPI #Aiohttp #SoftwareArchitecture #TechDeepDive
Like Comment
To view or add a comment, sign in
Obiageli Innocent
3w
Report this post
Day 15/30 - for Loops in Python What is a for Loop? A for loop is used to iterate — to go through every item in a sequence one by one and execute a block of code for each item. Instead of writing the same code 10 times, you write it once and let the loop repeat it automatically. The loop stops when it has gone through every item. The Golden Rule: A for loop works on any iterable — any object Python can step through one item at a time. This includes lists, tuples, strings, dictionaries, sets, and ranges. Syntax Breakdown for item in iterable: item -> This is a temporary variable holding the current item on each loop , you name it anything in -> It's the keyword that connects the variable to the iterable , always required iterable → the collection being looped - list, tuple, string, range, dict, set 1. How It Works, Step by Step 2. Python looks at the iterable and picks the first item 3. It assigns that item to your loop variable 4. It runs the indented block of code using that item 5. It moves to the next item and repeats steps 2–3 6. When there are no more items, the loop ends automatically The range() Function The range() generates a sequence of numbers for looping. The stop value is always excluded: range(5) -> 0, 1, 2, 3, 4 range(2, 6) -> 2, 3, 4, 5 range(0, 10, 2) -> 0, 2, 4, 6, 8 range(10, 0, -1) -> 10, 9, 8 ... 1 What You Can Loop Over List → loops through each item String → loops through each character one by one Tuple → same as list — goes item by item Dictionary → loops through keys by default; use .items() for key and value Range → loops through a sequence of generated numbers Set → loops through unique items (order not guaranteed) Tip: Use a name that makes the code readable — for fruit in fruits, for name in names, for i in range(10). i is the convention for index-style loops. Key Learnings ☑ A for loop iterates through every item in a sequence — running the same block for each one ☑ range(start, stop, step) generates numbers .Stop is always excluded ☑ You can loop over lists, strings, tuples, dicts, sets, and ranges ☑ The loop variable is temporary , holds the current item on each pass ☑ Indentation matters , only the indented block runs inside the loop Why It Matters: Loops are what turn Python from a calculator into an automation tool. Processing 10,000 sales records, sending emails to every customer, checking every row in a database - all of it uses loops. Writing code once and letting it repeat is one of the most powerful ideas in programming. My Takeaway Before loops, I was writing the same thing over and over. Now I write it once and Python handles the rest. That's what automation actually means - not robots, just smart repetition. #30DaysOfPython #Python #LearnToCode #CodingJourney #WomenInTech
Like Comment
To view or add a comment, sign in
Namaste FPGA Technologies

12,890 followers
3w Edited
Report this post
COCOTB 2.0 → Python Co-simulator that becomes more Pythonic compared to 1.0 cocotb 2.0, released in Sep 2025, removes most deprecated features to make the API cleaner and consistent with Python 3+. Python has the yield keyword that return values on demand. cocotb 1.0 exploited this behavior to pause coroutine execution and resume it whenever an event occurred in the hdl simulator. However, this approach did not align well with native Python IDEs and tooling. cocotb 2.0 removed generators completely and switched to native async/await, which naturally supports the pause and resume capabilities required for RTL verification where we wait for simulator events and then perform actions. second major change is replacing fork with start_soon. With fork, if the scheduler is currently executing a coroutine and fork is called, the new coroutine may start executing immediately without waiting for the current coroutine to finish. This could lead to inconsistent trigger ordering issues. start_soon instead queues the coroutine and allows the current coroutine to complete before the new coroutine begins execution. cocotb 2.0 also allows awaiting tasks directly. A coroutine can be awaited to wait for its completion, and tasks can be cancelled using cancel(), which raises a CancelledError exception inside the coroutine which allows coroutine to perform proper cleanup. BinaryValue was the default data type used when accessing signals of the DUT. BinaryValue stored values as binary strings, which meant the HDL bit order and Python indexing were effectively reversed hence we need to convert value to a string and reverse it before using. Accessing individual bits also required converting to a string, reversing and then indexing. This process was error-prone. cocotb 2.0 removed BinaryValue and introduced Logic and LogicArray types. These provide consistent indexing with HDL and allow direct access to individual bits without string conversion. BinaryValue supported only binary strings and therefore did not fully support all 9 logic levels used in HDL. Values other than 0 and 1 are silently converted to 0, which sometimes caused mismatches between DUT and testbench values. Logic and LogicArray support all 9 logic levels. The Clock class now supports period_high, allowing variable duty cycles. In cocotb 1.0, TestFactory was used to generate multiple tests and failures were reported by raising special cocotb exceptions. cocotb 2.0 introduces decorators similar to pytest and uses normal Python assertions for test failures, aligning test writing with standard Python testing practices. IPC mechanisms has also been simplified, and Event objects no longer require name or data attributes. With these changes, cocotb 2.0 becomes more powerful and easier to use compared to cocotb 1.0. We are happy to launch COCOTB 2.0 foundation course to help you get familiar with the newer cocotb 2.0 and learn how to write cleaner, more Pythonic tb. Explore here : https://lnkd.in/dFvsAM_n
Like Comment
To view or add a comment, sign in
SELVASUNDAR RAJAN
3w
Report this post
✅ *Core Python Interview Questions With Answers* 🐍 1 What is Python - Interpreted, high-level programming language - Created by Guido van Rossum in 1991 - Used for web dev, data analysis, automation, AI 2 What is an interpreter - Executes code line-by-line without compilation - Python uses CPython as default interpreter - Faster for development, slower runtime than compiled languages 3 What are variables - Named storage for data values - Dynamically typed: type inferred at runtime - Example: age = 30 #(int) name = "Bonus" #(str) 4 What are data types - Built-in types: int, float, str, bool, list, tuple, dict, set - Mutable: list, dict, set (can change contents) - Immutable: int, str, tuple (cannot change after creation) 5 What is a list - Ordered, mutable collection of items - Allows duplicates, indexed from 0 - Example: customers = ["A", "B", "A"] 6 What is a dictionary - Unordered key-value pairs (ordered since Python 3.7) - Keys unique, values any type - Example: user = {"id": 1, "name": "Bonus"} 7 Difference between list and tuple - List mutable [], Tuple immutable () - List slower, Tuple faster and hashable - Use tuple for fixed data like coordinates 8 What are loops - For: iterate sequences (for i in range(5)) - While: condition-based (while x < 10) - Used for repeating tasks efficiently 9 What are functions - Reusable code blocks defined with def - Can take parameters, return values - Example: def greet(name): return f"Hello {name}" 10 Interview tip you must remember - Always explain with code example - Discuss time complexity (O(1), O(n)) - Practice on LeetCode for data roles
Like Comment
To view or add a comment, sign in
GAYATHRI TWISHA VUDDAGIRI
1w
Report this post
🚀 Day 8 to10 — Python Full Stack Training | Conditional statements 🐍 Condition statements in Python are fundamental constructs used to control the flow of execution in a program. They enable decision-making by executing specific blocks of code based on whether given conditions evaluate to True. 1️⃣ if Statement The simplest form, used to execute a block of code only when a condition is satisfied. Example: x = 10 if x > 5: print("x is greater than 5") 2️⃣ if-else Statement Provides an alternative path of execution when the condition is not satisfied. Example: x = 2 if x > 5: print("Condition is True") else: print("Condition is False") 3️⃣ if-elif-else Structure Used when multiple conditions need to be evaluated in sequence. Python executes the first block where the condition is True. Example: score = 78 if score >= 90: print("Excellent") elif score >= 75: print("Good") elif score >= 50: print("Average") else: print("Needs Improvement") 4️⃣ Multiple if Statements In some scenarios, conditions need to be evaluated independently rather than exclusively. Using multiple if statements ensures each condition is checked regardless of others. Example: x = 15 if x > 10: print("Greater than 10") if x % 5 == 0: print("Divisible by 5") 5️⃣ Nested if Statements A nested structure allows you to place one condition inside another, enabling more granular decision-making. Example: x = 18 if x > 10: if x < 20: print("x is between 10 and 20") else: print("x is 20 or more") else: print("x is 10 or less")
Like Comment
To view or add a comment, sign in
Sandeep Prajapati
1mo
Report this post
Tutorial Python and production Python look nothing alike. And that's why most junior engineers struggle to ship real systems. Here's the difference: ❌ Tutorial code: "Make it work" ✅ Production code: "Make it readable, maintainable, reliable" I used to write clever one-liners. Then I opened a file I wrote 6 months ago and thought: "What is this?" That's when I learned the Three Pillars of Production Python: 📖 Readability → Can someone understand this without asking? 🔧 Maintainability → Can someone safely change this? 🛡️ Reliability → Does this handle edge cases gracefully? Example: Instead of this: # ❌ Tutorial style def process(d): r = [] for x in d: if x[2]: r.append(x[0] * x[1]) return r # ✅ Production style def calculate_line_totals(items: list[dict]) -> list[float]: """Return total price for each active line item.""" return [ item["price"] * item["quantity"] for item in items if item.get("is_active") ] Same logic. Different outcome: one survives a team handoff. The other doesn't. Why this matters for AI engineering: 1. Model endpoints need type hints for clear contracts 2. Data pipelines need maintainable transformations 3. Every AI product is Python + infrastructure + impact Master production patterns. Ship systems that last. 👇 What's a "boring" pattern you rely on in production? #Python #BackendEngineering #CleanCode #AIEngineering #LearnInPublic #CyberSecurity
Like Comment
To view or add a comment, sign in
Pamela Fox
1w
Report this post
In this week's Python + AI Office Hours, we spent most of the session exploring how to extract content from PDFs using Python. We tried a few approaches: 📄 MarkItDown, PyMuPDF: Two free, open-source Python packages that work well for straightforward documents. We tested both on a complex PDF and they each struggled in different ways. MarkItDown also has an OCR plugin for image descriptions, which we got working with an Azure OpenAI GPT-5.4 model. The entity-extraction repo is a great place to try out those packages: https://lnkd.in/gNWm4DUt 📄 Azure Document Intelligence: This cloud service takes things further by extracting figures, tables, and text separately. In our RAG repo, we also added an LLM description step to generate alt-text-style descriptions for each extracted figure. For the complex PDF, this combination of structured extraction + LLM descriptions gave the best output. In the attached screenshot, you can see a page from the PDF next to its extracted chunk. That code is in the RAG repo, in pdfparser.py and mediadescriber.py : https://lnkd.in/gPd8A8rv Or, just fork the repo and let the ingestion pipeline do its thing. Remember: Always set up evaluations for your data! Document extraction quality can vary widely depending on the structure and content of your documents. See the recording and questions here: https://lnkd.in/guKt8P2E
4 Comments
Like Comment
To view or add a comment, sign in

829 followers

24 Posts

View Profile Connect

Nutrient's Python SDK for Document Processing and AI

More Relevant Posts

Explore content categories