Symparse: Neurosymbolic AI Data Extraction Tool

View organization page for Aftermath Technologies Ltd

5 followers

2mo Edited

Symparse is a self-optimizing Unix pipeline tool that routes data between an AI Path (using local LLMs via litellm) and a Fast Path (using cached, sandboxed re2-based Python extraction scripts) with a strict neurosymbolic JSON validation gate. You get the magical, unstructured data extraction of Large Language Models, with the raw performance and ReDoS-safety of sandboxed Python scripts wrapping re2 on 95% of subsequent matched traffic. https://lnkd.in/gkk-jaPw #Symparse #Neurosymbolic #NeurosymbolicAI #LLM #LocalLLM #litellm #Unix #UnixPipeline #DataExtraction #UnstructuredData #Python #re2 #RE2 #Sandbox #JSONValidation #OpenSource #GitHub #DevTools #HybridAI #FastPath #AIPipeline

GitHub - Aftermath-Technologies-Ltd/symparse: A self-optimizing Unix pipeline tool that extracts unstructured data over stdin using local LLMs, automatically compiling ReDoS-proof fast-path Regex on the fly for 95% faster subsequent runs. github.com

To view or add a comment, sign in

More Relevant Posts

Mark Hornick
1mo
Report this post
🌶️ 💪 Modern API workloads aren’t “one user, one request” anymore—they’re bursts of concurrent traffic, mixed fast/slow calls, and unforgiving tail-latency expectations. That’s why I’m excited to share our new post on Select AI for Python 1.3 and a major step forward for production-grade concurrency: connection pooling. https://lnkd.in/eze4sUCb With 1.3, developers can now pool connections using: select_ai.create_pool() select_ai.create_pool_async() In the blog, learn what changed from standalone connections, what we measured by integrating pooling into a FastAPI service, and how to think about choosing a pool size that fits your workload. The results: better throughput, improved p95/p99 latency, and more predictable behavior under load—exactly what matters in real-world services. If you’re running (or planning) concurrent Python services with Select AI, this is one of the simplest, highest-impact upgrades you can make. #Oracle #Database #SelectAI #OracleAI #Python #FastAPI #Concurrency #ConnectionPooling

Boosting Select AI for Python Concurrency with Connection Pooling blogs.oracle.com

1 Comment
Like Comment
To view or add a comment, sign in
Kumar Rajamani
1mo
Report this post
🚀 Boosting AI app concurrency with smarter database connections The latest update to Oracle Autonomous Database Select AI for Python shows how connection pooling can significantly improve concurrency and throughput for AI-powered applications. Instead of opening a new database connection for every AI request, connection pools reuse a small set of connections, reducing overhead and enabling many concurrent AI calls from Python apps. Why it matters: ⚡ Higher concurrency for AI workloads 🔁 Reused connections reduce latency and overhead 🧠 Better performance for NL2SQL, RAG, and generative AI apps built on Autonomous Database For developers building AI-driven data apps in Python, this means more scalable, responsive AI pipelines with minimal code changes. #AI #Python #Databases #GenAI #AutonomousDatabase

Mark Hornick
1mo

🌶️ 💪 Modern API workloads aren’t “one user, one request” anymore—they’re bursts of concurrent traffic, mixed fast/slow calls, and unforgiving tail-latency expectations. That’s why I’m excited to share our new post on Select AI for Python 1.3 and a major step forward for production-grade concurrency: connection pooling. https://lnkd.in/eze4sUCb With 1.3, developers can now pool connections using: select_ai.create_pool() select_ai.create_pool_async() In the blog, learn what changed from standalone connections, what we measured by integrating pooling into a FastAPI service, and how to think about choosing a pool size that fits your workload. The results: better throughput, improved p95/p99 latency, and more predictable behavior under load—exactly what matters in real-world services. If you’re running (or planning) concurrent Python services with Select AI, this is one of the simplest, highest-impact upgrades you can make. #Oracle #Database #SelectAI #OracleAI #Python #FastAPI #Concurrency #ConnectionPooling

Boosting Select AI for Python Concurrency with Connection Pooling blogs.oracle.com
Like Comment
To view or add a comment, sign in
Serhiy Menshykh
2mo
Report this post
Microsoft Agent Framework now supports Agent Skills for both .NET and Python! 🧩 Your agents can now discover and load portable skill packages on demand - gaining domain expertise without bloating their context window. A skill is as simple as a folder with a SKILL.md file. No changes to your agent's core instructions needed. Learn more: https://lnkd.in/dW_-Tpqf #AI #AgentFramework #AgentSkills

Give Your Agents Domain Expertise with Agent Skills in Microsoft Agent Framework | Semantic Kernel https://devblogs.microsoft.com/semantic-kernel

1 Comment
Like Comment
To view or add a comment, sign in
Miroslav Vadkerti

Senior Principal Software Engineer at Red Hat
1mo
Report this post
A regex that fits on one line stalled 30 CI jobs for hours. We got a support report — dozens of Testing Farm jobs stuck in "running" state with no signs of progress. VMs were provisioned, SSH connections established, but nothing was happening. The culprit? A regex pattern in tmt's test framework that scans test output for lines containing "error" or "fail". Works fine on normal logs. But the test output from a container build pipeline contained base64-encoded in-toto attestation payloads — single lines over 1,000,000 characters long. On those lines, the greedy wildcard anchors on both sides of the pattern cause the Python regex engine to backtrack through every position before giving up. Even 10,000 characters took 5.4 seconds. The full line would take hours. The fix was straightforward — process line by line with a simpler search instead of running the greedy pattern against the entire file. Same results, completes in 1.1 seconds instead of never. Lessons: - Greedy wildcards on both sides of a regex pattern are a backtracking time bomb waiting for the right input - py-spy is invaluable for diagnosing stuck Python processes in production - The bug that takes down 30 jobs can be a single line of code Full write-up with the debugging steps: https://lnkd.in/gENBz8ZA #Python #Regex #Debugging #CI #OpenSource #SoftwareEngineering

How a Single Regex Stalled 30 Testing Farm Jobs for Hours vadkerti.net

1 Comment
Like Comment
To view or add a comment, sign in
Shweta Suman
2mo
Report this post
Processing 1.54 billion pixels with Python multiprocessing taught me why distributed systems are hard. Talked about serialization overhead, data skew, the straggler problem, and when parallelism actually helps here: https://lnkd.in/edufD9GW #Python #DistributedSystems #Multiprocessing

What 1.54 Billion Pixels Taught Me About Distributed Systems pinksocks.xyz

2 Comments
Like Comment
To view or add a comment, sign in
Vinood Pungle
1mo Edited
Report this post
Just published a deep-dive on Hybrid RAG: why combining vector search with a knowledge graph produces better, hallucination-resistant answers than standard RAG alone. Full Python implementation with FAISS, Neo4j, FastAPI, and LLMOps included. 👇 https://lnkd.in/d7uqbmMT

Why Your RAG System Needs a Knowledge Graph (How to Build) vinood.hashnode.dev
Like Comment
To view or add a comment, sign in
Dipankar Mazumdar
1mo
Report this post
Apache Iceberg Python (PyIceberg) had its 0.11.0 release some days back 🎉 PyIceberg is increasingly becoming the entry point for programmatic access to Iceberg tables from data applications, orchestration layers & emerging AI/agent workflows. And the project is growing on all ends! If you have missed this 0.11.0 release, here are a few highlight items to take a look. ✅ DeleteFileIndex for faster delete-file lookups In Iceberg tables with deletes, locating relevant delete files efficiently matters a lot for scan performance. This release introduces a DeleteFileIndex implementation to accelerate lookup of delete files during scans. ✅ Generator-based writes to reduce memory pressure Generator-based writes mean PyIceberg can handle writes in a more streaming-oriented way instead of materializing everything eagerly in memory. For Python workloads, where memory pressure becomes a real bottleneck quickly, this is huge. ✅ Snapshot management improvements You can now roll back to a specific snapshot ID, roll back to a point in time, and set the current snapshot directly. ✅ Full ORC read support in the PyArrow I/O layer Full ORC read support broadens the kinds of Iceberg tables and files PyIceberg can interact with, which matters in mixed-engine environments where ORC still shows up in production. ✅ Sort order evolution on existing tables Sort order can now be updated on existing tables without recreating them. That is a pretty important table evolution capability, because sort order can materially affect layout and read efficiency. ✅ REST scan planning PyIceberg can now use server-side scan planning through REST catalogs, where the client sends a scan request and the server returns file scan tasks. That is a big step toward thinner clients and more catalog-driven execution patterns. Read the complete release notes in comments. #dataengineering #softwareengineering
3 Comments
Like Comment
To view or add a comment, sign in
martinuke0
1mo
Report this post
Hi! Mastering Asynchronous Worker Patterns in Python for High‑Performance Data Processing Pipelines Modern data‑intensive applications—real‑time analytics, ETL pipelines, machine‑learning feature extraction, and event‑driven microservices—must move massive volumes of data through a series of transformations while keeping latency low and resource utilization high. In Python, the traditional “one‑thread‑one‑task” model quickly becomes a bottleneck, especially when a pipeline mixes I/O‑bound work (network calls, disk reads/writes) with CPU‑bound transformations (parsing, feature engineering). Enter asynchronous worker patterns. By decoupling the production of work items from their consumption, and by leveraging Python’s `asyncio` event loop together with thread‑ or process‑based executors, developers can build pipelines that: Scale horizontally** across cores without the overhead of heavyweight processes. Read the full guide: https://lnkd.in/dhj64Aut #python #asynchronous #dataprocessing #performance #concurrency

Mastering Asynchronous Worker Patterns in Python for High‑Performance Data Processing Pipelines martinuke0.github.io
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
1mo
Report this post
Machine Learning Data Visualization using tSNE #machinelearning #datascience #datavisualization #opentsne Open tSNE is a modular Python library for t-SNE dimensionality reduction and embedding | Extensible, parallel implementations of t-SNE | openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE), a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings, massive speed improvements, enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations https://lnkd.in/g-G-nmhn

GitHub - pavlin-policar/openTSNE: Extensible, parallel implementations of t-SNE github.com
Like Comment
To view or add a comment, sign in
Steven Loewy
1mo
Report this post
To all of #LangChain fans, I am happy to say Cockroach Labs’ integration is now GA. The integration provides out-of-the-box support for #CockroachDB as a vector source for any LangChain user using LangChain Python. This means you get all of the simplicity and orchestration provided by LangChain and the horizontal scale and never-down availability provided by CockroachDB. Enjoy! https://lnkd.in/gXGsTKB2

Agent Development with CockroachDB using the LangChain Framework cockroachlabs.com
Like Comment
To view or add a comment, sign in

5 followers

View Profile Connect

Symparse: Neurosymbolic AI Data Extraction Tool

More Relevant Posts

Explore related topics

Explore content categories