Building Agentic Search Service with Spring Boot 3 and LangChain4j

🚀 Built a production-grade Agentic Search Service from scratch using Spring Boot 3 + LangChain4j What started as a simple CRUD API evolved into an intelligent search system that decides HOW to search based on what you ask. 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝗲𝗮𝗿𝗰𝗵? Instead of always running the same query, the system classifies your intent first — then picks the right strategy automatically. "laptop" → keyword search "something portable for work" → semantic vector search "laptops under 500 with 16GB" → LLM extracts filters → structured query "good stuff" → asks for clarification 𝗧𝗲𝗰𝗵 𝗦𝘁𝗮𝗰𝗸 → Spring Boot 3 + Java 17 → LangChain4j + Groq (llama-3.3-70b) for intent classification → AllMiniLmL6V2 local embedding model (zero API cost) → pgvector on PostgreSQL for semantic similarity search → Redis for distributed caching → Apache Kafka for async write pipeline → HikariCP with primary/replica DB routing → Docker Compose for local infrastructure 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 → @Transactional(readOnly=true) routes reads to replica automatically via LazyConnectionDataSourceProxy → Redis cache with toggle flag — on/off without code changes → Kafka async writes with 202 Accepted — DB pressure decoupled from API latency → Paginated reads with configurable sort → Input validation with field-level 400 error responses 𝗞𝗲𝘆 𝗗𝗲𝘀𝗶𝗴𝗻 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻𝘀 → LazyConnectionDataSourceProxy — without this, read/write routing silently breaks → AOP proxy ordering — @Transactional must wrap before @Cacheable fires → Embeddings generated at write time, not search time — semantic search stays O(1) → Kafka/cache toggleable via properties — same codebase, different behaviour per environment 𝗪𝗵𝗮𝘁 𝗜 𝗟𝗲𝗮𝗿𝗻𝗲𝗱 Building this end-to-end showed me that the gap between a working API and a production-ready service is filled with decisions most tutorials skip — connection pool tuning, proxy ordering, embedding lifecycle, broker networking in Docker. The agentic layer on top made it clear how LangChain4j's AiServices turns an LLM into a typed Java method — no boilerplate, no JSON parsing, just an interface and annotations. #Java #SpringBoot #LangChain4j #AI #Kafka #Redis #PostgreSQL #pgvector #SystemDesign #BackendEngineering

To view or add a comment, sign in

More Relevant Posts

Tony Kim
1mo
Report this post
Written by Luke Thompson - MongoDB Champion published on Friends of OpenJDK (Foojay.io), learn how to build a Java faceted full-text search API! In the tutorial, he'll walk through using a interesting dataset which showcases how you can effectively pair machine learning/AI-generated data with more traditional search to produce fast, cheap, repeatable, and intuitive search engines. Dive in here 👉 https://lnkd.in/gm6a2Y77 #mongodb #java #nosql #database #atlas

Java Faceted Full-Text Search API Using MongoDB Atlas Search https://foojay.io
Like Comment
To view or add a comment, sign in
Gagan Ubbey
2w
Report this post
🚀 Solving a Hidden Tech Debt Problem in MongoDB-backed Microservices If you’ve worked with MongoDB aggregation pipelines in microservices, you’ve probably seen this pattern: complex, multi-stage queries hardcoded as raw strings inside Java code. It works… until it becomes painful to maintain. Here’s what we started running into: ❌ Pipeline stages built by manually concatenating strings with dynamic values ❌ Repeated boilerplate across multiple services ❌ Fragile string-based injection (special characters breaking queries silently) ❌ No clear visibility into what queries were actually running ❌ Onboarding pain — new developers had to trace Java code just to understand the database logic So we made a small shift. We built a lightweight utility to externalize MongoDB aggregation pipelines into versioned JSON files (one per module), with support for typed runtime parameters using a simple {{placeholder}} syntax. Here’s what improved: ✅ Pipelines became data, not code — stored as JSON, easy to read and reason about ✅ Type-safe parameter injection — integers stay integers, lists stay lists (no manual escaping) ✅ Auto-discovery at startup — drop a new JSON file in the right place and it’s picked up automatically ✅ Cleaner DAO layer — just call getPipeline("query_key", params) and execute ✅ Better code reviews — query changes show up as clean JSON diffs, not escaped Java strings The biggest win? The people who understand the business logic can now review and reason about queries directly — without digging through Java code. Sometimes small architectural changes remove a surprising amount of friction. This one took a few hours to build and is already paying off in maintainability and developer productivity. Curious — how are you managing complex database queries in your services? #Java #SpringBoot #MongoDB #SoftwareEngineering #Microservices #BackendArchitecture #CleanCode #TechDebt #DeveloperProductivity
Like Comment
To view or add a comment, sign in
Mindy Ferguson
5d
Report this post
Java 11 standard support ends later this year. If your Flink jobs are still running on 1.x, you’re heading toward an unsupported runtime while your competitors are already running ML inference natively in SQL. Francisco Morillo from #AWS just published the step-by-step migration guide for Flink 2.2 on Amazon Managed Service for Apache Flink. The upgrade is in-place. You don’t blow away your application — you update the runtime, point to a new JAR, and the service handles the rest. Auto-rollback kicks in automatically if binary incompatibilities are detected at startup. The part most teams will get burned by: Kryo. The serializer upgraded from 2.24 to 5.6, which breaks state compatibility for POJOs using Java collections (HashMap, ArrayList, HashSet). If your app uses those and you’re relying on Kryo fallback, your upgrade “succeeds” — then you enter restart loops. Check your logs for Class class <className> cannot be used as a POJO type before you touch production. The upside of getting through this: RocksDB 8.10.0 gives you measurably faster checkpoints and recovery. And ML_PREDICT + CREATE MODEL means you can call ML models directly from SQL — no separate inference layer to maintain. What’s your current Flink version? Still on 1.x or already evaluating 2.x? #ApacheFlink #Flink https://lnkd.in/e2gWJPwv

Migrate to Apache Flink 2.2 on Amazon Managed Service for Apache Flink | Amazon Web Services aws.amazon.com

1 Comment
Like Comment
To view or add a comment, sign in
Nischal koirala
2w
Report this post
🚀 DevOps Day 24 | Multi Container Deployment | Java + MySQL | Part 2 After learning Docker Compose basics, I moved into real production architecture. Today I deployed: • Java Backend Application • MySQL Database • Multi Container Setup This is where DevOps becomes real-world engineering. My application.properties: spring.datasource.url=jdbc:mysql://mysqldb:3306/bank spring.datasource.username=bankuser spring.datasource.password=bankpass123 Notice something interesting? Instead of localhost, I used: mysqldb Why? Because Docker Compose automatically creates internal networking between containers. Now My Docker Compose File: version: '3.8' services: mysqldb: image: mysql:8 container_name: mysql environment: MYSQL_ROOT_PASSWORD: admin@123 MYSQL_DATABASE: bank MYSQL_USER: bankuser MYSQL_PASSWORD: bankpass123 volumes: - ./mysql_data:/var/lib/mysql backend: image: bank container_name: bankapp build: context: ./backend environment: SPRING_DATASOURCE_URL: jdbc:mysql://mysqldb:3306/products SPRING_DATASOURCE_USERNAME: bankuser SPRING_DATASOURCE_PASSWORD: bankpass123 ports: - "8080:8080" depends_on: - mysqldb Key Learnings: mysqldb Database container backend Java application container depends_on Ensures MySQL starts before Java environment Inject runtime variables ports Expose service externally This setup allowed my Java application to communicate with MySQL automatically. This is exactly how Microservices Architecture works. And honestly… this felt like real DevOps work. Part 3 Coming Next Volumes + Production Level Data Persistence GitHub Repo https://lnkd.in/gjw9Fuxe #DevOps #DockerCompose #Java #MySQL #Microservices #CloudEngineering
Like Comment
To view or add a comment, sign in
Friends of OpenJDK (Foojay.io)

5,447 followers
1mo
Report this post
New on Foojay: Java Faceted Full-Text Search API Using MongoDB Atlas Search. Luke Thompson shows you how to build a faceted full-text search API with Java and MongoDB Atlas Search. The article covers: • Setting up MongoDB Atlas Search indexes • Implementing faceted search functionality • Building a REST API with Java • Handling search queries and filtering results Perfect for developers looking to add powerful search capabilities to their Java applications without the complexity of managing separate search infrastructure. Read the full article here: https://lnkd.in/emJPYwep #Java #MongoDB #SearchAPI #FullTextSearch #foojay

Java Faceted Full-Text Search API Using MongoDB Atlas Search https://foojay.io
Like Comment
To view or add a comment, sign in
Vaibhav Hawale
2w
Report this post
🚀 I built a production-grade Workflow Execution Engine from scratch — a mini GitHub Actions clone. Built to understand what no tutorial ever explains. A real, end-to-end distributed system taking a codebase from webhooks to isolated Docker execution, streaming the results live to the browser. Here is exactly how it works under the hood: 1️⃣ You run `git push` on any linked repository. 2️⃣ GitHub fires an HMAC-secured webhook to my Node.js engine. 3️⃣ A background job is queued in Redis (via Bull) to prevent server blocking. 4️⃣ A separate Worker process picks up the job, clones the repo, reads the `.pipeline.json` config, and executes each pipeline step inside a fresh, ephemeral Docker container. 5️⃣ Every standard output log ([stdout] and [stderr]) streams live to the React dashboard using a WebSocket connected to a Redis Pub/Sub channel. 6️⃣ Run history, statistics, and pipeline metrics are persisted in PostgreSQL. 7️⃣ On pipeline failure, an automated Slack Block Kit notification fires with a direct link to the failed logs. ⚙️ The Tech Stack: • Backend: Node.js, Express, WebSockets, Bull Queue • Infrastructure: Docker, Redis, PostgreSQL (orchestrated via Docker Compose) • Frontend: React 18, Vite, Tailwind CSS, Recharts • Security: JWT Auth + RBAC, HMAC Webhook verification This project taught me more about distributed systems than any course ever could. I had to solve real engineering problems like: 👉 How do you safely stream live terminal output across 4 network hops without blocking the main event-loop? 👉 How do you handle crash recovery if the worker dies mid-execution? (Built recoverStuckJobs() to re-queue stuck runs). 👉 How do you support testing code in any language? (Pipelines dynamically pull specific Docker images like python:3.11-alpine or node:20-alpine per step). If you are a developer who has ever wondered how GitHub Actions or Jenkins actually works under the hood — try building one. It will completely change how you view CI/CD. This is FlowForge. Check out the GitHub repository link in the comments below! 👇 #SystemDesign #NodeJS #Docker #Redis #PostgreSQL #WebSockets #BackendDevelopment #SoftwareEngineering #DevOps #CI_CD #BuildInPublic #OpenSource #React

2 Comments
Like Comment
To view or add a comment, sign in
Rajat Gajbhiye
1mo
Report this post
Zerodha processes 15 million+ trades daily. 15-20% of India's entire stock market volume. Built by a tech team of just 33 engineers. Here's exactly how they do it: 1. Everything performance-critical is written in Go - Not Java. Not Python. Go. - Zerodha's CTO evaluated Python, C++, Java, NodeJS, and Erlang before choosing Go specifically for handling thousands of concurrent WebSocket connections. - Why Go? Lightweight goroutines handle thousands of simultaneous connections without expensive thread overhead. 2. They abuse PostgreSQL in ways nobody else does - No fancy distributed databases. No Cassandra. No MongoDB. - Just PostgreSQL pushed to its absolute limits. - Their Console DBs store hundreds of billions of rows across four sharded nodes close to 20TB of financial data. - They sliced data by financial year. Each year in its own PostgreSQL instance. Linked together using PostgreSQL's Foreign Data Wrapper. - Same schema. Same queries. Just pointing to different backends. - No rewrite. No migration. No distributed magic. 3. They use PostgreSQL as a cache not Redis - They considered Redis for caching reports. - Too complex to implement filtering and sorting across dozens of report types. Solution? Another dedicated PostgreSQL instance as a hot cache. - 7 million tables created daily. Just as a cache. - Unconventional. But it works at scale. 4. They set a hard latency ceiling and engineer backwards - 40 milliseconds. That's their upper limit for mean user latency. - They don't optimise randomly. - They pick a number. Then reverse-engineer every system to hit it. 5. Their biggest philosophy: simplicity over hype - No microservices for the sake of it. - No Kafka where a simple queue works. No distributed NoSQL where PostgreSQL is sufficient. - Right tool. Right job. Always. What most developers miss: - Zerodha writes performance-critical systems in Go. - But the same concurrency principles behind Go; thread management, connection pooling, async processing are core Java concepts too. ExecutorService. CompletableFuture. Thread pools. These are not just interview topics. They are how real systems like Zerodha think at scale. Master these in Java first. Every other language and system becomes easier to understand. This is exactly what I cover in my Java Guide not just syntax, but how production systems actually work. - https://lnkd.in/d6u_ZD5u Stay Hungry, Stay FoolisH!

10 Comments
Like Comment
To view or add a comment, sign in
INSRAM UL HASSAN
3w
Report this post
Keeping cache consistent with the database is one of the most practical challenges when building scalable systems with Java and Spring Boot. When designing high-performance applications using Spring Boot (with tools like Spring Cache, Redis, or Caffeine), choosing the right caching strategy directly impacts data consistency, latency, and reliability. Here are the most common approaches: 1) Cache Aside (Lazy Loading) The application first checks the cache. If data is missing, it fetches from the database and updates the cache. On updates, the cache is invalidated. ➡️ In Spring Boot: commonly implemented using @Cacheable and @CacheEvict ➡️ Why it works: simple, flexible, and widely adopted in real-world systems 2) Write Through Data is written to both the cache and database at the same time. ➡️ Ensures strong consistency between cache and DB ➡️ Trade-off: increased write latency due to dual writes 3) Write Behind (Write Back) Data is written to the cache first and persisted to the database asynchronously. ➡️ Great for high-throughput systems ➡️ Risk: potential data loss if cache crashes before DB sync 4) TTL (Time-To-Live) Each cache entry expires automatically after a defined duration. ➡️ Easy to implement using Redis TTL configuration ➡️ Trade-off: stale data may be served before expiration Key takeaway: There is no one-size-fits-all strategy. In Spring Boot systems, the choice depends on your consistency requirements, traffic patterns, and failure tolerance. Often, a hybrid approach (Cache Aside + TTL) provides a good balance between performance and data freshness. #SystemDesign #Java #SpringBoot #Caching #Redis #BackendDevelopment #Scalability #SoftwareEngineering #Microservices #PerformanceOptimization

2 Comments
Like Comment
To view or add a comment, sign in
Md. Moonaz Rahman
3w
Report this post
🔍 Just explored Apache Lucene — and it's a game changer for search! At my work, we were facing serious performance issues with search in one of our projects. Database queries were getting slower as data grew, so I started digging for a better solution and Apache Lucene was the game changer that solved it all. What is Apache Lucene? Apache Lucene is an open-source, high-performance full-text search engine library written in Java. It's not a standalone application, it's a powerful toolkit you embed directly into your Java project to build fast and scalable search features. How does it work? Lucene works by creating an inverted index - a data structure that maps words to the documents they appear in. This makes searching millions of records incredibly fast compared to scanning a traditional database. When should you use it? ✅ You need full-text search in your app ✅ You're building a document or product search feature ✅ Database LIKE queries are too slow for your scale ✅ You need features like fuzzy matching, ranking, and filters Key Benefits: ⚡ Extremely fast search and indexing 🎯 Relevance scoring out of the box 🔤 Powerful query DSL (wildcard, phrase, fuzzy, range) 📦 Lightweight — just a JAR, no server needed 🔗 Foundation of Elasticsearch & Apache Solr If you're building a Java app that needs more than basic queries, Lucene is absolutely worth learning. Have you used Lucene or built on top of it? Would love to hear your experience! 👇 #Java #ApacheLucene #FullTextSearch #SoftwareEngineering #BackendDevelopment #OpenSource
2 Comments
Like Comment
To view or add a comment, sign in
Soumik Patra
3w
Report this post
🚨 Real Problem I Solved: Fixing a Slow System Using Microservices (Java + Spring Boot) Recently, I worked on a system where users were facing serious performance issues. 👉 Dashboard APIs were taking 8–12 seconds 👉 Frequent timeouts during peak traffic 👉 CPU usage was constantly high At first glance, it looked like a database issue… But the real problem was deeper. 💥 Root Cause The application was a monolith (Spring Boot) where: Every API request was doing too much work Even a simple dashboard load was triggering heavy report generation logic No separation between fast reads and heavy background processing 👉 So when traffic increased, the system choked. 🛠️ What I Did (Microservices Solution) I redesigned the flow using a microservices-based approach: ✔️ Separated services based on responsibility Dashboard Service (fast, read-heavy APIs) Report Service (CPU-intensive processing) ✔️ Introduced async processing using Kafka Instead of generating reports during API calls Requests were pushed to a queue and processed in background ✔️ Added Redis caching Frequently accessed data served instantly ✔️ Applied API Gateway + Rate Limiting Prevented system overload ⚙️ New Flow Before ❌ API → Generate Report → Return Response (slow + blocking) After ✅ API → Fetch cached/precomputed data → Return instantly Background → Kafka → Report Service → Store results 📈 Results 🚀 Response time improved from 10s → <500ms 🚀 System handled 5x more traffic 🚀 Zero timeouts during peak usage 🧠 Key Takeaway Microservices are not about splitting code. They are about: 👉 Designing for scalability 👉 Separating workloads (read vs heavy compute) 👉 Using async processing effectively 💼 Why This Matters If you're building: High-traffic web apps Data-heavy dashboards Scalable backend systems These patterns make a huge difference. I work on building scalable Java full-stack systems using: 👉 Spring Boot 👉 Microservices 👉 Kafka / Async Processing 👉 Redis / Caching 👉 React (for frontend) If you're facing performance or scaling issues in your application, let’s connect 🤝 #Java #SpringBoot #Microservices #Kafka #Redis #FullStackDeveloper #FreelanceDeveloper #SystemDesign #BackendDevelopment
Like Comment
To view or add a comment, sign in

286 followers

25 Posts

View Profile Follow

Building Agentic Search Service with Spring Boot 3 and LangChain4j

More Relevant Posts

Explore content categories