Numaflow allows engineers without Kubernetes experience to easily build and manage data processing pipelines for high-throughput workloads. By Joab Jackson feat. Sriharsha Yayi (Intuit)
Easy Data Pipeline Management for Engineers
More Relevant Posts
-
🚀 Running RAG Systems in Production with Kubernetes Retrieval-Augmented Generation (RAG) works great in demos — production is where things get real. The real challenge isn’t building RAG, but scaling, securing, and operating it reliably. This is where Kubernetes plays a critical role. ⚙️ Why Kubernetes for RAG? A production RAG system is a set of distributed components: Data ingestion pipelines (batch / streaming) Embedding generation services Vector databases (semantic search) LLM inference APIs Monitoring and evaluation services Kubernetes allows each component to run as an independent, scalable microservice. 🔧 Key technical advantages Horizontal Pod Autoscaling (HPA): Scale embedding and inference services based on CPU, memory, QPS, or queue depth — essential for bursty RAG workloads. Resource isolation: CPU workloads for ingestion, GPU workloads for embeddings/LLMs, with namespace quotas to control cost. Rolling & canary deployments: Safely release new prompt logic, retrievers, or embedding models without downtime. Fault tolerance: Pod restarts, self-healing, and service discovery ensure high availability. 📊 Production observability With Kubernetes-native tooling: Track retrieval latency vs inference latency Monitor index freshness and embedding drift Log failed retrievals and hallucination-prone queries This turns RAG into a measurable, auditable system, not a black box. 💡 Takeaway RAG is not just an LLM problem — it’s a data + platform engineering problem. Kubernetes provides the foundation to run RAG systems at enterprise scale with reliability and cost control. Happy to connect with folks working on RAG, LLM infra, Kubernetes, or data platforms. #RAG #Kubernetes #LLMOps #DataEngineering #MLOps #GenAI #CloudNative
To view or add a comment, sign in
-
The more I work with data processing frameworks like Flink and Spark, the harder it is for me to understand why some teams still build complex data pipelines using only microservices and queues. 1). Is it just inertia, i.e. doing what we’ve always done? 2). Is it resume-driven design when we use various technologies to stay “marketable”? 3). Or is it simply resistance to learning, even when better solutions already exist? Flink and Spark can help build lower latency and higher scale pipelines with less code. IMHO, the hardest part of design isn’t technology but mindset. #DataEngineering #BigData #Flink #Spark #SoftwareArchitecture #TechLeadership #DistributedSystems #ETL #DataPipelines
To view or add a comment, sign in
-
-
🚨 Data pipelines crashing at scale? Container orchestration saved my sanity in data engineering chaos! 🔥 Data engineering teams drown in manual scaling nightmares—managing 100s of containers for ETL jobs means endless network woes, failures, & security gaps without automation. In my last project, our Spark jobs on bare Docker exploded during peak loads: pods died, data lagged, & downtime cost hours debugging secrets & load balancing. 😩 Solution: Switched to Kubernetes—YAML configs defined desired state for auto-scaling, high availability, & secure networking. Gained reproducibility, resource mgmt, & 10x faster deploys! Key lesson: Start small, invest in observability to tame complexity. What's your go-to orchestrator for data workloads? K8s, ECS, or something else? 👇 #ContainerOrchestration #DataEngineering #Kubernetes #DevOps #ETL ----------------------------------------
To view or add a comment, sign in
-
🧪 𝐌𝐋𝐟𝐥𝐨𝐰 + 𝐊𝐒𝐞𝐫𝐯𝐞 + 𝐯𝐋𝐋𝐌 𝐨𝐧 𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 — 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐋𝐋𝐌 𝐬𝐞𝐫𝐯𝐢𝐧𝐠 𝐟𝐫𝐨𝐦 𝐩𝐫𝐨𝐭𝐨𝐭𝐲𝐩𝐞 𝐭𝐨 𝐬𝐜𝐚𝐥𝐞 → Experiment tracking, standardized serving, and high-throughput inference in one battle-tested stack. 》 𝐖𝐡𝐚𝐭 𝐞𝐚𝐜𝐡 𝐩𝐢𝐞𝐜𝐞 𝐝𝐨𝐞𝐬 ✸ MLflow 🧪: Tracks experiments, versions models, and manages registry with full lineage. ✸ KServe ☸️: Kubernetes-native serving with autoscaling, rollouts, and standard inference protocol. ✸ vLLM ⚡: LLM engine with continuous batching + paged KV-cache for max throughput. → Clean separation: ML lifecycle → infra → token performance. 》 𝐓𝐡𝐞 𝐟𝐥𝐨𝐰 (𝐞𝐧𝐝-𝐭𝐨-𝐞𝐧𝐝) ✸ Train/log → MLflow registry → KServe InferenceService YAML → vLLM pods spin up. ✸ Clients hit unified HTTP/gRPC endpoints; KServe handles traffic, scaling, health. → New versions = update YAML, canary/rollback automatically. 》 𝐖𝐡𝐲 𝐯𝐋𝐋𝐌 + 𝐊𝐒𝐞𝐫𝐯𝐞 𝐬𝐡𝐢𝐧𝐞𝐬 ✸ vLLM plugs directly into KServe for open-source, cloud-agnostic LLM serving. ✸ Get Kubernetes ops + vLLM perf (no custom servers or SaaS lock-in). → Perfect for teams wanting control + speed on their infra. 》 𝐐𝐮𝐢𝐜𝐤 𝐝𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 𝐬𝐭𝐞𝐩𝐬 ✸ 1️⃣ Setup MLflow (SQL backend + S3/MinIO) and log your models/LLM configs. ✸ 2️⃣ Install KServe on K8s, point InferenceService to MLflow artifacts. ✸ 3️⃣ Expose via ingress/gateway; watch autoscaling kick in under load. → From notebook to prod endpoint in hours, not weeks. 》 𝐖𝐡𝐞𝐧 𝐭𝐡𝐢𝐬 𝐬𝐭𝐚𝐜𝐤 𝐰𝐢𝐧𝐬 ✸ Kubernetes shops needing multi-framework serving (classic ML + LLMs). ✸ Observability/reproducibility matter as much as raw capability. → Bridges prototype chaos to production-grade autoscaled inference. Full guide: ku.bz/0YrLFC47h #MLOps #KServe #vLLM #MLflow #Kubernetes #LLM #Inference #AIInfrastructure
To view or add a comment, sign in
-
-
https://lnkd.in/ee5mgJqZ Introduction: Shipping RAG Without 2 AM Production Fires Imagine this: you push a single line of code and watch your AI‑powered Retrieval‑Augmented Generation (RAG) application automatically build, test, validate data pipelines, and deploy to AWS — all without touching the terminal. No manual ssh. No git pull && docker-compose up. No “please don’t break prod” prayers. That is the power of a well‑designed CI/CD pipeline. For RAG systems, this is not a “nice to have.” It is survival. RAG apps combine language models, vector databases, APIs, and data pipelines. A single mistake in prompt logic, vector indexing, or infrastructure can cascade into: Wrong retrieval results and hallucinations Stale embeddings and information rot API timeouts under real traffic Downtime during model or embedding updates If you’re deploying all this manually, you are one bad deploy away from an outage.
To view or add a comment, sign in
-
You might have a really great machine learning model that performs well in notebooks. But unless it's deployed to production, helping people solve real problems, it's not nearly as valuable. In this detailed tutorial, Kuriko helps you build and deploy an ML system on serverless architecture. https://lnkd.in/gKkz3Hs3
To view or add a comment, sign in
-
-
✅ Day 22/365 -- Consistency > Motivation Today was about reinforcing fundamentals and sharpening how I think about problems—both in code and system design. Solved Binary Search with optimal time complexity Achieved 100% runtime performance Revisited why binary search is more about invariants and boundaries than just mid calculations Derived State Principle → Don’t store what you can compute Read-only Data Usage → Analytics & stats should never mutate core data Analytics Mindset → Many backend APIs are built around these exact patterns Continued with BST & Balanced Trees Focused on comparing multiple approaches instead of memorizing one solution Big takeaway of the day: Strong software engineers don’t just write code — they reason about state, correctness, and scalability. Taking it one day at a time, building depth over speed. Onward to Day 23 🔥 #Day22 #DSA #BinarySearch #SoftwareEngineering #LeetCode #GATE #LearningInPublic #Consistency #ProblemSolving #DeveloperJourney #AI #AWS #Learning #DailyPost
To view or add a comment, sign in
-
What if running inference on streaming data didn't require building (and babysitting) a whole mini-platform around Kafka consumers, scaling rules, and boilerplate glue code? In their KubeCrash talk, Krithika Vijayakumar and Sriharsha Yayi from Intuit introduced Numaflow a Kubernetes-native, open source platform that lets teams do stream processing + ML inference in the same pipelines on Kubernetes, with built-in integrations for Kafka, Pulsar, SQS, HTTP and more. Here are some key takeaways: 💥 Stream inference needs different scaling than "normal apps." CPU-based HPA isn’t always enough- you may want autoscaling driven by backlog + processing rate + downstream capacity to avoid consumer lags. 💥 Decouple ML logic from sources/sinks. ML engineers shouldn't repeatedly implement Kafka/HTTP plumbing. Numaflow provides built-in sources/sinks so teams can mostly focus on the processing and inference itself. 💥 Pipelines as DAGs = real operational clarity. A declarative YAML DAG + a UI that exposes pods, CPU/memory/logs makes debugging and operating stream inference easy. Want to learn more? Watch the full talk 👉 https://lnkd.in/gUTjzv2Y #KubeCrash #Numaflow #Kubernetes #Streaming #MLOps #PlatformEngineering #EventDriven #CloudNative
To view or add a comment, sign in
-
Project Complete: End‑to‑End Housing Price Regression (FastAPI + MLflow + Docker + AWS) Over the past months, I implemented an end‑to‑end buildeploying a production‑ready ML regression system for US housing prices (2012–2023) and implemented my own full pipeline. What I built • Collected and organized housing and metro datasets from Kaggle and SimpleMaps, and designed a clear project structure for raw, processed, and model artifacts. • Performed exploratory analysis, time‑based train/eval/holdout splits, data cleaning, outlier handling, and feature engineering (date features, latitude/longitude encoding, zip code encoding). • Trained and fine‑tuned an XGBoost regression model with Optuna, tracking all runs, metrics, and artifacts in MLflow. MLOps & deployment steps • Modularized the work from notebooks into reusable pipelines for preprocessing, training, and inference, with unit and smoke tests to validate each stage. • Exposed the model via a FastAPI service and built a Streamlit UI so users can select year, month, and region to visualize actual vs predicted prices. • Containerized the application with Docker, set up CI/CD with GitHub Actions, and deployed to AWS using S3, ECR, ECS Fargate, and an Application Load Balancer. Final touches • Pushed the full project (code, configs, and documentation) to GitHub and validated model performance on a hold‑out set with low MAE/RMSE and stable percentage error across multiple metro areas. GitHub Repo : https://lnkd.in/egqkjAJf
To view or add a comment, sign in
-
-
Checkout our latest a production-grade recommendation system orchestration pipeline using Apache Airflow, aligned with 2025 best practices (KubernetesExecutor via Helm, idempotent DAGs, and event-driven scalability from Airflow 3.0). 𝗞𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀: • Real-world data ingestion (e.g., MovieLens/Amazon reviews) • Parallel training of 11 advanced models: SVD, KNN baselines + neural architectures (GMF, MLP, NeuMF, LightGCN with PyTorch) • Hybrid ensemble for superior accuracy • Comprehensive evaluation (RMSE/MAE) and automated reporting Critically, it emphasizes robustness: data validation, monitoring hooks for drift detection, MLflow integration for experiment tracking/model registry, and CI/CD readiness. This scalable foundation is ready for real production deployment—handling cold starts, bias mitigation, and real-time retraining. 𝗡𝗲𝘅𝘁: 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲 𝗟𝗟𝗠 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 𝗮𝗻𝗱 𝗱𝗲𝗽𝗹𝗼𝘆 𝗼𝗻 𝗺𝗮𝗻𝗮𝗴𝗲𝗱 𝗞𝘂𝗯𝗲𝗿𝗻𝗲𝘁𝗲𝘀. #MLOps #RecommendationSystems #ApacheAirflow #PyTorch #RecSys #DataEngineering
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development