LLM Observability Stack v2.0: Why Infrastructure Testing Changed Everything
From Reactive Firefighting to Proactive Validation
The Problem We All Face
You've deployed your LLM inference server. The YAML looks perfect. Kubernetes says the pods are running. But then...
Sound familiar?
After deploying dozens of LLM workloads on GPU clusters, I realized something critical: we were always reacting to failures instead of preventing them.
That's why I built LLM Observability Stack v2.0 - and why the two key enhancements changed everything.
What's New in Version 2.0?
Enhancement #1: Infrastructure Testing with Pytest
The game-changer: Validate your infrastructure BEFORE it breaks.
In v1.0, we had beautiful Grafana dashboards showing real-time GPU metrics. But dashboards only tell you something is wrong after it happens.
v2.0 introduces a complete pytest infrastructure testing framework that runs validation checks before deployment:
============================================================
LLM OBSERVABILITY STACK v2.0 - INFRASTRUCTURE TESTS
============================================================
--- Cluster Health Tests ---
PASSED: vLLM Health Endpoint - HTTP 200 OK
PASSED: vLLM Models Endpoint - Models available
PASSED: vLLM Inference Completion - Inference working
--- GPU Tests ---
PASSED: NVIDIA-SMI Available - nvidia-smi accessible
PASSED: CUDA Available - torch.cuda.is_available()
PASSED: GPU Memory > 20GB - NVIDIA A10 24GB
--- Network Tests ---
PASSED: Kubernetes DNS Resolution - kubernetes.default.svc resolved
PASSED: Elasticsearch Connectivity - HTTP 200 on port 9200
============================================================
RESULTS: 11 passed, 1 failed, 0 skipped
============================================================
Why this matters:
The test suite validates:
Enhancement #2: ELK Stack for Centralized Logging
Metrics tell you WHAT happened. Logs tell you WHY.
Grafana and Prometheus are excellent for metrics. But when you're debugging why inference failed for a specific request, you need logs.
v2.0 adds a complete ELK stack:
Filebeat (Collection) --> Logstash (Processing) --> Elasticsearch (Storage) --> Kibana (Visualization)
The results speak for themselves:
Metric Value
Log Documents Indexed 248,805 +
Unique Pods Monitored 50
Containers Tracked37
Namespaces Covered5
Query Response Time< 100ms
Real debugging scenarios solved:
The Architecture
+-------------------------------------------------------------------+
| KUBERNETES CLUSTER |
| |
| +-------------------------------------------------------------+ |
| | llm-observability namespace | |
| | | |
| | [Elasticsearch] --> [Kibana] (Logs & Dashboards) | |
| | [Logstash] --> [Filebeat] (Log Pipeline) | |
| | [Prometheus] --> [Grafana] (Metrics & Alerts) | |
| +-------------------------------------------------------------+ |
| |
| +-------------------------------------------------------------+ |
| | [vLLM + Mistral-7B] [DCGM Exporter] [GPU Operator] | |
| | GPU Node 1 GPU Node 2 | |
| | [A10 24GB x 2] [A10 24GB x 2] | |
| +-------------------------------------------------------------+ |
+-------------------------------------------------------------------+
Why Both? The Observability Triangle
PREVENTION
/\
/ \
/ \
/ Pytest\
/ Tests \
/ \
/____________\
/ \
/ \
METRICS LOGS
(Prometheus) (ELK Stack)
"What is "Why did it
happening?" happen?"
Pytest = Prevent issues before deployment Prometheus/Grafana = Monitor what's happening now ELK Stack = Understand why things happened
Together, they create a complete observability story.
Recommended by LinkedIn
Real-World Impact
Before v2.0:
After v2.0:
Time saved: ~4 hours per deployment
It's a Framework, Not Just a Stack
Component Customisation For
Pytest Tests Your specific validation needs
Kibana Dashboards Your metrics and KPIs
Logstash Pipelines Your log formats
Alert Rules Your SLAs
Add tests for your use case:
def test_my_model_loaded():
"""Verify specific model is loaded"""
response = requests.get('http://vllm:8000/v1/models')
models = response.json()['data']
assert any(m['id'] == 'my-custom-model' for m in models)
Technical Stack
This isn't a one-size-fits-all solution. It's a template you can customise:
Key Takeaways
Get Started
The complete stack is open source and available on GitHub:
GitHub Repository: https://github.com/deepaksatna/LLM-Observability-Stack-v2.0
What's Included:
Test Environment:
What's Next?
In future versions, I'm exploring:
Let's Connect
If you're running LLM workloads on Kubernetes and struggling with observability, I'd love to hear about your challenges.
What's the hardest part of monitoring your LLM infrastructure?
Drop a comment below or reach out directly.