Building a secure LLM gateway on AWS using API Gateway, Lambda, and Amazon Bedrock
Large language models are easy to call. Most examples look like this:
Application → Bedrock → Response
That’s fine for testing.
But the moment you try to use this in a real system, things start breaking — not technically, but operationally.
The problem nobody talks about early
Once multiple services or teams start using LLMs, the questions change:
This is where most “GenAI tutorials” stop.
Because calling a model is easy.
Running it like a platform is not.
What actually happens in production
You don’t allow direct access like:
Application → Bedrock
You introduce a control layer:
Application → Gateway → Bedrock
This layer becomes responsible for:
This is not over-engineering.
This is the minimum required once you move beyond a single demo.
What we are building in this post
A simple, practical LLM inference gateway on AWS.
No unnecessary complexity.
Just the right components to:
Services used
Solution architecture
Request flow (what actually happens)
Let’s walk through the flow the way it runs in real systems:
Nothing fancy.
But this small layer gives you control.
Infrastructure components
Let’s break this down one by one.
Amazon API Gateway
This is your public entry point.
POST /invoke
Terraform:
aws_apigatewayv2_api.gateway
protocol_type = HTTP
route_key = "POST /invoke"
AWS Lambda
This is where all the real logic lives.
Lambda acts as your control layer.
What it does:
Config:
Environment variables:
This is where you enforce consistency across all requests.
Amazon Bedrock
This is your model layer.
You don’t manage infra. You don’t manage GPUs.
You just invoke.
Example model:
anthropic.claude-3-haiku-20240307-v1:0
Lambda sends prompt → Bedrock returns response.
Recommended by LinkedIn
Amazon CloudWatch Logs
This is where most teams start realizing what’s really happening.
Every request logs:
Log group:
/aws/lambda/secure-bedrock-gateway
This is your foundation for:
Deployment using Terraform
Everything is provisioned using Terraform.
Example plan:
Plan: 9 to add, 0 to change, 0 to destroy
Resources created:
After deployment, you get:
https://<api-id>.execute-api.<region>.amazonaws.com/prod/invoke
This becomes your single entry point for inference.
Testing the gateway
Simple curl request:
curl -X POST \
https://<api-id>.execute-api.<region>.amazonaws.com/prod/invoke \
-H "Content-Type: application/json" \
-d '{"prompt":"Summarize the benefits of infrastructure as code."}'
Example response
{
"request_id": "aRxh-jLtoAMEV6A=",
"model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"latency_ms": 5291,
"response": "Infrastructure as Code (IaC) is a powerful approach...",
"guardrail_status": "allowed"
}
What this actually proves
Your full flow is working:
Client → API Gateway → Lambda → Bedrock → Lambda → API Gateway → Client
And more importantly:
👉 You now have control + visibility
What this pattern teaches (real takeaway)
This is not just a lab.
This is how AI platforms actually start.
Controlled model access
No direct calls from apps.
Guardrails before inference
You filter before things go wrong.
Observability-first design
You log everything that matters.
Infra as Code for AI
You treat this like any production system.
Where this shows up in real systems
You’ll see this pattern in:
This is usually the first layer teams build before anything advanced.
Try it yourself
From here you can:
Once you run it yourself, the architecture becomes very clear.
Final thought
Most people are still focused on:
“Which model should we use?” But in real systems, the bigger problem is:
“How do we control and operate model usage?” That’s where DevOps actually comes in.
And honestly, that’s where most of the real engineering work is going to happen.
Spot on. Most teams start with the simple App → Bedrock → Response pattern for demos, but it breaks quickly in production, especially around compliance, security, and scale. A proper control layer (App → Gateway → Bedrock) with validation, guardrails, logging, and routing is what actually makes it production-ready. We built something very similar using API Gateway + Lambda + Bedrock for our serverless AI form and document processing platform. Great post. How are others handling guardrails and observability with Bedrock today?