Building a secure LLM gateway on AWS using API Gateway, Lambda, and Amazon Bedrock

Roshan Singh

Published Mar 24, 2026

+ Follow

Large language models are easy to call. Most examples look like this:

Application → Bedrock → Response

That’s fine for testing.

But the moment you try to use this in a real system, things start breaking — not technically, but operationally.

The problem nobody talks about early

Once multiple services or teams start using LLMs, the questions change:

Who is calling the model?
What prompts are being sent?
Are we leaking sensitive data?
Why is latency inconsistent?
Why did cost suddenly spike?

This is where most “GenAI tutorials” stop.

Because calling a model is easy.

Running it like a platform is not.

What actually happens in production

You don’t allow direct access like:

Application → Bedrock

You introduce a control layer:

Application → Gateway → Bedrock

This layer becomes responsible for:

Validating inputs
Applying prompt guardrails
Logging every request
Enforcing policies
Keeping things observable

This is not over-engineering.

This is the minimum required once you move beyond a single demo.

What we are building in this post

A simple, practical LLM inference gateway on AWS.

No unnecessary complexity.

Just the right components to:

control access
add guardrails
capture telemetry
keep the system observable

Services used

Amazon API Gateway
AWS Lambda
Amazon Bedrock
Amazon CloudWatch Logs
Terraform

Solution architecture

Article content — secure gateway flow- TheOpskart

Request flow (what actually happens)

Let’s walk through the flow the way it runs in real systems:

Client sends request → API Gateway
API Gateway forwards → Lambda
Lambda validates input + checks guardrails
Lambda invokes Bedrock
Logs are pushed to CloudWatch
Response goes back to client

Nothing fancy.

But this small layer gives you control.

Infrastructure components

Let’s break this down one by one.

Amazon API Gateway

This is your public entry point.

POST /invoke

Accepts HTTPS requests
Routes them to Lambda
Standardizes how clients call your AI service

Terraform:

aws_apigatewayv2_api.gateway
protocol_type = HTTP
route_key     = "POST /invoke"

AWS Lambda

This is where all the real logic lives.

Lambda acts as your control layer.

What it does:

Parses request
Validates prompt
Applies guardrails
Calls Bedrock
Logs metadata
Returns response

Config:

Runtime: Python 3.11
Memory: 512 MB
Timeout: 30 seconds

Environment variables:

DEFAULT_MODEL_ID
MAX_PROMPT_LENGTH
LOG_LEVEL

This is where you enforce consistency across all requests.

Amazon Bedrock

This is your model layer.

You don’t manage infra. You don’t manage GPUs.

You just invoke.

Example model:

anthropic.claude-3-haiku-20240307-v1:0

Lambda sends prompt → Bedrock returns response.

Recommended by LinkedIn

AI Agents on Fastly Compute: How it Works and What…

Fastly 3 months ago

Redis is the silent backbone of modern multi-agent AI…

Kamel H. 1 month ago

OpenAI Cut $800 Billion and Hired One Person. The…

Harry Glorikian 2 months ago

Amazon CloudWatch Logs

This is where most teams start realizing what’s really happening.

Every request logs:

request_id
model_id
latency
prompt length
response size
guardrail status

Log group:

/aws/lambda/secure-bedrock-gateway

This is your foundation for:

debugging
cost tracking
performance tuning
detecting misuse

Deployment using Terraform

Everything is provisioned using Terraform.

Example plan:

Plan: 9 to add, 0 to change, 0 to destroy

Resources created:

API Gateway
Lambda
IAM roles
CloudWatch logs
Routes and stage

After deployment, you get:

https://<api-id>.execute-api.<region>.amazonaws.com/prod/invoke

This becomes your single entry point for inference.

Testing the gateway

Simple curl request:

curl -X POST \
https://<api-id>.execute-api.<region>.amazonaws.com/prod/invoke \
-H "Content-Type: application/json" \
-d '{"prompt":"Summarize the benefits of infrastructure as code."}'

Example response

{
 "request_id": "aRxh-jLtoAMEV6A=",
 "model_id": "anthropic.claude-3-haiku-20240307-v1:0",
 "latency_ms": 5291,
 "response": "Infrastructure as Code (IaC) is a powerful approach...",
 "guardrail_status": "allowed"
}

What this actually proves

Your full flow is working:

Client → API Gateway → Lambda → Bedrock → Lambda → API Gateway → Client

And more importantly:

👉 You now have control + visibility

What this pattern teaches (real takeaway)

This is not just a lab.

This is how AI platforms actually start.

Controlled model access

No direct calls from apps.

Guardrails before inference

You filter before things go wrong.

Observability-first design

You log everything that matters.

Infra as Code for AI

You treat this like any production system.

Where this shows up in real systems

You’ll see this pattern in:

Internal LLM gateways
AI platform teams
Prompt filtering layers
Model routing services
Cost and token tracking systems

This is usually the first layer teams build before anything advanced.

Try it yourself

[GitHub Repo ai-devops-labs]

From here you can:

Clone the repo
Deploy using Terraform
Call the API
Inspect logs

Once you run it yourself, the architecture becomes very clear.

Final thought

Most people are still focused on:

“Which model should we use?” But in real systems, the bigger problem is:

“How do we control and operate model usage?” That’s where DevOps actually comes in.

And honestly, that’s where most of the real engineering work is going to happen.

AI Agents for DevOps

217 followers

+ Subscribe

Rohan Sharma 1w

Spot on. Most teams start with the simple App → Bedrock → Response pattern for demos, but it breaks quickly in production, especially around compliance, security, and scale. A proper control layer (App → Gateway → Bedrock) with validation, guardrails, logging, and routing is what actually makes it production-ready. We built something very similar using API Gateway + Lambda + Bedrock for our serverless AI form and document processing platform. Great post. How are others handling guardrails and observability with Bedrock today?

To view or add a comment, sign in

The problem nobody talks about early

What actually happens in production

What we are building in this post

Services used

Solution architecture

Request flow (what actually happens)

Infrastructure components

Amazon API Gateway

AWS Lambda

Amazon Bedrock

Recommended by LinkedIn

Amazon CloudWatch Logs

Deployment using Terraform

Testing the gateway

Example response

What this actually proves

What this pattern teaches (real takeaway)

Controlled model access

Guardrails before inference

Observability-first design

Infra as Code for AI

Where this shows up in real systems

Try it yourself

Final thought

AI Agents for DevOps

217 followers

More articles by Roshan Singh

Why Your AI API Needs Authentication and Observability (Not Just a Model Call)

How I Actually Learned AI + DevOps by Building a Real System (Not Just Watching Tutorials)

AWS Kendra Review: Enterprise Search That Actually Understands Engineers (But Comes With Real Ops + Cost Tradeoffs)

Why RAG Alone Won’t Save Your On-Call — A DevOps View on Agentic AI

I Built an AI Agent That Responds to Kubernetes Incidents Like a Real SRE (And It Actually Works)

AI Agents for DevOps — Chapter 3 (Part 1)

The Mindset Gap Between DevOps and Developers in AI (And Why It's Costing You)

Why AI Breaks Every DevOps Rule — And Why DevOps Engineers Will Lead the Next AI Era

The Hidden Architecture That Makes or Breaks Every AI System

AI Agents for DevOps (Part 3): The AI Terminology Every DevOps Engineer Must Know – Part 1

Others also viewed

Google's Embedding Gemma 3 270M on AWS Lambda: A Curiosity-Driven Experiment

Using Microsoft Semantic Kernel with Amazon Bedrock (and Google Gemma)

Deploying a Machine Learning model in the local disk and on the AWS server using Flask.

MIT's Recursive Language Model (RLMs) on AWS using Strands and Amazon Bedrock AgentCore

Choosing the Right Foundation Model on AWS Bedrock — What the Exam REALLY Wants You to Know (AIP-C01 Guide)

We Taught the Cloud to See - Here's What We Built (and Broke Along the Way)

15 minutes with Amazon Bedrock

Lambdas Turn Sentient

AWS re:Invent 2024 Recap – Insights and Key Announcements

Similar topics

Applying Amazon Bedrock to System Analysis

Best Practices for Secure AI Sampling in LLM Agents

How to Set Generative AI Guardrails

Streamlining LLM Inference for Lightweight Deployments

How to Build Reliable LLM Systems for Production

How to Implement LLM Guardrails

Applying GenAI and ML in AWS Projects

Using LLMs as Microservices in Application Development

Explore content categories