Building a secure LLM gateway on AWS using API Gateway, Lambda, and Amazon Bedrock

Building a secure LLM gateway on AWS using API Gateway, Lambda, and Amazon Bedrock

Large language models are easy to call. Most examples look like this:

Application → Bedrock → Response

That’s fine for testing.

But the moment you try to use this in a real system, things start breaking — not technically, but operationally.


The problem nobody talks about early

Once multiple services or teams start using LLMs, the questions change:

  • Who is calling the model?
  • What prompts are being sent?
  • Are we leaking sensitive data?
  • Why is latency inconsistent?
  • Why did cost suddenly spike?

This is where most “GenAI tutorials” stop.

Because calling a model is easy.

Running it like a platform is not.


What actually happens in production

You don’t allow direct access like:

Application → Bedrock

You introduce a control layer:

Application → Gateway → Bedrock

This layer becomes responsible for:

  • Validating inputs
  • Applying prompt guardrails
  • Logging every request
  • Enforcing policies
  • Keeping things observable

This is not over-engineering.

This is the minimum required once you move beyond a single demo.


What we are building in this post

A simple, practical LLM inference gateway on AWS.

No unnecessary complexity.

Just the right components to:

  • control access
  • add guardrails
  • capture telemetry
  • keep the system observable

Services used

  • Amazon API Gateway
  • AWS Lambda
  • Amazon Bedrock
  • Amazon CloudWatch Logs
  • Terraform


Solution architecture

Article content
secure gateway flow- TheOpskart

Request flow (what actually happens)

Let’s walk through the flow the way it runs in real systems:

  1. Client sends request → API Gateway
  2. API Gateway forwards → Lambda
  3. Lambda validates input + checks guardrails
  4. Lambda invokes Bedrock
  5. Logs are pushed to CloudWatch
  6. Response goes back to client

Nothing fancy.

But this small layer gives you control.


Infrastructure components

Let’s break this down one by one.

Amazon API Gateway

This is your public entry point.

POST /invoke        

  • Accepts HTTPS requests
  • Routes them to Lambda
  • Standardizes how clients call your AI service

Terraform:

aws_apigatewayv2_api.gateway
protocol_type = HTTP
route_key     = "POST /invoke"        

AWS Lambda

This is where all the real logic lives.

Lambda acts as your control layer.

What it does:

  • Parses request
  • Validates prompt
  • Applies guardrails
  • Calls Bedrock
  • Logs metadata
  • Returns response

Config:

  • Runtime: Python 3.11
  • Memory: 512 MB
  • Timeout: 30 seconds

Environment variables:

  • DEFAULT_MODEL_ID
  • MAX_PROMPT_LENGTH
  • LOG_LEVEL

This is where you enforce consistency across all requests.

Amazon Bedrock

This is your model layer.

You don’t manage infra. You don’t manage GPUs.

You just invoke.

Example model:

anthropic.claude-3-haiku-20240307-v1:0        

Lambda sends prompt → Bedrock returns response.

Amazon CloudWatch Logs

This is where most teams start realizing what’s really happening.

Every request logs:

  • request_id
  • model_id
  • latency
  • prompt length
  • response size
  • guardrail status

Log group:

/aws/lambda/secure-bedrock-gateway        

This is your foundation for:

  • debugging
  • cost tracking
  • performance tuning
  • detecting misuse


Deployment using Terraform

Everything is provisioned using Terraform.

Example plan:

Plan: 9 to add, 0 to change, 0 to destroy        

Resources created:

  • API Gateway
  • Lambda
  • IAM roles
  • CloudWatch logs
  • Routes and stage

After deployment, you get:

https://<api-id>.execute-api.<region>.amazonaws.com/prod/invoke        

This becomes your single entry point for inference.


Testing the gateway

Simple curl request:

curl -X POST \
https://<api-id>.execute-api.<region>.amazonaws.com/prod/invoke \
-H "Content-Type: application/json" \
-d '{"prompt":"Summarize the benefits of infrastructure as code."}'        

Example response

{
 "request_id": "aRxh-jLtoAMEV6A=",
 "model_id": "anthropic.claude-3-haiku-20240307-v1:0",
 "latency_ms": 5291,
 "response": "Infrastructure as Code (IaC) is a powerful approach...",
 "guardrail_status": "allowed"
}        

What this actually proves

Your full flow is working:

Client → API Gateway → Lambda → Bedrock → Lambda → API Gateway → Client

And more importantly:

👉 You now have control + visibility


What this pattern teaches (real takeaway)

This is not just a lab.

This is how AI platforms actually start.

Controlled model access

No direct calls from apps.

Guardrails before inference

You filter before things go wrong.

Observability-first design

You log everything that matters.

Infra as Code for AI

You treat this like any production system.


Where this shows up in real systems

You’ll see this pattern in:

  • Internal LLM gateways
  • AI platform teams
  • Prompt filtering layers
  • Model routing services
  • Cost and token tracking systems

This is usually the first layer teams build before anything advanced.


Try it yourself

[GitHub Repo ai-devops-labs]

Article content
theopskart.com

From here you can:

  • Clone the repo
  • Deploy using Terraform
  • Call the API
  • Inspect logs


Once you run it yourself, the architecture becomes very clear.




Final thought

Most people are still focused on:

“Which model should we use?” But in real systems, the bigger problem is:

“How do we control and operate model usage?” That’s where DevOps actually comes in.

And honestly, that’s where most of the real engineering work is going to happen.

Spot on. Most teams start with the simple App → Bedrock → Response pattern for demos, but it breaks quickly in production, especially around compliance, security, and scale. A proper control layer (App → Gateway → Bedrock) with validation, guardrails, logging, and routing is what actually makes it production-ready. We built something very similar using API Gateway + Lambda + Bedrock for our serverless AI form and document processing platform. Great post. How are others handling guardrails and observability with Bedrock today?

Like
Reply

To view or add a comment, sign in

More articles by Roshan Singh

Others also viewed

Explore content categories