Building a Self-Configuring, Observable Reverse Proxy on AWS: EC2, Nginx, and Slack Alerts
If you are managing backend services, whether that’s a monolithic .NET Core API or a fleet of Node.js microservices, you need a reliable gatekeeper. A reverse proxy handles incoming HTTP traffic, manages SSL termination, and routes requests to the appropriate backend.
But in modern cloud environments, manually SSH into a server to install Nginx and check logs is a thing of the past. We need infrastructure that is immutable, automated, and observable.
In this post, we will walk through a real-world use case: deploying a self-configuring Nginx reverse proxy on an AWS EC2 instance, and wiring it up with Amazon CloudWatch, SNS, and Slack for real-time alerting.
The Architecture at a Glance
Instead of treating our server like a pet, we are treating it like cattle. If the server dies, we simply spin up a new one that configures itself perfectly on boot. Furthermore, if something goes wrong, the server tells us about it directly in our team’s Slack channel.
Here is what the setup looks like:
Why This Approach Matters
1. True Immutable Infrastructure
By relying entirely on an EC2 user_data script, you eliminate configuration drift. You never log in to patch or tweak the Nginx config. If you need to make a change to the proxy routing, you update the user data script or the centralized config file it pulls from (like an S3 bucket), terminate the old instance, and let an Auto Scaling Group spin up a fresh, perfectly configured replacement.
2. Proactive Incident Response
A reverse proxy is a single point of failure for incoming traffic. If it goes down, your users are offline. By integrating CloudWatch alarms, you don’t have to wait for a customer to complain.
If CPU utilization spikes above 85% or Nginx starts throwing a high volume of 502 Bad Gateway errors (indicating your backend .NET or Node.js services might be unreachable), CloudWatch instantly triggers an alarm.
3. Reduced Mean Time to Resolution (MTTR)
Alert fatigue is real when notifications are buried in an email inbox. By routing CloudWatch alarms through an SNS topic directly into a dedicated Slack channel, the engineering team gets immediate, contextual visibility where they already collaborate.
How the Flow Works in Practice
Let’s trace a scenario where your backend application gets overwhelmed:
The team acknowledges the alert in Slack, scales up the backend application, and resolves the issue before it causes widespread customer churn.
Terraform Implementation
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["al2023-ami-2023.*-x86_64"]
}
}
# --- SSH Key Generation ---
# This generates a secure private key and creates an AWS Key Pair
resource "tls_private_key" "rsa_key" {
algorithm = "RSA"
rsa_bits = 4096
}
resource "aws_key_pair" "nginx_key_pair" {
key_name = "nginx-server-key"
public_key = tls_private_key.rsa_key.public_key_openssh
}
# Save the private key to our local machine so we can actually SSH in!
resource "local_file" "private_key" {
content = tls_private_key.rsa_key.private_key_pem
filename = "${path.module}/nginx-server-key.pem"
file_permission = "0400"
}
resource "aws_security_group" "web_sg" {
name = "nginx-sg"
description = "Security group for Nginx web server"
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "SSH from allowed IP"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [var.allowed_ssh_ip]
}
egress {
description = "Allow all outbound traffic for package downloads"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "nginx-sg"
Environment = "dev"
}
}
resource "aws_instance" "nginx_proxy" {
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
key_name = aws_key_pair.nginx_key_pair.key_name
iam_instance_profile = aws_iam_instance_profile.cloudwatch_agent_profile.name
vpc_security_group_ids = [aws_security_group.web_sg.id]
user_data = <<-EOF
#!/bin/bash
set -e
# ── 1. Install & start Nginx ─────────────────────────────────
yum update -y
yum install -y nginx
# Configure Nginx to output JSON logs
cat << 'EOF_NGINX' > /etc/nginx/conf.d/json_logging.conf
log_format json_analytics escape=json '{'
'"time_local": "$time_local", '
'"remote_addr": "$remote_addr", '
'"remote_user": "$remote_user", '
'"request": "$request", '
'"status": "$status", '
'"body_bytes_sent": "$body_bytes_sent", '
'"request_time": "$request_time", '
'"http_referrer": "$http_referer", '
'"http_user_agent": "$http_user_agent", '
'"http_x_forwarded_for": "$http_x_forwarded_for"'
'}';
access_log /var/log/nginx/access.json.log json_analytics;
EOF_NGINX
systemctl enable nginx
systemctl start nginx
echo '<h1>Deployed via Terraform Nginx Reverse Proxy with JSON Logging</h1>' > /usr/share/nginx/html/index.html
# ── 2. Install the CloudWatch Agent ──────────────────────────
yum install -y amazon-cloudwatch-agent
# ── 3. Fetch agent config from SSM Parameter Store & start ──
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-s \
-c ssm:/cloudwatch-agent/config/nginx
EOF
# Ensure the instance gets a public IP
associate_public_ip_address = true
# Depend on SSM param so it exists before the instance tries to fetch it
depends_on = [aws_ssm_parameter.cloudwatch_agent_config]
tags = {
Name = "nginx-reverse-proxy"
}
}
# Allocate an Elastic IP (EIP) to the instance so the public IP remains static across restarts
resource "aws_eip" "nginx_eip" {
instance = aws_instance.nginx_proxy.id
domain = "vpc"
tags = {
Name = "nginx-eip"
Environment = "dev"
}
}
# SNS Topic for Alarms
resource "aws_sns_topic" "alerts" {
name = "nginx-alerts-topic"
}
# SNS Subscription for Slack (requires a HTTPS webhook)
# Note: AWS SNS doesn't natively format for Slack, so we use the HTTPS protocol
# Slack expects a specific JSON format, which usually requires a Lambda to translate.
# For simplicity, we are pushing the raw SNS message to the webhook.
resource "aws_sns_topic_subscription" "slack_sub" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "https"
endpoint = var.slack_webhook_url
}
# AWS Chatbot Slack Configuration (Modern alternative to webhooks)
# Requires manual Slack Workspace authorization in the AWS Console first.
resource "aws_chatbot_slack_channel_configuration" "slack_alerts" {
count = var.slack_team_id != "" && var.slack_channel_id != "" ? 1 : 0
configuration_name = "nginx-alerts-chatbot"
iam_role_arn = aws_iam_role.chatbot_role[0].arn
slack_channel_id = var.slack_channel_id
slack_team_id = var.slack_team_id
sns_topic_arns = [aws_sns_topic.alerts.arn]
}
resource "aws_iam_role" "chatbot_role" {
count = var.slack_team_id != "" && var.slack_channel_id != "" ? 1 : 0
name = "aws-chatbot-slack-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "chatbot.amazonaws.com"
}
},
]
})
}
resource "aws_iam_role_policy_attachment" "chatbot_notifications" {
count = var.slack_team_id != "" && var.slack_channel_id != "" ? 1 : 0
role = aws_iam_role.chatbot_role[0].name
policy_arn = "arn:aws:iam::aws:policy/AWSResourceExplorerReadOnlyAccess" # Example policy, adjust based on needs
}
# CloudWatch Alarm to monitor CPU Utilization
resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
alarm_name = "nginx-cpu-utilization-alarm"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120" # 2 minutes
statistic = "Average"
threshold = "75"
alarm_description = "This alarm triggers if the EC2 instance CPU utilization exceeds 75% for 4 consecutive minutes."
alarm_actions = [aws_sns_topic.alerts.arn]
ok_actions = [aws_sns_topic.alerts.arn]
dimensions = {
InstanceId = aws_instance.nginx_proxy.id
}
tags = {
Name = "nginx-cpu-alarm"
Environment = "dev"
}
}
Resources Provisioned
The Immutable Core: EC2 and User Data
The magic of this deployment lies in the aws_instance resource, specifically the user_data script. This bash script runs automatically as root on the first boot cycle, transforming a vanilla Linux box into a fully functioning Nginx proxy.
user_data = <<-EOF
#!/bin/bash
set -e
# ── 1. Install & start Nginx ─────────────────────────────────
yum update -y
yum install -y nginx
...
JSON Logging for Better Observability
Inside the user_data script, we do not just start Nginx; we overwrite the default logging format to output as JSON.
log_format json_analytics escape=json '{'
'"time_local": "$time_local", '
'"status": "$status", '
...
'}';
Parsing standard text logs is tedious. By forcing Nginx to emit structured JSON logs, log aggregators (like CloudWatch Logs) can easily parse, filter, and trigger metric alarms based on specific fields, like HTTP status codes or request times.
To ensure our proxy maintains a static entry point even if the underlying instance is replaced, we attach an Elastic IP (aws_eip) to the EC2 resource.
Observability: CloudWatch Integration
A proxy is only as good as the metrics it emits. Our user_data script also installs the Amazon CloudWatch Agent.
It starts the agent by fetching a configuration file securely stored in AWS Systems Manager (SSM) Parameter Store (ssm:/cloudwatch-agent/config/nginx). This configuration dictates exactly which system metrics (memory, disk) and log files (our Nginx JSON logs) should be pushed to CloudWatch.
Setting the Alarm
We define an aws_cloudwatch_metric_alarm to monitor CPU utilization.
resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
alarm_name = "nginx-cpu-utilization-alarm"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
...
threshold = "75"
alarm_actions = [aws_sns_topic.alerts.arn]
}
Alerting: SNS to Slack
The final piece of the puzzle is routing that alarm directly to the engineering team.
We create an Amazon SNS Topic (nginx-alerts-topic). This acts as a pub/sub router. When CloudWatch fires the alarm, it publishes a message here.
The Slack Connection Our Terraform code handles two methods for pushing to Slack:
The Payoff: Immediate Usability with Terraform Outputs
output "public_ip" {
description = "The public IP address of the Nginx web server"
value = aws_eip.nginx_eip.public_ip
}
output "public_dns" {
description = "The public DNS of the Nginx web server"
value = aws_eip.nginx_eip.public_dns
}
output "website_url" {
description = "The URL to access the Nginx web server"
value = "http://${aws_eip.nginx_eip.public_ip}"
}
output "cpu_alarm_arn" {
description = "The ARN of the CloudWatch CPU utilization alarm"
value = aws_cloudwatch_metric_alarm.cpu_alarm.arn
}
output "cloudwatch_log_group_name" {
description = "The CloudWatch Log Group where application logs are shipped"
value = aws_cloudwatch_log_group.app_logs.name
}
output "cloudwatch_log_group_url" {
description = "Direct AWS Console link to the log group"
value = "https://console.aws.amazon.com/cloudwatch/home#logsV2:log-groups/log-group/${replace(aws_cloudwatch_log_group.app_logs.name, "/", "$252F")}"
}
Infrastructure as Code isn’t just about spinning up resources; it’s about making those resources immediately usable for your engineering team or CI/CD pipeline. Hunting through the AWS Console to find the IP address of a newly provisioned proxy or searching for the correct log group wastes valuable time.
By defining an outputs.tf file, Terraform acts as a well-mannered assistant, printing exactly what you need to the terminal the moment the terraform apply finishes.
Recommended by LinkedIn
Here is how our outputs break down:
Instant Access to the Proxy
Instead of clicking through the EC2 dashboard to find where your server was deployed, Terraform hands you the keys right away.
Streamlined Incident Response
When things go wrong, every second counts. Our outputs are specifically designed to reduce Mean Time to Resolution (MTTR):
Making It Reusable: The Power of Variables
variable "instance_type" {
description = "The EC2 instance type"
type = string
default = "t2.medium"
}
variable "allowed_ssh_ip" {
description = "The IP address allowed to SSH into the EC2 instance"
type = string
default = "0.0.0.0/0" # Consider restricting to your exact IP in production!
}
variable "slack_webhook_url" {
description = "The Slack Incoming Webhook URL to send notifications to"
type = string
sensitive = true
}
variable "slack_team_id" {
description = "The Slack Team ID (Workspace ID) for AWS Chatbot (must be authorized in console first)"
type = string
default = ""
}
variable "slack_channel_id" {
description = "The Slack Channel ID where notifications will be sent"
type = string
default = ""
}
variable "log_group_name" {
description = "The CloudWatch Log Group name where application logs will be shipped"
type = string
default = "/ec2/nginx-app-logs"
}
variable "log_retention_days" {
description = "Number of days to retain logs in CloudWatch (0 = never expire)"
type = number
default = 30
}
Hardcoding values into your main.tf file is a quick way to build a proof of concept, but it is a terrible way to build production infrastructure. To make our Nginx reverse proxy truly reusable across different environments (like Dev, Staging, and Prod), we need to extract the configurable pieces into a variables.tf file.
By parameterizing our setup, anyone on the team can deploy this architecture with their own specific needs without touching the core logic. Here is a breakdown of how we make our module flexible:
Compute & Security Configurations
We start by defining the size of the server and who is allowed to access it.
Alerting Routing & Secrets Management
Because we are piping CloudWatch alarms into Slack, we need to pass in our workspace credentials.
Cost-Conscious Log Retention
Finally, we manage our observability footprint.
The Nervous System: IAM, SSM, and Centralized Logging
# ──────────────────────────────────────────────────────────────────────────────
# IAM Role – allows the EC2 instance to write logs to CloudWatch
# ──────────────────────────────────────────────────────────────────────────────
resource "aws_iam_role" "cloudwatch_agent_role" {
name = "nginx-cloudwatch-agent-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = "sts:AssumeRole"
Principal = { Service = "ec2.amazonaws.com" }
}
]
})
tags = {
Name = "nginx-cloudwatch-agent-role"
Environment = "dev"
}
}
# AWS-managed policy that grants the CloudWatch agent all permissions it needs
resource "aws_iam_role_policy_attachment" "cloudwatch_agent_policy" {
role = aws_iam_role.cloudwatch_agent_role.name
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}
# Also attach SSM read access so the agent can pull its config from Parameter Store
resource "aws_iam_role_policy_attachment" "ssm_read_policy" {
role = aws_iam_role.cloudwatch_agent_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# Instance profile wraps the IAM role so EC2 can use it
resource "aws_iam_instance_profile" "cloudwatch_agent_profile" {
name = "nginx-cloudwatch-agent-profile"
role = aws_iam_role.cloudwatch_agent_role.name
}
# ──────────────────────────────────────────────────────────────────────────────
# CloudWatch Log Group – where your application logs will live
# ──────────────────────────────────────────────────────────────────────────────
resource "aws_cloudwatch_log_group" "app_logs" {
name = var.log_group_name
retention_in_days = var.log_retention_days
tags = {
Name = var.log_group_name
Environment = "dev"
}
}
# ──────────────────────────────────────────────────────────────────────────────
# SSM Parameter – stores the CloudWatch agent JSON config
# The agent running on the EC2 will pull this on startup via fetch-config.
# Add / modify [[log_files]] entries below to ship additional log files.
# ──────────────────────────────────────────────────────────────────────────────
resource "aws_ssm_parameter" "cloudwatch_agent_config" {
name = "/cloudwatch-agent/config/nginx"
type = "String"
value = jsonencode({
agent = {
metrics_collection_interval = 60
run_as_user = "root"
}
logs = {
logs_collected = {
files = {
collect_list = [
# Nginx access log (JSON structured)
{
file_path = "/var/log/nginx/access.json.log"
log_group_name = var.log_group_name
log_stream_name = "{instance_id}/nginx-access-json"
timezone = "UTC"
},
# Nginx error log
{
file_path = "/var/log/nginx/error.log"
log_group_name = var.log_group_name
log_stream_name = "{instance_id}/nginx-error"
timezone = "UTC"
},
# ── Add YOUR application log path here ──────────────────────────
# {
# file_path = "/var/log/myapp/app.log"
# log_group_name = var.log_group_name
# log_stream_name = "{instance_id}/app"
# timezone = "UTC"
# },
# ────────────────────────────────────────────────────────────────
]
}
}
log_stream_name = "{instance_id}/default"
}
})
tags = {
Name = "cloudwatch-agent-config"
Environment = "dev"
}
}
An observable system is only as good as its ability to securely transmit data. Our EC2 instance needs to ship its Nginx logs and system metrics to CloudWatch, but by default, an AWS EC2 instance has zero permissions to do so.
In our cloudwatch.tf file, we wire up the "nervous system" of our infrastructure, securely granting our reverse proxy the exact permissions it needs and defining how the CloudWatch agent should behave.
Here is how we break down the security and logging configuration:
Secure Permissions via IAM
nstead of hardcoding AWS access keys onto the server (a massive security risk), we use an IAM Role and an Instance Profile.
The Central Log Hub
Next, we define the destination for our data:
Centralized Agent Configuration via SSM
This is where the true power of configuration management shines. Instead of baking the CloudWatch Agent’s JSON configuration directly into the EC2 user_data script, we store it centrally in AWS Systems Manager (SSM) Parameter Store.
resource "aws_ssm_parameter" "cloudwatch_agent_config" {
name = "/cloudwatch-agent/config/nginx"
type = "String"
value = jsonencode({
# ... configuration details ...
})
}
Why do it this way?
By structuring our logging this way, our reverse proxy boots up, securely identifies itself to AWS, asks SSM what logs it needs to track, and immediately starts streaming that data to CloudWatch.
The 2026 Standard: AWS Chatbot vs. Custom Lambdas
If you look at older AWS tutorials, you will often see a very different approach to Slack integration. Historically, connecting a CloudWatch SNS topic to Slack required writing custom “glue code” deploying an AWS Lambda function (usually written in Python or Node.js) to intercept the raw, ugly JSON payload from SNS, parse it, and format it into readable Slack blocks before posting it via a webhook.
In 2026, maintaining custom Lambda functions just to forward alerts is an anti-pattern. Here is why we explicitly chose the native aws_chatbot_slack_channel_configuration resource in our Terraform deployment:
Zero Glue Code to Maintain
Every Lambda function you deploy is a piece of software you have to own. You have to monitor its execution logs, manage its IAM permissions, and regularly update its runtime (like moving from Node 20 to Node 22) to avoid security deprecations. AWS Chatbot eliminates this toil entirely. It is a fully managed service that natively understands CloudWatch alarm schemas and formats them beautifully out of the box.
Native ChatOps Capabilities
AWS Chatbot is not just a one-way pager; it is bidirectional. By granting our Chatbot configuration an IAM role (aws_chatbot_slack_role), we aren't just sending alerts to Slack, we are enabling our team to interact with AWS from Slack.
When that Nginx CPU alarm fires in your Slack channel, an engineer can type an AWS CLI command directly into the Slack thread to retrieve the EC2 instance's top processes or check the auto-scaling group status, without ever logging into the AWS Management Console.
Infrastructure as Code Simplicity
As you can see in our main.tf, deploying Chatbot takes only a few lines of HCL. You authorize the Slack workspace once in the AWS console, and from then on, wiring up any new SNS topic to a Slack channel is as simple as passing the slack_team_id and slack_channel_id variables.
By skipping the custom Lambda, we keep our Terraform module lightweight, our architecture highly reliable, and our engineering focus on actual business value rather than maintaining notification scripts.