Building a Self-Configuring, Observable Reverse Proxy on AWS: EC2, Nginx, and Slack Alerts

Building a Self-Configuring, Observable Reverse Proxy on AWS: EC2, Nginx, and Slack Alerts

If you are managing backend services, whether that’s a monolithic .NET Core API or a fleet of Node.js microservices, you need a reliable gatekeeper. A reverse proxy handles incoming HTTP traffic, manages SSL termination, and routes requests to the appropriate backend.

But in modern cloud environments, manually SSH into a server to install Nginx and check logs is a thing of the past. We need infrastructure that is immutable, automated, and observable.

In this post, we will walk through a real-world use case: deploying a self-configuring Nginx reverse proxy on an AWS EC2 instance, and wiring it up with Amazon CloudWatch, SNS, and Slack for real-time alerting.


The Architecture at a Glance

Instead of treating our server like a pet, we are treating it like cattle. If the server dies, we simply spin up a new one that configures itself perfectly on boot. Furthermore, if something goes wrong, the server tells us about it directly in our team’s Slack channel.

Here is what the setup looks like:

  1. Amazon EC2 (The Host): A lightweight Linux instance.
  2. User Data (The Automation): A startup shell script that automatically installs Nginx, pulls the correct configuration files, and starts the service upon boot.
  3. Nginx (The Proxy): Routes incoming traffic to our backend application servers.
  4. Amazon CloudWatch (The Watchdog): Monitors the EC2 instance (CPU, Network) and Nginx specific metrics (like 5xx error rates).
  5. Amazon SNS (The Router): An Simple Notification Service topic that receives alarm state changes from CloudWatch.
  6. Slack Integration (The Alert): AWS Chatbot (or a custom Lambda webhook) subscribed to the SNS topic, pushing critical alerts directly to your development team.

Why This Approach Matters

1. True Immutable Infrastructure

By relying entirely on an EC2 user_data script, you eliminate configuration drift. You never log in to patch or tweak the Nginx config. If you need to make a change to the proxy routing, you update the user data script or the centralized config file it pulls from (like an S3 bucket), terminate the old instance, and let an Auto Scaling Group spin up a fresh, perfectly configured replacement.

2. Proactive Incident Response

A reverse proxy is a single point of failure for incoming traffic. If it goes down, your users are offline. By integrating CloudWatch alarms, you don’t have to wait for a customer to complain.

If CPU utilization spikes above 85% or Nginx starts throwing a high volume of 502 Bad Gateway errors (indicating your backend .NET or Node.js services might be unreachable), CloudWatch instantly triggers an alarm.

3. Reduced Mean Time to Resolution (MTTR)

Alert fatigue is real when notifications are buried in an email inbox. By routing CloudWatch alarms through an SNS topic directly into a dedicated Slack channel, the engineering team gets immediate, contextual visibility where they already collaborate.

How the Flow Works in Practice

Let’s trace a scenario where your backend application gets overwhelmed:

  1. Traffic Spike: Your Node.js service struggles to keep up with a sudden burst of requests, causing timeouts.
  2. Nginx Reports Errors: The Nginx reverse proxy on your EC2 instance begins returning 504 Gateway Timeout errors to the client.
  3. CloudWatch Detects the Anomaly: A CloudWatch Logs metric filter, which is actively parsing the Nginx access logs, detects the spike in 5xx errors and breaches the predefined alarm threshold.
  4. SNS Topic Triggered: CloudWatch publishes a message to the High-5xx-Errors SNS topic.
  5. Slack Notification: AWS Chatbot (acting as a subscriber to the SNS topic) instantly formats the alarm and drops a red-alert notification into your Slack channel, complete with a link directly to the failing metric.

The team acknowledges the alert in Slack, scales up the backend application, and resolves the issue before it causes widespread customer churn.


Terraform Implementation

main.tf

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-2023.*-x86_64"]
  }
}

# --- SSH Key Generation ---
# This generates a secure private key and creates an AWS Key Pair
resource "tls_private_key" "rsa_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "aws_key_pair" "nginx_key_pair" {
  key_name   = "nginx-server-key"
  public_key = tls_private_key.rsa_key.public_key_openssh
}

# Save the private key to our local machine so we can actually SSH in!
resource "local_file" "private_key" {
  content         = tls_private_key.rsa_key.private_key_pem
  filename        = "${path.module}/nginx-server-key.pem"
  file_permission = "0400"
}

resource "aws_security_group" "web_sg" {
  name        = "nginx-sg"
  description = "Security group for Nginx web server"

  ingress {
    description = "HTTP from anywhere"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "SSH from allowed IP"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [var.allowed_ssh_ip]
  }

  egress {
    description = "Allow all outbound traffic for package downloads"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name        = "nginx-sg"
    Environment = "dev"
  }
}

resource "aws_instance" "nginx_proxy" {
  ami                  = data.aws_ami.amazon_linux.id
  instance_type        = var.instance_type
  key_name             = aws_key_pair.nginx_key_pair.key_name
  iam_instance_profile = aws_iam_instance_profile.cloudwatch_agent_profile.name

  vpc_security_group_ids = [aws_security_group.web_sg.id]

  user_data = <<-EOF
              #!/bin/bash
              set -e

              # ── 1. Install & start Nginx ─────────────────────────────────
              yum update -y
              yum install -y nginx

              # Configure Nginx to output JSON logs
              cat << 'EOF_NGINX' > /etc/nginx/conf.d/json_logging.conf
log_format json_analytics escape=json '{'
  '"time_local": "$time_local", '
  '"remote_addr": "$remote_addr", '
  '"remote_user": "$remote_user", '
  '"request": "$request", '
  '"status": "$status", '
  '"body_bytes_sent": "$body_bytes_sent", '
  '"request_time": "$request_time", '
  '"http_referrer": "$http_referer", '
  '"http_user_agent": "$http_user_agent", '
  '"http_x_forwarded_for": "$http_x_forwarded_for"'
'}';

access_log /var/log/nginx/access.json.log json_analytics;
EOF_NGINX

              systemctl enable nginx
              systemctl start nginx
              echo '<h1>Deployed via Terraform Nginx Reverse Proxy with JSON Logging</h1>' > /usr/share/nginx/html/index.html

              # ── 2. Install the CloudWatch Agent ──────────────────────────
              yum install -y amazon-cloudwatch-agent

              # ── 3. Fetch agent config from SSM Parameter Store & start ──
              /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
                -a fetch-config \
                -m ec2 \
                -s \
                -c ssm:/cloudwatch-agent/config/nginx
              EOF

  # Ensure the instance gets a public IP
  associate_public_ip_address = true

  # Depend on SSM param so it exists before the instance tries to fetch it
  depends_on = [aws_ssm_parameter.cloudwatch_agent_config]

  tags = {
    Name = "nginx-reverse-proxy"
  }
}

# Allocate an Elastic IP (EIP) to the instance so the public IP remains static across restarts
resource "aws_eip" "nginx_eip" {
  instance = aws_instance.nginx_proxy.id
  domain   = "vpc"

  tags = {
    Name        = "nginx-eip"
    Environment = "dev"
  }
}
# SNS Topic for Alarms
resource "aws_sns_topic" "alerts" {
  name = "nginx-alerts-topic"
}

# SNS Subscription for Slack (requires a HTTPS webhook)
# Note: AWS SNS doesn't natively format for Slack, so we use the HTTPS protocol
# Slack expects a specific JSON format, which usually requires a Lambda to translate.
# For simplicity, we are pushing the raw SNS message to the webhook.
resource "aws_sns_topic_subscription" "slack_sub" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "https"
  endpoint  = var.slack_webhook_url
}

# AWS Chatbot Slack Configuration (Modern alternative to webhooks)
# Requires manual Slack Workspace authorization in the AWS Console first.
resource "aws_chatbot_slack_channel_configuration" "slack_alerts" {
  count = var.slack_team_id != "" && var.slack_channel_id != "" ? 1 : 0

  configuration_name = "nginx-alerts-chatbot"
  iam_role_arn       = aws_iam_role.chatbot_role[0].arn
  slack_channel_id   = var.slack_channel_id
  slack_team_id      = var.slack_team_id

  sns_topic_arns = [aws_sns_topic.alerts.arn]
}

resource "aws_iam_role" "chatbot_role" {
  count = var.slack_team_id != "" && var.slack_channel_id != "" ? 1 : 0
  name  = "aws-chatbot-slack-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "chatbot.amazonaws.com"
        }
      },
    ]
  })
}

resource "aws_iam_role_policy_attachment" "chatbot_notifications" {
  count      = var.slack_team_id != "" && var.slack_channel_id != "" ? 1 : 0
  role       = aws_iam_role.chatbot_role[0].name
  policy_arn = "arn:aws:iam::aws:policy/AWSResourceExplorerReadOnlyAccess" # Example policy, adjust based on needs
}

# CloudWatch Alarm to monitor CPU Utilization
resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
  alarm_name          = "nginx-cpu-utilization-alarm"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = "120" # 2 minutes
  statistic           = "Average"
  threshold           = "75"
  alarm_description   = "This alarm triggers if the EC2 instance CPU utilization exceeds 75% for 4 consecutive minutes."
  
  alarm_actions = [aws_sns_topic.alerts.arn]
  ok_actions    = [aws_sns_topic.alerts.arn]

  dimensions = {
    InstanceId = aws_instance.nginx_proxy.id
  }

  tags = {
    Name        = "nginx-cpu-alarm"
    Environment = "dev"
  }
}
        

Resources Provisioned

  • EC2 Instance: Runs Amazon Linux 2023. Includes a bootstrap script to automatically install, enable, and start Nginx, while serving a custom HTML landing page.
  • Security Group: Allows inbound HTTP traffic from anywhere (0.0.0.0/0) and strictly limits SSH access to the provided IP address variable. Allows all outbound traffic for installing packages.
  • Dynamic SSH Key Pair: Terraform automatically generates a 4096-bit RSA secure key. The private key is saved locally to the module directory as nginx-server-key.pem for secure remote access, and .gitignore ignores *.pem to prevent accidental credential leakage into Git.
  • Elastic IP (EIP): A static, reserved public IP address allocated from Amazon’s pool is permanently attached to the web server, ensuring the DNS and website_url never change during instance restarts.
  • CloudWatch Alarm: Monitors CPU Utilization and triggers if it exceeds 75% for a sustained period of 4 minutes (2 evaluation periods of 2 minutes each). This provides automated visibility into the health and load of your web server.
  • CloudWatch Agent & Logging: Automatically ships Nginx access and error logs to CloudWatch.
  • IAM Role: Granted CloudWatchAgentServerPolicy and AmazonSSMManagedInstanceCore permissions.
  • SSM Parameter Store: Stores the agent’s JSON configuration, allowing for centralized management and updates without modifying the instance directly.
  • Log Group: Logs are centralized in /ec2/nginx-app-logs with a configurable retention policy (default 30 days).

The Immutable Core: EC2 and User Data

The magic of this deployment lies in the aws_instance resource, specifically the user_data script. This bash script runs automatically as root on the first boot cycle, transforming a vanilla Linux box into a fully functioning Nginx proxy.

user_data = <<-EOF
              #!/bin/bash
              set -e

              # ── 1. Install & start Nginx ─────────────────────────────────
              yum update -y
              yum install -y nginx
              ...        

JSON Logging for Better Observability

Inside the user_data script, we do not just start Nginx; we overwrite the default logging format to output as JSON.

log_format json_analytics escape=json '{'
  '"time_local": "$time_local", '
  '"status": "$status", '
  ...
'}';        

Parsing standard text logs is tedious. By forcing Nginx to emit structured JSON logs, log aggregators (like CloudWatch Logs) can easily parse, filter, and trigger metric alarms based on specific fields, like HTTP status codes or request times.

To ensure our proxy maintains a static entry point even if the underlying instance is replaced, we attach an Elastic IP (aws_eip) to the EC2 resource.

Observability: CloudWatch Integration

A proxy is only as good as the metrics it emits. Our user_data script also installs the Amazon CloudWatch Agent.

It starts the agent by fetching a configuration file securely stored in AWS Systems Manager (SSM) Parameter Store (ssm:/cloudwatch-agent/config/nginx). This configuration dictates exactly which system metrics (memory, disk) and log files (our Nginx JSON logs) should be pushed to CloudWatch.

Setting the Alarm

We define an aws_cloudwatch_metric_alarm to monitor CPU utilization.

resource "aws_cloudwatch_metric_alarm" "cpu_alarm" {
  alarm_name          = "nginx-cpu-utilization-alarm"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "CPUUtilization"
  ...
  threshold           = "75"
  alarm_actions       = [aws_sns_topic.alerts.arn]
}
        

Alerting: SNS to Slack

The final piece of the puzzle is routing that alarm directly to the engineering team.

We create an Amazon SNS Topic (nginx-alerts-topic). This acts as a pub/sub router. When CloudWatch fires the alarm, it publishes a message here.

The Slack Connection Our Terraform code handles two methods for pushing to Slack:

  • Direct Webhook: A simple HTTPS subscription to the SNS topic. (Note: Raw SNS JSON is often ugly in Slack without a Lambda function to format it).
  • AWS Chatbot: The modern, native approach. By defining an aws_chatbot_slack_channel_configuration, AWS handles the formatting and drops a clean, actionable, color-coded alert directly into your specified Slack channel.

The Payoff: Immediate Usability with Terraform Outputs

outputs.tf

output "public_ip" {
  description = "The public IP address of the Nginx web server"
  value       = aws_eip.nginx_eip.public_ip
}

output "public_dns" {
  description = "The public DNS of the Nginx web server"
  value       = aws_eip.nginx_eip.public_dns
}

output "website_url" {
  description = "The URL to access the Nginx web server"
  value       = "http://${aws_eip.nginx_eip.public_ip}"
}

output "cpu_alarm_arn" {
  description = "The ARN of the CloudWatch CPU utilization alarm"
  value       = aws_cloudwatch_metric_alarm.cpu_alarm.arn
}

output "cloudwatch_log_group_name" {
  description = "The CloudWatch Log Group where application logs are shipped"
  value       = aws_cloudwatch_log_group.app_logs.name
}

output "cloudwatch_log_group_url" {
  description = "Direct AWS Console link to the log group"
  value       = "https://console.aws.amazon.com/cloudwatch/home#logsV2:log-groups/log-group/${replace(aws_cloudwatch_log_group.app_logs.name, "/", "$252F")}"
}
        

Infrastructure as Code isn’t just about spinning up resources; it’s about making those resources immediately usable for your engineering team or CI/CD pipeline. Hunting through the AWS Console to find the IP address of a newly provisioned proxy or searching for the correct log group wastes valuable time.

By defining an outputs.tf file, Terraform acts as a well-mannered assistant, printing exactly what you need to the terminal the moment the terraform apply finishes.

Here is how our outputs break down:

Instant Access to the Proxy

Instead of clicking through the EC2 dashboard to find where your server was deployed, Terraform hands you the keys right away.

  • public_ip & public_dns: Grabs the static Elastic IP details associated with the instance.
  • website_url: Formats that IP into a ready-to-click HTTP link, allowing you to instantly verify that Nginx is up and routing traffic.

Streamlined Incident Response

When things go wrong, every second counts. Our outputs are specifically designed to reduce Mean Time to Resolution (MTTR):

  • cpu_alarm_arn: Outputs the Amazon Resource Name for the CloudWatch alarm, which is highly useful if you need to reference this alarm in other Terraform modules or dashboards.
  • cloudwatch_log_group_url: This is the ultimate time-saver. By dynamically constructing the AWS Console URL using the replace function, Terraform provides a direct, one-click link to your Nginx JSON logs. If your Slack alert fires, developers can use this URL to jump straight to the logs without navigating the AWS interface.

Making It Reusable: The Power of Variables

variable "instance_type" {
  description = "The EC2 instance type"
  type        = string
  default     = "t2.medium"
}

variable "allowed_ssh_ip" {
  description = "The IP address allowed to SSH into the EC2 instance"
  type        = string
  default     = "0.0.0.0/0" # Consider restricting to your exact IP in production!
}

variable "slack_webhook_url" {
  description = "The Slack Incoming Webhook URL to send notifications to"
  type        = string
  sensitive   = true
}

variable "slack_team_id" {
  description = "The Slack Team ID (Workspace ID) for AWS Chatbot (must be authorized in console first)"
  type        = string
  default     = ""
}

variable "slack_channel_id" {
  description = "The Slack Channel ID where notifications will be sent"
  type        = string
  default     = ""
}

variable "log_group_name" {
  description = "The CloudWatch Log Group name where application logs will be shipped"
  type        = string
  default     = "/ec2/nginx-app-logs"
}

variable "log_retention_days" {
  description = "Number of days to retain logs in CloudWatch (0 = never expire)"
  type        = number
  default     = 30
}
        

Hardcoding values into your main.tf file is a quick way to build a proof of concept, but it is a terrible way to build production infrastructure. To make our Nginx reverse proxy truly reusable across different environments (like Dev, Staging, and Prod), we need to extract the configurable pieces into a variables.tf file.

By parameterizing our setup, anyone on the team can deploy this architecture with their own specific needs without touching the core logic. Here is a breakdown of how we make our module flexible:

Compute & Security Configurations

We start by defining the size of the server and who is allowed to access it.

  • instance_type: We default to a t2.medium, which provides a solid baseline of CPU and memory for a standard proxy handling moderate traffic.
  • allowed_ssh_ip: By default, this is wide open (0.0.0.0/0) for testing, but in a real-world scenario, you would override this variable during deployment to restrict SSH access to your office IP or corporate VPN.

Alerting Routing & Secrets Management

Because we are piping CloudWatch alarms into Slack, we need to pass in our workspace credentials.

  • slack_webhook_url: Notice that we flagged this variable with sensitive = true. This is a crucial Terraform security feature. It ensures that when you run a terraform plan or apply, your plain-text webhook URL is redacted from the console output, keeping your secrets out of your CI/CD logs.
  • slack_team_id & slack_channel_id: These allow us to optionally route alerts through the native AWS Chatbot instead of a raw webhook, provided the workspace has been authorized in the AWS Console.

Cost-Conscious Log Retention

Finally, we manage our observability footprint.

  • log_group_name: Defines exactly where in CloudWatch our Nginx JSON logs will live.
  • log_retention_days: CloudWatch storage costs can spiral out of control if you leave logs sitting around forever. By defaulting this to 30 days, we ensure our infrastructure cleans up after itself, balancing our need for historical debugging data with our AWS monthly bill.

The Nervous System: IAM, SSM, and Centralized Logging

cloudwatch.tf

# ──────────────────────────────────────────────────────────────────────────────
# IAM Role – allows the EC2 instance to write logs to CloudWatch
# ──────────────────────────────────────────────────────────────────────────────
resource "aws_iam_role" "cloudwatch_agent_role" {
  name = "nginx-cloudwatch-agent-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Action    = "sts:AssumeRole"
        Principal = { Service = "ec2.amazonaws.com" }
      }
    ]
  })

  tags = {
    Name        = "nginx-cloudwatch-agent-role"
    Environment = "dev"
  }
}

# AWS-managed policy that grants the CloudWatch agent all permissions it needs
resource "aws_iam_role_policy_attachment" "cloudwatch_agent_policy" {
  role       = aws_iam_role.cloudwatch_agent_role.name
  policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}

# Also attach SSM read access so the agent can pull its config from Parameter Store
resource "aws_iam_role_policy_attachment" "ssm_read_policy" {
  role       = aws_iam_role.cloudwatch_agent_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance profile wraps the IAM role so EC2 can use it
resource "aws_iam_instance_profile" "cloudwatch_agent_profile" {
  name = "nginx-cloudwatch-agent-profile"
  role = aws_iam_role.cloudwatch_agent_role.name
}

# ──────────────────────────────────────────────────────────────────────────────
# CloudWatch Log Group – where your application logs will live
# ──────────────────────────────────────────────────────────────────────────────
resource "aws_cloudwatch_log_group" "app_logs" {
  name              = var.log_group_name
  retention_in_days = var.log_retention_days

  tags = {
    Name        = var.log_group_name
    Environment = "dev"
  }
}

# ──────────────────────────────────────────────────────────────────────────────
# SSM Parameter – stores the CloudWatch agent JSON config
# The agent running on the EC2 will pull this on startup via fetch-config.
# Add / modify [[log_files]] entries below to ship additional log files.
# ──────────────────────────────────────────────────────────────────────────────
resource "aws_ssm_parameter" "cloudwatch_agent_config" {
  name  = "/cloudwatch-agent/config/nginx"
  type  = "String"
  value = jsonencode({
    agent = {
      metrics_collection_interval = 60
      run_as_user                 = "root"
    }
    logs = {
      logs_collected = {
        files = {
          collect_list = [
            # Nginx access log (JSON structured)
            {
              file_path        = "/var/log/nginx/access.json.log"
              log_group_name   = var.log_group_name
              log_stream_name  = "{instance_id}/nginx-access-json"
              timezone         = "UTC"
            },
            # Nginx error log
            {
              file_path        = "/var/log/nginx/error.log"
              log_group_name   = var.log_group_name
              log_stream_name  = "{instance_id}/nginx-error"
              timezone         = "UTC"
            },
            # ── Add YOUR application log path here ──────────────────────────
            # {
            #   file_path       = "/var/log/myapp/app.log"
            #   log_group_name  = var.log_group_name
            #   log_stream_name = "{instance_id}/app"
            #   timezone        = "UTC"
            # },
            # ────────────────────────────────────────────────────────────────
          ]
        }
      }
      log_stream_name = "{instance_id}/default"
    }
  })

  tags = {
    Name        = "cloudwatch-agent-config"
    Environment = "dev"
  }
}
        

An observable system is only as good as its ability to securely transmit data. Our EC2 instance needs to ship its Nginx logs and system metrics to CloudWatch, but by default, an AWS EC2 instance has zero permissions to do so.

In our cloudwatch.tf file, we wire up the "nervous system" of our infrastructure, securely granting our reverse proxy the exact permissions it needs and defining how the CloudWatch agent should behave.

Here is how we break down the security and logging configuration:

Secure Permissions via IAM

nstead of hardcoding AWS access keys onto the server (a massive security risk), we use an IAM Role and an Instance Profile.

  • aws_iam_role & Policies: We create a dedicated role (nginx-cloudwatch-agent-role) and attach two AWS-managed policies: CloudWatchAgentServerPolicy (to allow log/metric uploading) and AmazonSSMManagedInstanceCore (to allow the instance to securely read configurations from Systems Manager).

The Central Log Hub

Next, we define the destination for our data:

  • aws_cloudwatch_log_group: This creates the actual bucket in CloudWatch where our logs will live. Notice that it dynamically pulls the name and retention days from the variables we set up earlier, ensuring our logs are automatically purged after 30 days to save costs.

Centralized Agent Configuration via SSM

This is where the true power of configuration management shines. Instead of baking the CloudWatch Agent’s JSON configuration directly into the EC2 user_data script, we store it centrally in AWS Systems Manager (SSM) Parameter Store.

resource "aws_ssm_parameter" "cloudwatch_agent_config" {
  name  = "/cloudwatch-agent/config/nginx"
  type  = "String"
  value = jsonencode({
    # ... configuration details ...
  })
}        

Why do it this way?

  • Decoupling: By keeping the configuration in SSM, you separate the agent’s behavior from the server’s boot script.
  • Flexibility: Inside this JSON block, we instruct the agent to grab both the access.json.log and the standard error.log. If you decide later that you want to add a backend application log (like a .NET or Node.js log file), you simply update this SSM parameter. The next time an instance boots, it will automatically pull the updated tracking requirements without you needing to rewrite your core EC2 logic.

By structuring our logging this way, our reverse proxy boots up, securely identifies itself to AWS, asks SSM what logs it needs to track, and immediately starts streaming that data to CloudWatch.

The 2026 Standard: AWS Chatbot vs. Custom Lambdas

If you look at older AWS tutorials, you will often see a very different approach to Slack integration. Historically, connecting a CloudWatch SNS topic to Slack required writing custom “glue code” deploying an AWS Lambda function (usually written in Python or Node.js) to intercept the raw, ugly JSON payload from SNS, parse it, and format it into readable Slack blocks before posting it via a webhook.

In 2026, maintaining custom Lambda functions just to forward alerts is an anti-pattern. Here is why we explicitly chose the native aws_chatbot_slack_channel_configuration resource in our Terraform deployment:

Zero Glue Code to Maintain

Every Lambda function you deploy is a piece of software you have to own. You have to monitor its execution logs, manage its IAM permissions, and regularly update its runtime (like moving from Node 20 to Node 22) to avoid security deprecations. AWS Chatbot eliminates this toil entirely. It is a fully managed service that natively understands CloudWatch alarm schemas and formats them beautifully out of the box.

Native ChatOps Capabilities

AWS Chatbot is not just a one-way pager; it is bidirectional. By granting our Chatbot configuration an IAM role (aws_chatbot_slack_role), we aren't just sending alerts to Slack, we are enabling our team to interact with AWS from Slack.

When that Nginx CPU alarm fires in your Slack channel, an engineer can type an AWS CLI command directly into the Slack thread to retrieve the EC2 instance's top processes or check the auto-scaling group status, without ever logging into the AWS Management Console.

Infrastructure as Code Simplicity

As you can see in our main.tf, deploying Chatbot takes only a few lines of HCL. You authorize the Slack workspace once in the AWS console, and from then on, wiring up any new SNS topic to a Slack channel is as simple as passing the slack_team_id and slack_channel_id variables.

By skipping the custom Lambda, we keep our Terraform module lightweight, our architecture highly reliable, and our engineering focus on actual business value rather than maintaining notification scripts.


To view or add a comment, sign in

More articles by Sasanga Edirisinghe

Others also viewed

Explore content categories