Automating File Processing with AWS Lambda, S3, DynamoDB, and SNS: A Serverless Solution

In the modern cloud architecture, serverless solutions allow you to build highly scalable and cost-efficient applications. AWS Lambda is a core component of serverless computing, enabling you to execute code in response to events without the need to manage servers.

In this article, we will explore how to use AWS Lambda in conjunction with Amazon S3, DynamoDB, and Simple Notification Service (SNS) to automatically process files that are uploaded to an S3 bucket. The entire flow will be:

  1. Upload a file to an S3 bucket.
  2. Trigger a Lambda function to process the file.
  3. Store file details in DynamoDB.
  4. Send an SNS notification about the file being processed.
  5. Delete the file from the S3 bucket.

We will use Node.js for the Lambda function and Terraform to provision the infrastructure.

Prerequisites

  • AWS Account: You'll need an AWS account with the necessary permissions to create Lambda functions, S3 buckets, DynamoDB tables, and SNS topics.
  • Terraform: We will use Terraform for provisioning the infrastructure.
  • Node.js: Our Lambda function will be written in Node.js.

Step 1: Set up the Infrastructure Using Terraform

Terraform will allow us to easily provision and manage AWS resources.

  1. Terraform Configuration File (main.tf)

provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "file_upload_bucket" {
  bucket = "my-file-upload-bucket"
}

resource "aws_dynamodb_table" "file_details" {
  name           = "file-details-table"
  hash_key       = "file_id"
  read_capacity  = 5
  write_capacity = 5

  attribute {
    name = "file_id"
    type = "S"
  }

  attribute {
    name = "upload_time"
    type = "S"
  }

  attribute {
    name = "status"
    type = "S"
  }

  provisioned_throughput {
    read_capacity_units  = 5
    write_capacity_units = 5
  }
}

resource "aws_sns_topic" "file_processed_topic" {
  name = "file-processed-topic"
}

resource "aws_lambda_function" "process_file_lambda" {
  function_name = "process-file-lambda"
  runtime       = "nodejs14.x"
  handler       = "index.handler"
  role          = aws_iam_role.lambda_exec_role.arn
  s3_bucket     = aws_s3_bucket.lambda_code_bucket.bucket
  s3_key        = "lambda_code.zip"

  environment {
    variables = {
      DYNAMODB_TABLE = aws_dynamodb_table.file_details.name
      SNS_TOPIC_ARN  = aws_sns_topic.file_processed_topic.arn
    }
  }
}

resource "aws_lambda_permission" "allow_s3_trigger" {
  statement_id  = "AllowS3Trigger"
  action        = "lambda:InvokeFunction"
  principal     = "s3.amazonaws.com"
  function_name = aws_lambda_function.process_file_lambda.function_name
  source_arn    = aws_s3_bucket.file_upload_bucket.arn
}

resource "aws_s3_bucket_object" "lambda_code" {
  bucket = aws_s3_bucket.lambda_code_bucket.bucket
  key    = "lambda_code.zip"
  source = "path_to_your_lambda_code/lambda_code.zip"
}

resource "aws_s3_bucket" "lambda_code_bucket" {
  bucket = "my-lambda-code-bucket"
}

resource "aws_s3_bucket_notification" "s3_to_lambda" {
  bucket = aws_s3_bucket.file_upload_bucket.bucket
  lambda_function {
    events = ["s3:ObjectCreated:*"]
    filter_prefix = "uploads/"
    lambda_function_arn = aws_lambda_function.process_file_lambda.arn
  }
}
        

Step 2: Write the Lambda Function in Node.js

We will create a Lambda function in Node.js that processes files, stores the file details in DynamoDB, sends an SNS notification, and deletes the file from S3.

Lambda Function Code (index.js)

const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const dynamoDB = new AWS.DynamoDB.DocumentClient();
const sns = new AWS.SNS();

// DynamoDB Table and SNS Topic
const TABLE_NAME = process.env.DYNAMODB_TABLE;
const SNS_TOPIC_ARN = process.env.SNS_TOPIC_ARN;

exports.handler = async (event) => {
  console.log('Received event:', JSON.stringify(event, null, 2));

  const s3Event = event.Records[0].s3;
  const bucketName = s3Event.bucket.name;
  const fileName = s3Event.object.key;
  
  // Extract file details
  const fileId = fileName.split('/')[1]; // Assuming file path is "uploads/{fileId}/{fileName}"
  const uploadTime = new Date().toISOString();

  try {
    // Step 1: Process the file
    console.log(`Processing file: ${fileName}`);

    // In this example, the file is simply being processed by logging the details.
    // You can add more file processing logic here, like parsing the file, etc.

    // Step 2: Store the file details in DynamoDB
    const dynamoDbParams = {
      TableName: TABLE_NAME,
      Item: {
        file_id: fileId,
        file_name: fileName,
        upload_time: uploadTime,
        status: 'Processed'
      }
    };

    await dynamoDB.put(dynamoDbParams).promise();

    // Step 3: Send notification via SNS
    const snsMessage = `File ${fileName} has been processed and stored successfully.`;
    const snsParams = {
      Message: snsMessage,
      TopicArn: SNS_TOPIC_ARN
    };

    await sns.publish(snsParams).promise();

    // Step 4: Delete the file from S3
    const s3DeleteParams = {
      Bucket: bucketName,
      Key: fileName
    };

    await s3.deleteObject(s3DeleteParams).promise();
    console.log(`File ${fileName} has been deleted from S3.`);

    return {
      statusCode: 200,
      body: JSON.stringify('File processed successfully!')
    };
  } catch (error) {
    console.error('Error processing file:', error);
    return {
      statusCode: 500,
      body: JSON.stringify('Error processing file')
    };
  }
};
        

Step 3: Explanation of the Lambda Code

  1. S3 Event Trigger: The Lambda function is triggered by an S3 event, specifically when a new file is uploaded to the uploads/ directory in the S3 bucket.
  2. File Processing: The file's metadata (name, upload time) is extracted, and in this example, the file is processed by logging the file's details. This is where you would add your specific file processing logic (e.g., parsing, analyzing, or converting the file content).
  3. DynamoDB Integration: The file details (file ID, file name, upload time, and status) are stored in DynamoDB.
  4. SNS Notification: After processing the file, an SNS notification is sent out with the message indicating that the file has been processed.
  5. S3 File Deletion: After processing the file, the Lambda function deletes the file from the S3 bucket to avoid unnecessary storage costs.

Step 4: Deploy and Test the Solution

  1. Upload the Terraform Configuration: Deploy the Terraform code by running the following commands:
  2. Deploy Lambda Code: Use the AWS console or CLI to upload your Lambda function code (ZIP format) to the S3 bucket specified in your Terraform configuration.
  3. Upload a File to S3: Once the infrastructure is deployed, upload a file to the uploads/ directory of your S3 bucket to trigger the Lambda function.
  4. Check DynamoDB: After the Lambda function processes the file, check your DynamoDB table to verify that the file details have been stored.
  5. Check SNS Notifications: Verify that an SNS notification has been sent out about the file processing.
  6. File Deletion: Finally, check that the uploaded file has been deleted from the S3 bucket after processing.

Conclusion

In this article, we demonstrated how to use AWS Lambda in combination with Amazon S3, DynamoDB, and SNS to create a fully serverless solution that automatically processes files uploaded to an S3 bucket. The process includes storing metadata in DynamoDB, sending notifications via SNS, and cleaning up files from S3.

By using AWS Lambda for event-driven computing, this solution scales automatically and efficiently. It also highlights the ease of using Terraform to provision and manage the infrastructure.

This architecture is ideal for various use cases, such as processing images, analyzing documents, or performing data transformations on uploaded files.

To view or add a comment, sign in

More articles by SHAMAIL ABBAS

Others also viewed

Explore content categories