Serverless S3 File Processor using Shared Lambda Layer | AWS S3 Lambda Project

By Sowjanyashree N R

Published: July 01, 2025


Introduction

In cloud-native applications, processing files stored in Amazon S3 is a common requirement across a range of industries—from data analytics and document management to content pipelines and image processing. However, many real-world systems deal with more than just one file format. CSV logs, PDF documents, and image uploads often arrive at the same entry point, requiring format-specific processing.

Traditionally, developers either write monolithic code to handle all formats or duplicate file-handling logic across multiple services. This leads to brittle systems, poor maintainability, and difficulty scaling.

This project introduces a modular, event-driven file processing architecture designed to dynamically handle multiple file types in a serverless environment using AWS Lambda, Java 17, AWS SAM, and a shared Lambda layer. It centralizes reusable logic, avoids deployment pitfalls like circular dependencies, and ensures clean separation between components.


Project Objective

The primary objective of this system is to enable automated processing of different file types uploaded to S3 by delegating the handling of each format to a dedicated Lambda function. To avoid duplication and promote reuse, all file-processing Lambdas share a centralized utility layer for downloading and caching files.

A key architectural challenge arises when trying to configure S3 event notifications that target multiple Lambda functions using infrastructure-as-code tools like AWS SAM. This project resolves this by introducing a custom resource Lambda function that dynamically sets up S3 triggers during deployment, bypassing the circular dependency that often occurs between S3 and Lambda resources in a single stack.

Why This Approach?

  • It avoids code duplication by using a shared Lambda layer for common logic.
  • It enables separation of concerns, with each file type having its own processing logic.
  • It solves the SAM deployment circular dependency issue using a dynamic configuration approach.
  • It provides a clean foundation that can be extended to support additional file types or downstream workflows.


Architecture Diagram

Article content
Serverless S3 File Processor using Shared Lambda Layer

Prerequisites

To deploy and run this project, ensure the following prerequisites are met:

An AWS account with permissions to create and manage:

  • Lambda functions
  • S3 buckets and event notifications
  • IAM roles and policies
  • CloudFormation custom resources

Installed tools:

  • Java Development Kit (JDK) 17 or later
  • Apache Maven 3.8.x or later
  • AWS CLI configured with appropriate credentials
  • AWS SAM CLI (for building and deploying the application)


Services Used

This project uses the following AWS services:

  • Amazon S3 – For file storage and event notification source
  • AWS Lambda – To process files and manage configuration logic
  • AWS Lambda Layers – To share file-handling utilities across functions
  • AWS SAM (Serverless Application Model) – For infrastructure-as-code and deployment
  • Amazon CloudWatch Logs – For monitoring Lambda function execution
  • IAM – For defining scoped access between services


Working Flow Overview

  1. Files of different type are uploaded to the designated S3 bucket in the AWS S3 File Processor Bucket console.
  2. Based on the file extension, an S3 event notification triggers one of three Lambda functions: CSVFileProcessor , PDFFileProcessor , ImageFileProcessor
  3. Each processor invokes a common method in the Shared Lambda Layer:
  4. The processor then performs format-specific logic:
  5. Processing logs are emitted to Amazon CloudWatch.

File Upload to S3 Bucket

User uploads a file (e.g., .csv, .pdf, .jpg) to a designated folder (/input/) in the S3 bucket.

Article content
Uploaded test files in file-processor-s3-bkt bucket

Automatic Trigger Based on File Type

S3 automatically triggers the appropriate processor Lambda function based on the file extension.

Article content
S3 event notifications configured for suffix-based triggers to Lambda functions: CSVFileProcessor, PDFFileProcessor, and ImageFileProcessor

Processor Lambda Invokes Shared Lambda Layer

The triggered Lambda function uses a Shared Lambda Layer to download the file from S3 and cache it in /tmp

Article content
Lambda functions attached with Shared Lambda Layer

Format-Specific File Processing

Each processor performs logic specific to the file type (e.g., CSV row parsing, PDF text extraction, or image verification

Article content
CloudWatch Logs output showing CSV rows parsed


Article content
CloudWatch Logs output showing PDF text extracted


Article content
CloudWatch Logs output showing image type validated( Eg: .png file)

Execution Logging and Cleanup

After processing, temporary files in the /tmp directory are deleted, marking the completion of execution.

Article content
CSVProcessor log confirming cleanup of /tmp directory after successful file processing.
Article content
PDFProcessor log confirming cleanup of /tmp directory after successful file processing.
Article content
ImageProcessor log confirming cleanup of /tmp directory after successful file processing.

Benefits of the Architecture

  • Modularity and Maintainability : Each file type has a dedicated Lambda processor, avoiding complex conditional logic and enhancing maintainability.
  • Code Reuse via Shared Layer : All file download, caching, and cleanup logic resides in a shared layer, reducing duplication and improving consistency.
  • Dynamic Event Configuration : By using a Lambda-backed custom resource, the project avoids circular dependencies during deployment and enables dynamic trigger configuration.
  • Cost-Efficient and Scalable : Built entirely using managed, serverless components, the system scales automatically with incoming files and incurs charges only when processing occurs.
  • Stateless Execution with Secure Temporary Storage : Utilizes the Lambda /tmp directory for ephemeral storage, which is secure, isolated, and wiped between container reuse cycles.


Conclusion

This project demonstrates a clean, extensible, and production-ready architecture for handling multi-format S3 file uploads in a serverless environment. It emphasizes separation of concerns, code reuse, and deploy ability through dynamic resource configuration and shared logic.

Designed using Java 17 and AWS best practices, it serves as a foundational blueprint for any cloud-native system that processes incoming files with format-specific logic.

Whether you are building a document ingestion pipeline, image classifier, or data ingestion framework, this architecture is a proven and reliable pattern worth adopting.


🔗 Explore the full project here: Serverless-S3-File-Processor-Using-Shared-Lambda-Layer



Good Approach! irl, Using Design patterns like combination of Strategy and Factory patterns will solve this problem, this is widely used in the industries as of now.

To view or add a comment, sign in

More articles by Sowjanyashree N R

Others also viewed

Explore content categories