Supersonic Performance with SQS and Lambda
I've been having some fun this weekend with SQS and AWS Lambda. In case you missed the announcement, SQS is now available as an event source for AWS Lambda.
The use case I had was trying to build a bulk downloader - put something in a queue, Lambda downloads it and saves it to S3.
The SQS visibility timeout is forced to match the Lambda function timeout - this makes sense so that messages that don't get processed in time can get processed by other functions reading from the queue.
My code was written in C#. I'm always pleasantly surprised by how fast code runs on Lambda, but this surprised me again - 10,000 images downloaded and saved into S3 in 3 minutes. There's something satisfying about seeing your queues empty and to signify work finishing. My AWS account has default Lambda scaling limits so this is performance that any customer can expect to get.
A couple of things I found helped:
- When doing Parallel.ForEach/Parallel.For/Parallel.Invoke, increasing the amount of memory (and therefore CPU) available to your function increases performance substantially. Lambda scales out but don't forget about performance gains using parallelism within the function itself.
- There are currently no official POCOs for serialization of streams from SQS. In C# unless your function input and output parameters are of type System.IO.Stream, you will need to serialize them. Here are the POCOs for SQS that you can use in your Lambda function arguments. I'm sure this is coming soon.
[DataContract]
public class SqsEvents
{
[DataMember]
public SqsEvent[] Records { get; set; }
}
[DataContract]
public class SqsEvent
{
[DataMember]
public string messageId
{
get;
set;
}
[DataMember]
public string md5OfBody
{
get;
set;
}
[DataMember]
public string eventSource
{
get;
set;
}
[DataMember]
public string eventSourceARN
{
get;
set;
}
[DataMember]
public string awsRegion
{
get;
set;
}
[DataMember]
public string body
{
get;
set;
}
[DataMember]
public SqsEventAttribute attributes
{
get;
set;
}
}
[DataContract]
public class SqsEventAttribute
{
[DataMember]
public int ApproximateReceiveCount
{
get;
set;
}
[DataMember]
public long SentTimestamp
{
get;
set;
}
[DataMember]
public string SenderId
{
get;
set;
}
[DataMember]
public long ApproximateFirstReceiveTimestamp
{
get;
set;
}
}
All I had to do was call JsonConvert.DeserializeObject after reading the input stream to get the full contents of the message from SQS. Remember, depending on your batch size, SQS may send your Lambda function multiple records in the same message which you can easily handle in your code.
public void FunctionHandler(Stream inputStream, ILambdaContext context)
{
TextReader textReader = new StreamReader(inputStream);
var strInput = textReader.ReadToEnd();
LambdaLogger.Log($"Received input {strInput}");
var message=JsonConvert.DeserializeObject<SqsEvents>(strInput);
// Could have multiple records in the message from SQS
if (message.Records == null || message.Records.Length <= 0)
return;
LambdaLogger.Log($"Received request to download {message.Records.Length} files.");
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = Environment.ProcessorCount * 2;
Parallel.ForEach(message.Records,options, (record) =>
{
var image=JsonConvert.DeserializeObject<ImageLocationObject>(Regex.Unescape(record.body));
DownloadAndSaveFile(image.ImageUrl);
});
}
void DownloadAndSaveFile(string fileUrl)
{
LambdaLogger.Log($"Received request to download {fileUrl}");
var strS3BucketName = System.Environment.GetEnvironmentVariable("S3BucketName");
var strS3Prefix = "RandomImages/";
Amazon.S3.AmazonS3Client s3Client = new Amazon.S3.AmazonS3Client();
PutObjectRequest putObjectRequest = new PutObjectRequest();
putObjectRequest.Key = strS3Prefix + Guid.NewGuid().ToString() + ".jpg";
putObjectRequest.BucketName = strS3BucketName;
// Download image and save to S3
HttpWebRequest webRequest = WebRequest.CreateHttp(fileUrl);
webRequest.Method = "GET";
using (WebResponse response = webRequest.GetResponse())
{
using (var stream = response.GetResponseStream())
{
// We need to make a copy of the stream in order to get around the content-length problem
using (MemoryStream msCopy = new MemoryStream())
{
stream.CopyTo(msCopy);
msCopy.Seek(0, SeekOrigin.Begin);
putObjectRequest.InputStream = msCopy;
var putResult = s3Client.PutObjectAsync(putObjectRequest).Result;
LambdaLogger.Log($"Successfully put object {putObjectRequest.Key} in S3.");
}
}
}
}
SQS event sources for Lambda have been a long-awaited feature. I'm really happy to know that the performance really stacks up too.
Nice!