Video Thumbnailer with AWS Lambda and Fargate using AWS CDK
- Published on
- Authors
- Name
- Binh Bui
- @bvbinh
We’ll build an application that extracts a thumbnail from a video using AWS Lambda and Fargate. The application will be able to process multiple videos in parallel and store the thumbnails in S3.
We’ll use the following AWS services:
- Lambda
- Fargate
- S3
- Docker
Prerequisites
Create and deploy the project
Create the project
- Make sure Docker is installed and running.
- Create a new directory and initialize a CDK project:
mkdir video-thumbnailer
cd video-thumbnailer
cdk init app --language typescript
Cloud infrastructure
- Replace the contents of
lib/video-thumbnailer-stack.ts
with the following:
// A simple cluster to run our tasks in. We will import default VPC and use default cluster.
// You can also create your own VPC and/or cluster.
const cluster = new ecs.Cluster(this, 'Cluster', {
vpc: ec2.Vpc.fromLookup(this, 'VPC', { isDefault: true })
});
// A bucket to store videos and thumbnails.
const bucket = new s3.Bucket(this, 'Bucket', {
// following settings for demo purposes only. DON'T USE IN PRODUCTION!
removalPolicy: cdk.RemovalPolicy.DESTROY,
autoDeleteObjects: true
});
// a task role that allows the task to access the bucket
const taskRole = new iam.Role(this, 'TaskRole', {
assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com'),
description: 'Allows the task to access the bucket'
});
bucket.grantReadWrite(taskRole);
// A task definition that will run FFMPEG on Fargate
const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
memoryLimitMiB: 512,
cpu: 256,
taskRole: taskRole,
family: 'video-thumbnailer'
});
// Add FFMPEG container to the task definition
const ffmpegContainer = taskDefinition.addContainer('ffmpeg', {
image: ecs.ContainerImage.fromAsset(
path.join(__dirname, './worker-docker-ffmpeg-thumb'),
{
file: 'Dockerfile',
// because jrottenberg/ffmpeg doesn't support ARM64, we need to use X86_64
platform: assets.Platform.LINUX_AMD64 // <--- this is the important part
}
),
logging: new ecs.AwsLogDriver({
streamPrefix: 'ffmpeg',
logRetention: logs.RetentionDays.ONE_WEEK
})
});
// === lambda ===
// A lambda function that will be triggered by S3 object creation event
const onNewVideoFn = new NodejsFunction(this, 'OnNewVideoLambda', {
runtime: lambda.Runtime.NODEJS_18_X,
entry: path.join(__dirname, './lambda/index.ts'),
handler: 'onNewVideoHandler',
architecture: lambda.Architecture.ARM_64,
description: 'A lambda function that triggers on new video upload',
environment: {
FARGATE_TASK_DEFINITION: taskDefinition.taskDefinitionArn,
FARGATE_CONTAINER_NAME: ffmpegContainer.containerName,
FARGATE_CLUSTER: cluster.clusterName,
FARGATE_SUBNET_IDS: cluster.vpc.publicSubnets
.map((s) => s.subnetId)
.join(',')
}
});
// set permissions to allow lambda to run the task
taskDefinition.grantRun(onNewVideoFn);
// trigger lambda on new video upload
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(onNewVideoFn),
{ suffix: '.mp4' }
);
// a lambda function triggers on new thumbnail created and uploaded to the bucket
const onNewThumbnailFn = new NodejsFunction(this, 'OnNewThumbnailLambda', {
runtime: lambda.Runtime.NODEJS_18_X,
entry: path.join(__dirname, './lambda/index.ts'),
handler: 'onNewThumbnailHandler',
architecture: lambda.Architecture.ARM_64,
description: 'A lambda function that triggers on new thumbnail upload'
});
// allow lambda to read from the bucket
bucket.grantRead(onNewThumbnailFn);
// trigger lambda on new thumbnail upload
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(onNewThumbnailFn),
{
suffix: '.jpg'
}
);
The code above creates a cluster
, a bucket
, a task definition
, and two lambda functions
. The lambda functions will be triggered by S3 events. The first lambda function will run a task on Fargate. The second lambda function will be triggered when the task finishes and a thumbnail is uploaded to the bucket.
Fargate work er
- Create a new directory
worker-docker-ffmpeg-thumb
and add the following files:
mkdir worker-docker-ffmpeg-thumb
touch worker-docker-ffmpeg-thumb/Dockerfile
touch worker-docker-ffmpeg-thumb/ffmpeg-thumb.sh
# add the following code to worker-docker-ffmpeg-thumb/Dockerfile
FROM jrottenberg/ffmpeg
RUN apt-get update --yes && \
apt-get install curl unzip --yes && \
apt-get clean --yes
# Install AWS v2 cli
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install && \
rm -rf aws
WORKDIR /tmp/workdir
# Copy entrypoint.sh to workdir
COPY entrypoint.sh entrypoint.sh
# make entrypoint.sh executable by default
RUN chmod +x entrypoint.sh
ENTRYPOINT ["./entrypoint.sh"]
# add the following code to worker-docker-ffmpeg-thumb/entrypoint.sh
#!/bin/bash
# test aws cli installation
echo "Testing aws cli installation..."
aws --version
echo "Starting ffmpeg task..." && \
echo "Copying video from s3://${S3_BUCKET}/${INPUT_VIDEO} to ${INPUT_VIDEO}..." && \
aws s3 cp s3://${S3_BUCKET}/${INPUT_VIDEO} ./${INPUT_VIDEO} && \
# get time duration of the video. it will be used to calculate the time offset
# echo "Getting video duration..." && \
# DURATION=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 ./${INPUT_VIDEO}) && \
# run ffmpeg
ffmpeg -v error -i ./${INPUT_VIDEO} -ss ${TIME_OFFSET} -vframes 1 -f image2 -an -y ${OUTPUT_FILE} && \
echo "Copying thumbnail to S3://${S3_BUCKET}/${OUTPUT_FILE} ..." && \
aws s3 cp ./${OUTPUT_FILE} s3://${S3_BUCKET}/${OUTPUT_FILE}
This Dockerfile uses the jrottenberg/ffmpeg
image as the base image. It installs the AWS CLI and copies the entrypoint.sh
script to the container. The entrypoint.sh
script will be executed when the container starts. It will download the video file from S3, run the FFMPEG command, and upload the thumbnail to S3.
Lambda code
- Create a new directory
lambda
and add the following files:
mkdir lambda
touch lambda/index.ts
# add the following code to lambda/index.ts
/**
* handle new video event when a new video is uploaded to the bucket
* @param event
*/
export const onNewVideoHandler = async (event: S3Event) => {
console.log('onNewVideoHandler', event);
// When a new video is uploaded, run the FFMPEG task on the video file.
// Use the time index specified in the filename (e.g. cat_00-01.mp4 uses timestamp 00:01)
if (!event.Records || event.Records.length === 0) {
console.log('*** New video: no records found.');
return;
}
console.log('environment variables', {
FARGATE_CLUSTER,
FARGATE_TASK_DEFINITION,
FARGATE_CONTAINER_NAME,
FARGATE_SUBNET_IDS
});
// loop through all the records
for (const record of event.Records) {
console.log(
`*** New video: file ${record.s3.object.key} was uploaded at ${record.eventTime}.`
);
const file = record.s3.object.key;
const bucket = record.s3.bucket.name;
const thumbnailFile = file.substring(0, file.indexOf('_')) + '.jpg';
// we need to upload video with the name of cat_00-01.mp4
// if the file name is cat_00-01.mp4, then framePos is 00:01
let framePos = file
.substring(file.indexOf('_') + 1, file.indexOf('.'))
.replace('-', ':');
// if framePos is not a valid time index, set it to 00:00
if (!framePos.match(/\d\d:\d\d/)) {
framePos = '00:00';
}
console.log(
`*** New video: thumbnail file ${thumbnailFile} at ${framePos}.`
);
// run ffmpegThumbnailTask on the video file
const task = new FargateTask({
taskDefinition: FARGATE_TASK_DEFINITION,
cluster: FARGATE_CLUSTER,
networkConfiguration: {
awsvpcConfiguration: {
subnets: FARGATE_SUBNET_IDS.split(','),
assignPublicIp: 'ENABLED'
}
}
});
await task.run({
overrides: {
containerOverrides: [
{
name: FARGATE_CONTAINER_NAME,
environment: [
{
name: 'S3_BUCKET',
value: bucket
},
{
name: 'INPUT_VIDEO',
value: file
},
{
name: 'TIME_OFFSET',
value: framePos
},
{
name: 'OUTPUT_FILE',
value: thumbnailFile
}
]
}
]
}
});
console.log(`*** New video: thumbnail task started for ${file}.`);
}
return {
statusCode: 200
};
};
/**
* handle new thumbnail event when a new thumbnail created for a video and uploaded to the bucket
* @param event
*/
export const onNewThumbnailHandler = async (event: S3Event) => {
// When a new thumbnail is created, log a message.
console.log('onNewThumbnailHandler', event);
// loop through all the records
for (const record of event.Records) {
console.log(
`*** New thumbnail: file ${record.s3.object.key} was uploaded at ${record.eventTime}.`
);
}
return {
statusCode: 200
};
};
The Lambda function
onNewVideoHandler
is triggered whenever a new .mp4 video file is uploaded to the S3 bucket. The Lambda extracts the time index that is encoded in the video filename (in the form file_mm-ss) and launches the container task.The Lambda function
onNewThumbnailHandler
is triggered when a new .jpg thumbnail file is uploaded to the S3 bucket, and prints a message to the log file.
Deploy the application
Preview and deploy changes via cdk deploy
, which will take a few minutes.
cdk deploy
Test the application
- Download a sample video file from Sample and upload it to the S3 bucket.
aws s3 cp ./images/big_buck_bunny_00-01.mp4 s3://<your-bucket-name>/big_buck_bunny_00-01.mp4
- Check the CloudWatch log for the Lambda function
onNewVideoHandler
andonNewThumbnailHandler
.
# check the log for the Lambda function onNewVideoHandler
aws logs tail /aws/lambda/ffmpeg-thumbnailer-stack-onNewVideoHandler --follow
# check the log for the Lambda function onNewThumbnailHandler
aws logs tail /aws/lambda/ffmpeg-thumbnailer-stack-onNewThumbnailHandler --follow
After a few minutes, you should see new thumbnails big_buck_bunny.jpg
in the S3 bucket.
Clean up
That's it! To clean up the resources created by this example, run the following command:
cdk destroy
The source code for this example can be found here