Automated Monthly EC2 Snapshot Backup with AWS Lambda + CloudWatch Events

🌏 閱讀中文版本


In cloud architecture, backup is a critical component for ensuring data security. AWS EC2 snapshots provide an easy way to create backups for EC2 instances. However, manually creating snapshots can be time-consuming and error-prone. Using AWS Lambda with CloudWatch Events for automated backups is an ideal solution.

This article will guide you step-by-step on how to:

  • Create a Lambda function to execute snapshot operations
  • Configure CloudWatch Events to schedule monthly backups
  • Verify snapshot creation success

Why Automate EC2 Backups?

Use Cases:

  • Disaster Recovery: Quickly restore EC2 instances from snapshots when failures occur
  • Compliance Requirements: Many industry regulations require regular data backups
  • Data Migration: Migrate EC2 instances to different regions or accounts
  • Version Management: Maintain system states at different points in time

Why Choose Lambda + CloudWatch Events?

  • Serverless Architecture: No need to manage backup execution servers
  • Cost-Effective: Pay only for execution time, no cost when idle
  • High Reliability: AWS managed services guarantee high availability
  • Flexible Scheduling: Configure monthly, weekly, daily, or custom frequencies

Architecture Overview

System Components:

  • Lambda Function: Core logic for snapshot operations
  • CloudWatch Events (EventBridge): Define schedule (e.g., monthly execution)
  • IAM Role: Provide Lambda with necessary EC2 permissions
  • CloudWatch Logs: Record execution logs and error messages
  • EC2 Snapshots: Store backup data

Workflow:

  1. CloudWatch Events triggers Lambda function based on schedule
  2. Lambda function obtains EC2 permissions through IAM role
  3. Lambda reads all volumes from specified EC2 instance
  4. Creates snapshot for each volume
  5. Verifies snapshot status until completion
  6. Records execution results to CloudWatch Logs

Step 1: Create Lambda Function

  1. Log in to AWS Lambda Console
  2. Click Create function
  3. Configure the following parameters:
    • Function name: MonthlyEC2Backup
    • Runtime: Python 3.12 (or latest version)
    • Permissions: Select Create a new role with basic Lambda permissions
  4. Click Create function

Step 2: Write Lambda Code

In the Function code section, add the following code:

import boto3
import datetime
import time

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')

    # Set EC2 instance IDs to backup
    instances = ['<INSTANCE_ID>']  # Replace with your EC2 instance ID

    for instance_id in instances:
        # Create snapshot description
        description = f"Backup of {instance_id} - {datetime.datetime.now().strftime('%Y-%m-%d')}"

        # Get all volumes associated with the instance
        volumes = ec2.describe_volumes(Filters=[
            {'Name': 'attachment.instance-id', 'Values': [instance_id]}
        ])

        # Create snapshots for each volume and check status
        for volume in volumes['Volumes']:
            volume_id = volume['VolumeId']

            print(f"Creating snapshot for volume: {volume_id}")

            # Create snapshot
            response = ec2.create_snapshot(
                VolumeId=volume_id,
                Description=description
            )

            snapshot_id = response['SnapshotId']
            print(f"Snapshot {snapshot_id} created for volume {volume_id}")

            # Verify snapshot status
            print(f"Verifying snapshot {snapshot_id} status...")
            while True:
                snapshot_status = ec2.describe_snapshots(
                    SnapshotIds=[snapshot_id]
                )['Snapshots'][0]['State']

                if snapshot_status == 'completed':
                    print(f"Snapshot {snapshot_id} for volume {volume_id} completed successfully!")
                    break
                elif snapshot_status == 'error':
                    print(f"Snapshot {snapshot_id} for volume {volume_id} failed!")
                    break
                else:
                    print(f"Snapshot {snapshot_id} is in progress...")
                    time.sleep(5)

    return {
        'statusCode': 200,
        'body': 'EC2 backup completed successfully'
    }

Code Explanation:

  • boto3.client('ec2'): Create EC2 client
  • describe_volumes(): Get all volumes from EC2 instance
  • create_snapshot(): Create snapshot
  • describe_snapshots(): Check snapshot status
  • time.sleep(5): Check status every 5 seconds

Important: Replace <INSTANCE_ID> with your actual instance ID (e.g., i-0abcd1234efgh5678), then click Deploy.

Step 3: Configure IAM Permissions

  1. Go to Lambda function’s ConfigurationPermissions
  2. Click the execution role link
  3. In IAM Console, click Add permissionsCreate inline policy
  4. Use the following JSON policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:CreateSnapshot",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

Permission Explanation:

  • ec2:DescribeVolumes: Read volume information
  • ec2:CreateSnapshot: Create snapshots
  • ec2:DescribeSnapshots: Check snapshot status
  • ec2:CreateTags: Add tags to snapshots (optional)
  • logs:*: Write to CloudWatch Logs

Step 4: Create CloudWatch Events Schedule

  1. Go to Amazon EventBridge ConsoleRules
  2. Click Create rule
  3. Configure rule:
    • Name: MonthlyEC2BackupSchedule
    • Rule type: Schedule
  4. Set schedule pattern:
    • Select Cron-based schedule
    • Enter schedule expression: cron(0 0 1 * ? *)
  5. Select target:
    • Target: Lambda function
    • Function: MonthlyEC2Backup
  6. Click Create rule

Cron Expression Explanation:

cron(0 0 1 * ? *)
     │ │ │ │ │ │
     │ │ │ │ │ └─ Year (* means any year)
     │ │ │ │ └─── Day of week (? means not specified)
     │ │ │ └───── Month (* means every month)
     │ │ └─────── Day (1 means 1st of month)
     │ └───────── Hour (0 means midnight)
     └─────────── Minute (0 means on the hour)

Other Common Schedule Examples:

  • Every Sunday at 2 AM: cron(0 2 ? * SUN *)
  • Every day at 3 AM: cron(0 3 * * ? *)
  • 15th of every month at 1 AM: cron(0 1 15 * ? *)

Step 5: Testing and Verification

Test Lambda Function

  1. In Lambda Console, click Test
  2. Create test event (can use empty JSON: {})
  3. Execute test and check output

Check CloudWatch Logs

  1. Go to CloudWatch ConsoleLog groups
  2. Find /aws/lambda/MonthlyEC2Backup
  3. Check execution logs for error messages

Verify Snapshots

  1. Go to EC2 ConsoleSnapshots
  2. Confirm snapshot status is Completed
  3. Check snapshot description and creation time

Common Issues and Solutions

Issue 1: Lambda Timeout

Cause:

  • Default Lambda timeout is 3 seconds
  • Creating large snapshots requires more time

Solution:

  • Adjust Timeout in Lambda ConfigurationGeneral configuration to 5-10 minutes
  • Or remove snapshot status verification loop, use asynchronous processing

Issue 2: Permission Denied Error

Error Message:

An error occurred (UnauthorizedOperation) when calling the CreateSnapshot operation

Solution:

  • Confirm IAM role includes ec2:CreateSnapshot permission
  • Check for SCP (Service Control Policy) restrictions

Issue 3: How to Backup Multiple EC2 Instances?

Solution:

Expand instances to a list:

instances = [
    'i-0abcd1234efgh5678',
    'i-0ijkl5678mnop1234',
    'i-0qrst9012uvwx3456'
]

Issue 4: How to Automatically Clean Up Old Snapshots?

Solution:

Add cleanup logic to Lambda function:

# Delete snapshots older than 30 days
retention_days = 30
cutoff_date = datetime.datetime.now() - datetime.timedelta(days=retention_days)

snapshots = ec2.describe_snapshots(OwnerIds=['self'])['Snapshots']
for snapshot in snapshots:
    if snapshot['StartTime'].replace(tzinfo=None) < cutoff_date:
        print(f"Deleting old snapshot: {snapshot['SnapshotId']}")
        ec2.delete_snapshot(SnapshotId=snapshot['SnapshotId'])

Cost Considerations

Cost Components:

  • Lambda Execution: Free tier includes 1 million requests + 400,000 GB-seconds compute time per month
  • EBS Snapshot Storage: Approximately $0.05 per GB per month (varies by region)
  • CloudWatch Logs: Approximately $0.50 per GB

Cost Estimation Example:

Assuming backup of 1 EC2 instance with 100GB, executed once per month:

  • Lambda execution cost: Nearly $0 (within free tier)
  • Snapshot storage cost: 100GB × $0.05 = $5.00/month
  • CloudWatch Logs: Negligible (< $0.10)

Cost Optimization Recommendations:

  • Regularly clean up old snapshots (e.g., retain last 3 months)
  • Use EBS Snapshot Lifecycle Management (Data Lifecycle Manager)
  • Evaluate need for cross-region snapshot replication (adds transfer costs)

Advanced Optimizations

1. Add SNS Notifications

Send notification after snapshot completion:

sns = boto3.client('sns')
sns.publish(
    TopicArn='arn:aws:sns:us-east-1:123456789012:EC2BackupNotification',
    Subject='EC2 Backup Completed',
    Message=f'Snapshot {snapshot_id} created successfully'
)

2. Add Tags to Snapshots

ec2.create_tags(
    Resources=[snapshot_id],
    Tags=[
        {'Key': 'Name', 'Value': f'Backup-{instance_id}'},
        {'Key': 'CreatedBy', 'Value': 'Lambda'},
        {'Key': 'BackupDate', 'Value': datetime.datetime.now().strftime('%Y-%m-%d')}
    ]
)

3. Use Environment Variables

Set instance IDs as environment variables to avoid hardcoding:

import os
instances = os.environ['INSTANCE_IDS'].split(',')

Conclusion

Through this implementation, you have learned to:

  • Build automated EC2 backup system: Using Lambda + CloudWatch Events
  • Configure appropriate IAM permissions: Ensuring security and least privilege principle
  • Verify snapshot status: Ensuring backup completion
  • Handle common issues: Resolve permissions, timeouts, etc.
  • Optimize costs: Reduce expenses through automatic cleanup of old snapshots

Next Steps:

  • Implement cross-region snapshot replication for enhanced disaster recovery
  • Integrate AWS Backup service for centralized management
  • Configure CloudWatch alarms to monitor backup failures
  • Establish snapshot restore testing procedures to verify backup usability

Related Articles

Leave a Comment