Delete Files from S3 Quickly with Python (asyncio)

Managing large amounts of data on AWS S3 can be challenging, especially when you need to delete massive files quickly. In my scenario, I had 13 TB of files to delete and needed to speed up the process. Python, along with the asyncio library, provided an efficient solution. Here’s a guide on how to leverage Python and asyncio to delete files from S3 quickly.

What is Python?

Python is a high-level, interpreted programming language known for its readability and versatility. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python’s extensive standard and community-driven libraries make it popular for various applications, from web development to data science and automation.

What is Asyncio?

asyncio is a Python library that provides asynchronous programming support, allowing you to write concurrent code using the async/await syntax. Asynchronous programming is ideal for I/O-bound tasks, such as network requests, where waiting for a response can be done without blocking the execution of other tasks. This can lead to significant performance improvements in scenarios involving many I/O operations, like our use case of deleting files from S3.

Setting Up Your Environment

To get started, ensure you have Python installed. You can download Python from the official website. Additionally, you’ll need to install the aioboto3 library, which is an asynchronous version of the boto3 library for AWS services.

You can install aioboto3 using pip:

pip install aioboto3

The Code

Here’s the complete Python script to delete files from S3 using asyncio:

bucket_names = [
    "my-bucket-1",
    "my-bucket-2"
]

import aioboto3
import asyncio
from botocore.exceptions import ClientError

async def list_objects(s3_client, bucket_name, prefix):
    objects = []
    try:
        paginator = s3_client.get_paginator('list_objects_v2')
        async for page in paginator.paginate(Bucket=bucket_name, Prefix=prefix):
            for obj in page.get('Contents', []):
                objects.append(obj['Key'])
    except ClientError as e:
        print(f"{bucket_name} {prefix}: error listing objects: {e}")
    return objects

async def delete_objects(s3_client, bucket_name, objects):
    try:
        delete_requests = [{'Key': obj} for obj in objects]
        await s3_client.delete_objects(
            Bucket=bucket_name,
            Delete={'Objects': delete_requests}
        )
        print(f"{bucket_name}: deleted {len(objects)} objects")
    except ClientError as e:
        print(f"{bucket_name}: error deleting objects: {e}")

async def batch_delete(bucket_name, prefix, aws_access_key_id, aws_secret_access_key, aws_region):
    session = aioboto3.Session(
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key,
        region_name=aws_region
    )
    async with session.client('s3') as s3_client:
        while True:
            objects = await list_objects(s3_client, bucket_name, prefix)
            if not objects:
                print(f"{bucket_name} {prefix}: No more objects to delete.")
                break
            tasks = []
            for i in range(0, len(objects), 10):
                batch = objects[i:i + 10]
                tasks.append(delete_objects(s3_client, bucket_name, batch))
            await asyncio.gather(*tasks)
            await asyncio.sleep(1)  # optional delay between batches

if __name__ == "__main__":
    aws_region = input("Enter the AWS region: ")
    aws_access_key_id = input("Enter your AWS Access Key ID: ")
    aws_secret_access_key = input("Enter your AWS Secret Access Key: ")
    prefix = ""

    for bucket_name in bucket_names:
        asyncio.run(batch_delete(bucket_name, prefix, aws_access_key_id, aws_secret_access_key, aws_region))

How to Run the Script

  1. Save the code to a file, e.g., s3_delete.py.
  2. Install the required library using pip: pip3 install aioboto3.
  3. Run the script: python3 s3_delete.py. The script will prompt you to enter your AWS region, access key ID, and secret access key. Ensure you have the necessary permissions to list and delete objects in the specified S3 buckets.

Explanation

  1. Listing Objects: The list_objects function uses aioboto3 to asynchronously paginate through the objects in the specified S3 bucket and prefix, collecting their keys.
  2. Deleting Objects: The delete_objects function sends a batch delete request to S3, simultaneously deleting up to 10 objects.
  3. Batch Deletion: The batch_delete function manages the asynchronous listing and deletion process. It creates a session with AWS credentials, lists objects in batches and deletes them concurrently.

By using asyncio and aioboto3, this script efficiently handles the deletion of large numbers of files from S3, significantly speeding up the process compared to a synchronous approach.

Wrapping Up

Python and asyncio provide powerful tools for managing and automating tasks involving large-scale data operations. This script demonstrates how to leverage these tools to delete files from AWS S3 quickly and efficiently. Whether dealing with lots of data or smaller datasets, this approach can help you save time and resources.

Feel free to adapt and expand this script to suit your specific needs. Happy coding!

You May Also Be Interested In

About Anto Online

Anto, a seasoned technologist with over two decades of experience, has traversed the tech landscape from Desktop Support Engineer to enterprise application consultant, specializing in AWS serverless technologies. He guides clients in leveraging serverless solutions while passionately exploring cutting-edge cloud concepts beyond his daily work. Anto's dedication to continuous learning, experimentation, and collaboration makes him a true inspiration, igniting others' interest in the transformative power of cloud computing.

View all posts by Anto Online

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.