CtrlB 5TB Benchmark Logs Dataset

This is a public benchmark logs dataset hosted on Amazon S3. It is intended for testing log ingestion, search, compression, storage, and observability pipeline performance.

Bucket URL:
https://ctrlb-5tb-benchmark-logs-public.s3.ap-south-1.amazonaws.com
Browse the dataset:

Explore the bucket like the AWS Console — navigate folders and click any file to download it. Open the file browser →

Dataset Structure

logs-benchmark/YYYY/MM/DD/HH/*.log.gz

manifests/
  all-files.txt
  all-files.txt.gz
  samples-1gb.txt
  samples-5gb.txt

samples/
  1GB/
  5GB/

Quick Start: Download Samples

1 GB sample

aws s3 cp \
  s3://ctrlb-5tb-benchmark-logs-public/samples/1GB/ \
  ./ctrlb-logs-sample-1gb \
  --recursive \
  --no-sign-request

5 GB sample

aws s3 cp \
  s3://ctrlb-5tb-benchmark-logs-public/samples/5GB/ \
  ./ctrlb-logs-sample-5gb \
  --recursive \
  --no-sign-request

Download Using Manifests

Manifest files contain direct HTTPS URLs to log files. This works even if public S3 listing is disabled.

curl -O https://ctrlb-5tb-benchmark-logs-public.s3.ap-south-1.amazonaws.com/manifests/all-files.txt.gz

gunzip all-files.txt.gz

while read url; do
  wget -c "$url"
done < all-files.txt

Download Full Dataset

If public bucket listing is enabled, you can use:

aws s3 sync \
  s3://ctrlb-5tb-benchmark-logs-public/logs-benchmark/ \
  ./ctrlb-5tb-benchmark-logs \
  --no-sign-request

If listing is not enabled, use the manifest-based download method above.

File Format

Log files are gzip-compressed:

*.log.gz

To inspect a file:

gunzip -c file.log.gz | head

Recommended Usage