diff --git a/README.md b/README.md index ee41e9bf..4bf2675c 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,30 @@ Here is an illustration showing how the `StreamingDataset` works. ![An illustration showing how the Streaming Dataset works.](https://pl-flash-data.s3.amazonaws.com/streaming_dataset.gif) + +# 🚀 Benchmarks + +[Imagenet-1.2M](https://www.image-net.org/) is a famous dataset used to compare Computer Vision deep learning models. Its train dataset contains 1281167 images. Get the full benchmark details [here](https://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries). + +### Imagenet-1.2M Streaming from AWS S3 + +| Framework | Images / sec 1st Epoch (float32) | Images / sec 2nd Epoch (float32) | Images / sec 1st Epoch (torch16) | Images / sec 2nd Epoch (torch16) | +|---|---|---|---|---| +| PL Data | 5800.34 | 6589.98 | 6282.17 | 7221.88 | +| Web Dataset | 3134.42 | 3924.95 | 3343.40 | 4424.62 | +| Mosaic ML | 2898.61 | 5099.93 | 2809.69 | 5158.98 | + + +### Imagenet-1.2M Preparation + +The dataset underlying format needs to be changed to be optimized for data cloud streaming. We measure how long it takes to convert the 1.2 million images. + +| Framework |Train Preparation Time | Val Preparation Time | Dataset Size | # Files | +|---|---|---|---|---| +| PL Data | 10:05 min | 00:30 min | 143.1 GB | 2.339 | +| Web Dataset | 32:36 min | 01:22 min | 147.8 GB | 1.144 | +| Mosaic ML | 49:49 min | 01:04 min | 143.1 GB | 2.298 | + # 🎬 Getting Started ## 💾 Installation