Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
tchaton committed Feb 21, 2024
1 parent 014d75a commit 0d18e1b
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,30 @@ Here is an illustration showing how the `StreamingDataset` works.

![An illustration showing how the Streaming Dataset works.](https://pl-flash-data.s3.amazonaws.com/streaming_dataset.gif)


# 🚀 Benchmarks

[Imagenet-1.2M](https://www.image-net.org/) is a famous dataset used to compare Computer Vision deep learning models. Its train dataset contains 1281167 images. Get the full benchmark details [here](https://lightning.ai/lightning-ai/studios/benchmark-cloud-data-loading-libraries).

### Imagenet-1.2M Streaming from AWS S3

| Framework | Images / sec 1st Epoch (float32) | Images / sec 2nd Epoch (float32) | Images / sec 1st Epoch (torch16) | Images / sec 2nd Epoch (torch16) |
|---|---|---|---|---|
| PL Data | 5800.34 | 6589.98 | 6282.17 | 7221.88 |
| Web Dataset | 3134.42 | 3924.95 | 3343.40 | 4424.62 |
| Mosaic ML | 2898.61 | 5099.93 | 2809.69 | 5158.98 |


### Imagenet-1.2M Preparation

The dataset underlying format needs to be changed to be optimized for data cloud streaming. We measure how long it takes to convert the 1.2 million images.

| Framework |Train Preparation Time | Val Preparation Time | Dataset Size | # Files |
|---|---|---|---|---|
| PL Data | 10:05 min | 00:30 min | 143.1 GB | 2.339 |
| Web Dataset | 32:36 min | 01:22 min | 147.8 GB | 1.144 |
| Mosaic ML | 49:49 min | 01:04 min | 143.1 GB | 2.298 |

# 🎬 Getting Started

## 💾 Installation
Expand Down

0 comments on commit 0d18e1b

Please sign in to comment.