Skip to content

ethz-spylab/lm-extraction-benchmark-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Datasets for the SATML 2023 challenge on training data extraction

This repository contains the raw datasets for the Training Data Extraction Challenge organized at SaTML 2023.

The main repository provides the challenge data as a list of pointers into The Pile.

To save participants the need for downloading and decompressing 800GB of text, you can find the raw numpy files here:

Train

Val

Will be added once the validation set is released.

Test

Will be added once the validation set is released.

About

Datasets for the SATML 2023 competition on training data extraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published