Skip to content

Commit

Permalink
Fix the license term for the dataset (#25)
Browse files Browse the repository at this point in the history
* Update README.md

* Add README to bench/data

* fix

* fix
  • Loading branch information
vbkaisetsu authored Dec 15, 2021
1 parent 051933c commit 1a1bd13
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,8 @@ Licensed under either of

at your option.

For softwares under `bench/data`, follow the license terms of each software.

## Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
Expand Down
8 changes: 8 additions & 0 deletions bench/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Datasets for benchmarking

These datasets are copied from third party repositories.

* `unidic`: [National Institute for Japanese Language and Linguistics](https://ccd.ninjal.ac.jp/unidic/)
* `sherlock.txt`: [Project Gutenberg](https://www.gutenberg.org/ebooks/1661)
* `wagahaiwa_nekodearu.txt`: [Aozora Bunko](https://www.aozora.gr.jp/cards/000148/card789.html)
* `words_100000`: [fst crate](https://github.com/BurntSushi/fst/blob/master/data/words-100000)

0 comments on commit 1a1bd13

Please sign in to comment.