Skip to content

Commit

Permalink
add wikipedia-fa
Browse files Browse the repository at this point in the history
  • Loading branch information
pourmand1376 committed Aug 2, 2023
1 parent aa96100 commit 7e438b3
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
1 change: 1 addition & 0 deletions data/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
"tv_dialogue": "sedthh/tv_dialogue", # TV and Movie dialogues and transcripts
"fd_dialogue": "sedthh/fd_dialogue", # TV and Movie dialogues and transcripts from ForeverDreaming
"tlcv2.0_oa": "pythainlp/tlcv2.0_oa", # Thai classical literature texts
"fa-wikipedia": "pourmand1376/fa-wikipedia", # Farsi Wikipedia texts
}

INSTRUCTION_DATASETS = {
Expand Down
6 changes: 6 additions & 0 deletions data/datasets/fa-wikipedia/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This dataset is crawled from
[farsi wikipedia](https://fa.wikipedia.org/wiki/%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C).
This is valuable clean text data in persian (Farsi). It contains information
about all subjects.

It has 2.53M Articles.

0 comments on commit 7e438b3

Please sign in to comment.