Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The link of RCV1 dataset is invalid #5

Open
AppleXY opened this issue Mar 10, 2020 · 2 comments
Open

The link of RCV1 dataset is invalid #5

AppleXY opened this issue Mar 10, 2020 · 2 comments

Comments

@AppleXY
Copy link

AppleXY commented Mar 10, 2020

Hi, when I got into the link of the RCV dataset, I found "404 not found", could you provide another link of the RCV dataset? If possible could you provide other datasets in your paper. It's a little hard for me to understand the code without the dataset. Thank you very much!

@YipingNUS
Copy link

You can know the format of the data by looking at the load_data method.

In the line, you see the data is pickle files containing four attributes (the last two are never used and can thus ignore).

[train, test, vocab, catgy] = pickle.load(fin)

Then looking at the load_data_and_labels method, you see the train/test data are a list of document dicts with key 'text' for the plain text document and 'catgy' for the label.

There's another closed issue providing a link to some other datasets used in the paper.

@purviprajapati196
Copy link

Please provide .p file for eurlex, wiki10, amazonCat datasets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants