The link of RCV1 dataset is invalid #5

AppleXY · 2020-03-10T13:47:49Z

Hi, when I got into the link of the RCV dataset, I found "404 not found", could you provide another link of the RCV dataset? If possible could you provide other datasets in your paper. It's a little hard for me to understand the code without the dataset. Thank you very much!

YipingNUS · 2020-08-19T03:49:23Z

You can know the format of the data by looking at the load_data method.

In the line, you see the data is pickle files containing four attributes (the last two are never used and can thus ignore).

[train, test, vocab, catgy] = pickle.load(fin)

Then looking at the load_data_and_labels method, you see the train/test data are a list of document dicts with key 'text' for the plain text document and 'catgy' for the label.

There's another closed issue providing a link to some other datasets used in the paper.

purviprajapati196 · 2021-05-08T05:33:58Z

Please provide .p file for eurlex, wiki10, amazonCat datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The link of RCV1 dataset is invalid #5

The link of RCV1 dataset is invalid #5

AppleXY commented Mar 10, 2020

YipingNUS commented Aug 19, 2020

purviprajapati196 commented May 8, 2021

The link of RCV1 dataset is invalid #5

The link of RCV1 dataset is invalid #5

Comments

AppleXY commented Mar 10, 2020

YipingNUS commented Aug 19, 2020

purviprajapati196 commented May 8, 2021