Skip to content

Latest commit

 

History

History
43 lines (19 loc) · 2.82 KB

README.md

File metadata and controls

43 lines (19 loc) · 2.82 KB

Code and Data for Multi-Document Summarization with Determinantal Point Process Attention.

Datasets

We use the WikiCatSum dataset available here. In particular, for our controlled experiments we use an Oracle (Section 4 in the paper) to rank the input and then truncate it to 500 input tokens.

We use the MultiNews data as preprocessed by Fabbri et al. (2019) (here).

Code and Model Training

Our code extends implementations in OpenNMT (Pointer-Generator and Transformer) here and Fairseq (ConvSeq2Seq) here to use DPP attention.

Evaluation

ROUGE

We use the wrapper script test_rouge.py as used in MultiNews.

BERTScore

We installed BERTScore with pip install bert-score (version 0.3.9). Our script to run BERTScore: run_bertscore.sh (and previous formatting needed by Fariseq outputs is done by running this Python script: format-fairseqout-to-bertscore).

Sentence Movers' Similarity

For the sentence mover's similarity metrics we follow the code here SMS github. Our script to run this metric: run_sms.sh.

Fact_acc MultiNews

We adapt the model proposed in Neural Text Summarization: A Critical Evaluation for multi-document evaluation. Installation instructions and the trained model can be found in FACTCC github. You will need to run format-to-factCC-eval.py to format model outputs as expected, factcc-eval.sh (with updated directory references from factCC/modeling/scripts/) to run the model evaluation, and factCC-summarise-predictions.py to summarise results. Note that we provide our modified version of FactCC run.py.

Fact_acc WikiCatSum

We implement the Fact_acc metric from Assessing the factual accuracy of generated text and use the relation extraction system proposed by (Sorokin and Gurevych, 2017) available at Relation Extraction github. For installation follow instructions provided there. Our script to run this metric is run_relext.sh.

Outputs

Models Outputs