Skip to content

mediatechnologycenter/CHeeSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHeeSE

Dataset and baselines presented in the paper "Towards Stance Detection in German News Articles". The dataset consists of debate questions and news articles that are matched and annotated for stance detection. It contains ~2000 new articles in German from the Swiss news papers "NZZ", "NZZ am Sonntag" and "Blick" that are matched with 91 debate questions. On average each article gets matched with ~1.9 questions resulting in ~3800 stance-annotated article-question pairs. Moreover, all articles and each of their paragraphs are annotated with emotions, thus, enabling emotion detection at an article or paragraph level. We believe, this hand-annotated dataset enables research in many interesting areas, such as multi-task learning, transfer learning, emotion flow in news articles, etc. The reprository also contains the code to reproduce the stance detection baselines of the paper, namely the one using fastText and the deep learning approach with a pretrained transformer (BERT) using Huggingface.

Dataset Structure

All the data is contained within the file data/cheese.json. The data is structured as following:

    {
        "title": "...",
        "snippet": "...",
        "paragraphs": [
            {
                "text": "...", 
                "paragraph_emotion": ["Angst", "Traurigkeit"],
            },
            {
                "text": "...", 
                "paragraph_emotion": ["Antizipation"],
            }
        ],
        "article_emotion": ["Angst", "Überraschung"],
        "article_stance": [
            {    
                "question_id": 10,
                "question": "...",
                "stance": "Nein, dagegen",
                "selection_stage": 1,
                "selection_rank": "gold",
                "general_area_of_interest": "...",
                "target_topic": "...",
            },
            {    
                "question_id": 53,
                "question": "...",
                "stance": "Kein Bezug",
                "selection_stage": 2,
                "selection_rank": "silver",
                "general_area_of_interest": "...",
                "target_topic": "...",
            }
        ],
        "article_id": "...",
        "source": "NZZ",
        "date": "...",
    }
    {
        ...
    }

The possible choices for stance are:

  • "Kein Bezug" (unrelated)
  • "Diskutierend" (discussing)
  • "Ja, dafür" (in favor)
  • "Nein, dagegen" (against)
  • "Unklar" (unclear)

The possible choices for article/paragraph emotion are:

  • "Angst" (fear)
  • "Antizipation" (anticipation)
  • "Ärger" (anger)
  • "Ekel" (disgust)
  • "Freude" (joy)
  • "Keine" (none)
  • "Traurigkeit" (sadness)
  • "Überraschung" (surprise)
  • "Vertrauen" (trust)
  • "Unklar" (unclear)

Baselines

The paper presents two baselines for stance detection:

  • Fasttext classifier
  • BERT classifier

Data

First, downlaod the data from https://projects.mtc.ethz.ch/cheese-data and place the cheese.json file into CHeeSE/data.

Setup

We recommend using a python environment to install all the python packages required for this project. Once setup, the packages can be installed with:

    pip3 install -r requirements.txt

System settings of the machine used for the results in the paper:

  • CPU: Intel(R) Core(TM) i7-8700 @ 3.20GHz
  • GPU: GeForce RTX 2070
  • Python Version: Python 3.6
  • CUDA Version: 11.2

Fasttext Baseline

The Fasttext baseline can be reproduced by running the script fasttext_baseline.py within the baselines/stance_detection directory with the following command:

    cd baselines/stance_detection
    python3 fasttext_baseline.py

Bert Baseline

The Bert baseline can be reproduced by running the script bert_baseline.py within the baselines/stance_detection directory with the following command and using the provided config file:

    cd baselines/stance_detection
    python3 bert_baseline.py --config bert_baseline_config.json

Reference

The dataset and baseline models are presented in:

https://aclanthology.org/2021.fever-1.8.pdf

When using the CHeeSE data set for research purpose, please cite:

    @inproceedings{mascarell-etal-2021-stance,
        title = "Stance Detection in {G}erman News Articles",
        author = "Mascarell Laura, Ruzsics Tatyana, Schneebeli Christian, Schlattner Philippe, Campanella Luca, Klingler Severin, Kadar Cristina",
        booktitle = "Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER)",
        month = nov,
        year = "2021",
        address = "Dominican Republic",
        publisher = "Association for Computational Linguistics",
        url = "https://aclanthology.org/2021.fever-1.8",
        pages = "66--77"
    }

Acknowledgements

This project is supported by Ringier, TX Group, NZZ, SRG, VSM, viscom, and the ETH Zurich Foundation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages