WIP

OFAI · Apr 9, 2024 · d33883c · d33883c
1 parent fafa166
commit d33883c
Show file tree

Hide file tree

Showing 5 changed files with 107 additions and 34 deletions.
diff --git a/site/announcement.md b/site/announcement.md
@@ -1,6 +1,2 @@
-* Text to use for announcing the shared task by email or in social media.
-* may contain a copy of [overview](overview.md) text
-
-## Text
 
 
diff --git a/site/closed-track.md b/site/closed-track.md
@@ -1 +1,15 @@
-In the closed tracks, participants agree to use only the annotated data provided within this task to develop their model. No (i) additional data labelled for sexism or misogyn or (ii) additional models trained on data labelled for sexism or misogyny are allowed. Participants having made at least one submission in a closed track during the Test Phase will be invited to submit a paper for the Shared Task at KONVENS 2024 describing their system. If participants also made at least one submission for an open track, they can also include a comparison of their approaches in the paper. 
+# Closed Track Competition
+
+There is a _Closed Track_ competition for each of the two subtasks. Please note the following:
+
+* The closed track competitions are the main competitions: in order to submit a paper that describes your approach, you have to submit to the closed track of 
+  one or both of subtask 1 and subtask 2.
+* If you have submitted to the closed track of one or both subtasks, you can also include information about your open track approach in your paper submission.
+* IMPORTANT: In the closed tracks, participants agree to use **only** the annotated data provided within this task to develop their model. More specifically:
+  * the use of additional data labelled for sexism or misogyn is not allowed
+  * the use of pretrained models or embeddings trained on data labelled for sexism or misogyny is not allowed
+  * the use of other models, ontologies, knowledge bases or similar that contains specific knowledge about sexism / misogyny is not allowed
+  * pretrained models like BERT or embeddings are allowed as long as they have not been specifically pre-trained or fine-tuned on sexism/misogyny-specific data other than the data shared for this competition
+* If in doubt if your approach is compatible with the closed track requirements, please ask in the competition forum or send an email to the 
+  organizers. If you send an email to the organizers you can include information which you might not want to share in the forum which the organizers will 
+  keep confidential. 
diff --git a/site/index.md b/site/index.md
@@ -43,24 +43,27 @@ The shared task is divided into two subtasks:
 
 ## Closed and open tracks
 
-Each of the [subtask 1](subtask1.md) and [subtask 2](subtask2.md) competitions 
+Each of the [subtask 1](subtask1.html) and [subtask 2](subtask2.md) competitions 
 are organized into two different tracks:
 
 * [Closed Track](closed-track.md): in this track, models can only be trained with the provided training set. Models are limited as to what kind of data for pretraining is allowed. Only the closed track counts towards the competition of the shared task and a closed track submission is required for the submission of a paper. See the linked document for details.
-* [Open Track](open-track.md): in this track, anything goes really: you can use language models, use your own training data (but you have to share it with the community) or use other interesting approaches. The open track does NOT count towards the competition ranking but has been added to allow for the exploration of interesting strategies which may be hard to reproduce. 
+* [Open Track](open-track.md): in this track, anything goes really: you can use language models, use your own training data (but you have to share it with the community) or use other interesting approaches. The open track does NOT count towards the competition ranking but has its own leader board and has been added to allow for the exploration of interesting strategies which may be hard to reproduce. 
 
-
-
 ## Timeline
 
-* **Development phase**: April 14 - May 17, 2024
-* **Testing phase**: May 18 - June 12, 2024
-* **Evaluation phase**: June 13 - June 25, 2024
+* **Development phase**: April 14 - June 12, 2024
+  * During the development phase, a labeled training set and an unlabeled development set are made available. You can upload the labels for the development set
+    to the competition site and will see the ranking of your submission on the leaderboard.
+* **Competition phase**: June 13 - June 25, 2024
+  * During the competition phase, the labeled training and development set is released and an unlabeled test set is made available. You can upload the labels for 
+    the test set to the competition site and your most last submission will be the one that will be used for ranking. During that phase, the leaderboard is not shown. 
+    The final leaderboard/ranking is shown after the end of the competition phase. 
 * **Paper submission due**: July 1, 2024
 * **Camera ready due**: July 20, 2024
 * **Shared Task @KONVENS**: 9 September, 2024
 
 ## Organizers
+
 The task is organized by the [**Austrian Research Institute for Artificial Intelligence (OFAI)**](https://ofai.at). The organizing team are:
 
 * [Brigitte Krenn](https://www.ofai.at/~brigitte.krenn/) (brigitte.krenn (AT) ofai.at)

diff --git a/site/open-track.md b/site/open-track.md
@@ -1 +1,11 @@
-In the open tracks, participants are encouraged to use additional data or models trained on labelled data. These additional labelled data, embeddings or models need to be open source and provided by the participants upon request and if possible via the submission page. Participants submitting in open tracks are only invited to submit a paper for the Shared Task at KONVENS 2024 describing their system, if they also made a submission in a closed track during the Test Phase and compare their approaches in the paper. Due to reproducibility issues, e.g. when including generative LLM such as GPT 3.5, we do not accept papers who solely present approaches for the open tracks.  
+# Closed Track Competition
+
+
+There is an _Olosed Track_ competition for each of the two subtasks. Please note the following:
+
+* In the open tracks, participants are encouraged to use whatever approach they prefer
+* Additional labeld data or models or embeddings trained on labelled data are allowed.
+  * HOWEVER Additional labelled data, embeddings or models must be publically available as open source or with a creative-commons license 
+* IMPORTANT: Participants submitting in open tracks are only invited to submit a paper for the Shared Task at KONVENS 2024 describing their system, if they also made a submission in a closed track during the Competition Phase. 
+* Due to reproducibility issues, e.g. when including results from commercial or closed-source models we do not accept papers which solely present approaches for the open tracks.
+* We do look forward however to find out how the results in the open tracks will compare to the closed track results.
diff --git a/site/subtask1.md b/site/subtask1.md
@@ -1,34 +1,84 @@
-# Submission Instructions
-## How to participate
+# Subtask 1 
+
+In subtask 1 the goal is to predict labels for each text in a dataset where the labels are derived from the original 
+labels assigned by several human annotators. 
+
+The human annotators assigned (according to the [annotation guidelines](guidelines.md) ) 
+the strength of misogyny/sexism present in the given text via the following labels:
+
+* `0-Kein`: no sexism/misogyny present
+* `1-Gering`: mild sexism/misogyny
+* `2-Vorhanden`: sexism/misogyny present
+* `3-Stark`: strong sexism/misogyny
+* `4-Extrem`: extreme sexism/misogyny
+
+While the annotation guidelines define what kind of sexism/misogyny should get annotated, there has been made no attempt to 
+give rules about how to decide on the strength. For this reason, if an annotator decided that sexism/misogyny is present in a text,
+the strength assigned is a matter of personal judgement.
+
+The labels to predict in subtask one reflect different strategies for how multiple labels from annotators can be use to derive a final
+target label:
+
+* `bin_maj`: predict `1` if a majority of annotators assigned a label other than `0-Kein`, predict `0` if a majority of annotators assigned a label 
+  `0-Kein`. If there was no majority, then both the label `1` and `0` will count as correct in the evaluation.
+* `bin_one`: predict `1` if at least one annotator assigned a label other than `0-Kein`, `0` otherwise
+* `bin_all`: predict `1` if all annotators assigned labels other than `0-Kein`, `0` otherwise
+* `multi_maj`: predict the majority label if there is one, if there is no majority label, any of the labels assigned is counted as a correct prediction for evaluation
+* `disagree_bin`: predict `1` if there is disagreement between annotators on `0-Kein` versus all other labels and `0` otherwise
+
+
+## Data
 
 For the development phase of subtask 1, we provide all participants with the following data:
-* the labeled training set containing 'id', 'text', and 'annotations'
-* the unlabeled dev set containing 'id' and 'annotations'
+* the labeled training set containing 'id', 'text', and 'annotations' (annotator ids and the label assigned by them)
+* the unlabeled dev set containing 'id', 'text' and 'annotators' (annotator ids)
+
+Both files are in JSONL format (one JSON-serialized object per line) where each object is a dictionary with the following 
+fields:
+
+* `id`: a hash that identifies the example
+* `text`: the text to classify. The text can contain arbitrary Unicode and new lines 
+* `annotations` (only in the labeled dataset): an array of dictionaries which contain the following key/value pairs:
+  * `user`: a string in the form "A003" which is an anonymized id for the annotator who assigned the label
+  * `label`: the label assigned by the annotator
+  * Note that the number of annotations and the specific annotators who assigned labels vary between examples
+* `annotators` (only in the unlabeled dataset): an array of annotator ids who labeled the example
+
+You can [download](download.md) the labeled and unlabeled data for the development phase and for the competition phase.
+
+
+## Submission
 
-You can download the data [add-link](link-tbd)
+Your submission must be a file in TSV (tab separated values) format which contains the following columns in any order:
 
-**note**: do we provide example submissions?
+* `id`: the id of the example in the unlabeled dataset for which the predictions are submitted
+* `bin_maj`: prediction of `0` or `1`
+* `bin_one`: prediction of `0` or `1`
+* `bin_all`: prediction of `0` or `1`
+* `multi_maj`: prediction of one of `0-Kein`, `1-Gering`, `2-Vorhanden`, `3-Stark`, `4-Extrem`
+* `disagree_bin`: predictiction of `1` or `0`
 
-**Goal** of subtask 1 is to solve 4 binary classification tasks on the data and to predict the majority label.
+Note that the way how you derive those labels is up to you (as long as the rules for the closed or open tracks are followed):
 
-For each submission:
-* save your predictions to a separate csv file. The file needs to contain the following columns:
-  * 'id': the unique ID of each text, as specified in the dev/test data
-  * 'bin_maj': predict 1 if a majority of annotators assigned non-0 (scores 1 - 4), predict 0 if a majority of annotators assigned 0
-  * 'bin_one': predict 1 if at least one annotator assigned non-0, 0 otherwise
-  * 'bin_all': predict 1 if all annotators assigned non-0
-  * 'multi_maj': predict the majority label if there is one
-  * 'disagree_bin': predict 1 if there is disagreement between annotators on 0 vs non-0
-* compress this csv file into a zip file.
-* under My Submissions, fill out the submission form and submit the zip file.
+* you can train several models or a single model to get the predictions
+* you can derive the mode-specific training set in any way from the labeled training data
+* you can use the information of which annotator assigned the label or ignore that
 
-**note**: do we want the data in .csv format?
+To submit your predictions to the competition:
 
-For the Development Phase, multiple submissions are allowed and they serve the purpose of developing the model.
+* the file MUST have the file name extension `.tsv` 
+* the TSV file must get compressed into a ZIP file with extension `.zip`
+* the ZIP file should then get uploaded as a submission to the correct competition.
+* !! Please make sure you submit to the competition that corresponds to the correct subtask (1 or 2) and correct track (Open or Closed)!
+* under "My Submissions" make sure to fill out the form and:
+  * enter the name of your team which has been registered for the competition
+  * give a name to your method 
+  * confirm that you have checked that you are indeed submitting to the correct competition for the subtask and track desired
 
-For the Test Phase, participants may only submit two times, to allow for a mistake in the first submission. Please note that only the latest valid submission determines the final task ranking.
+## Phases
 
-**note**: for EDOS, they restricted the submission in the test phase to 2. Do we want that as well?
+* For the Development Phase, multiple submissions are allowed and they serve the purpose of developing and improving the model(s).
+* For the Test Phase, participants may only submit a limited number of times. Please note that only the latest valid submission determines the final task ranking.
 
 ## Evaluation