Skip to content

Latest commit

 

History

History
64 lines (51 loc) · 2.44 KB

README.md

File metadata and controls

64 lines (51 loc) · 2.44 KB

CS_Probing

The Official data repository for COLING 2022 paper "Are Visual-Linguistic Models Commonsense Knowledge Bases?".

We release the two datasets used in the commonsense knowledge probing experiemnts:

(1) CWWV_IMG (2) CWWV_CLIP.

Datasets Construction Pipeline of CWWV_IMG

Overview of CWWV_IMG Dataset Construction Pipeline

  1. CWWV_IMG (Download):

    is automatically generated by following the procedures proposed by Ma et al. (2019).

    Additionaly, we rely on an effcient image retrieval process to compensate the missing image sources (please refer to our paper for details).

    Dimensions Counts
    part-whole 1,165
    taxonomic 1,323
    distinctness 828
    similarity 644
    quality 1,840
    utility 2,090
    creation 100
    temporal 1,889
    spatial 1,599
    desire 1,781
    total 13,259
  2. CWWV_CLIP (Download):

    is a subset of CWWV_IMG that contains higher quality of image-word pair according to CLIPScore.

    Dimensions Counts
    part-whole 170
    taxonomic 85
    distinctness 86
    similarity 188
    quality 143
    utility 120
    creation 8
    temporal 154
    spatial 144
    desire 91
    total 1,189

Citation

If you find this dataset useful for your research, please cite:

@inproceedings{
  yang-2022,
  title={Are Visual-Linguistic Models Commonsense Knowledge Bases?},
  author={Hsiu-Yu Yang and 
          Carina Silberer},
  booktitle={Proceedings of the 29th International Conference on Computational
             Linguistics, {COLING} 2022, Gyeongju, Republic of Korea, October 12-17,
             2022},
  year={2022}
}