CS_Probing

The Official data repository for COLING 2022 paper "Are Visual-Linguistic Models Commonsense Knowledge Bases?".

We release the two datasets used in the commonsense knowledge probing experiemnts:

(1) CWWV_IMG (2) CWWV_CLIP.

Datasets Construction Pipeline of CWWV_IMG

CWWV_IMG (Download):

is automatically generated by following the procedures proposed by Ma et al. (2019).

Additionaly, we rely on an effcient image retrieval process to compensate the missing image sources (please refer to our paper for details).

Dimensions	Counts
part-whole	1,165
taxonomic	1,323
distinctness	828
similarity	644
quality	1,840
utility	2,090
creation	100
temporal	1,889
spatial	1,599
desire	1,781
total	13,259

CWWV_CLIP (Download):

is a subset of CWWV_IMG that contains higher quality of image-word pair according to CLIPScore.

Dimensions Counts

part-whole 170

taxonomic 85

distinctness 86

similarity 188

quality 143

utility 120

creation 8

temporal 154

spatial 144

desire 91

total 1,189

Citation

If you find this dataset useful for your research, please cite:

@inproceedings{
  yang-2022,
  title={Are Visual-Linguistic Models Commonsense Knowledge Bases?},
  author={Hsiu-Yu Yang and 
          Carina Silberer},
  booktitle={Proceedings of the 29th International Conference on Computational
             Linguistics, {COLING} 2022, Gyeongju, Republic of Korea, October 12-17,
             2022},
  year={2022}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CS_Probing

Datasets Construction Pipeline of CWWV_IMG

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

CS_Probing

Datasets Construction Pipeline of CWWV_IMG

Citation