Skip to content

Zak-Hussain/psychProbing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

psychProbing

This repository contains the code for the paper:

Hussain, Z., Mata, R., Newell, B. R., & Wulff, D. U. (2024). Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase. arXiv. https://arxiv.org/abs/2412.04936

@misc{hussain2024probingcontentssemanticrepresentations,
      title={Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase}, 
      author={Zak Hussain and Rui Mata and Ben R. Newell and Dirk U. Wulff},
      year={2024},
      eprint={2412.04936},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.04936}, 
}

Environment setup

  1. To set up the environment, you can use the environment.yml file in the root directory of this repository.
  2. Before running any other code, make sure to run code/setup.py to download/generate the necessary data files. Please note that to reduce the download size of the representations, we have already subsetted them to their intersection with the psychNorms dataset.
  3. For licensing reasons, you will need to manually download SWOW-EN.R100.csv into data/free_assoc/ from the Small World of Words.
  4. To obtain the representations that we trained ourselves, you will need to run the notebooks in code/embed_training/.
  5. Analyses (code/rsa and code/rca) can then be run in the order implied by the numbering of the notebooks.
  6. Finally, figures can be generated by running the notebooks in code/figures/.

Representations

The original sources of the representations are as follows:

Text:

Brain:

Behavior:

  • PPMI_SVD_SWOW ('SWOW-EN18', further processed with PPMI and SVD transformations)
  • SGSoftMaxInput_SWOW ('SWOW-EN18', further processed with Skip-Gram Softmax embedding algorithm)
  • SGSoftMaxOutput_SWOW ('SWOW-EN18', further processed with Skip-Gram Softmax embedding algorithm)
  • PPMI_SVD_SouthFlorida ('Appendix A. The normed cues, their targets and related information', further processed with PPMI and SVD transformations)
  • PPMI_SVD_EAT ('ea-thesaurus.json', further processed with PPMI and SVD transformations)
  • THINGS ('spose_embedding_49d_sorted.txt' and 'items1854names.tsv')
  • feature_overlap ('double_words.csv')
  • norms_sensorimotor ('Lancaster_sensorimotor_norms_for_39707_words.csv')
  • compo_attribs ('word_ratings.zip')
  • SVD_sim_rel: 'AG203', 'BakerVerb', 'MartinezAldana', 'MC30', 'MEN3000', 'RG65', 'SimLex999', 'SimVerb3500', 'SL7576sem', 'SL7576vis', 'WP300', 'YP130', 'Atlasify240', 'GM30', 'MT287', 'MT771', 'Rel122', 'RW2034', 'WordSim353', 'Zie25', 'Zie30' (datasets were combined, min-max scaled and then processed with SVD transformation).

Note: compo_attribs has been renamed to 'experiential attributes' in the paper and figures to be consistent with the terminolgy in the psychNorms metabase.

Norms

Information on the norms used in our analysis can be found in the psychNorms repository, and in the metadata file in data/psychNorms/psychNorms_metadata.csv.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published