GitHub - Zak-Hussain/psychProbing

psychProbing

This repository contains the code for the paper:

Hussain, Z., Mata, R., Newell, B. R., & Wulff, D. U. (2024). Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase. arXiv. https://arxiv.org/abs/2412.04936

@misc{hussain2024probingcontentssemanticrepresentations,
      title={Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase}, 
      author={Zak Hussain and Rui Mata and Ben R. Newell and Dirk U. Wulff},
      year={2024},
      eprint={2412.04936},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.04936}, 
}

Environment setup

To set up the environment, you can use the environment.yml file in the root directory of this repository.
Before running any other code, make sure to run code/setup.py to download/generate the necessary data files. Please note that to reduce the download size of the representations, we have already subsetted them to their intersection with the psychNorms dataset.
For licensing reasons, you will need to manually download SWOW-EN.R100.csv into data/free_assoc/ from the Small World of Words.
To obtain the representations that we trained ourselves, you will need to run the notebooks in code/embed_training/.
Analyses (code/rsa and code/rca) can then be run in the order implied by the numbering of the notebooks.
Finally, figures can be generated by running the notebooks in code/figures/.

Representations

The original sources of the representations are as follows:

Text:

CBOW_GoogleNews ('GoogleNews-vectors-negative300.bin.gz')
fastText_CommonCrawl ('crawl-300d-2M.vec.zip')
fastText_Wiki_News('wiki-news-300d-1M.vec.zip)
fastTextSub_OpenSub ('English, en, OpenSubtitles')
GloVe_CommonCrawl ('glove.840B.300d.zip')
GloVe_Twitter ('glove.twitter.27B.zip')
GloVe_Wikipedia ('glove.6B.zip')
LexVec_CommonCrawl ('Word Vectors (2.2GB)')
morphoNLM ('HSMN+csmRNN')
spherical_text_Wikipedia ('300-d')

Brain:

microarray ('results/tungsten/word_projections.pickle')
EEG_speech('cognival-vectors/eeg_speech/naturalspeech_scaled.txt')
EEG_text('cognival-vectors/eeg_text/zuco_scaled.txt')
fMRI_speech_hyper_align('cognival-vectors/fmri/harry-potter/1000-random-voxels/', further processed with 'hyper alignment')
fMRI_text_hyper_align('cognival-vectors/fmri/alice/', further processed with 'hyper alignment')
eye_tracking('cognival-vectors/eye-tracking/all_scaled.txt')

Behavior:

PPMI_SVD_SWOW ('SWOW-EN18', further processed with PPMI and SVD transformations)
SGSoftMaxInput_SWOW ('SWOW-EN18', further processed with Skip-Gram Softmax embedding algorithm)
SGSoftMaxOutput_SWOW ('SWOW-EN18', further processed with Skip-Gram Softmax embedding algorithm)
PPMI_SVD_SouthFlorida ('Appendix A. The normed cues, their targets and related information', further processed with PPMI and SVD transformations)
PPMI_SVD_EAT ('ea-thesaurus.json', further processed with PPMI and SVD transformations)
THINGS ('spose_embedding_49d_sorted.txt' and 'items1854names.tsv')
feature_overlap ('double_words.csv')
norms_sensorimotor ('Lancaster_sensorimotor_norms_for_39707_words.csv')
compo_attribs ('word_ratings.zip')
SVD_sim_rel: 'AG203', 'BakerVerb', 'MartinezAldana', 'MC30', 'MEN3000', 'RG65', 'SimLex999', 'SimVerb3500', 'SL7576sem', 'SL7576vis', 'WP300', 'YP130', 'Atlasify240', 'GM30', 'MT287', 'MT771', 'Rel122', 'RW2034', 'WordSim353', 'Zie25', 'Zie30' (datasets were combined, min-max scaled and then processed with SVD transformation).

Note: compo_attribs has been renamed to 'experiential attributes' in the paper and figures to be consistent with the terminolgy in the psychNorms metabase.

Norms

Information on the norms used in our analysis can be found in the psychNorms repository, and in the metadata file in data/psychNorms/psychNorms_metadata.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
code		code
data		data
figures		figures
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

psychProbing

Environment setup

Representations

Norms

About

Releases

Packages

Languages

Zak-Hussain/psychProbing

Folders and files

Latest commit

History

Repository files navigation

psychProbing

Environment setup

Representations

Norms

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages