CPAE-PyTorch

CPAE-PyTorch is a library of CPAE (Consistency Penalized AutoEncoder) re-implemented with PyTorch. The model is introduced by EMNLP 2018 paper "Auto-Encoding Dictionary Definitions into Consistent Word Embeddings", and its original implementation could be found here.

Installation

This repo is developed on Python 3.6, PyTorch 1.0.0 and AllenNLP 0.8.5.

You may create experimental environment by conda as follows:

conda env create -f=environment.yml

Or, install dependencies step by step:

conda create -n cpae-pytorch python=3.6
conda activate cpae-pytorch
conda install pytorch=1.0 cudatoolkit=9.0 -c pytorch
pip install allennlp jsonlines

Train

The default configuration is provided in training_config, which you can play with. You may change alpha (autoencoding coefficient) and beta (consistency penalty coefficient) to switch between AutoEncoder and Consistency Penalized AutoEncoder, or provide pre-trained word embedding as a way to improve it.

As the implementation is based on AllenNLP, a configurable flexible library, you can change any component with its counterpart component, add helpful components, or delete useless components.

For convenience and fair comparison, we include en_wn_full_all.jsonl and vocab.txt in data directory, which are generated by the original CPAE code.

To train a model, just run as follows:

allennlp train -s path/to/serialization/directory training_config/cpae.jsonnet --include-package cpae

Generate definition embeddings using AllenNLP's predictor:

allennlp predict path/to/serialization/directory/model.tar.gz data/en_wn_full_all.jsonl --output-file path/to/serialization/directory/definition_embeddings.txt --include-package cpae --predictor cpae_definition_embedding_generator --batch-size 32 --cuda 0 --silent
sed -i 's/^"//g' path/to/serialization/directory/definition_embeddings.txt
sed -i 's/"$//g' path/to/serialization/directory/definition_embeddings.txt

After generating definition embeddings (glove format, i.e., no first line), they can be evaluated or used just like usual word embeddings.

Comparison with the original implementation

We compare our re-implemented models with the original models using the included word-embeddings-benchmarks toolkit (the original version of toolkit can be found here).

As we can see, the models achieve comparable, sometimes better performance as the originals.

Model	MEN-dev	MEN-test	MTurk	RG65	RW	SCWS	SimLex333	SimLex999	SimVerb3500-dev	SimVerb3500-test	WS353	WS353R	WS353S	AP	BLESS	Battig	ESSLI_1a	ESSLI_2b	ESSLI_2c	Google	MSR	SemEval2012_2
our AE (alpha=1, beta=0)	0.399109683	0.44381856	0.374776443	0.520243471	0.186448245	0.495065492	0.253624435	0.368178852	0.357852756	0.349119334	0.430635419	0.292890592	0.55375016	0.514925373	0.59	0.228445804	0.545454545	0.7	0.444444444	0.083862055	0.1	0.128368539
original AE (alpha=1, beta=0)	0.384803476	0.424013127	0.374223152	0.596125059	0.141162454	0.47554452	0.26243494	0.334538441	0.367640014	0.331242873	0.407243453	0.26709243	0.526226658	0.480099502	0.515	0.225960619	0.568181818	0.675	0.511111111	0.088518215	0.1045	0.117133135
our CPAE (alpha=1, beta=8)	0.498663069	0.496606982	0.433008813	0.634411542	0.256603718	0.551788864	0.259022761	0.394054538	0.425242418	0.368174528	0.543721278	0.440885165	0.634893993	0.509950249	0.5	0.243356911	0.590909091	0.725	0.466666667	0.025890299	0.047125	0.129653634
original CPAE (alpha=1, beta=8)	0.498157962	0.495570312	0.434743114	0.556321716	0.234406662	0.537071954	0.242319671	0.387031863	0.415217566	0.347100864	0.480991963	0.382172741	0.5842947	0.509950249	0.47	0.240298222	0.613636364	0.75	0.577777778	0.016373312	0.030875	0.117190979
our CPAE (alpha=1, beta=64, word2vec)	0.660632874	0.668232132	0.542060783	0.811922197	0.324839691	0.627628157	0.346681441	0.471233914	0.484940154	0.435970855	0.600053185	0.478884821	0.709479011	0.641791045	0.67	0.319441789	0.772727273	0.75	0.577777778	0.027629963	0.04625	0.183607132
original CPAE (alpha=1, beta=64, word2vec, reported in paper)	0.651	0.638	0.615	0.72	-	0.604	0.309	0.458	0.441	0.423	0.613	-	-	-	-	-	-	-	-	-	-	-

(The original models corespond to s2sg_w2v_defs_1_pen0 and s2sg_w2v_defs_1_pen8 configurations respectively.)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cpae		cpae
data		data
training_config		training_config
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
extract_token_embedder_embeddings.py		extract_token_embedder_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPAE-PyTorch

Installation

Train

Comparison with the original implementation

About

Releases

Packages

Languages

iamkissg/cpae-pytorch

Folders and files

Latest commit

History

Repository files navigation

CPAE-PyTorch

Installation

Train

Comparison with the original implementation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages