Skip to content

A library of CPAE (Consistency Penalized AutoEncoder) implemented in PyTorch

Notifications You must be signed in to change notification settings

iamkissg/cpae-pytorch

Repository files navigation

CPAE-PyTorch

CPAE-PyTorch is a library of CPAE (Consistency Penalized AutoEncoder) re-implemented with PyTorch. The model is introduced by EMNLP 2018 paper "Auto-Encoding Dictionary Definitions into Consistent Word Embeddings", and its original implementation could be found here.

Installation

This repo is developed on Python 3.6, PyTorch 1.0.0 and AllenNLP 0.8.5.

You may create experimental environment by conda as follows:

conda env create -f=environment.yml

Or, install dependencies step by step:

conda create -n cpae-pytorch python=3.6
conda activate cpae-pytorch
conda install pytorch=1.0 cudatoolkit=9.0 -c pytorch
pip install allennlp jsonlines

Train

The default configuration is provided in training_config, which you can play with. You may change alpha (autoencoding coefficient) and beta (consistency penalty coefficient) to switch between AutoEncoder and Consistency Penalized AutoEncoder, or provide pre-trained word embedding as a way to improve it.

As the implementation is based on AllenNLP, a configurable flexible library, you can change any component with its counterpart component, add helpful components, or delete useless components.

For convenience and fair comparison, we include en_wn_full_all.jsonl and vocab.txt in data directory, which are generated by the original CPAE code.

To train a model, just run as follows:

allennlp train -s path/to/serialization/directory training_config/cpae.jsonnet --include-package cpae

Generate definition embeddings using AllenNLP's predictor:

allennlp predict path/to/serialization/directory/model.tar.gz data/en_wn_full_all.jsonl --output-file path/to/serialization/directory/definition_embeddings.txt --include-package cpae --predictor cpae_definition_embedding_generator --batch-size 32 --cuda 0 --silent
sed -i 's/^"//g' path/to/serialization/directory/definition_embeddings.txt
sed -i 's/"$//g' path/to/serialization/directory/definition_embeddings.txt

After generating definition embeddings (glove format, i.e., no first line), they can be evaluated or used just like usual word embeddings.

Comparison with the original implementation

We compare our re-implemented models with the original models using the included word-embeddings-benchmarks toolkit (the original version of toolkit can be found here).

As we can see, the models achieve comparable, sometimes better performance as the originals.

Model MEN-dev MEN-test MTurk RG65 RW SCWS SimLex333 SimLex999 SimVerb3500-dev SimVerb3500-test WS353 WS353R WS353S AP BLESS Battig ESSLI_1a ESSLI_2b ESSLI_2c Google MSR SemEval2012_2
our AE (alpha=1, beta=0) 0.399109683 0.44381856 0.374776443 0.520243471 0.186448245 0.495065492 0.253624435 0.368178852 0.357852756 0.349119334 0.430635419 0.292890592 0.55375016 0.514925373 0.59 0.228445804 0.545454545 0.7 0.444444444 0.083862055 0.1 0.128368539
original AE (alpha=1, beta=0) 0.384803476 0.424013127 0.374223152 0.596125059 0.141162454 0.47554452 0.26243494 0.334538441 0.367640014 0.331242873 0.407243453 0.26709243 0.526226658 0.480099502 0.515 0.225960619 0.568181818 0.675 0.511111111 0.088518215 0.1045 0.117133135
our CPAE (alpha=1, beta=8) 0.498663069 0.496606982 0.433008813 0.634411542 0.256603718 0.551788864 0.259022761 0.394054538 0.425242418 0.368174528 0.543721278 0.440885165 0.634893993 0.509950249 0.5 0.243356911 0.590909091 0.725 0.466666667 0.025890299 0.047125 0.129653634
original CPAE (alpha=1, beta=8) 0.498157962 0.495570312 0.434743114 0.556321716 0.234406662 0.537071954 0.242319671 0.387031863 0.415217566 0.347100864 0.480991963 0.382172741 0.5842947 0.509950249 0.47 0.240298222 0.613636364 0.75 0.577777778 0.016373312 0.030875 0.117190979
our CPAE (alpha=1, beta=64, word2vec) 0.660632874 0.668232132 0.542060783 0.811922197 0.324839691 0.627628157 0.346681441 0.471233914 0.484940154 0.435970855 0.600053185 0.478884821 0.709479011 0.641791045 0.67 0.319441789 0.772727273 0.75 0.577777778 0.027629963 0.04625 0.183607132
original CPAE (alpha=1, beta=64, word2vec, reported in paper) 0.651 0.638 0.615 0.72 - 0.604 0.309 0.458 0.441 0.423 0.613 - - - - - - - - - - -

(The original models corespond to s2sg_w2v_defs_1_pen0 and s2sg_w2v_defs_1_pen8 configurations respectively.)

About

A library of CPAE (Consistency Penalized AutoEncoder) implemented in PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published