Constrained Sentence Generation via Metropolis-Hastings Sampling

Introduction

CGMH is a sampling based model for constrained sentence generation, which can be used in keyword-to-sentence generation, paraphrase, sentence correction and many other tasks.

Examples

Running example for parahrase: (All rejected proposal is omitted)
what movie do you like most . ->
which movie do you like most . (replace what with which) ->
which movie do you like . (delete most) ->
which movie do you like best . (insert best) ->
which movie do you think best . (replace like with think) ->
which movie do you think the best . (insert the) ->
which movie do you think is the best . (insert is)
Running example for sentence correction: in the word oil price very high right now . ->
in the word , oil price very high right now . (insert ,) ->
in the word , oil prices very high right now . (replace price with prices) ->
in the word , oil prices are very high right now . (insert are)
Extra Examples for sentence correction:
origin: even if we are failed , we have to try to get a new things .->
generated: even if we are failing , we have to try to get some new things .

origin: in the word oil price very high right now .->
generated: in the word , oil prices are very high right now .

origin: the reason these problem occurs is also becayse of the exam .->
generated: the reason these problems occur is also because of the exam .

Requirement

python
- ==2.7
python packages
- TensorFlow == 1.3.0 (other versions are not tested)
- numpy
- pickle
- Rake (pip install python-rake)
- zpar (pip install python-zpar, download model file from https://github.com/frcchang/zpar/releases/download/v0.7.5/english-models.zip and extract it to POS/english-models)
- skipthoughts (needed only when config.sim=='skipthoughts')
- en (get en from https://www.nodebox.net/code/index.php/Linguistics and put under liguistics/)
word embedding
- If you want to try using word embedding for paraphrase, you should download or train a word embedding first and place it at config.emb_path and set config.emb_path='word_max'.

Language model download

For a pretrained language model, please download the following file and extract it under model/.
Correction and key-gen: https://drive.google.com/open?id=1L3q-xGD3lHNETfibERTIh-ciCXmzRs3i
Paraphrase: https://drive.google.com/open?id=1kTjnqO69CjwpBXwPtOPT6v7Ur7ro5nRR. Please put the .pkl file under data/quora

Word embedding download

For a pretrained word embedding, please download the following file.
Correction and key-gen: https://drive.google.com/open?id=1q79Dvrx3eapffHL4ApfrT0XpOgm3sKKF. Please put the .pkl file under data/1-billion
Paraphrase: https://drive.google.com/open?id=1ggEdFyLIrr9sjfG1SHxjyHgOYNKy3ySE. Please put the .pkl file under data/quora

Running

Training language models
- For each task, first train a backward and a language model:
  set mode='forward' and mode='backward' in config.py successively.
  run python correction.py / paraphrase.py / key-gen.py to train each model.
Generation
- For generating new sample for each tasks:
  set mode='use' and choose proper parameter in config.py.
  give inputs in 'input/input.txt' run python correction.py / paraphrase.py / key-gen.py to generate.
  outputs are in output.
Details
- Make sure that paths for package and data are correctly set in 'config.py'.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
correction		correction
data		data
key_gen		key_gen
paraphrase		paraphrase
utils/dict_emb		utils/dict_emb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constrained Sentence Generation via Metropolis-Hastings Sampling

Introduction

Examples

Requirement

Language model download

Word embedding download

Running

About

Releases

Packages

Languages

NingMiao/CGMH

Folders and files

Latest commit

History

Repository files navigation

Constrained Sentence Generation via Metropolis-Hastings Sampling

Introduction

Examples

Requirement

Language model download

Word embedding download

Running

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages