CGMH is a sampling based model for constrained sentence generation, which can be used in keyword-to-sentence generation, paraphrase, sentence correction and many other tasks.
-
Running example for parahrase: (All rejected proposal is omitted)
what movie do you like most . ->
which movie do you like most . (replace
whatwith
which) ->
which movie do you like . (delete
most) ->
which movie do you like best . (insert
best) ->
which movie do you think best . (replace
likewith
think) ->
which movie do you think the best . (insert
the) ->
which movie do you think is the best . (insert
is) -
Running example for sentence correction: in the word oil price very high right now . ->
in the word , oil price very high right now . (insert
,) ->
in the word , oil prices very high right now . (replace
pricewith
prices) ->
in the word , oil prices are very high right now . (insert
are) -
Extra Examples for sentence correction:
origin: even if we are failed , we have to try to get a new things .->
generated: even if we are failing , we have to try to get some new things .origin: in the word oil price very high right now .->
generated: in the word , oil prices are very high right now .origin: the reason these problem occurs is also becayse of the exam .->
generated: the reason these problems occur is also because of the exam .
-
python
==2.7
-
python packages
- TensorFlow
== 1.3.0
(other versions are not tested) - numpy
- pickle
- Rake (pip install python-rake)
- zpar (pip install python-zpar, download model file from https://github.com/frcchang/zpar/releases/download/v0.7.5/english-models.zip and extract it to
POS/english-models
) - skipthoughts (needed only when config.sim=='skipthoughts')
- en (get en from https://www.nodebox.net/code/index.php/Linguistics and put under
liguistics/
)
- TensorFlow
-
word embedding
- If you want to try using word embedding for paraphrase, you should download or train a word embedding first and place it at config.emb_path and set config.emb_path='word_max'.
- For a pretrained language model, please download the following file and extract it under
model/
. - Correction and key-gen: https://drive.google.com/open?id=1L3q-xGD3lHNETfibERTIh-ciCXmzRs3i
- Paraphrase: https://drive.google.com/open?id=1kTjnqO69CjwpBXwPtOPT6v7Ur7ro5nRR. Please put the
.pkl
file under data/quora
- For a pretrained word embedding, please download the following file.
- Correction and key-gen: https://drive.google.com/open?id=1q79Dvrx3eapffHL4ApfrT0XpOgm3sKKF. Please put the
.pkl
file under data/1-billion - Paraphrase: https://drive.google.com/open?id=1ggEdFyLIrr9sjfG1SHxjyHgOYNKy3ySE. Please put the
.pkl
file under data/quora
-
Training language models
- For each task, first train a backward and a language model:
setmode='forward'
andmode='backward'
inconfig.py
successively.
runpython
correction.py
/paraphrase.py
/key-gen.py
to train each model.
- For each task, first train a backward and a language model:
-
Generation
- For generating new sample for each tasks:
setmode='use'
and choose proper parameter inconfig.py
.
give inputs in 'input/input.txt' runpython
correction.py
/paraphrase.py
/key-gen.py
to generate.
outputs are inoutput
.
- For generating new sample for each tasks:
-
Details
- Make sure that paths for package and data are correctly set in 'config.py'.