GitHub - baoy-nlp/DSS-VAE-pytorch: Generating Sentences from Disentangled Syntactic and Semantic Spaces

Code base: (https://github.com/baoy-nlp/TextVAE-pytorch)

tokenize

python preprocess/tokenize.py --raw_file [raw_file_path] --token_file [token_out_path] --for_parse
parse the data with ZPar

【ref to zpar】
prepare dataset
- convert to <Sentence, Linearized Tree>
  
  python preprocess/tree_convert --tree_file [tree_file_path] --out_file [tree_out_path] --mode s2b
- generate dataset and vocabulary
  
  python struct_self/generate_dataset.py --train_file [<Sentence,LinearTree> file] --dev_file [<Sentence,LinearTree> file] --test_file [<Sentence,LinearTree> file] --tgt_dir [output_dir] --max_src_vocab 30000 --max_src_len 30 --max_tgt_len 90 --train_size 100000
After Pre-Process, the prepared data directory structure is as follows:
SNLI-SVAE [Tgt Dir]
- train.bin
- test.bin
- dev.bin
- vocab.bin

training from scratch with following command:

python main.py --config_files [config.yaml file] --mode train_vae --exp_name [exp_name:for note]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
EVALB		EVALB
configs		configs
decoder		decoder
encoder		encoder
examples		examples
metrics		metrics
models		models
nn_self		nn_self
preprocess		preprocess
scripts		scripts
struct_self		struct_self
timbmgVAE		timbmgVAE
utils		utils
LICENSE		LICENSE
README.md		README.md
Vagrantfile		Vagrantfile
main.py		main.py
pytorch-projects.iml		pytorch-projects.iml