AndroidHowTo dataset

🚨 Attention 🚨 The code in this repo was designed to facilitate the download and generation of the AndroidHowTo dataset. All credit goes to the authors of the Seq2Act paper, I am only reproducing their results. If you use the dataset, please cite their work:

@inproceedings{seq2act,
  title = {Mapping Natural Language Instructions to Mobile UI Action Sequences},
  author = {Yang Li and Jiacong He and Xin Zhou and Yuan Zhang and Jason Baldridge},
  booktitle = {Annual Conference of the Association for Computational Linguistics (ACL 2020)},
  year = {2020},
  url = {https://www.aclweb.org/anthology/2020.acl-main.729.pdf},
}

Dataset files

The generated AndroidHowTo dataset is available in the androidhowto_dataset folder. The downloaded files before generation are available in prep. If you want to generate the dataset yourself, follow the steps below.

Running the code to generate AndroidHowTo dataset:

Download all the files in used_warc.paths

python download_and_extract.py

This will take a LONG time, come back after two days ⏳

For each file in used_warc.paths a output file will be generated inside prep. Make sure the prep folder is empty, or else you might have problems 🐛

After you download and parsed all the files (3,414 in total), you can merge all the files into one, by running:

python merge_files.py

The output file is crawled_output.json.

Then, generate the TFrecords by running:


python -m seq2act.data_generation.create_commoncrawl_dataset \
--input_instruction_json_file="crawled_output.json" \
--input_csv_file="common_crawl_annotation.csv" \
--vocab_file="commoncrawl_rico_vocab_subtoken_44462" \
--output_dir="androidhowto_dataset/"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AndroidHowTo dataset

Dataset files

Running the code to generate AndroidHowTo dataset:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
androidhowto_dataset		androidhowto_dataset
prep		prep
seq2act/data_generation		seq2act/data_generation
README.md		README.md
common_crawl_annotation.csv		common_crawl_annotation.csv
commoncrawl_rico_vocab_subtoken_44462		commoncrawl_rico_vocab_subtoken_44462
crawl_instructions.py		crawl_instructions.py
crawled_output.json		crawled_output.json
download_and_extract.py		download_and_extract.py
merge_files.py		merge_files.py
used_warc.paths		used_warc.paths

debymf/generating_android_howto

Folders and files

Latest commit

History

Repository files navigation

AndroidHowTo dataset

Dataset files

Running the code to generate AndroidHowTo dataset:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages