For reproducibility analysis of CMash related tasks:
- Install conda or miniconda (version >= 4.6)
- Clone this repo
git clone https://github.com/KoslickiLab/CMASH-reproducibles.git
- Get required tools/resources (only need to do once)
cd CMASH-reproducibles/src
bash 0.install_all_required_dependency_run_once.sh
- Uninstall everything if necessary
cd CMASH-reproducibles/src
bash uninstall.sh
This is to reproduce the results in the CMash manuscript for ISMB 2022. Please follow Install dependencies above first to install all required dependencies.
- regenerate all the output data (may take more than 1 day)
cd CMASH-reproducibles/src
nohup bash 1.reproduce_ISMB_2022_CMash_manuscript_results.sh &
#accept 1 positional parameter for thread number (default 16)
- find the results
cd CMASH-reproducibles/1_ISMB_2022/
ls -d CMash_out_* #output folder: CMash_out_${time_tag}
- folder structure
# final_output: stores all final output files and Fig 1f, 2, and 3
# fig2_JI_estimation: intermediate outputs for pairwise JI estimation within Brucella genus
# fig3_CI_estimation: intermediate outputs for containment estimation of 1000 random genomes in the simulated metagenomic data
# Brucella_30 / random_1000 / simulation_200: downloaded genome files
# sup_f2_cmash_profile: intermediate outputs for CMash profiling of one real world data for supplementary figure S2
# sup_f3_compare_sourmash_mash: intermediate outputs for comparison of CMash, Sourmash, and Mash for supplementary figure S3
# sup_f4_BF_distri: intermediate outputs for bias factor measurements for supplementary figure S4
- Rush implementation
- Explore metagenomic dark matter
- JI as function of k related to evolutionary distances