Skip to content

hzi-bifo/AMR_benchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes

alt text

Contents

Introduction

software list

  1. Aytan-Aktug [1],
  2. Seq2Geno2Pheno (Seq2Geno&Geno2Pheno) [2],
  3. PhenotypeSeeker v 0.7.3 [3],
  4. Kover 2.0 [4],
  5. ResFinder 4.0 [5], a direct association software based on AMR determinant database, was used as the baseline.

Datasets

Prerequirements

  • Dependencies

    • To reproduce the output, you need to use Linux OS and conda. Miniconda2 4.8.4 was used by us. All software environments were activated under "base" env, which is the default environment.

    • Installation of the conda environments:

      git clone https://github.com/hzi-bifo/AMR_benchmarking.git
      cd AMR_benchmarking
      bash ./install/install.sh #Create 9 pieces of conda environments and install packages respectively
    • For Kover, please refer to Kover to try other installation methods.

    • Finally, you need to install PyTorch in the multi_torch_env manually. To install PyTorch compatible with your CUDA version, please follow this instruction: https://pytorch.org/get-started/locally/. Our code was tested with pytorch v1.7.1, with CUDA Version 10.1 and 11.0 .

  • Memory requirement: Some procedures require extremely large memory. Aytan-Aktug multi-species model (adapted version) feature-building procedure needs ~370G memory. Other ML software needs up to 80G memory, depending on the number of CPUs and the specific species-antibiotic combination.

  • Disk storage requirement: Some procedures generate extremely large intermediate files, although they are deleted once finished in our pipeline. E.G. PhenotypeSeeker(adapted version) needs the most disk storage, which is up to the magnitude of 10T depending on the species.

Input file

The input file is a YAML file Config.yaml at the root folder where all options are described:

A. Basic/required parameters setting

  • Please change everything in A after the ":" to your own.
option action values ([default])
dataset_location To where the PATRIC data will be downloaded. ~246G /vol/projects/BIFO/patric_genome
output_path To where to generate the Results folder for the direct results of each software and further visualization. ./
log_path To where to generate the log folder for the intermediate files (~10 TB, while regularly cleaning files related to completed benchmarking species). ./
n_jobs CPU cores (>1) to use. 10
gpu_on GPU possibility for Aytan-Aktug SSSA model, If set to False, parallelization on CPU will be applied; Otherwise, it will be applied on one gpu core sequentially. False
clean_software Clean large intermediate files of the specified software (optional). Large temp files can also be manually removed from <log_path>/log/software/<software_name>/software_output.

B.Optional parameters setting

  • Please change the conda environment names if the same names already exist in your working PC.
option action values ([default])
amr_env_name,amr_env_name2 conda env for general use amr_env,amr2
PhenotypeSeeker_env_name conda env for PhenotypeSeeker PhenotypeSeeker_env
multi_env_name conda env for multi_env
multi_torch_env_name conda env for NN model multi_torch_env
kover_env_name conda env for Kover kover_env
se2ge_env_name conda env for Seg2Geno snakemake_env
kmer_env_name conda env for Seg2Geno k-mers generation kmer_kmc
phylo_name conda env for Seg2Geno phylogenetic trees generation phylo_env
phylo_name2 conda env for visualization of misclassified genomes phylo_env2
resfinder_env conda env for ResFinder res_env

C. Advanced/optional parameters setting

  • You can evaluate for a subset of species at a time by modifying the values of the 'species_list', 'species_list_phylotree', and 'species_list_multi_antibiotics' options.
  • For multi-species models , we have listed all the possible species in terms of dataset this study provides; you can explore as you like by making new combinations of the listed species. Users, who would like to reproduce this AMR benchmarking results, are not advised to change settings in this category.
option action values ([default])
species_list Benchmarked species under random and homology-aware folds for single-species evaluation Escherichia_coli, Staphylococcus_aureus, Salmonella_enterica, Klebsiella_pneumoniae, Pseudomonas_aeruginosa, Acinetobacter_baumannii, Streptococcus_pneumoniae, Mycobacterium_tuberculosis, Campylobacter_jejuni, Enterococcus_faecium, Neisseria_gonorrhoeae
species_list_phylotree Benchmarked species under phylogeny-aware folds for single-species evaluation Escherichia_coli, Staphylococcus_aureus, Salmonella_enterica, Klebsiella_pneumoniae, Pseudomonas_aeruginosa, Acinetobacter_baumannii, Streptococcus_pneumoniae, Campylobacter_jejuni, Enterococcus_faecium, Neisseria_gonorrhoeae
species_list_multi_antibiotics Benchmarked species for single-species multi-antibiotic model. Mycobacterium_tuberculosis, Escherichia_coli, Staphylococcus_aureus, Salmonella_enterica, Klebsiella_pneumoniae, Pseudomonas_aeruginosa, Acinetobacter_baumannii, Streptococcus_pneumoniae, Neisseria_gonorrhoeae
species_list_multi_species Benchmarked species for multi-species models. Mycobacterium_tuberculosis, Salmonella_enterica, Streptococcus_pneumoniae, Escherichia_coli, Staphylococcus_aureus, Klebsiella_pneumoniae, Acinetobacter_baumannii, Pseudomonas_aeruginosa, Campylobacter_jejuni
cv_number The k value of k-fold nested cross-validation 10
QC_criteria Sample quality control level. Can be loose or strict. loose

Output

└── Results
    ├── final_figures_tables
    ├── other_figures_tables
    ├── supplement_figures_tables    
    └── software
        ├── AytanAktug
        ├── kover
        ├── majority
        ├── phenotypeseeker
        ├── resfinder_b
        ├── resfinder_folds
        ├── resfinder_k
        └── seq2geno

  • Cross-validation results of each ML software and evaluation results of Resfinder are generated under output_path/Results/software/<name of the software>.
  • Visualization tables and graphs are generated under output_path/Results/final_figures_tables and output_path/Results/supplement_figures_tables.
  • Numbers and statistic results mentioned in our benchmarking article are generated under output_path/Results/other_figures_tables.

Usage

git clone https://github.com/hzi-bifo/AMR_benchmarking.git
cd AMR_benchmarking
bash main.sh #details of usage were explained in main.sh. You can't finish the whole AMR benchmarking just by setting this command to run once.
bash ./scripts/model/clean.sh # Optional. Clean intermediate files 
  • One could see main.sh for benchmarking workflow.
  • One could use clean.sh to clean large and less important intermediate files. You can run it any time after the specified software finishes running on a benchmarked species. Don't use it when the corresponding software is running on a new benchmarked species.

References

[1] D Aytan-Aktug, Philip Thomas Lanken Conradsen Clausen, Valeria Bortolaia, Frank Møller Aarestrup, and Ole Lund. Prediction of acquired antimicrobial resistance for multiple bacterial species using neural networks.Msystems, 5(1), 2020.

[2] Ariane Khaledi, Aaron Weimann, Monika Schniederjans, Ehsaneddin Asgari, Tzu-Hao Kuo, Antonio Oliver, Gabriel Cabot, Axel Kola, Petra Gastmeier, Michael Hogardt, et al. Predicting antimicrobial resistance in pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO molecular medicine, 12(3):e10264, 2020.

[3] Erki Aun, Age Brauer, Veljo Kisand, Tanel Tenson, and Maido Remm. A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria. PLoS computational biology, 14(10):e1006434, 2018.

[4] Alexandre Drouin, Gaël Letarte, Frédéric Raymond, Mario Marchand, Jacques Corbeil, and François Laviolette. Interpretable genotype-to-phenotype classifiers with performance guarantees. Scientific reports, 9(1):1–13, 2019.

[5] Valeria Bortolaia, Rolf S Kaas, Etienne Ruppe, Marilyn C Roberts, Stefan Schwarz, Vincent Cattoir, Alain Philippon, Rosa L Allesoe, Ana Rita Rebelo, Alfred Ferrer Florensa, et al. Resfinder 4.0 for predictions of phenotypes from genotypes. Journal of Antimicrobial Chemotherapy, 75(12): 3491–3500, 2020.

License

MIT License

Citation

Contact