Skip to content

Latest commit

 

History

History
130 lines (81 loc) · 4.31 KB

README.md

File metadata and controls

130 lines (81 loc) · 4.31 KB

Dependencies

software

    MAFFT
    GMAP

Python Modules

    outlier-utils

Installation

    cd /path/to/install
    git clone https://github.com/Chaiyuangungun/PanMarker.git
    conda env create -g PanMarker.yml
    chmod +x PanMarker/get_promoter_and_cds.py
    echo 'export PATH=/path/to/install/PanMarker:$PATH' >> ~/.bash_profile
    source ~/.bash_profile
    conda activate PanMarker

Usage

1、Mapping reference cds to all genomes of pan-genome

    # For each sample, run command below:
    gmap_build -D . -d DB genomeN.fasta
    gmap -D . -d DB -f 2 -n 1 -t 20 ref.cds > genomeN.gff3

2、Extract cds and promoter sequences from gff3 file generated by gmap

    # For each sample, run command below
    ./get_promoter_and_cds.py genomeN.gff3 genomeN.fasta 2000 genomeN

3、Extract variants and extart variant sites associate with traits

    # Prepare the list of cds files and the list of promoter files.
    ls *.prm >prm.file 
    ls *.cds >cds.file

If you want to identify cds sequence variants and correlate with phenotype and gene expression,run command below:

    python3 PanMarker.py -i cds.file -p trait.file -e FPKM.file -s cds -o prefix -g T\F【-t num -a person_cor】
    
    Parameters: 
    
            -i INPUTFILE, --inputfile  input file(cds  file list)
            -p PHE, --phe              phenotype file
            -e EXP, --exp              expression profile
            -s TYPE --type             file type, cds or prm
            -o OUTPUT, --output        output file prefix
            -g GRU, --gru              T or F,Ture or False, whether to perform phenotype outlier filtering
            -t THREAT, --threat        Number of threads (default=10)
            -a PERVALUE --pervalue     Pearson correlation coefficient(default=0.3)
    Tips: 
    
    trait.file :

            sample1 value1

            sample2 value2

            ...     ...

            sampleN valueN

    FPKM.file :

            gene1 gene2 gene3 ... geneN
   
    sample1 xx  xx  xx  ... xx

    sample2 xx  xx  xx  ... xx

    ...

    sampleN xx  xx  xx  ... xx

If you want to identify promoter sequence variants and correlate with gene expression,run command below:

    python3 PanMarker.py -i prm.file -p FPKM.file -s prm -o prefix -g T\F【-t num】
    
    Parameters:
    
            -i INPUTFILE, --inputfile  input file(prm file list)
            -p PHE, --phe              expression profile
            -s TYPE --type             file type, cds or prm
            -o OUTPUT, --output        output file prefix
            -g GRU, --gru              T or F,Ture or False, whether to perform phenotype outlier filtering
            -t THREAT, --threat        Number of threads (int,default=10)
            -a PERVALUE --pervalue     Pearson correlation coefficient(float,default=0.3)

    you can also identify TF sites variants by PanTFBS(https://github.com/Chaiyuangungun/PanTFBS)

If you don't have expression data, run command below:

    python3 PanMarker_noexpress.py -i cds.file -p FPKM.file -s cds -o prefix -g T\F【-t num】
    
    or
    
    python3 PanMarker_noexpress.py -i prm.file -p FPKM.file -s prm -o prefix -g T\F【-t num】

    Parameters: 
    
            -i INPUTFILE, --inputfile  input file(cds  file list)
            -p PHE, --phe              phenotype file
            -s TYPE --type             file type, cds or prm
            -o OUTPUT, --output        output file prefix
            -g GRU, --gru              T or F,Ture or False, whether to perform phenotype outlier filtering
            -t THREAT, --threat        Number of threads (default=10)
            -a PERVALUE --pervalue     Pearson correlation coefficient(default=0.3)
    Tips: 
    
    trait.file :

            sample1 value1

            sample2 value2

            ...     ...

            sampleN valueN

2、result(output files)

    prefix.result(all variant sites associate with phenotype)
            
    prefix.out(top 5% variant sites associate with phenotype)