Dependencies

software

    MAFFT
    GMAP

Python Modules

    outlier-utils

Installation

    cd /path/to/install
    git clone https://github.com/Chaiyuangungun/PanMarker.git
    conda env create -g PanMarker.yml
    chmod +x PanMarker/get_promoter_and_cds.py
    echo 'export PATH=/path/to/install/PanMarker:$PATH' >> ~/.bash_profile
    source ~/.bash_profile
    conda activate PanMarker

Usage

1、Mapping reference cds to all genomes of pan-genome

    # For each sample, run command below:
    gmap_build -D . -d DB genomeN.fasta
    gmap -D . -d DB -f 2 -n 1 -t 20 ref.cds > genomeN.gff3

2、Extract cds and promoter sequences from gff3 file generated by gmap

    # For each sample, run command below
    ./get_promoter_and_cds.py genomeN.gff3 genomeN.fasta 2000 genomeN

3、Extract variants and extart variant sites associate with traits

    # Prepare the list of cds files and the list of promoter files.
    ls *.prm >prm.file 
    ls *.cds >cds.file

If you want to identify cds sequence variants and correlate with phenotype and gene expression，run command below:

    python3 PanMarker.py -i cds.file -p trait.file -e FPKM.file -s cds -o prefix -g T\F【-t num -a person_cor】
    
    Parameters: 
    
            -i INPUTFILE, --inputfile  input file(cds  file list)
            -p PHE, --phe              phenotype file
            -e EXP, --exp              expression profile
            -s TYPE --type             file type, cds or prm
            -o OUTPUT, --output        output file prefix
            -g GRU, --gru              T or F,Ture or False, whether to perform phenotype outlier filtering
            -t THREAT, --threat        Number of threads (default=10）
            -a PERVALUE --pervalue     Pearson correlation coefficient(default=0.3)
    Tips: 
    
    trait.file :

            sample1 value1

            sample2 value2

            ...     ...

            sampleN valueN

    FPKM.file :

            gene1 gene2 gene3 ... geneN
   
    sample1 xx  xx  xx  ... xx

    sample2 xx  xx  xx  ... xx

    ...

    sampleN xx  xx  xx  ... xx

If you want to identify promoter sequence variants and correlate with gene expression，run command below:

    python3 PanMarker.py -i prm.file -p FPKM.file -s prm -o prefix -g T\F【-t num】
    
    Parameters:
    
            -i INPUTFILE, --inputfile  input file(prm file list)
            -p PHE, --phe              expression profile
            -s TYPE --type             file type, cds or prm
            -o OUTPUT, --output        output file prefix
            -g GRU, --gru              T or F,Ture or False, whether to perform phenotype outlier filtering
            -t THREAT, --threat        Number of threads (int,default=10）
            -a PERVALUE --pervalue     Pearson correlation coefficient(float,default=0.3)

    you can also identify TF sites variants by PanTFBS(https://github.com/Chaiyuangungun/PanTFBS)

If you don't have expression data, run command below:

    python3 PanMarker_noexpress.py -i cds.file -p FPKM.file -s cds -o prefix -g T\F【-t num】
    
    or
    
    python3 PanMarker_noexpress.py -i prm.file -p FPKM.file -s prm -o prefix -g T\F【-t num】

    Parameters: 
    
            -i INPUTFILE, --inputfile  input file(cds  file list)
            -p PHE, --phe              phenotype file
            -s TYPE --type             file type, cds or prm
            -o OUTPUT, --output        output file prefix
            -g GRU, --gru              T or F,Ture or False, whether to perform phenotype outlier filtering
            -t THREAT, --threat        Number of threads (default=10）
            -a PERVALUE --pervalue     Pearson correlation coefficient(default=0.3)
    Tips: 
    
    trait.file :

            sample1 value1

            sample2 value2

            ...     ...

            sampleN valueN

2、result(output files)

    prefix.result(all variant sites associate with phenotype)
            
    prefix.out(top 5% variant sites associate with phenotype)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
PanMarker.py		PanMarker.py
PanMarker.yml		PanMarker.yml
PanMarker_noexpress.py		PanMarker_noexpress.py
README.md		README.md
get_promoter_and_cds.py		get_promoter_and_cds.py
simer.geno.map		simer.geno.map
simer.phe		simer.phe
simer1.vcf		simer1.vcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Installation

Usage

About

Releases

Packages

Languages

Chaiyuangungun/PanMarker

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages