software
MAFFT
GMAP
Python Modules
outlier-utils
cd /path/to/install
git clone https://github.com/Chaiyuangungun/PanMarker.git
conda env create -g PanMarker.yml
chmod +x PanMarker/get_promoter_and_cds.py
echo 'export PATH=/path/to/install/PanMarker:$PATH' >> ~/.bash_profile
source ~/.bash_profile
conda activate PanMarker
1、Mapping reference cds to all genomes of pan-genome
# For each sample, run command below:
gmap_build -D . -d DB genomeN.fasta
gmap -D . -d DB -f 2 -n 1 -t 20 ref.cds > genomeN.gff3
2、Extract cds and promoter sequences from gff3 file generated by gmap
# For each sample, run command below
./get_promoter_and_cds.py genomeN.gff3 genomeN.fasta 2000 genomeN
3、Extract variants and extart variant sites associate with traits
# Prepare the list of cds files and the list of promoter files.
ls *.prm >prm.file
ls *.cds >cds.file
If you want to identify cds sequence variants and correlate with phenotype and gene expression,run command below:
python3 PanMarker.py -i cds.file -p trait.file -e FPKM.file -s cds -o prefix -g T\F【-t num -a person_cor】
Parameters:
-i INPUTFILE, --inputfile input file(cds file list)
-p PHE, --phe phenotype file
-e EXP, --exp expression profile
-s TYPE --type file type, cds or prm
-o OUTPUT, --output output file prefix
-g GRU, --gru T or F,Ture or False, whether to perform phenotype outlier filtering
-t THREAT, --threat Number of threads (default=10)
-a PERVALUE --pervalue Pearson correlation coefficient(default=0.3)
Tips:
trait.file :
sample1 value1
sample2 value2
... ...
sampleN valueN
FPKM.file :
gene1 gene2 gene3 ... geneN
sample1 xx xx xx ... xx
sample2 xx xx xx ... xx
...
sampleN xx xx xx ... xx
If you want to identify promoter sequence variants and correlate with gene expression,run command below:
python3 PanMarker.py -i prm.file -p FPKM.file -s prm -o prefix -g T\F【-t num】
Parameters:
-i INPUTFILE, --inputfile input file(prm file list)
-p PHE, --phe expression profile
-s TYPE --type file type, cds or prm
-o OUTPUT, --output output file prefix
-g GRU, --gru T or F,Ture or False, whether to perform phenotype outlier filtering
-t THREAT, --threat Number of threads (int,default=10)
-a PERVALUE --pervalue Pearson correlation coefficient(float,default=0.3)
you can also identify TF sites variants by PanTFBS(https://github.com/Chaiyuangungun/PanTFBS)
If you don't have expression data, run command below:
python3 PanMarker_noexpress.py -i cds.file -p FPKM.file -s cds -o prefix -g T\F【-t num】
or
python3 PanMarker_noexpress.py -i prm.file -p FPKM.file -s prm -o prefix -g T\F【-t num】
Parameters:
-i INPUTFILE, --inputfile input file(cds file list)
-p PHE, --phe phenotype file
-s TYPE --type file type, cds or prm
-o OUTPUT, --output output file prefix
-g GRU, --gru T or F,Ture or False, whether to perform phenotype outlier filtering
-t THREAT, --threat Number of threads (default=10)
-a PERVALUE --pervalue Pearson correlation coefficient(default=0.3)
Tips:
trait.file :
sample1 value1
sample2 value2
... ...
sampleN valueN
2、result(output files)
prefix.result(all variant sites associate with phenotype)
prefix.out(top 5% variant sites associate with phenotype)