Skip to content

This is a repository to try debug the expression prediction protocol used by Peter Ulz and their team.

License

Notifications You must be signed in to change notification settings

DoryAbelman/ExpressionPrediction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updated Expression Prediction

Analyze gene expression based on coverage distribution around TSS from cfDNA

Basic Usage: expression_prediction.py [-h] -fq FASTQ_FILE -s NAME -g {m,f} [-o OUTDIR] [-k] [-t THREADS] -cna CNA_FILE [-tmp TEMP_DIR] [-step START_STEP]

Note: This calls every step of the below list. Copy number alterations (CNAs) need to be normalized against. This is done by specifying a list of copy-number states in the format . For plasma-seq analyses this is the *.segments file.

Attention: Large hg19 reference index files for BWA are not in this commit

Also needed for analysis: . java . R (Package e1071)

-) Step1 create directory and create MD5 file of input -) Step2 Trim fastq -) Step3 alignment and conversion to BAM -) Step4 remove PCR duplicates -) Step5 analyze TSS profile in housekeeping vs. unexpressed genes and plot in R -) Step6 extract coverage parameters for expression prediction -) Step7 expression prediction

New added notes:

As Rmdup is now obsolete, and we have paired end reads, or lab uses picard's markduplicates to remove them. We then start from step 5 after duplicates have been removed and fastq's converted to bam's in the way the lab prefers. These new scripts will be uploaded to the repository.

About

This is a repository to try debug the expression prediction protocol used by Peter Ulz and their team.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 57.6%
  • R 41.7%
  • Shell 0.7%