Skip to content

Latest commit

 

History

History
354 lines (252 loc) · 13.9 KB

README.md

File metadata and controls

354 lines (252 loc) · 13.9 KB

R scripts for analysis of mass spectrometry and Olink PEA proteomics data from cord blood

Supporting repository for the manuscript “The proteome signature of cord blood plasma with high hematopoietic stem and progenitor cell count”

What this workflow does

This repository contains a set of R scripts that performs protein, pathway and correlation module centric analysis of proteomics data.

More specifically it uses data from two different technologies that to a large extent complement each other in terms of covered proteins but also to a small extent overlaps in terms of that coverage. It takes two Excel files as input. One with Olink NPX protein expression values and one with mass spectrometry normalized abundance (formated output from Proteome Discoverer).

The specific data analyzed in this project is collected from cord blood plasma. 8 samples with high CDC34 concentration and 8 samples with low such concentration. The project is trying to find proteomics biomarkers for CD34 concentration. It looks for biomarkers on the single protein level but also looks more widely into pathways with differing expression patterns.

It looks for differentially expressed genes using the packages t-test and DEqMS, pathways using ReactomeGSA and correlation modules using WGCNA.

The purpose of this repository

The main purpose of this repository is to make the published analysis reproducible. A secondary usage would be for anyone to run the code for another project with similar datasets. Hope fully the repository will also develop beyond its state at publication. It applies a set for recently published algorithms and software proven to outperform their predecessors in a workflow to identify biomarkers.

How to run the analysis (the r-scripts)

There is a README.Rmd. It can be used to run all or a subset of the R scripts of the analysis. It also generates the README.md file for github site.

To reproduce the published analysis:

  1. clone this repository
  2. Checkout commit 778d19a5f01fe38871da7a947e42700041e4886b from the master branch
  3. Move or delete the out_r folder and make a new empty out_r
  4. Edit “path_to_my_project” in README.Rmd
  5. Rerun the scripts in the “r_subscripts” folder. This is easiest done by running the README.Rmd file. It runs the R scripts and render markdown files and generates a README.md file with links to the markdown reports markdown files. (This was the only way I could render markdown form the scripts and keep the main direcory as working directory. I am sure there is a tidyer way to make README fils) Note: There is also a README_local_html.Rmd that can be used to generate a corresponding README file in html that can be used locally, not at github.)
  6. For each subscript to be run from the README file(s) it might have to be “uncommented” by removing the hash in front of it.

Take a look the code

If you would like to see what is going on in the scripts the code and output can be accessed with the linkes below. The links are in order of executions.

Links to Reactome webserver to browse results

Only active at Reactome server for seven days. Then this analysis (or script 006) has to be rerun.

Correlation Adjusted MEan RAnk (CAMERA)

Generating the README.md file for Github (the file that you are reading now)

Run/Knit “README.Rmd”

Generating the README.html for local browsing

Run “render_html.R”

References to all used R packages

Blighe, Kevin, Sharmila Rana, and Myles Lewis. 2020. EnhancedVolcano: Publication-Ready Volcano Plots with Enhanced Colouring and Labeling. https://github.com/kevinblighe/EnhancedVolcano.

Carlson, Marc. 2020. Org.hs.eg.db: Genome Wide Annotation for Human.

Epskamp, Sacha, Giulio Costantini, Jonas Haslbeck, and Adela Isvoranu. 2021. Qgraph: Graph Plotting Methods, Psychometric Data Visualization and Graphical Model Estimation. https://CRAN.R-project.org/package=qgraph.

Epskamp, Sacha, Angélique O. J. Cramer, Lourens J. Waldorp, Verena D. Schmittmann, and Denny Borsboom. 2012. “qgraph: Network Visualizations of Relationships in Psychometric Data.” Journal of Statistical Software 48 (4): 1–18.

Griss, Johannes. 2021. ReactomeGSA: Client for the Reactome Analysis Service for Comparative Multi-Omics Gene Set Analysis. https://github.com/reactome/ReactomeGSA.

Griss, Johannes, Guilherme Viteri, Konstantinos Sidiropoulos, Vy Nguyen, Antonio Fabregat, and Henning Hermjakob. 2020. “ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis.” bioRxiv. https://doi.org/10.1101/2020.04.16.044958.

Langfelder, Peter, Steve Horvath with contributions by Chaochao Cai, Jun Dong, Jeremy Miller, Lin Song, Andy Yip, and Bin Zhang. 2021. WGCNA: Weighted Correlation Network Analysis. http://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/.

Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An r Package for Weighted Correlation Network Analysis.” BMC Bioinformatics, no. 1: 559. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559.

———. 2012. “Fast R Functions for Robust Correlations and Hierarchical Clustering.” Journal of Statistical Software 46 (11): 1–17. https://www.jstatsoft.org/v46/i11/.

Luo, Weijun. 2020. Pathview: A Tool Set for Pathway Based Data Integration and Visualization. https://pathview.uncc.edu/.

Luo, Weijun, Brouwer, and Cory. 2013. “Pathview: An r/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (14): 1830–31. https://doi.org/10.1093/bioinformatics/btt285.

Lüdecke, Daniel. 2018. “Sjmisc: Data and Variable Transformation Functions.” Journal of Open Source Software 3 (26): 754. https://doi.org/10.21105/joss.00754.

———. 2021. Sjmisc: Data and Variable Transformation Functions. https://strengejacke.github.io/sjmisc/.

Ooms, Jeroen. 2021. Writexl: Export Data Frames to Excel Xlsx Format. https://CRAN.R-project.org/package=writexl.

Ren, Kun. 2021. Rlist: A Toolbox for Non-Tabular Data Manipulation. https://CRAN.R-project.org/package=rlist.

Robinson, David, Alex Hayes, and Simon Couch. 2022. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.

Schauberger, Philipp, and Alexander Walker. 2021. Openxlsx: Read, Write and Edit Xlsx Files. https://CRAN.R-project.org/package=openxlsx.

Turner, Stephen. 2022. Annotables: Ensembl Annotation Tables. https://github.com/stephenturner/annotables.

Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Yu, Guangchuang. 2021a. clusterProfiler: Statistical Analysis and Visualization of Functional Profiles for Genes and Gene Clusters. https://yulab-smu.top/biomedical-knowledge-mining-book/.

———. 2021b. Enrichplot: Visualization of Functional Enrichment Result. https://yulab-smu.top/biomedical-knowledge-mining-book/.

Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. “clusterProfiler: An r Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology 16 (5): 284–87. https://doi.org/10.1089/omi.2011.0118.

Zhao, Shilin, Linlin Yin, Yan Guo, Quanhu Sheng, and Yu Shyr. 2021. Heatmap3: An Improved Heatmap Package. https://CRAN.R-project.org/package=heatmap3.