-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
continues notebook for intersecting phospho sites and total peptides
- Loading branch information
1 parent
2a81ded
commit 596e62a
Showing
9 changed files
with
1,340 additions
and
1,808 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
--- | ||
title: "Phosphoproteomics using Tandem Mass Tags" | ||
subtitle: "Intersecting phosphosites and total peptides and statistical testing" | ||
author: "Tom Smith" | ||
date: "`r Sys.Date()`" | ||
output: | ||
html_document: | ||
code_folding: show | ||
pdf_document: default | ||
bibliography: bib.json | ||
--- | ||
```{r, message=FALSE, warning=FALSE} | ||
library(camprotR) | ||
library(MSnbase) | ||
library(ggplot2) | ||
library(tidyr) | ||
library(dplyr) | ||
library(here) | ||
library(limma) | ||
library(uniprotREST) | ||
``` | ||
|
||
|
||
```{r} | ||
total <- readRDS(here('results/total.rds')) | ||
phospho_sites <- readRDS(here('results/phospho_sites.rds')) | ||
``` | ||
|
||
|
||
Below, we identify the total peptides that overlap each phosphosite. Note that each phosphosite may intersect multiple total peptides (due to missed cleavages) and each total peptide may intersect multiple phosphosites (due to multiple phosphosites per peptide). The code chunk takes a while to run.. | ||
```{r} | ||
phospho_f <- phospho_sites %>% fData() %>% tibble::rownames_to_column('ID') | ||
total_f <- total %>% fData() %>% tibble::rownames_to_column('ID') | ||
phospho_peptide_to_total_peptides <- vector('list', length=length(phospho_f$ID)) | ||
names(phospho_peptide_to_total_peptides) <- phospho_f$ID | ||
n <- 0 | ||
for(phospho_id in phospho_f$ID){ | ||
row <- phospho_f %>% filter(ID==phospho_id) | ||
n <- n + 1 | ||
ptm_positions <- as.numeric(strsplit(row$ptm_position, split='\\|')[[1]]) | ||
total_peptide_ids <- total_f %>% | ||
filter(Master.Protein.Accessions %in% row$Master.Protein.Accessions, | ||
peptide_start<=min(ptm_positions), | ||
peptide_end>=max(ptm_positions)) %>% | ||
pull(ID) | ||
phospho_peptide_to_total_peptides[[row$ID]] <- total_peptide_ids | ||
} | ||
``` | ||
|
||
Now, how many overlapping features. Note that for the phospho and total to be considered intersecting, there must be at least one total peptide which overlaps the phosphosite. Out of the `r length(phospho_f$ID)` phosphosites, only `r sum(sapply(phospho_peptide_to_total_peptides, function(x) length(x)>0))` have at least one overlapping total peptide. | ||
```{r} | ||
print(length(phospho_f$ID)) | ||
print(table(sapply(phospho_peptide_to_total_peptides, function(x) length(x)>0))) | ||
print(table(sapply(phospho_peptide_to_total_peptides, length))) | ||
``` | ||
limma requires a single row per feature with a columns per sample. Here, the phospho and total are separate samples so we need to combine them into a single row. | ||
```{r} | ||
combined_exprs <- phospho_peptide_to_total_peptides %>% names() %>% lapply(function(x){ | ||
total_peptides <- phospho_peptide_to_total_peptides[[x]] | ||
if(length(total_peptides)>0) { | ||
exprs_values <- c(exprs(phospho_sites[x,]), | ||
colSums(exprs(total[total_peptides,]), na.rm=TRUE)) | ||
exprs_values <- unname(exprs_values) | ||
return(exprs_values) | ||
} else { return(NULL) } | ||
}) | ||
names(combined_exprs) <- names(phospho_peptide_to_total_peptides) | ||
``` | ||
|
||
Create the completed matrix of phospho and total. | ||
```{r} | ||
combined_exprs <- combined_exprs[sapply(combined_exprs, function(x) !is.null(x))] | ||
all_combined_exprs <- bind_rows(combined_exprs) %>% t() %>% as.matrix | ||
rownames(all_combined_exprs) <- names(combined_exprs) | ||
colnames(all_combined_exprs) <- c(paste0(colnames(phospho_sites), '_phospho'), | ||
paste0(colnames(total), '_total')) | ||
``` | ||
Inspect the combined quantification matrix. | ||
```{r} | ||
head(all_combined_exprs) # | ||
``` | ||
|
||
```{r} | ||
all_combined_fdata <- fData(phospho_sites[rownames(all_combined_exprs),])[ | ||
,c('Master.Protein.Accessions', 'ptm_position')] | ||
all_combined_fdata$n_total <- sapply(rownames(all_combined_exprs), | ||
function(x) length(phospho_peptide_to_total_peptides[[x]])) | ||
all_combined_pdata <- rbind((pData(phospho_sites) %>% mutate(type='phospho')), | ||
(pData(total) %>% mutate(type='total'))) | ||
rownames(all_combined_pdata) <- colnames(all_combined_exprs) | ||
all_combined_res <- MSnSet(as.matrix.data.frame(all_combined_exprs), | ||
all_combined_fdata, | ||
all_combined_pdata) | ||
head(fData(all_combined_res)) | ||
pData(all_combined_res) | ||
``` | ||
|
||
Below, we subset to the first 2 'conditions', where condition 1 = tags 1-4 (x1 phospho; x1 total) and condition 2 = tags 5-7 (x6 phospho; x2 total). | ||
The ground truth for the difference between the first 4 tags and the next 3 is therefore a 3-fold increase in phosphorylation for the yeast proteins and reduced phosphorylation for the human proteins. For the human proteins, given the amounts of spike in yeast and balancing human proteins to make up the labelled material, we expect a 30% drop in phosphorylation. | ||
```{r} | ||
pairwise_comparison_combined_res <- all_combined_res[,pData(all_combined_res)$condition %in% 1:2] | ||
pData(pairwise_comparison_combined_res) | ||
``` | ||
|
||
Below, we perform the limma analysis | ||
```{r} | ||
dat <- exprs(pairwise_comparison_combined_res) %>% log(base=2) | ||
type <- factor(pData(pairwise_comparison_combined_res)$type, level=c('total', 'phospho')) | ||
condition <- factor(pData(pairwise_comparison_combined_res)$condition, levels=1:2) | ||
study.design <- model.matrix(~type*condition) | ||
fit <- lmFit(dat, study.design) | ||
fit <- eBayes(fit, trend=TRUE) | ||
limma.results <- topTable(fit, coef = colnames(fit$coefficients)[4], n = Inf, confint=TRUE) | ||
limma.results$sigma <- fit$sigma | ||
``` | ||
|
||
|
||
We would like to add information about the species to make sure the fold-changes are in the expected direction | ||
```{r} | ||
limma.results$protein <- rownames(limma.results) %>% | ||
strsplit(split='_') %>% | ||
sapply('[[', 1) | ||
species_res <- uniprot_map( | ||
ids = unique(limma.results$protein), | ||
from = "UniProtKB_AC-ID", | ||
to = "UniProtKB", | ||
fields = "organism_name") %>% | ||
mutate(species=recode(Organism, | ||
"Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)"="S.cerevisiae", | ||
"Homo sapiens (Human)"="H.sapiens")) | ||
``` | ||
|
||
```{r} | ||
limma.results %>% | ||
merge(species_res, by.x='protein', by.y='From') %>% | ||
ggplot(aes(logFC, -log10(P.Value), fill=adj.P.Val<0.01)) + | ||
geom_point(pch=21, size=2, colour='grey70') + | ||
facet_wrap(~species) + | ||
theme_camprot(base_size=15, base_family='sans', border=FALSE) + | ||
theme(strip.background=element_blank()) + | ||
xlab('Log2 fold change') + | ||
ylab('-log10(p-value)') + | ||
scale_fill_manual(values=c('grey90', get_cat_palette(2)[2]), name='FDR < 1%') | ||
``` | ||
OK, so the direction of change is as expected (reduced for human, increased for yeast), but only a subset of the fold-changes reach a 1% FDR threshold for significance. |
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.