Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated metabo-batches example #233

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
2ca8aed
updated metabo-batches example
mwalzer Jul 23, 2024
443855e
Added batch-correction and data details to worked metabo example md
mwalzer Sep 17, 2024
9b56b9c
Add meeting notes July 10
bittremieux Jul 23, 2024
45ae62d
Add meeting notes July 24
bittremieux Jul 29, 2024
f8bed0c
Update set mzQC example (#219)
bittremieux Jul 29, 2024
da64a5c
update QC metrics in mzML files example (#228)
bittremieux Jul 29, 2024
05f0ab1
updated USI example to mzQC v1.0 (#229)
cbielow Jul 29, 2024
63e5a60
Update example index
bittremieux Jul 29, 2024
3192215
Remove QC-CV (#236)
bittremieux Aug 6, 2024
a54c743
trimmed some redundant sections
mwalzer Jul 24, 2024
c3bbd53
Moved validation info under one document
mwalzer Jul 24, 2024
0e114b0
Added mzQC structure info
mwalzer Jul 24, 2024
736179f
updated readme for CV changes
mwalzer Jul 27, 2022
417c030
updated use-case story with a more robust (abstract) CV term use
mwalzer Jul 27, 2022
b7fb28c
Added link to MS-Quality-hub
nilshoffmann May 5, 2022
ce821bf
added two metric sub-pages for use and request guide
mwalzer Jul 27, 2022
3b2e5e9
added mzQC in-a-nutshell page
mwalzer Jul 27, 2022
3089f84
moved website in-a-nutshell page into intro as section
mwalzer Jul 25, 2024
65789aa
website updated: about page and a bit pruned contrib page
mwalzer Jul 25, 2024
8b205e6
Add meeting notes August 7
bittremieux Aug 7, 2024
a370fb1
Update QC2 sample example (#231)
mwalzer Aug 21, 2024
a2c0443
Linkcheck on PR (#242)
cbielow Aug 22, 2024
4ed1c08
Fix link check
bittremieux Aug 22, 2024
7a0cc5e
some menu cleanup -- not final
cbielow Aug 22, 2024
50d999a
fix broken links
cbielow Aug 22, 2024
1e334f7
Add meeting notes August 21
bittremieux Sep 2, 2024
77ec5be
Add meetings notes Oct 4
bittremieux Oct 17, 2024
24ac34b
Add meeting notes October 18
bittremieux Oct 25, 2024
55b3c02
Add meeting notes November 12
bittremieux Nov 14, 2024
1905200
Add meeting notes November 15
bittremieux Nov 15, 2024
2cb3707
Add meeting notes Nov 29
bittremieux Dec 2, 2024
58cc6c5
Meeting notes Dec 13
bittremieux Dec 16, 2024
c5bb8c2
Update example
bittremieux Jan 10, 2025
3c3d136
Merge branch 'main' into Update_metabobatches_example
bittremieux Jan 10, 2025
026ec98
Minor formatting change
bittremieux Jan 10, 2025
128be7a
Fix broken links
bittremieux Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions docs/pages/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@ title: "mzQC Examples"
permalink: /examples/
---

The following use cases provide several hands-on examples of how mzQC files are structured and can be used:
These introductory use cases provide examples of how mzQC files are structured and can be used:

- [Representing QC data for an individual mass spectrometry run](intro_run/)
- [Deriving QC data from multiple related mass spectrometry runs](intro_set/)
- [Tracking instrument performance using controlled QC samples](intro_qc2/)
- [Batch correction](metabo-batches/)

The following use cases demonstrate how mzQC files can be used for real-life quality control reporting:

- [Tracking instrument performance using controlled QC samples](example_qc2_longitudinal/)
- [Batch correction in metabolomics](example_batch_correction/)

Additionally, for more advanced usage, mzQC can closely interoperate with several other file formats developed by the Proteomics Standards Initiative:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 1 addition & 2 deletions docs/pages/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,10 @@ title: Metrics
permalink: /metrics/
---


The mzQC format owes much of it's _simplicity_ **and** _flexibility_ to the use of controlled vocabulary (CV) terms to define and instantiate quality metric records.
You can find out more on how to use and define your own CV terms below.

{% include_relative cv/howto_use_cv_terms.md %}

{% include_relative cv/howto_create_cv_terms.md %}


8 changes: 3 additions & 5 deletions docs/pages/validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,8 @@ of mzQC files. Full validation is implemented in the

## Syntactic Validation

With the help of the mzQC JSON schema, mzQC instances can be readily checked for
syntactic schema compliance. There are a
[number of JSON schema implementations for use in various programming languages](https://json-schema.org/implementations.html)
that also support validation of JSON schema draft-07 from which the mzQC schema is designed.
With the help of the mzQC JSON schema, mzQC instances can be readily checked for syntactic schema compliance.
There are a [number of JSON schema implementations for use in various programming languages](https://json-schema.org/tools) that also support validation of JSON schema draft-07 from which the mzQC schema is designed.

## Semantic Validation

Expand Down Expand Up @@ -62,7 +60,7 @@ The pymzqc library now supports a offline validator:
mzqc-validator [OPTIONS] INFILE
```
You will have to install pymzqc python module, but the process is quick and easy thanks to pypi.
More details can be found at the [pymzqc site](https://pymzqc.readthedocs.io/en/v1.0.0rc3/).
More details can be found at the [pymzqc site](https://pymzqc.readthedocs.io/).
The validator tool is a CLI tool built on click.
It will generate a joint validation of syntax and semantics of a given mzQC input. The output is in json format.
The validator will segment the validation report into lists of the following categories:
Expand Down
249 changes: 249 additions & 0 deletions docs/pages/worked-examples/example_batch_correction.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
library(BatchCorrMetabolomics)
data(BC)

set.3.lod <- min(set.3[!is.na(set.3)])
minBatchOccurrence.Ave <- 2
minBatchOccurrence.Line <- 4


TOF.ngenotypes <- nlevels(set.3.Y$SCode)-1 ## take away the ref samples
TOF.nNA <- apply(set.3, 2, function(x) sum(is.na(x)))
TOF.nref <- sum(set.3.Y$SCode == "ref")


conditions <- c("", "0", "1", "2", "c")
experiments <- c(t(outer(c("Q", "S"), conditions, paste, sep = "")))
methods <- rep("lm", length(experiments))
methods[grep("c", experiments)] <- "tobit"
imputeValues <- rep(NA, length(experiments))
imputeValues[grep("0", experiments)] <- 0
imputeValues[grep("1", experiments)] <- set.3.lod / 2
imputeValues[grep("2", experiments)] <- set.3.lod
imputeValues[grep("c", experiments)] <- set.3.lod - .01
refSamples <- list("Q" = which(set.3.Y$SCode == "ref"),
"S" = which(set.3.Y$SCode != "ref"))
strategies <- rep(c("Q", "S"), each = length(conditions))

## leave out the censored regressions for A
exp.idx <- (1:length(experiments))[-grep("c", experiments)]
suppressMessages(allResultsAve <-
lapply(exp.idx,
function(ii)
apply(set.3, 2, doBC,
ref.idx = refSamples[[ strategies[[ii]] ]],
batch.idx = set.3.Y$Batch,
minBsamp = minBatchOccurrence.Ave,
correctionFormula = formula("X ~ B"),
seq.idx = set.3.Y$SeqNr,
method = methods[ii],
imputeVal = imputeValues[ii])))
names(allResultsAve) <- experiments[exp.idx]

suppressMessages(allResultsLine <-
lapply(seq(along = experiments),
function(ii)
apply(set.3, 2, doBC,
ref.idx = refSamples[[ strategies[[ii]] ]],
batch.idx = set.3.Y$Batch,
minBsamp = minBatchOccurrence.Line,
seq.idx = set.3.Y$SeqNr,
method = methods[ii],
imputeVal = imputeValues[ii])))
names(allResultsLine) <- experiments

library("RUVSeq")

idx <- which(set.3.Y$SCode == "ref")
replicates.ind <- matrix(-1, nrow(set.3) - length(idx) + 1, length(idx))
replicates.ind[1,] <- idx
replicates.ind[-1,1] <- (1:nrow(set.3))[-idx]

allResultsRUV <-
lapply(0:2,
function(ImpVal) {
huhn <- set.3
huhn[is.na(huhn)] <- ImpVal * set.3.lod / 2
woppa <- t(RUVs(t(huhn),
1:ncol(huhn),
k = 3,
replicates.ind,
round = FALSE, isLog = TRUE)$normalizedCounts)
woppa[is.na(set.3)] <- NA
woppa
})


# figure 2

par(mfrow = c(1,1))
huhnPCA <- evaluateCorrection(set.3, set.3.Y, what = "PCA",
plot = TRUE, legend.loc = "bottomright")
title(main = paste("Uncorrected Interbatch Distance:", round(huhnPCA, 3)))

huhnPCA.A <- evaluateCorrection(allResultsAve[["Q"]], set.3.Y, what = "PCA",
plot = TRUE, legend.loc = "bottomright")
title(main = paste("Corrected Interbatch Distance:", round(huhnPCA.A, 3)))

# huhnPCA.A <- evaluateCorrection(allResultsAve[["Q0"]], set.3.Y, what = "PCA",
# plot = TRUE, legend.loc = "bottomright")
# title(main = paste("Q0: Interbatch distance:", round(huhnPCA.A, 3)))



results.ave <- results.line <- matrix(NA, length(experiments) + 1, 2)
dimnames(results.line) <- dimnames(results.ave) <-
list(c("No correction", experiments), c("PCA", "duplo"))

results.line[1,1] <- results.ave[1,1] <-
evaluateCorrection(set.3, set.3.Y, what = "PCA", plot = FALSE)
results.line[1,2] <- results.ave[1,2] <-
evaluateCorrection(set.3, set.3.Y, what = "duplo", plot = FALSE)

for (exp in experiments[exp.idx]) {
x <- allResultsAve[[exp]]
woppa <- set.3
woppa[!is.na(x)] <- x[!is.na(x)]
results.ave[exp, 1] <-
evaluateCorrection(woppa, set.3.Y, "PCA", plot = FALSE)
results.ave[exp, 2] <-
evaluateCorrection(woppa, set.3.Y, "duplo", plot = FALSE)
}

for (exp in experiments) {
x <- allResultsLine[[exp]]
woppa <- set.3
woppa[!is.na(x)] <- x[!is.na(x)]
results.line[exp, 1] <-
evaluateCorrection(woppa, set.3.Y, "PCA", plot = FALSE)
results.line[exp, 2] <-
evaluateCorrection(woppa, set.3.Y, "duplo", plot = FALSE)
}

results.ruv <-
cbind(sapply(allResultsRUV,
evaluateCorrection, set.3.Y,
what = "PCA", plot = FALSE),
sapply(allResultsRUV,
evaluateCorrection, set.3.Y,
what = "duplo", plot = FALSE))
dimnames(results.ruv) <- list(paste("R", 0:2, sep = ""),
c("PCA", "duplo"))

# figure 6
floodresults.ave <- rbind(results.ave, results.ruv)
floodresults.ave.label <- factor(substr(rownames(floodresults.ave), 1, 1))
floodresults.line <- rbind(results.line, results.ruv)
floodresults.line.label <- factor(substr(rownames(floodresults.line), 1, 1))
PCA.range <- range(c(floodresults.ave[,1], floodresults.line[,1]),
na.rm = TRUE)
duplo.range <- range(c(floodresults.ave[,2], floodresults.line[,2]),
na.rm = TRUE)
par(mfrow = c(1,1))
plot(floodresults.ave[,1], floodresults.ave[,2],
main = "Data set III - only batch correction",
ylab= "Repeatability", xlab = "Interbatch distance",
xlim = PCA.range, ylim = duplo.range,
col = as.integer(floodresults.ave.label))
text(floodresults.ave[,1], floodresults.ave[,2],
pos = ifelse(floodresults.ave.label == "N", 2, 4),
col = as.integer(floodresults.ave.label),
labels = rownames(floodresults.ave))

plot(floodresults.line[,1], floodresults.line[,2],
main = "Data set III - batch and order correction",
xlim = PCA.range, ylim = duplo.range,
ylab= "Repeatability", xlab = "Interbatch distance",
col = as.integer(floodresults.line.label))
text(floodresults.line[,1], floodresults.line[,2],
pos = ifelse(floodresults.line.label %in% c("N", "Q"), 2, 4),
col = as.integer(floodresults.line.label),
labels = rownames(floodresults.line))


# log peak area mean
library("dplyr")
library("ggplot2")

raw <- data.frame(set.3)
#corrected <- data.frame(allResultsAve[["Q"]])
corrected <- data.frame(allResultsRUV[[0]])
row.names(corrected) <- row.names(raw)

raw[is.na(raw)] <- 0
corrected[is.na(corrected)] <- 0

raw['mean'] <- apply(raw, 1, mean)
corrected['mean'] <- apply(corrected, 1, mean)
raw_plus_meta <- merge(raw, set.3.Y, by="row.names", all=TRUE)
corrected_plus_meta <- merge(corrected, set.3.Y, by="row.names", all=TRUE)

plot_areas_batchwise<- function(df,title) {
gm = mean(df$mean)
ggplot() +
geom_point(data=df, mapping=aes(x=SeqNr, y=mean, color=Batch)) +
xlab("injection order") + ylab("log mean peak area") + ggtitle(title) +
geom_hline(yintercept=gm, linetype="dashed", color = "black") +
stat_smooth(method="lm", formula=y~1, se=FALSE)
}

plot_areas_batchwise(raw_plus_meta,"Raw")
plot_areas_batchwise(corrected_plus_meta,"Corrected")

raw_refs <- raw_plus_meta[ which(raw_plus_meta$SCode=='ref'),]
raw_samples <- raw_plus_meta[ which(raw_plus_meta$SCode!='ref'),]
corrected_refs <- corrected_plus_meta[ which(corrected_plus_meta$SCode=='ref'),]
corrected_samples <- corrected_plus_meta[ which(corrected_plus_meta$SCode!='ref'),]

plot_areas_batchwise(raw_refs,"raw refs")
plot_areas_batchwise(corrected_refs,"corrected refs")
plot_areas_batchwise(raw_samples,"raw samples")
plot_areas_batchwise(corrected_samples,"corrected samples")



raw_plus_meta$Batch <- as.character(raw_plus_meta$Batch)
raw_plus_meta[ which(raw_plus_meta$SCode=='ref'),]$Batch = "QC"
raw_plus_meta$Batch <- as.factor(raw_plus_meta$Batch)

corrected_plus_meta$Batch <- as.character(corrected_plus_meta$Batch)
corrected_plus_meta[ which(corrected_plus_meta$SCode=='ref'),]$Batch = "QC"
corrected_plus_meta$Batch <- as.factor(corrected_plus_meta$Batch)

plot_areas_batchwise(raw_plus_meta,"raw")
plot_areas_batchwise(corrected_plus_meta,"corrected")


getPCA <- function(X, Y, GRAPH)
{
nbatches <- nlevels(Y$Batch)
noref.idx <- which(Y$SCode != "ref")
Xsample <- X[noref.idx,]
YSample <- Y[noref.idx,]

Xsample <- Xsample[, apply(Xsample, 2, function(x) !all(is.na(x)))]

## replace NA values with column means
for (i in 1:ncol(Xsample))
Xsample[is.na(Xsample[,i]),i] <- mean(Xsample[,i], na.rm = TRUE)

Xsample <- Xsample[, apply(Xsample, 2, sd, na.rm = TRUE) > 0]
## Original is using ChemometricsWithR which reports an unintelligible result object, hence switching to something well maintained to get the PCA scores
X.PCA <- FactoMineR::PCA(Xsample, scale.unit = TRUE, graph=GRAPH)

return (X.PCA)

}



set3.uncorrected.PCA <- getPCA(set.3, set.3.Y,GRAPH = FALSE)
set3.uncorrected.PCAscores <- data.frame(set3.uncorrected.PCA$ind$coord)
set3.uncorrected.PCAscores <- merge(set3.uncorrected.PCAscores, set.3.Y, by="row.names", all=TRUE)
write.csv(set3.uncorrected.PCAscores,"set3.uncorrected.PCA.csv", row.names = TRUE)

set3.corrected.PCA <- getPCA(corrected, set.3.Y,GRAPH = FALSE)
set3.corrected.PCAscores <- data.frame(set3.corrected.PCA$ind$coord)
set3.corrected.PCAscores <- merge(set3.corrected.PCAscores, set.3.Y, by="row.names", all=TRUE)
write.csv(set3.corrected.PCAscores,"set3.corrected.PCA.csv", row.names = TRUE)


Loading
Loading