diff --git a/vignettes/dataset-investigation.qmd b/vignettes/dataset-investigation.qmd
new file mode 100644
index 0000000..f05286a
--- /dev/null
+++ b/vignettes/dataset-investigation.qmd
@@ -0,0 +1,79 @@
+---
+title: "Dataset investigation: What to do when you get your data"
+format: html
+editor: visual
+minimal: true
+date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
+vignette: >
+  %\VignetteIndexEntry{Dataset investigation: What to do when you get your data}
+  %\VignetteEngine{quarto::html}
+  %\VignetteEncoding{UTF-8}
+---
+
+## In Construction
+
+```{r setup, include=FALSE}
+library(knitr)
+library(quarto)
+knitr::opts_knit$set(root.dir = './')
+```
+
+# Introduction
+
+So, you (or your amazing lab mate) have finally finished the data acquisition,
+and now you have a dataset in hand. What's next? Unfortunately, the work isn't
+over yet. Before diving into any analysis, it's crucial to understand the
+dataset itself. This is the first step in any data analysis workflow, ensuring
+that the data is of good quality and is well-prepared for preprocessing and any
+downstream analysis you plan to perform.
+
+In this vignette, we present the dataset used throughout the different vignettes
+of this website. It's far from a *perfect* dataset, which actually mirrors the
+reality of most datasets you'll encounter in research.
+
+Some issues will indeed be specific to this described dataset. However, the
+purpose of this vignette is to encourage you to think critically about your data
+and guide you through steps that can help you avoid spending hours on an
+analysis, only to realize later that some samples or features should have been
+removed or flagged earlier on.
+
+# Dataset Description
+
+In this workflow, two datasets are used:
+
+1.  An LC-MS-based (MS1 level only) untargeted metabolomics dataset to quantify
+    small polar metabolites in human plasma samples.
+2.  An additional LC-MS/MS dataset of selected samples from the former study for
+    the identification and annotation of significant features.
+
+The samples were randomly selected from a larger study aimed at identifying
+metabolites with varying abundances between individuals suffering from
+cardiovascular disease (CVD) and healthy controls (CTR). The subset analyzed
+here includes data for three CVD patients, three CTR individuals, and four
+quality control (QC) samples. The QC samples, representing a pooled serum sample
+from a large cohort, were measured repeatedly throughout the experiment to
+monitor signal stability.
+
+The data and metadata for this workflow are available on the MetaboLights
+database under the ID: MTBLS8735.
+
+The detailed materials and methods used for the sample analysis are also
+available in the MetaboLights entry. This is particularly important for
+understanding the analysis and the parameters used. It should be noted that the
+samples were analyzed using ultra-high-performance liquid chromatography (UHPLC)
+coupled to a Q-TOF mass spectrometer (TripleTOF 5600+), and chromatographic
+separation was achieved using hydrophilic interaction liquid chromatography
+(HILIC).
+
+-   Consider moving visualizations from the end-to-end vignette to here for a
+    clearer understanding of the dataset.
+-   Provide more in-depth visualizations to explore and understand the dataset
+    quality.
+-   Compare pool lc-ms and pool lc-ms/ms and show that we have better separation
+    on the second run.
+
+```{r}
+getwd()
+list.files()
+list.dirs()
+```