Wilms Tumor Dataset Annotation (SCPCP000006) #635
Replies: 3 comments 5 replies
-
Hi @maud-p. I'm Jen, the Scientific Community Manager at the Data Lab. Thank you for sharing your proposed analysis! Have you filled out the contributor form yet? On this form, you will provide the name and email address that will be associated with the AWS account that we'll create for you. We also need this form returned to ensure you have agreed to the OpenScPCA terms and conditions and other policies. Once we receive this, our team will review your proposed analyses and get back to you with next steps within 3 business days! In the meantime, please let us know if you have any questions about OpenScPCA. We look forward to discussing more with you soon! |
Beta Was this translation helpful? Give feedback.
-
@maud-p, I also wanted to point out that another researcher previously proposed an analysis of the same Wilms Tumor dataset (Group ID: SCPCAB0010 Project ID: SCPCP000006). We are only able to provide one award per project. If more than one person is working on the same project, the researcher who completes the analysis first would be eligible to apply for the award. You are still welcome to continue with your proposed analysis! I just wanted to make sure you're aware of this.
Please let me know if you have any questions! |
Beta Was this translation helpful? Give feedback.
-
Hi @maud-p, I'm Ally, one of the Data Scientists in the Data Lab. We're looking forward to having you on board as an OpenScPCA contributor! Please follow the below steps to start contributing to the project:
After you have initiated your module, you will be ready to continue with the rest of the analysis that you proposed. I would recommend that you break up your work into the steps that you outlined in your initial post, where each step corresponds to an issue and at least one subsequent pull request. I've left some initial thoughts for each of the steps you mentioned below.
As for your note about using the VISIUM data, I would definitely encourage you to use that if you think it will be beneficial in annotating normal vs. malignant cell types. As far as getting access to better resolution images, you would have to reach out to the initial submitters as we only have access to the images that are on the Portal. You can find their contact information on the project page. For more information on contributing to the project, I recommend you review these sections of the documentation: Please let me know if you have any additional questions and we are looking forward to working with you! |
Beta Was this translation helpful? Give feedback.
-
Proposed analysis
Wilms tumor (WT) is the most common pediatric kidney cancer characterized by an exacerbated intra- and inter- tumor heterogeneity. The genetic landscape of WT is very diverse in each of the histological contingents. The COG classifies WT patients into two groups: the favorable histology and diffuse anaplasia. Each of these groups is composed of the blastemal, epithelial, and stromal populations of cancer cells in different proportions, as well as cells from the normal kidney, mostly kidney epithelial cells, endothelial cells, immune cells and normal stromal cells (fibroblast).
Scientific goals
Here, we first aim to annotate the Wilms Tumor snRNA-seq samples in the SCPCP000006 (n=40) dataset. To do so we will:
• Provide annotations of normal cells composing the kidney, including normal kidney epithelium, endothelium, stroma and immune cells
• Provide annotations of tumor cell populations that may be present in the WT samples, including blastemal, epithelial, and stromal populations of cancer cells
Based on the provided annotation, we would like to additionally provide a reference of marker genes for the three cancer cell populations, which is so far lacking for the WT community.
Methods or approach
Step 1 – acquisition of INPUT data
To ensure a uniform analysis across modules, we will start the analysis with the processed objects available on the ScPCA Portal (_processed.rds). From my understanding, these objects have already undergone empty droplets removal, low quality cells filtering, and normalization and should be ready to go!
Step 2 – dimensional reduction and clustering
We hypothesize that after dimensional reduction, a cluster contains a relatively homogeneous population and will aim to annotate the dataset at the cluster level first. Refined annotation can be performed afterwards.
Step 3 – identification of immune and endothelium (normal) cells using annotation transfer from the healthy fetal kidney atlas and first felling into the cell composition
Wilms tumor have been reported to be closer to the fetal kidney as mature kidney. We will first map the snRNA-Seq to the fetal kidney reference (from the kidney cell atlas kidneycellatlas.org, [1]). This will allow the straight identification and annotation of immune and endothelial cells. Cells labelled as fetal nephron should contain (i) normal kidney epithelium cells, (ii) epithelial and blastemal cancer cells. Cells labelled as stromal cells might contain (i) normal fibroblasts, (ii) cancer associated fibroblasts (CAF) as well as (iii) stromal cancer cells.
Step 4 – identification of malignant cells using inferCNV (and/or Numbat)
We will use inferCNV to infer copy number alterations (immune and endothelial cells as normal cells for the reference). Ideally, we would like to compare results with other tools such as Numbat (if BAM file available) or copykat. Cells with CNV will be annotated as malignant, the others as normal cells. In non-obvious cases, cells will be annotated as the level of a Seurat cluster, as cells composing a cluster should be relatively homogeneous in cell type.
Step 5 – integration of CNV, label transfer and marker genes to complete the annotation
We will finally integrate the information obtained in the previous steps to annotated as the following:
o Epithelial cancer cells: labeled as fetal kidney epithelium or cap mesenchyme, epithelial markers PODXL, CDH1, LTL, WT1 (?)
o Stromal cancer cells: labeled as stromal fetal cells + stromal markers VIM, collagenases
o Blastemal cancer cells: labeled as cap mesenchyme mostly + blastemal marker CITED1, SIX2 (?)
o Immune cells
o Endothelial cells
o Normal kidney epithelium (podocyte PODXL+, renal vesicle, uteric bud)
o Normal kidney stroma, mostly fibroblasts
Step 6 – validation by integration of the 40 samples
Next, we will integrate the 40 snRNA-Seq using scVI or harmony, perform dimensional reduction and clustering. This will allow to validate our annotations, as cells from the same cell type should cluster together. We will validate this by comparing compatible cluster marker expression across all integrated datasets.
Step 7 – identification of marker genes for each cell subtype using differential expression analysis
Finally, we would like to provide the WT community with universal marker genes for a rapid identification of the different cell types found within the tumors. To do so, we will use pseudobulk differential expression analyses (DElegate package [2-3]) to find markers of the different cell types using the function FindAllMarkers2 (default parameters, patient as replicate). Additionally, we could compare relapse and non-relapse samples per cell type using the function findDE (replicate_column = "patient", method = “edger”) to evaluate if a specific phenotype within the cancer cells or the microenvironment could indicate relapse in WT.
Existing modules
Input data
To ensure a uniform analysis across modules, we will start the analysis with the processed objects available on the ScPCA Portal (_processed.rds). From my understanding, these objects have already undergone empty droplets removal, low quality cells filtering, and normalization and should be ready to go!
Scientific literature
[1] Spatiotemporal immune zonation of the human kidney | Science
[2] Single-cell RNA-seq differential expression tests within a sample should use pseudo-bulk data of pseudo-replicates | bioRxiv
[3] GitHub - cancerbits/DElegate: Wrapper and helper functions to use bulk RNA-seq differential expression methods with single-cell data
Other details
Suggestion
Correct identification of normal fibroblast versus stroma cancer cell might not be as straightforward as described in the plan. Telling cancer cells from their healthy counterparts is generally not a trivial task (for all cancer entities), but particularly difficult for WT due to a lack of unequivocal markers. Since the ScPCA portal for WT also contain VISIUM data, I would suggest using this data to identify with a pathologist homogeneous areas of normal kidney epithelium, normal stroma, nephrogenic rest, cancer stroma, cancer epithelium and cancer blastema to derive from the VISIUM data reliable signature for each cell population. The resolution of H&E pictures available in the portal is unfortunately not sufficient for a pathologist to perform such annotation. Better resolution for few slides might be available?
Beta Was this translation helpful? Give feedback.
All reactions