Annotate cell types in Ewing sarcoma samples on the Portal #292
allyhawkins
started this conversation in
Propose a new analysis
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Proposed analysis
This analysis aims to annotate cell types for the existing Ewing sarcoma samples on the Portal. These can be found in SCPCP000015.
Specifically, we plan to add three levels of annotations:
In addition to providing human-readable names for cell types, we will also provide cell ontology identifiers where possible.
As part of this analysis, we will create a reference of marker genes associated with Ewing sarcoma tumor cells and cell states. We will also create a reference dataset containing well-annotated Ewing sarcoma cells. These references will be able to be used to annotate other samples from Ewing sarcoma tumors.
Scientific goals
The goal of this analysis is to provide validated cell type annotations that can be used for downstream analysis of the Ewing sarcoma samples. The analysis will accomplish the following goals:
By providing standardized cell labels across all samples in the project, we can then perform joint analysis on all samples such as differential expression or gene set enrichment.
Additionally, these annotations can be added to the existing objects available on the ScPCA Portal. This will give users the validated annotations without having to perform their own cell type annotations.
Methods or approach
As part of this analysis, we would like to annotate all Ewing sarcoma samples and provide a reference that can be used by the community. To do this, we will perform the following steps on a handful of samples (2-3) to first create a well-annotated reference dataset for Ewing sarcoma. Then we can use this as a reference to annotate cells from all other samples using a reference-based approach like
SingleR
.Step 1
Before starting to identify any cell types or cell states, it will be helpful to curate a list of marker genes that are associated with Ewing sarcoma tumor cells or specific cell states. This may be helpful in assigning cell types or can be used to validate assigned cell types and cell states. To do this, we should comb through the literature and identify markers that can be used to identify:
Step 2
Next, we will need to identify which cells in each sample are tumor cells or normal cells.
There are multiple ways to do this:
CopyKat
to predict which cells are tumor or normal using copy number aberrations.Although using
CopyKat
may be more straightforward, the quiet genome typical of Ewing sarcoma tumors may make this difficult. Additionally, a number of genes have been shown to act as markers for Ewing sarcoma tumor cells, such as CD99 and NKX2.2.As part of Step 1, we should have a list of marker genes we expect to be present in Ewing sarcoma tumor cells. We can use that list with a marker gene-based cell type annotation method, like
CellAssign
orscType
, to classify cells as either tumor or normal.One option is to do this in conjunction with
CopyKat
and look for cells where we see agreement between the methods. Another way we could validate this is by performing unsupervised clustering, where we expect tumor cells and normal cells to separate.Step 3
Next, we want to classify the types of normal cells that are present. Here, we can pull out any cells that are classified as normal cells and use a reference-based method to classify those cells, such as
SingleR
.For the reference, we can use a publicly available reference from
celldex
containing both immune cells and other non-immune stromal cells (fibroblasts, endothelial cells).The benefit of using
celldex
is the presence of cell ontology terms included in the reference datasets.We will then need to validate these findings, which we can do by first curating a list of known markers for the normal cell types in our dataset. Then we can plot the expression of those known markers across the cell types. We expect that the cells assigned to the cell type where that gene is a marker gene would have the highest expression. One thing to note is that there are some cases where we expect to see correlated expression of multiple marker genes, indicating a specific cell type.
Step 4
The last thing we will want to do is identify any tumor cells that can be further classified into known cell states. Ewing sarcoma tumors are classified by the presence of the EWS-FLI1 fusion gene. However, literature has shown tumor cells exist along a mesenchymal trajectory and contain different levels of EWS-FLI1 expression and expression of EWS-FLI1 targets. EWS-low cells mimic a mesenchymal stem cell-like state and display hallmarks of the epithelial-to-mesenchymal (EMT) transition, while EWS-high cells have a more proliferative phenotype.
Multiple publications have highlighted key marker genes to distinguish these cell states and we can use these marker genes to further stratify tumor cells based on EWS-FLI1 states.
One approach would be to first group all tumor cells using the metacellapproach. Then we can use a marker gene-based approach to classify each of the metacells into cell states that have been identified in the literature. Grouping cells into smaller clusters, like metacells decreases the size of the dataset and makes it easier to implement marker gene-based methods like
CellAssign
orscType
.Existing modules
No, this module is not related to any existing modules.
However, the output of this module may be useful for any future modules that perform downstream analysis of Ewing sarcoma samples.
Input data
This analysis will use the processed
SingleCellExperiment
objects for SCPCP000015. Depending on the exact methods and tools implemented, we may need to use the processedAnnData
objects as well.Additionally, we will obtain a reference dataset from the
celldex
package to use for annotating normal cells.Scientific literature
The following papers may be helpful in curating marker gene lists to use for this analysis:
Other details
Computational resources
Much of the analysis could be done on my local computer, but if we use programs like
CellAssign
we will need to have access to more computing resources. I understand thatCellAssign
prefers a GPU and can require a large amount of memory, but these requirements decreaes with smaller numbers of marker genes. Here, we will useCellAssign
in stages so that we are only working with small references, rather than trying to use it to assign all possible cell types at once. For this, I plan on using Lightsail for Research.I also plan on using mostly R for this analysis, with the exception of
CellAssign
.Timeline
This analysis will be done in stages, where each stage is, at minimum, a single pull request:
SingleR
on all remaining Ewing sarcoma samples using our well-annotated reference datasetBeta Was this translation helpful? Give feedback.
All reactions