Annotate cell types in Ewing sarcoma samples on the Portal #292

allyhawkins · 2024-03-27T14:39:28Z

allyhawkins
Mar 27, 2024
Maintainer

Proposed analysis

This analysis aims to annotate cell types for the existing Ewing sarcoma samples on the Portal. These can be found in SCPCP000015.

Specifically, we plan to add three levels of annotations:

The first will annotate cells as tumor cells and normal cells.
The second will annotate the normal cells, providing a distinct cell type (e.g., fibroblasts, T cells).
The last will classify cell states found in the tumor cells (e.g., EWS-high and low cell states)

In addition to providing human-readable names for cell types, we will also provide cell ontology identifiers where possible.

As part of this analysis, we will create a reference of marker genes associated with Ewing sarcoma tumor cells and cell states. We will also create a reference dataset containing well-annotated Ewing sarcoma cells. These references will be able to be used to annotate other samples from Ewing sarcoma tumors.

Scientific goals

The goal of this analysis is to provide validated cell type annotations that can be used for downstream analysis of the Ewing sarcoma samples. The analysis will accomplish the following goals:

Provide annotations of normal cells found in the Ewing sarcoma samples
Provide annotations of tumor cell states that may be present in the Ewing sarcoma samples
Include cell ontology identifiers for all annotations where possible
Create a reference of marker genes for Ewing sarcoma tumor cells and cell states

By providing standardized cell labels across all samples in the project, we can then perform joint analysis on all samples such as differential expression or gene set enrichment.

Additionally, these annotations can be added to the existing objects available on the ScPCA Portal. This will give users the validated annotations without having to perform their own cell type annotations.

Methods or approach

As part of this analysis, we would like to annotate all Ewing sarcoma samples and provide a reference that can be used by the community. To do this, we will perform the following steps on a handful of samples (2-3) to first create a well-annotated reference dataset for Ewing sarcoma. Then we can use this as a reference to annotate cells from all other samples using a reference-based approach like SingleR.

Step 1

Before starting to identify any cell types or cell states, it will be helpful to curate a list of marker genes that are associated with Ewing sarcoma tumor cells or specific cell states. This may be helpful in assigning cell types or can be used to validate assigned cell types and cell states. To do this, we should comb through the literature and identify markers that can be used to identify:

Ewing sarcoma tumor cells (e.g., NKX2.2 and CD99/MIC2)
Cell states (e.g., EWS-low/high)

Step 2

Next, we will need to identify which cells in each sample are tumor cells or normal cells.
There are multiple ways to do this:

Use a program such as CopyKat to predict which cells are tumor or normal using copy number aberrations.
Use the list of marker genes mentioned in step 1 to identify tumor cells.

Although using CopyKat may be more straightforward, the quiet genome typical of Ewing sarcoma tumors may make this difficult. Additionally, a number of genes have been shown to act as markers for Ewing sarcoma tumor cells, such as CD99 and NKX2.2.
As part of Step 1, we should have a list of marker genes we expect to be present in Ewing sarcoma tumor cells. We can use that list with a marker gene-based cell type annotation method, like CellAssign or scType, to classify cells as either tumor or normal.

One option is to do this in conjunction with CopyKat and look for cells where we see agreement between the methods. Another way we could validate this is by performing unsupervised clustering, where we expect tumor cells and normal cells to separate.

Step 3

Next, we want to classify the types of normal cells that are present. Here, we can pull out any cells that are classified as normal cells and use a reference-based method to classify those cells, such as SingleR.

For the reference, we can use a publicly available reference from celldex containing both immune cells and other non-immune stromal cells (fibroblasts, endothelial cells).
The benefit of using celldex is the presence of cell ontology terms included in the reference datasets.

We will then need to validate these findings, which we can do by first curating a list of known markers for the normal cell types in our dataset. Then we can plot the expression of those known markers across the cell types. We expect that the cells assigned to the cell type where that gene is a marker gene would have the highest expression. One thing to note is that there are some cases where we expect to see correlated expression of multiple marker genes, indicating a specific cell type.

Step 4

The last thing we will want to do is identify any tumor cells that can be further classified into known cell states. Ewing sarcoma tumors are classified by the presence of the EWS-FLI1 fusion gene. However, literature has shown tumor cells exist along a mesenchymal trajectory and contain different levels of EWS-FLI1 expression and expression of EWS-FLI1 targets. EWS-low cells mimic a mesenchymal stem cell-like state and display hallmarks of the epithelial-to-mesenchymal (EMT) transition, while EWS-high cells have a more proliferative phenotype.

Multiple publications have highlighted key marker genes to distinguish these cell states and we can use these marker genes to further stratify tumor cells based on EWS-FLI1 states.

One approach would be to first group all tumor cells using the metacellapproach. Then we can use a marker gene-based approach to classify each of the metacells into cell states that have been identified in the literature. Grouping cells into smaller clusters, like metacells decreases the size of the dataset and makes it easier to implement marker gene-based methods like CellAssign or scType.

Existing modules

No, this module is not related to any existing modules.
However, the output of this module may be useful for any future modules that perform downstream analysis of Ewing sarcoma samples.

Input data

This analysis will use the processed SingleCellExperiment objects for SCPCP000015. Depending on the exact methods and tools implemented, we may need to use the processed AnnData objects as well.

Additionally, we will obtain a reference dataset from the celldex package to use for annotating normal cells.

Scientific literature

The following papers may be helpful in curating marker gene lists to use for this analysis:

Other details

Computational resources

Much of the analysis could be done on my local computer, but if we use programs like CellAssign we will need to have access to more computing resources. I understand that CellAssign prefers a GPU and can require a large amount of memory, but these requirements decreaes with smaller numbers of marker genes. Here, we will use CellAssign in stages so that we are only working with small references, rather than trying to use it to assign all possible cell types at once. For this, I plan on using Lightsail for Research.

I also plan on using mostly R for this analysis, with the exception of CellAssign.

Timeline

This analysis will be done in stages, where each stage is, at minimum, a single pull request:

Stage 1: Curate lists of marker genes from the literature
Stage 2: Classify cells as tumor vs. normal in a single sample
Stage 3: Classify normal cells in a single sample
Stage 4: Classify tumor cell states in a single sample
Stage 5: Apply this analysis to 2-3 samples to create a reference dataset
Stage 6: Create the code to run SingleR on all remaining Ewing sarcoma samples using our well-annotated reference dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate cell types in Ewing sarcoma samples on the Portal #292

{{title}}

Replies: 0 comments

Select a reply

Annotate cell types in Ewing sarcoma samples on the Portal #292

allyhawkins Mar 27, 2024 Maintainer

Proposed analysis

Scientific goals

Methods or approach

Step 1

Step 2

Step 3

Step 4

Existing modules

Input data

Scientific literature

Other details

Computational resources

Timeline

Replies: 0 comments

allyhawkins
Mar 27, 2024
Maintainer