Gene Lists from cell type clusters #760

dvenprasad · 2024-06-11T20:25:06Z

Context

During eval, users looking at the SingleR and CellAssign cell type predictions were confused because their predictions differed so much. They were trying to figure out which one they could trust. Some even said, they would just run it with their cell typing method.

Problem or idea

They said they would like to see top 10/15 genes for each of the cell type cluster on the UMAP, so they can validate the calls made by the cell typing methods.

Solution or next step

Tagging @allyhawkins / @jashapiro for feasibility and go/no go decision

allyhawkins · 2024-06-14T14:33:53Z

They said they would like to see top 10/15 genes for each of the cell type cluster on the UMAP, so they can validate the calls made by the cell typing methods.

I'm not quite sure how much of a priority this should be. Part of including both methods and the report is actually to encourage users to validate the methods before blindly trusting the annotations we provide. The methods we are using are far from perfect at this point.

Also, I'm not sure which top genes they are referring to. Is it the top genes used to identify each of the cell types in the first place? For CellAssign this would mean plotting the marker genes used, which for some cell types is 100s of genes. For SingleR this would mean pulling out the gene lists from the SingleR reference object. Again, the number of genes could be in the 100s.

Alternatively, it could mean taking all of the cells assigned to a specific cell type and performing marker gene analysis to identify the marker genes associated with those cells vs. all other cells in the dataset. Then plotting the top genes from that analysis. Then if you have knowledge about the cell types you could look to see if the genes showing up match up with the marker genes you expect for that cell type. I think this is much more reasonable and feasible to implement, however, it still relies on the user to know what genes are expected and explore the data to validate cell types on their own. I think it's something we could add but I also don't think its a priority.

allyhawkins · 2024-06-17T18:33:15Z

@dvenprasad The science team discussed this and we agree that this will require more analysis and exploration and doesn't quite make sense at this point. So for now, we will hold off on implementing this.

dvenprasad · 2024-06-18T14:23:09Z

Yes, that sounds reasonable to me. These questions came up with the more computationally savvy folks and they also have the skills to extract the genes themselves, so this isn't "hindering" anyone.

To clarify your question :

Also, I'm not sure which top genes they are referring to. Is it the top genes used to identify each of the cell types in the first place?

This came up when they were looking at the cell types on the UMAP. They want to know what are the top genes for the data points in each of the cell types colored on the UMAP. So yes, I think we are talking about the same thing.

dvenprasad added enhancement New feature or request future thoughts labels Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene Lists from cell type clusters #760

Gene Lists from cell type clusters #760

dvenprasad commented Jun 11, 2024

allyhawkins commented Jun 14, 2024

allyhawkins commented Jun 17, 2024

dvenprasad commented Jun 18, 2024

Gene Lists from cell type clusters #760

Gene Lists from cell type clusters #760

Comments

dvenprasad commented Jun 11, 2024

Context

Problem or idea

Solution or next step

allyhawkins commented Jun 14, 2024

allyhawkins commented Jun 17, 2024

dvenprasad commented Jun 18, 2024