Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene Lists from cell type clusters #760

Open
dvenprasad opened this issue Jun 11, 2024 · 3 comments
Open

Gene Lists from cell type clusters #760

dvenprasad opened this issue Jun 11, 2024 · 3 comments
Labels
enhancement New feature or request future thoughts

Comments

@dvenprasad
Copy link
Member

Context

During eval, users looking at the SingleR and CellAssign cell type predictions were confused because their predictions differed so much. They were trying to figure out which one they could trust. Some even said, they would just run it with their cell typing method.

Problem or idea

They said they would like to see top 10/15 genes for each of the cell type cluster on the UMAP, so they can validate the calls made by the cell typing methods.

Solution or next step

Tagging @allyhawkins / @jashapiro for feasibility and go/no go decision

@dvenprasad dvenprasad added enhancement New feature or request future thoughts labels Jun 11, 2024
@allyhawkins
Copy link
Member

They said they would like to see top 10/15 genes for each of the cell type cluster on the UMAP, so they can validate the calls made by the cell typing methods.

I'm not quite sure how much of a priority this should be. Part of including both methods and the report is actually to encourage users to validate the methods before blindly trusting the annotations we provide. The methods we are using are far from perfect at this point.

Also, I'm not sure which top genes they are referring to. Is it the top genes used to identify each of the cell types in the first place? For CellAssign this would mean plotting the marker genes used, which for some cell types is 100s of genes. For SingleR this would mean pulling out the gene lists from the SingleR reference object. Again, the number of genes could be in the 100s.

Alternatively, it could mean taking all of the cells assigned to a specific cell type and performing marker gene analysis to identify the marker genes associated with those cells vs. all other cells in the dataset. Then plotting the top genes from that analysis. Then if you have knowledge about the cell types you could look to see if the genes showing up match up with the marker genes you expect for that cell type. I think this is much more reasonable and feasible to implement, however, it still relies on the user to know what genes are expected and explore the data to validate cell types on their own. I think it's something we could add but I also don't think its a priority.

@allyhawkins
Copy link
Member

@dvenprasad The science team discussed this and we agree that this will require more analysis and exploration and doesn't quite make sense at this point. So for now, we will hold off on implementing this.

@dvenprasad
Copy link
Member Author

Yes, that sounds reasonable to me. These questions came up with the more computationally savvy folks and they also have the skills to extract the genes themselves, so this isn't "hindering" anyone.

To clarify your question :

Also, I'm not sure which top genes they are referring to. Is it the top genes used to identify each of the cell types in the first place?

This came up when they were looking at the cell types on the UMAP. They want to know what are the top genes for the data points in each of the cell types colored on the UMAP. So yes, I think we are talking about the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request future thoughts
Projects
None yet
Development

No branches or pull requests

2 participants