-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SingleR delta median plot to QC report #432
Add SingleR delta median plot to QC report #432
Conversation
…plifies downstream code which often needs those levels
Thank you for working on this and apologies if this is really annoying, but... I do wonder if we should be doing something similar to what I did with the ridge plot for CellAssign. We would plot just the score and label by the top score and then everything else and look at the separation. I don't know if it would work quite as well because it's a score and not a probability, but I think it's worth a shot. |
Not annoying at all! It's been a fun week spending lots of time plotting :) Let's see how it looks.. |
My interpretation of this comment was to plot the scores themselves, not the median delta values. Perhaps I got that wrong, but if we're going to plot the median delta, it's helpful to use a completely different style of plot IMO so folks know they're looking at something quite different. |
Ah no, I think you are right! Let's see.. |
I think we want to be mindful of overly-discussing strategies in this PR, mostly because as comments build up things will be become harder to track & review. So, I'm going to open an issue that we can use to discuss visualization strategies, and then we can come back here to continue the PR. Edit - issue for discussion opened in #434 |
As discussed in #434, I've updated this to still visualize |
@allyhawkins, I can't re-request review here since only comments were left before, so this is my re-request ping :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good, I just had one clarifying question and a suggestion about adding a median point.
templates/qc_report/celltypes_qc.rmd
Outdated
new_levels <- levels(delta_median_df$celltype) | ||
new_levels <- new_levels[-length(new_levels)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused what you are doing here? Do you need both or can you just use the first line without the second line since Unknown cell type
shouldn't be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's confusing! I realize it can be simplified too. I will add some comments. Here's what's happening:
- Although there is no longer an
"Unknown cell type"
value in the data, that level still exists in thedelta_median_df$celltype
variable - This doesn't matter for plotting though! One could proceed to just plot, and the x-axis order would be fine. But, it does matter if I want to wrap the labels, since cell type names are very long.
- So, this code was setting up to wrap the labels while also getting rid of the
Unknown
level. - Looking again with fresh eyes, we really don't need to get rid of the
Unknown
level though! So, I will simplify to this:
# add column with ordered levels with wrapped labels for visualization
delta_median_df$annotation_wrapped <- factor(
delta_median_df$celltype,
levels = levels(delta_median_df$celltype),
labels = stringr::str_wrap(levels(delta_median_df$celltype), 30)
)```
legend.title = element_text(size = rel(0.75)), | ||
legend.text = element_text(size = rel(0.75)), | ||
legend.position = "bottom" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to add a median point here too? I'm not sure what color though since red
is being used for the cells that were pruned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think blue would probably be fine for median. One question though is how this stat should deal with the current grouping. I feel like it would be best if the median only reflected the black points? Any thoughts?
Also, do you think it would be too busy to also make the red points a different shape, like a diamond or something? It might make them easier to spot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it would be best if the median only reflected the black points? Any thoughts?
This makes sense to me.
Also, do you think it would be too busy to also make the red points a different shape, like a diamond or something? It might make them easier to spot?
I don't think I would make them a different color and a different shape, that feels like it might be a lot. I might make the median a different shape or a line though.
I've updated the plot as discussed and simplified that factor code, so this is ready for another look! One important bit: In 601087b, I made some updates which could be reverted. This commit sets things up if we want to pass in the workflow seed to the QC report, for the sina plot layout. But for this to work, we'd need some small changes over in
|
You should be able to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I will hold off on approving though until we had in the seed argument in scpcaTools.
) | ||
} | ||
|
||
prepare_annotation_values(cellassign_celltype_annotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be assigned to a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is! see line 79 (not part of diff) too :)
templates/qc_report/celltypes_qc.rmd
Outdated
delta_median_df <- tibble::tibble( | ||
delta_median = rowMaxs(singler_scores) - rowMedians(singler_scores), | ||
# Need to grab the non-pruned label for this plot | ||
ontology = metadata(processed_sce)$singler_result$labels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the ontology id or the ontology id label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the labels that were actually assigned which are ontology ids. I needed to grab this vector since we don't want the pruned labels for this plot. But, then I need to make sure we don't actually use ontology ids in the plot, but the actually cell names.
All that said, I realize I need to tweak some things here to make sure this works if, for some reason, ontology ids weren't used for singler annotation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some changes to this end in 6ba6351 (plus bonus forcats
cleanup code from @jashapiro)
Co-authored-by: Joshua Shapiro <[email protected]>
…ere NOT used for annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Stacked on #427
Closes #410
This PR adds the delta median plot for
SingleR
. Implementation notes:celltypes_df
. This way, all downstream code that uses this data frame inherits these levels!qc_report.rmd
, but it might be preferable to use whatever seed is used for the overall workflow and pass that in as a parameter to the report? I'm not sure how much this really matters though for this situation.Report: qc_report.html.zip