Add SingleR delta median plot to QC report #432

sjspielman · 2023-08-31T19:25:40Z

Stacked on #427
Closes #410

This PR adds the delta median plot for SingleR. Implementation notes:

I refactored (bonus this is a pun 🎉) some of the earlier QC report code since we'll want to use the annotation factor order for this plot as well. I removed code that sets the factor order from the celltype tables, and instead added a helper function to accomplish this while setting up celltypes_df. This way, all downstream code that uses this data frame inherits these levels!
I made this overall section generally about assessing cell types, and place it immediately after the cell type tables, and before the section with umaps + heatmaps. This way, we get a sense of reliability before diving into plots.
I'm using a sina plot here, so I had to set a seed. For now, I just put a seed into qc_report.rmd, but it might be preferable to use whatever seed is used for the overall workflow and pass that in as a parameter to the report? I'm not sure how much this really matters though for this situation.
I wrapped the labels that are over 30 characters, and this seems to really help with plot layout.
Let me know any feedback about my description of delta median too!

…plifies downstream code which often needs those levels

allyhawkins · 2023-09-01T14:42:50Z

Thank you for working on this and apologies if this is really annoying, but... I do wonder if we should be doing something similar to what I did with the ridge plot for CellAssign. We would plot just the score and label by the top score and then everything else and look at the separation. I don't know if it would work quite as well because it's a score and not a probability, but I think it's worth a shot.

sjspielman · 2023-09-01T14:59:42Z

Thank you for working on this and apologies if this is really annoying, but... I do wonder if we should be doing something similar to what I did with the AlexsLemonade/sc-data-integration#231.

Not annoying at all! It's been a fun week spending lots of time plotting :) Let's see how it looks..

sjspielman · 2023-09-01T18:31:03Z

Here's a super quick side-by-side (well, stacked) comparison of sina vs ridgeplot, just to get a sense:

One one hand, I do like the ridgeplot more, but on the other hand, I'm not entirely it's usable (this may go for CellAssign as well...!) - for any categories that have <= 2 cells, nothing gets drawn and that's just how the algorithm works; >=3 points are needed to estimate the distribution.

I wonder if there's a good middle ground we could achieve here, since I really do like the ridgeplot more... Would it it make sense to show both sina + ridgeplot, and/or only show cells types with >=3 cells for the ridgeplot (we'd add text explaining this).
Very curious to hear your thoughts!

Edit - also, I wonder if it makes sense to show unknown cell types in this plot? Is it meaningful to show "confidence" for something that was unclassified? I'm starting to think we should exclude those cells?

jaclyn-taroni · 2023-09-01T18:37:32Z

I do wonder if we should be doing something similar to what I did with the ridge plot for CellAssign. We would plot just the score and label by the top score and then everything else and look at the separation. I don't know if it would work quite as well because it's a score and not a probability, but I think it's worth a shot.

My interpretation of this comment was to plot the scores themselves, not the median delta values. Perhaps I got that wrong, but if we're going to plot the median delta, it's helpful to use a completely different style of plot IMO so folks know they're looking at something quite different.

sjspielman · 2023-09-01T18:43:22Z

My interpretation of this comment was to plot the scores themselves, not the median delta values.

Ah no, I think you are right! Let's see..

allyhawkins · 2023-09-01T18:55:18Z

My interpretation of this comment was to plot the scores themselves, not the median delta values.

Ah no, I think you are right! Let's see..

Yes Jackie is correct. I was thinking we would plot the actual scores themselves and then create a plot similar to the one below.

sjspielman · 2023-09-01T19:25:52Z

I think we want to be mindful of overly-discussing strategies in this PR, mostly because as comments build up things will be become harder to track & review. So, I'm going to open an issue that we can use to discuss visualization strategies, and then we can come back here to continue the PR.

Edit - issue for discussion opened in #434

…all plot paragraph

sjspielman · 2023-09-06T14:05:41Z

As discussed in #434, I've updated this to still visualize delta_median, but highlighting points that were pruned out. I've updated text in the plot preamble to match what the plot currently shows. Note that this involved a decent bit of wrangling, since we need to plot based on the full labels, not the pruned labels, in order to color by whether a cell was pruned or not. The points are pretty small and possibly tricky to see, but I think this is inevitable when visualizing this many data points (or more!).

qc_report.html.zip

sjspielman · 2023-09-06T14:06:22Z

@allyhawkins, I can't re-request review here since only comments were left before, so this is my re-request ping :)

allyhawkins

This mostly looks good, I just had one clarifying question and a suggestion about adding a median point.

allyhawkins · 2023-09-06T17:21:54Z

templates/qc_report/celltypes_qc.rmd

+new_levels <- levels(delta_median_df$celltype)
+new_levels <- new_levels[-length(new_levels)]


I'm a little confused what you are doing here? Do you need both or can you just use the first line without the second line since Unknown cell type shouldn't be here?

Yeah, it's confusing! I realize it can be simplified too. I will add some comments. Here's what's happening:

Although there is no longer an "Unknown cell type" value in the data, that level still exists in the delta_median_df$celltype variable

This doesn't matter for plotting though! One could proceed to just plot, and the x-axis order would be fine. But, it does matter if I want to wrap the labels, since cell type names are very long.

So, this code was setting up to wrap the labels while also getting rid of the Unknown level.

Looking again with fresh eyes, we really don't need to get rid of the Unknown level though! So, I will simplify to this:

# add column with ordered levels with wrapped labels for visualization delta_median_df$annotation_wrapped <- factor( delta_median_df$celltype, levels = levels(delta_median_df$celltype), labels = stringr::str_wrap(levels(delta_median_df$celltype), 30) )```

allyhawkins · 2023-09-06T17:23:59Z

templates/qc_report/celltypes_qc.rmd

+    legend.title = element_text(size = rel(0.75)),
+    legend.text = element_text(size = rel(0.75)),
+    legend.position = "bottom"
+  )


I think we want to add a median point here too? I'm not sure what color though since red is being used for the cells that were pruned.

I think blue would probably be fine for median. One question though is how this stat should deal with the current grouping. I feel like it would be best if the median only reflected the black points? Any thoughts?

Also, do you think it would be too busy to also make the red points a different shape, like a diamond or something? It might make them easier to spot?

I feel like it would be best if the median only reflected the black points? Any thoughts?

This makes sense to me.

Also, do you think it would be too busy to also make the red points a different shape, like a diamond or something? It might make them easier to spot?

I don't think I would make them a different color and a different shape, that feels like it might be a lot. I might make the median a different shape or a line though.

…an +iqr

…requires scpcaTools changes

sjspielman · 2023-09-06T18:52:10Z

I've updated the plot as discussed and simplified that factor code, so this is ready for another look!
qc_report.html.zip

One important bit: In 601087b, I made some updates which could be reverted. This commit sets things up if we want to pass in the workflow seed to the QC report, for the sina plot layout. But for this to work, we'd need some small changes over in scpcaTools::generate_qc_report().
If we want to take this route for the seed then, two ways forward:

make scpcaTools compatible, then merge this PR
revert that commit, hardcode the seed for now in this PR. Later, we could circle back with a new PR to set the seed from the workflow seed, after making scpcaTools compatible

jashapiro · 2023-09-06T19:19:13Z

I've updated the plot as discussed and simplified that factor code, so this is ready for another look! qc_report.html.zip

One important bit: In 601087b, I made some updates which could be reverted. This commit sets things up if we want to pass in the workflow seed to the QC report, for the sina plot layout. But for this to work, we'd need some small changes over in scpcaTools::generate_qc_report(). If we want to take this route for the seed then, two ways forward:

make scpcaTools compatible, then merge this PR

revert that commit, hardcode the seed for now in this PR. Later, we could circle back with a new PR to set the seed from the workflow seed, after making scpcaTools compatible

You should be able to use the extra_params argument to scpcaTools::generate_qc_report() to pass in the seed. It is there just so we don't need to update the function every time we make changes to the template!

allyhawkins

This looks good to me. I will hold off on approving though until we had in the seed argument in scpcaTools.

bin/sce_qc_report.R

allyhawkins · 2023-09-06T19:16:59Z

templates/qc_report/celltypes_qc.rmd

-  )
-}
-
+    prepare_annotation_values(cellassign_celltype_annotation)


I think this needs to be assigned to a variable?

It is! see line 79 (not part of diff) too :)

allyhawkins · 2023-09-06T19:18:55Z

templates/qc_report/celltypes_qc.rmd

+delta_median_df <- tibble::tibble(
+  delta_median = rowMaxs(singler_scores) - rowMedians(singler_scores),
+  # Need to grab the non-pruned label for this plot
+  ontology = metadata(processed_sce)$singler_result$labels,


is this the ontology id or the ontology id label?

These are the labels that were actually assigned which are ontology ids. I needed to grab this vector since we don't want the pruned labels for this plot. But, then I need to make sure we don't actually use ontology ids in the plot, but the actually cell names.

All that said, I realize I need to tweak some things here to make sure this works if, for some reason, ontology ids weren't used for singler annotation..

Made some changes to this end in 6ba6351 (plus bonus forcats cleanup code from @jashapiro)

bin/sce_qc_report.R

Co-authored-by: Joshua Shapiro <[email protected]>

templates/qc_report/celltypes_qc.rmd

…ere NOT used for annotation

allyhawkins

LGTM 🚀

sjspielman added 3 commits August 31, 2023 14:23

Update earlier code to make factors from the very beginning. This sim…

921090f

…plifies downstream code which often needs those levels

add seed, here for now

2759eee

add section for delta median plot

8a83b70

sjspielman requested a review from allyhawkins August 31, 2023 19:25

Base automatically changed from sjspielman/409-qc-celltypes-umaps to development September 1, 2023 12:38

Merge branch 'development' into sjspielman/410-qc-singler-median-delta

92f95e3

sjspielman mentioned this pull request Sep 1, 2023

Heatmap for comparing annotations #433

Merged

add a non-styled ridgeplot

6ed692b

sjspielman mentioned this pull request Sep 1, 2023

Discussion: visualizing SingleR scores in the QC report #434

Closed

sjspielman added 2 commits September 6, 2023 09:59

Update singler delta median plot to color by pruning, and update over…

ec33c95

…all plot paragraph

update text and turn off messages to hide the joining output

5af423e

fix extra words

08e6a07

sjspielman mentioned this pull request Sep 6, 2023

Add CellAssign ridgeplot #437

Merged

allyhawkins reviewed Sep 6, 2023

View reviewed changes

sjspielman added 3 commits September 6, 2023 14:06

Simplify levels code for label wrapping

972a253

Update plot: use blue pruned points and red crossbar overlay for medi…

440adc5

…an +iqr

Add code we would need if we want to pass in the workflow seed; this …

601087b

…requires scpcaTools changes

sjspielman requested a review from allyhawkins September 6, 2023 18:52

allyhawkins reviewed Sep 6, 2023

View reviewed changes

jashapiro reviewed Sep 6, 2023

View reviewed changes

bin/sce_qc_report.R Outdated Show resolved Hide resolved

Update bin/sce_qc_report.R

ba1d54d

Co-authored-by: Joshua Shapiro <[email protected]>

jashapiro reviewed Sep 6, 2023

View reviewed changes

templates/qc_report/celltypes_qc.rmd Outdated Show resolved Hide resolved

Add that new forcats code, and cover the condition where ontologies w…

6ba6351

…ere NOT used for annotation

sjspielman requested a review from allyhawkins September 6, 2023 19:46

allyhawkins approved these changes Sep 6, 2023

View reviewed changes

sjspielman merged commit 4efef62 into development Sep 6, 2023
3 checks passed

sjspielman deleted the sjspielman/410-qc-singler-median-delta branch September 6, 2023 21:26

sjspielman mentioned this pull request Sep 6, 2023

Add plot for SingleR delta median score to the cell type report #410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SingleR delta median plot to QC report #432

Add SingleR delta median plot to QC report #432

sjspielman commented Aug 31, 2023

allyhawkins commented Sep 1, 2023

sjspielman commented Sep 1, 2023 •

edited

Loading

sjspielman commented Sep 1, 2023 •

edited

Loading

jaclyn-taroni commented Sep 1, 2023

sjspielman commented Sep 1, 2023

allyhawkins commented Sep 1, 2023

sjspielman commented Sep 1, 2023 •

edited

Loading

sjspielman commented Sep 6, 2023

sjspielman commented Sep 6, 2023

allyhawkins left a comment

allyhawkins Sep 6, 2023

sjspielman Sep 6, 2023

allyhawkins Sep 6, 2023

sjspielman Sep 6, 2023

allyhawkins Sep 6, 2023

sjspielman commented Sep 6, 2023

jashapiro commented Sep 6, 2023

allyhawkins left a comment

allyhawkins Sep 6, 2023

sjspielman Sep 6, 2023 •

edited

Loading

allyhawkins Sep 6, 2023

sjspielman Sep 6, 2023

sjspielman Sep 6, 2023

allyhawkins left a comment

		new_levels <- levels(delta_median_df$celltype)
		new_levels <- new_levels[-length(new_levels)]

Add SingleR delta median plot to QC report #432

Add SingleR delta median plot to QC report #432

Conversation

sjspielman commented Aug 31, 2023

allyhawkins commented Sep 1, 2023

sjspielman commented Sep 1, 2023 • edited Loading

sjspielman commented Sep 1, 2023 • edited Loading

jaclyn-taroni commented Sep 1, 2023

sjspielman commented Sep 1, 2023

allyhawkins commented Sep 1, 2023

sjspielman commented Sep 1, 2023 • edited Loading

sjspielman commented Sep 6, 2023

sjspielman commented Sep 6, 2023

allyhawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman commented Sep 6, 2023

jashapiro commented Sep 6, 2023

allyhawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjspielman Sep 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allyhawkins left a comment

Choose a reason for hiding this comment

sjspielman commented Sep 1, 2023 •

edited

Loading

sjspielman commented Sep 1, 2023 •

edited

Loading

sjspielman commented Sep 1, 2023 •

edited

Loading

sjspielman Sep 6, 2023 •

edited

Loading