(Where to) include visualisations of the standard names on the site #547

sadielbartholomew · 2024-09-27T21:41:14Z

At a Hackathon session of the recent CF Workshop 2024, I began updating the code I created in 2020 to produce some visualisations of the standard names: a plot of the total number of standard names over time and per version, plus word clouds which show sets of names added between specified versions (see #110).

This week in-between other work I managed to finalise it to the point I am happy, in terms of the outputs being good enough and the code maintainable and flexible enough. I now have a branch here (standard-names-vis-cfworkshop24) on my fork with the final code script to generate the plot and wordclouds, with generated outputs. If you follow the link, you can see the code and the generated plots, for reference.

The next steps are:

to get this code added to the repository, so it can be run when suitable e.g. after a new version of the table to update the totals plot and overall wordcloud, and add a new wordcloud to cover only the names added for the new version;
to get the visualisation images included on the site.

I am opening this separate Issue because it is clear there is consensus from #110, and from conversations in person at the workshops etc., that people would like these included on the website somewhere (then in turn also available for people to take and use elsewhere e.g. to promote the CF Conventions overall). But now we need to agree where the best place to put them is, hence this Issue.

Proposal

I think it helps to start off with a proposal, so here is my suggestion of what to do to get my branch, and some or all of the generated images (the word cloud code is quite flexible and can show the names added from any version to any older one), incorporated into the site:

Add the overall word cloud (see https://github.com/sadielbartholomew/cf-convention.github.io/blob/0ebe966ff92065b59c94e0f0f7b02fd032ef4b11/generated_vis_images/sn_wordcloud_versions1_to_current.png) showing all the names in the latest table to the 'Vocabularies' page, just above the 'Documents' heading. Perhaps we could also add it to the site homepage to add a little colour and media to what is otherwise a purely text-based page?
Add the totals plot (see https://github.com/sadielbartholomew/cf-convention.github.io/blob/0ebe966ff92065b59c94e0f0f7b02fd032ef4b11/generated_vis_images/sn_totals_plot.png) to the same page, under a new title of 'History' before or after the 'Discussion' sub-heading, with a few sentences to introduce it summarising that the table has bene under development for a few decades now, or something along those lines.
Create a new page, linked under that 'History' heading with a summarising sentence and after the totals plot, which shows a word cloud covering the new names added to a version for every version from 1 to the current, under a heading of the version name to cover every version in reverse order (so top heading 'Version 86', then 'Version 85' and so on).

And as for the code, I would keep it in its own self-contained directory vis, so that it can be separate from other parts of the repo. (I don't mind what we call it, but chose vis which is descriptive but short and notably avoids British vs. America English spelling decision for 'visualisation').

The text was updated successfully, but these errors were encountered:

JonathanGregory · 2024-09-30T12:46:39Z

Dear @sadielbartholomew

Thanks for your work on this and for this proposal. Some comments.

The word cloud and total plots both have to be quite large to read them clearly, and I'm concerned that they'd take up a dispropotionate amount of space on the Vocabularies page. Perhaps you could put thumbnail versions of them on that page instead, enlarging to full-size when clicked? If they're thumbnails, they could go side-by-side on the same "line". I'd put them after the Discussion section, before Area Types.
I agree that artistry (in addition to the globe) would be welcome on the home page. If we have an image to illustrate vocabulary (again, a thumbnail would be better, I think) we ought also to have an image to illustrate the conventions. Then they could go both in the section of the home page with "Ask a question ... Read the conventions ... section". What do you think?
On the totals plot, could you program "manual" additions to give the reasons for the two large steps, in 2009 and 2018? I think they might be chemical species and radioisotopes, but I'm not sure.
I would suggest that the images and the vis directory should all be kept in the new vocabularies repo. They can be presented on the home page nonetheless.

Best wishes

Jonathan

DocOtak · 2024-09-30T13:27:03Z

Technical question: the images are rather large data wise, can the output be an SVG? and if it can be, does that reduce the file size from the 3+MB? For browsers, SVG is supported by everything. For downloads that people could use in presentations, a different vector format might be preferred (eps?).

This question is from ignorance and not in opposition to doing this in any way, and I want to learn new things: What am I meant to learn from a word cloud like this? Is it more "art" than "information"? What contexts would I pick a word cloud over some other visualization like a bar chart with a categorical axes?

sadielbartholomew · 2024-10-01T17:21:17Z

Thanks for the comments, Jonathan and Andrew. Let me respond to the various components in turn. Firstly, @JonathanGregory:

The word cloud and total plots both have to be quite large to read them clearly, and I'm concerned that they'd take up a dispropotionate amount of space on the Vocabularies page. Perhaps you could put thumbnail versions of them on that page instead, enlarging to full-size when clicked? If they're thumbnails, they could go side-by-side on the same "line". I'd put them after the Discussion section, before Area Types.

I am happy to go with your thumbnail idea, it seems sensible for the reasons you outline. I should add, I can easily edit the font size on the plot and on the wordclouds (i.e. for the latter, set a minimum text size so that overall it is easier to read at a given size), which could be an alternative approach to keeping things readable without taking up too much space, though obviously the totals plot is quite detailed so I don't think tweaking the font sizes is a good idea in that case. (I was already careful with my selections to keep the plot balanced and readable given a fairly large viewing size.)

So unless anyone states opinion otherwise, let's do the thumbnails solution.

I agree that artistry (in addition to the globe) would be welcome on the home page. If we have an image to illustrate vocabulary (again, a thumbnail would be better, I think) we ought also to have an image to illustrate the conventions. Then they could go both in the section of the home page with "Ask a question ... Read the conventions ... section". What do you think?

I agree, this would be optimal and provide good balance.

That said, it is not clear to me what a good image would be to represent the conventions as a whole - so we should try to ask around to get ideas. I am happy to open a separate Discussion Thread or Issue on this topic of 'cover/featured' images for the homepage, with the proposal of using the full-table standard name word cloud as the vocabularies one, and asking for ideas regarding the conventions one - how does that sound?

On the totals plot, could you program "manual" additions to give the reasons for the two large steps, in 2009 and 2018? I think they might be chemical species and radioisotopes, but I'm not sure.

I am not sure quite what you mean by manual additions, but guess you mean something along the lines of adding some logic to plot annotations to put some text and perhaps a corresponding arrow for the pair of cases to briefly explain why so many names relative to other versions were added? Matplotlib is very flexible as you may well know, and I am happy to add something in to indicate the jumps in number at those two version, like you suggest.

I think they might be chemical species and radioisotopes, but I'm not sure.

Yes this seems to be the case, as show by the wordclouds: at version 12 a whole batch of chemistry-based names went in, bulked up by 'moles' and 'mass' 'content/concentration' names for sets of chemical species (see: https://github.com/sadielbartholomew/cf-convention.github.io/blob/0ebe966ff92065b59c94e0f0f7b02fd032ef4b11/generated_vis_images/sn_wordcloud_versions11_to_12.png) and at version 49 the radioactivity names were initiated and hence there are hundreds to cover all of the relevant chemical species (see: https://github.com/sadielbartholomew/cf-convention.github.io/blob/0ebe966ff92065b59c94e0f0f7b02fd032ef4b11/generated_vis_images/sn_wordcloud_versions48_to_49.png).

I would suggest that the images and the vis directory should all be kept in the new vocabularies repo. They can be presented on the home page nonetheless.

OK, sure, happy to add them there. Would you suggest therefore that the underlying script, which generates them, goes in that repository too?

JonathanGregory · 2024-10-06T22:24:36Z

Dear @sadielbartholomew

Yes, by "manual additions" I meant something hardcoded in the figure-generating program to label particular features, just as you say.

Yes, I think that the script to produce the diagrams from the standard names files would be appropriate to put in the vocabularies repo, unless there's some reason against that which occurs to @japamment @feggleton @efisher008.

Perhaps the conventions could be represented by something like .

Best wishes

Jonathan

sadielbartholomew added the enhancement Enhancements to the website's presentation or contents label Sep 27, 2024

sadielbartholomew mentioned this issue Sep 27, 2024

Visualisations for standard names data (includes POC)? #110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Where to) include visualisations of the standard names on the site #547

(Where to) include visualisations of the standard names on the site #547

sadielbartholomew commented Sep 27, 2024 •

edited

Loading

JonathanGregory commented Sep 30, 2024

DocOtak commented Sep 30, 2024

sadielbartholomew commented Oct 1, 2024

JonathanGregory commented Oct 6, 2024

(Where to) include visualisations of the standard names on the site #547

(Where to) include visualisations of the standard names on the site #547

Comments

sadielbartholomew commented Sep 27, 2024 • edited Loading

Proposal

JonathanGregory commented Sep 30, 2024

DocOtak commented Sep 30, 2024

sadielbartholomew commented Oct 1, 2024

JonathanGregory commented Oct 6, 2024

sadielbartholomew commented Sep 27, 2024 •

edited

Loading