-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Where to) include visualisations of the standard names on the site #547
Comments
Dear @sadielbartholomew Thanks for your work on this and for this proposal. Some comments.
Best wishes Jonathan |
Technical question: the images are rather large data wise, can the output be an SVG? and if it can be, does that reduce the file size from the 3+MB? For browsers, SVG is supported by everything. For downloads that people could use in presentations, a different vector format might be preferred (eps?). This question is from ignorance and not in opposition to doing this in any way, and I want to learn new things: What am I meant to learn from a word cloud like this? Is it more "art" than "information"? What contexts would I pick a word cloud over some other visualization like a bar chart with a categorical axes? |
Thanks for the comments, Jonathan and Andrew. Let me respond to the various components in turn. Firstly, @JonathanGregory:
I am happy to go with your thumbnail idea, it seems sensible for the reasons you outline. I should add, I can easily edit the font size on the plot and on the wordclouds (i.e. for the latter, set a minimum text size so that overall it is easier to read at a given size), which could be an alternative approach to keeping things readable without taking up too much space, though obviously the totals plot is quite detailed so I don't think tweaking the font sizes is a good idea in that case. (I was already careful with my selections to keep the plot balanced and readable given a fairly large viewing size.) So unless anyone states opinion otherwise, let's do the thumbnails solution.
I agree, this would be optimal and provide good balance. That said, it is not clear to me what a good image would be to represent the conventions as a whole - so we should try to ask around to get ideas. I am happy to open a separate Discussion Thread or Issue on this topic of 'cover/featured' images for the homepage, with the proposal of using the full-table standard name word cloud as the vocabularies one, and asking for ideas regarding the conventions one - how does that sound?
I am not sure quite what you mean by manual additions, but guess you mean something along the lines of adding some logic to plot annotations to put some text and perhaps a corresponding arrow for the pair of cases to briefly explain why so many names relative to other versions were added? Matplotlib is very flexible as you may well know, and I am happy to add something in to indicate the jumps in number at those two version, like you suggest.
Yes this seems to be the case, as show by the wordclouds: at version 12 a whole batch of chemistry-based names went in, bulked up by 'moles' and 'mass' 'content/concentration' names for sets of chemical species (see: https://github.com/sadielbartholomew/cf-convention.github.io/blob/0ebe966ff92065b59c94e0f0f7b02fd032ef4b11/generated_vis_images/sn_wordcloud_versions11_to_12.png) and at version 49 the radioactivity names were initiated and hence there are hundreds to cover all of the relevant chemical species (see: https://github.com/sadielbartholomew/cf-convention.github.io/blob/0ebe966ff92065b59c94e0f0f7b02fd032ef4b11/generated_vis_images/sn_wordcloud_versions48_to_49.png).
OK, sure, happy to add them there. Would you suggest therefore that the underlying script, which generates them, goes in that repository too? |
Dear @sadielbartholomew Yes, by "manual additions" I meant something hardcoded in the figure-generating program to label particular features, just as you say. Yes, I think that the script to produce the diagrams from the standard names files would be appropriate to put in the Perhaps the conventions could be represented by something like . Best wishes Jonathan |
At a Hackathon session of the recent CF Workshop 2024, I began updating the code I created in 2020 to produce some visualisations of the standard names: a plot of the total number of standard names over time and per version, plus word clouds which show sets of names added between specified versions (see #110).
This week in-between other work I managed to finalise it to the point I am happy, in terms of the outputs being good enough and the code maintainable and flexible enough. I now have a branch here (
standard-names-vis-cfworkshop24
) on my fork with the final code script to generate the plot and wordclouds, with generated outputs. If you follow the link, you can see the code and the generated plots, for reference.The next steps are:
I am opening this separate Issue because it is clear there is consensus from #110, and from conversations in person at the workshops etc., that people would like these included on the website somewhere (then in turn also available for people to take and use elsewhere e.g. to promote the CF Conventions overall). But now we need to agree where the best place to put them is, hence this Issue.
Proposal
I think it helps to start off with a proposal, so here is my suggestion of what to do to get my branch, and some or all of the generated images (the word cloud code is quite flexible and can show the names added from any version to any older one), incorporated into the site:
And as for the code, I would keep it in its own self-contained directory
vis
, so that it can be separate from other parts of the repo. (I don't mind what we call it, but chosevis
which is descriptive but short and notably avoids British vs. America English spelling decision for 'visualisation').The text was updated successfully, but these errors were encountered: