Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_groups #5

Open
johanvonboer opened this issue Aug 31, 2023 · 3 comments
Open

data_groups #5

johanvonboer opened this issue Aug 31, 2023 · 3 comments

Comments

@johanvonboer
Copy link
Contributor

In some cases, notably dendro and ceramics, but also abundance counting datasets, I use a concept I call "data_groups" which are groupings of datasets since each key/value pair in e.g. a dendro analysis is considered a "dataset". It is however, impractical to use it like this, we need something to bind the various datasets belonging to the same sample together somehow, and this is the basic concept of a data_group, if I remember correctly, let's hope I do.

Anyway, the point here is that this needs to be looked over. These "data groups" need to be as consistent as possible across various data types, I am not currently sure they are. They are also currently outputted in parallel with the regular datasets array from the JAS server, which is inefficient since it leads to outputting the same data twice to a high degree. Perhaps it would be possible to create clever bindings/references across the data structure which would avoid this to a large degree?

All of this of course also begs the question; Why do we even need this "data groups" construct to begin with? Can't we just re-arrange our data so that a "dataset" becomes the more intuitive grouping that the "data group" is trying to be? The answer to that is probably yes, but this requires structural changes in the database.

@johanvonboer
Copy link
Contributor Author

johanvonboer commented Sep 1, 2023

I have found out that a 'data_group' can be quite a different thing depending context. Wonderful.

For example, a dendro data_group has 'datasets' attached to it, while a C14 data_group has 'data_points'.

@johanvonboer
Copy link
Contributor Author

Also, have a look at the postProcessSiteData in the MeasuredValuesModule. There we have something we're calling "datasets" inside data groups, but they are something quite different from what we normally call datasets. This should be corrected.

@johanvonboer
Copy link
Contributor Author

We need to keep the data group concept until the database is fixed. But we should perhaps rename it to "dataset groups", because that's what they are. The core issue here is that ceramics and dendro have their data stored in the way that each key/value pair in an analysis on a sample is stored as a separate dataset with just one analysis entity in each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant