Set the `name` attribute for derived data variables #321

niksirbi · 2024-10-10T09:48:52Z

Describe the bug

When assigning a derived array (e.g. velocity) as a dataset variable, it's name attribute is automatically set to the name we assign to that variable.

However, if we keep it as a standalone array, it's name stays the same as that of the input array from which it was derived.

To Reproduce

>>> from movement import sample_data
>>> from movement.analysis import kinematics as kin
>>>
>>> ds["velocity"] = kin.compute_velocity(ds.position)
>>> ds.velocity.name
'velocity'
>>> velocity = compute_velocity(ds.position)
>>> velocity.name
'position'

Expected behaviour
Both methods in the above example should return 'velocity'. This also affects every other derived variable, such as displacement, acceleration, head_direction etc.

The matter can me easily fixed by setting the name attribute inside the function that computes the variable.
Having an appropriate name is quite handy for printing, plotting, etc.

The text was updated successfully, but these errors were encountered:

sfmig · 2024-10-18T11:44:46Z

From dev meeting today:

we can check what is xarray doing, and do something similar?
or should we simply set it to empty? (does a variable need a name attribute? this would mean less maintenance)

niksirbi · 2024-10-18T17:23:26Z

Arguments in favour of setting an appropriate `name` for every derived variable:

The name will appear when using built-in xarray plots
We will reduce probability of conflicts when using xr.merge (merging objects that have the same name may cause issues).
For some built-in saving functions, like to_netcdf() (an .h5-based format) the variable names in the saved file will be meanigful (though we don't really use that file format)
We get the chance to nudge users towards "standardised" naming conventions (users may be inclined to stick with these when adding arrays to datasets)

Arguments in favour of setting an empty `name` for every derived variable:

users have full freedom to set that to whatever they like later, we make no assumed choices for them (though they could always override our choice anyway)
we reduce the risk of setting the "wrong" name, or these names becoming outdated after downstream operations. No plots with inadvertently wrong names.
we don't have to decide what that name should be when writing a new function

Full disclosure:
I personally currently favour the first approach, but both are better than the status quo.

niksirbi added the bug Something isn't working label Oct 10, 2024

github-project-automation bot added this to movement progress tracker Oct 10, 2024

github-project-automation bot moved this to 🤔 Triage in movement progress tracker Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the `name` attribute for derived data variables #321

Set the `name` attribute for derived data variables #321

niksirbi commented Oct 10, 2024

sfmig commented Oct 18, 2024

niksirbi commented Oct 18, 2024 •

edited

Loading

Set the name attribute for derived data variables #321

Set the name attribute for derived data variables #321

Comments

niksirbi commented Oct 10, 2024

sfmig commented Oct 18, 2024

niksirbi commented Oct 18, 2024 • edited Loading

Arguments in favour of setting an appropriate name for every derived variable:

Arguments in favour of setting an empty name for every derived variable:

Set the `name` attribute for derived data variables #321

Set the `name` attribute for derived data variables #321

niksirbi commented Oct 18, 2024 •

edited

Loading

Arguments in favour of setting an appropriate `name` for every derived variable:

Arguments in favour of setting an empty `name` for every derived variable: