-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New types of datasets supported for Delmic HDF5 format #328
base: main
Are you sure you want to change the base?
Conversation
New supported formats: intensity, hyperspectral, angle-resolved, E-k, time-resolved.
def get_unit_prefix(number): | ||
"""Return the SI prefix for the given number based on its magnitude.""" | ||
if 1e-15 < np.abs(number) < 1e-12: | ||
prefix = "f" |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
redefined
This assignment to 'prefix' is unnecessary as it is
redefined
This assignment to 'prefix' is unnecessary as it is
redefined
This assignment to 'prefix' is unnecessary as it is
redefined
This assignment to 'prefix' is unnecessary as it is
redefined
|
||
# Ensure SE Image shape is compatible by taking the last two dimensions | ||
if len(SE_Image.shape) >= 2: | ||
Y, X = SE_Image.shape[-2:] |
Check notice
Code scanning / CodeQL
Unused local variable Note
|
||
# Ensure SE Image shape is compatible by taking the last two dimensions | ||
if len(SE_Image.shape) >= 2: | ||
Y, X = SE_Image.shape[-2:] |
Check notice
Code scanning / CodeQL
Unused local variable Note
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #328 +/- ##
==========================================
- Coverage 87.84% 87.02% -0.83%
==========================================
Files 85 85
Lines 11180 11534 +354
Branches 2280 2378 +98
==========================================
+ Hits 9821 10037 +216
- Misses 860 945 +85
- Partials 499 552 +53 ☔ View full report in Codecov by Sentry. |
pre-commit.ci autofix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @noemiebonnet for the substantial progress! I left a few comments from a first browsing over the code, but did not yet have time for a closer look. Please include the standard tracking list in the initial comment to get an overview of how far the PR is.
There are still a number of lines uncovered by tests, as commented by codecov
. If lines should explicitly be ignored in the coverage test, one can add the comment # pragma: no cover
at the end of the line. I see that most uncovered lines concern warnings or errors that might surface during file reading. It might be hard to test for those if one has only proper test files. What is our current best-practice in that case @ericpre? (Note that you can also get inspiration on such cases from other files readers recently implemented, such as Horiba JobinYvon or Hamamatsu).
A single spectrum is still loaded as |
the associated image type. | ||
""" | ||
if Acq is None: | ||
raise TypeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, all the various loading errors are giving codecov warnings, because they are not included in the tests. In hdf5, it should be rather straight forward to create some minimal test files for each case where certain elements are deleted from the file to test each of these cases. An idea could also be to use hdf5 files from some other rsciio plugins that do not match the delmic specifications to test some of the errors.
|
||
# Check if Image has 5 dimensions | ||
if Image.ndim != 5: | ||
raise ValueError("The input 'Image' must be a 5D h5 dataset.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this check and the one above repeated in all different load functions. Are all delmic hdf5 files 5D datasets? Then it could make sense to check for this once on a higher level instead of having the same checks in every datatype function.
or Image.shape[3] < 1 | ||
or Image.shape[4] < 1 | ||
): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example for testing errors, here you could use the file from a different type of data and change the img_type
to get a case that triggers this error.
Scale = np.array(ImgData.get(scale_key)) | ||
|
||
if axis_name in ["C", "T"]: | ||
scale_value = np.mean( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other readers have a parameter to choose whether to read such axes as a UniformDataAxis
by calculating the mean scale
, or as a non-uniform DataAxis
by just taking exactly the vector that is in the file.
see e.g.:
rosettasciio/rsciio/jobinyvon/_api.py
Line 684 in 1274ed5
use_uniform_signal_axis : bool, default=False |
metadata["Signal"]["quantity"] = "Intensity (counts)" # Default value | ||
|
||
if "signal_type" in Acq: | ||
metadata["Signal"]["signal_type"] = Acq["signal_type"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal_type
is used to select the right signal class
. If LumiSpy is installed, it should default to the correct signal classes by choosing Luminescence
for anything where the only signal axis is wavelength (spectra or hyperspectral maps), LumiTransient
if the only signal axis is time and LumiTransientSpectrum
for streak camera images.
See e.g.
rosettasciio/rsciio/jobinyvon/_api.py
Line 85 in 1274ed5
if self._lumispy_installed: |
rosettasciio/rsciio/hamamatsu/_api.py
Line 362 in 332412f
signal["signal_type"] = "LumiTransientSpectrum" # pragma: no cover |
return metadata | ||
|
||
|
||
def make_original_metadata(Acq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original_metadata
should be 1:1 the metadata tree from the file and can be automatically parsed for hdf5 files, see e.g.
rosettasciio/rsciio/nexus/_api.py
Line 828 in 332412f
def _load_metadata(group, lazy=False, skip_array_metadata=False): |
metadata
should contain a curated subset of metadata with predefined key names that follows the LumiSpy/HyperSpy conventions:
https://docs.lumispy.org/en/stable/user_guide/metadata_structure.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note, HyperSpy's load
function takes the parameter load_original_metadata
that can be used to only parse the curated metadata
and drop the object original_metadata
in case it is very large:
https://hyperspy.org/hyperspy-doc/current/user_guide/io.html#metadata
New supported formats: intensity, hyperspectral, angle-resolved, E-k, time-resolved.
Progress of the PR
original_metadata
,upcoming_changes
folder (seeupcoming_changes/README.rst
),docs/readthedocs.org:rosettasciio
build of this PR (link in github checks)