Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New types of datasets supported for Delmic HDF5 format #328

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

noemiebonnet
Copy link
Contributor

@noemiebonnet noemiebonnet commented Oct 28, 2024

New supported formats: intensity, hyperspectral, angle-resolved, E-k, time-resolved.

Progress of the PR

  • Read intensity,
  • Read hyperspectral,
  • Read angle-resolved,
  • Read E-k,
  • Read time-resolved,
  • Read original_metadata,
  • Read EELS,
  • Write EELS,
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add a changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting of the changelog entry (and eventual user guide changes) in the docs/readthedocs.org:rosettasciio build of this PR (link in github checks)
  • add tests,
  • increase test coverage for errors,
  • ready for review.

New supported formats: intensity, hyperspectral, angle-resolved, E-k, time-resolved.
def get_unit_prefix(number):
"""Return the SI prefix for the given number based on its magnitude."""
if 1e-15 < np.abs(number) < 1e-12:
prefix = "f"

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'prefix' is unnecessary as it is
redefined
before this value is used.
This assignment to 'prefix' is unnecessary as it is
redefined
before this value is used.
This assignment to 'prefix' is unnecessary as it is
redefined
before this value is used.
This assignment to 'prefix' is unnecessary as it is
redefined
before this value is used.
This assignment to 'prefix' is unnecessary as it is
redefined
before this value is used.

# Ensure SE Image shape is compatible by taking the last two dimensions
if len(SE_Image.shape) >= 2:
Y, X = SE_Image.shape[-2:]

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable Y is not used.

# Ensure SE Image shape is compatible by taking the last two dimensions
if len(SE_Image.shape) >= 2:
Y, X = SE_Image.shape[-2:]

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable X is not used.
Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 62.60163% with 138 lines in your changes missing coverage. Please review.

Project coverage is 87.02%. Comparing base (88923bf) to head (0a3e897).
Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
rsciio/delmic/_api.py 62.60% 85 Missing and 53 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
- Coverage   87.84%   87.02%   -0.83%     
==========================================
  Files          85       85              
  Lines       11180    11534     +354     
  Branches     2280     2378      +98     
==========================================
+ Hits         9821    10037     +216     
- Misses        860      945      +85     
- Partials      499      552      +53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@noemiebonnet
Copy link
Contributor Author

pre-commit.ci autofix

Copy link
Member

@jlaehne jlaehne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @noemiebonnet for the substantial progress! I left a few comments from a first browsing over the code, but did not yet have time for a closer look. Please include the standard tracking list in the initial comment to get an overview of how far the PR is.

There are still a number of lines uncovered by tests, as commented by codecov. If lines should explicitly be ignored in the coverage test, one can add the comment # pragma: no cover at the end of the line. I see that most uncovered lines concern warnings or errors that might surface during file reading. It might be hard to test for those if one has only proper test files. What is our current best-practice in that case @ericpre? (Note that you can also get inspiration on such cases from other files readers recently implemented, such as Horiba JobinYvon or Hamamatsu).

doc/user_guide/supported_formats/delmic.rst Show resolved Hide resolved
doc/user_guide/supported_formats/delmic.rst Show resolved Hide resolved
rsciio/delmic/_api.py Show resolved Hide resolved
rsciio/tests/test_delmic.py Show resolved Hide resolved
rsciio/delmic/_api.py Show resolved Hide resolved
@jlaehne
Copy link
Member

jlaehne commented Nov 5, 2024

A single spectrum is still loaded as <Signal1D, title: , dimensions: (1, 1|2048)>, but should be <Signal1D, title: , dimensions: (|2048)> (drop the navigation dimensions).

rsciio/delmic/_api.py Show resolved Hide resolved
the associated image type.
"""
if Acq is None:
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all the various loading errors are giving codecov warnings, because they are not included in the tests. In hdf5, it should be rather straight forward to create some minimal test files for each case where certain elements are deleted from the file to test each of these cases. An idea could also be to use hdf5 files from some other rsciio plugins that do not match the delmic specifications to test some of the errors.


# Check if Image has 5 dimensions
if Image.ndim != 5:
raise ValueError("The input 'Image' must be a 5D h5 dataset.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this check and the one above repeated in all different load functions. Are all delmic hdf5 files 5D datasets? Then it could make sense to check for this once on a higher level instead of having the same checks in every datatype function.

or Image.shape[3] < 1
or Image.shape[4] < 1
):
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example for testing errors, here you could use the file from a different type of data and change the img_type to get a case that triggers this error.

Scale = np.array(ImgData.get(scale_key))

if axis_name in ["C", "T"]:
scale_value = np.mean(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other readers have a parameter to choose whether to read such axes as a UniformDataAxis by calculating the mean scale, or as a non-uniform DataAxis by just taking exactly the vector that is in the file.

see e.g.:

use_uniform_signal_axis : bool, default=False

metadata["Signal"]["quantity"] = "Intensity (counts)" # Default value

if "signal_type" in Acq:
metadata["Signal"]["signal_type"] = Acq["signal_type"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signal_type is used to select the right signal class. If LumiSpy is installed, it should default to the correct signal classes by choosing Luminescence for anything where the only signal axis is wavelength (spectra or hyperspectral maps), LumiTransient if the only signal axis is time and LumiTransientSpectrum for streak camera images.

See e.g.

if self._lumispy_installed:

signal["signal_type"] = "LumiTransientSpectrum" # pragma: no cover

return metadata


def make_original_metadata(Acq):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original_metadata should be 1:1 the metadata tree from the file and can be automatically parsed for hdf5 files, see e.g.

def _load_metadata(group, lazy=False, skip_array_metadata=False):

metadata should contain a curated subset of metadata with predefined key names that follows the LumiSpy/HyperSpy conventions:
https://docs.lumispy.org/en/stable/user_guide/metadata_structure.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, HyperSpy's load function takes the parameter load_original_metadata that can be used to only parse the curated metadata and drop the object original_metadata in case it is very large:
https://hyperspy.org/hyperspy-doc/current/user_guide/io.html#metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants