Add data read #85

oruebel · 2024-08-31T09:51:57Z

This PR is to try and implement the proposed approach for data read from #83 to see if this approach is viable. This PR is experimental right now and should not be merged.

2. Proposed Implementation for reading data arrays

`BaseReadData`

Create a new ReadDatasetWrapper and ReadAttributeWrapper classes for reading data. This is modified from the proposal, which suggested a single BaseReadData class for reading any array (datasets, attributes) from a file.
Support conversion to boost multi-dimensional array for convenience
Note I did not update BaseRecordingData to inherit from ReadDataWrapper because the two are not compatible right now. BaseReadData uses the io and the path to access the data, whereas BaseRecordingData leaves that choice to the I/O backend and stores the references to the dataset. We may or may not want to change this.

`BaseIO`

Note I did not add abstract methods for lazy reading objects from a file to BaseIO (or more accurately I remove them) because: 1) I wanted to use the shared_ptr to the I/O rather than a raw pointer, which I can't get from BaseIO, and 2) with the ReadDatasetWrapper this is more approbriately done in the Container class directly.
Add pure virtual method to allow us to get the storage object type (Group, Dataset, Attribute) for a given path
Add pure virtual methods to read data values from a Dataset or Attribute that the BaseReadData can call for read

`HDF5IO`

Note In contrast to the proposal, I did not implement specific version of the BaseReadData for HDF5 but left read logic to HDF5IO itself so that the ReadDatasetWrapper can remain generic. To make this more manageable, I defined ReadAttributeWrapper separately
Implement the methods for reading data values from a Dataset or Attribute that the HDF5ReadDataSet and HDF5ReadAttribute wrappers can call can call for read
Implement the getObjectType method for getting the storage object type (Group, Dataset, Attribute) for a given path

`Container`

Store the io object on the Container so that we can call io->readDataset and io->readAttribute in the read methods

NWB types: `TimeSeries`, `ElectricalSeries` etc.

Remove storage of properties from the Container classes and replace them with access methods that return BaseReadData objects instead. This allows for reading in both read and write mode and avoids keeping data in memory that we have already written to disk. For example, in TimeSeries, these variables would need to change to properties:

aqnwb/src/nwb/base/TimeSeries.hpp

Lines 91 to 140 in e873d95

    
             /** 
        
              * @brief Base unit of measurement for working with the data. Actual stored 
        
              * values are not necessarily stored in these units. To access the data in 
        
              * these units, multiply ‘data’ by ‘conversion’ and add ‘offset’. 
        
              */ 
        
             std::string unit; 
        
             /** 
        
              * @brief The description of the TimeSeries. 
        
              */ 
        
             std::string description; 
        
             /** 
        
              * @brief Human-readable comments about the TimeSeries. 
        
              */ 
        
             std::string comments; 
        
             /** 
        
              * @brief Size used in dataset creation. Can be expanded when writing if 
        
              * needed. 
        
              */ 
        
             SizeArray dsetSize; 
        
             /** 
        
              * @brief Chunking size used in dataset creation. 
        
              */ 
        
             SizeArray chunkSize; 
        
             /** 
        
              * @brief Scalar to multiply each element in data to convert it to the 
        
              * specified ‘unit’. 
        
              */ 
        
             float conversion; 
        
             /** 
        
              * @brief Smallest meaningful difference between values in data, stored in the 
        
              * specified by unit. 
        
              */ 
        
             float resolution; 
        
             /** 
        
              * @brief Scalar to add to the data after scaling by ‘conversion’ to finalize 
        
              * its coercion to the specified ‘unit’. 
        
              */ 
        
             float offset; 
        
             /** 
        
              * @brief The starting time of the TimeSeries. 
        
              */ 
        
             float startingTime = 0.0;

3. Proposed implementation for reading whole `Containers` (e.g., to read an `ElectricalSeries`)

Add access methods on the respective Container that owns the respective objects, e.g., NWBFile owning ElectricalSeries objects to retrieve the object
Add abstract factory method (that is templated on the return type) to Container to create an instance of the specific Container type using only the io and path for the Container as input. The specific Container classes, such as TimeSeries will then need to implement a corresponding constructor that uses io and path as input.

Step 1: Define the Template Factory Method in `Container`

class Container {
public:
   
    template <typename T>
    static std::unique_ptr<T> create(const BaseIO& io, const std::string& path) {
        static_assert(std::is_base_of<Container, T>::value, "T must be a derived class of Container");
        return std::unique_ptr<T>(new T(path, io));
    }
};

Step 2: Implement the constructors on the specific `Container` classes (e.g., `TimeSeries`)

Add the necessary constructor

class TimeSeries : public Container {
public:
    TimeSeries(const std::string& path, const BaseIO& io) {
        // Implementation of TimeSeries constructor
    }
};

4. Proposed implementation for reading untyped groups (e.g., `/acquisition`)

I'm not sure we'll need do this, since a group by itself does not define data. To access the contents we could define access methods on the parent Container class (e.g., NWBFile) that owns the untyped group to access its contents.

TODO

Next steps

Items moved to new issues

…r from BaseIO

oruebel · 2024-09-01T09:08:33Z

@stephprince when you get a chance ,could you please do a first code review of this PR to make sure this is heading in the right direction. I now have a first outline of one possible solution for how we might implement read. There is still a lot more work to be done before this PR is ready, but it would be useful if you could take a look before I go any further with this approach.

I would start by looking at:

tests/examples/test_ecephys_data_read.cpp which shows an example of how read works for the user
BaseIO then defines the main new classes used for reading and HDF5IO then implements the actual reading
Container and ElectricalSeries also have some relevant changes to allow us to construct Container objects for read and how we can get specific datasets/attributes

oruebel · 2024-09-01T22:26:28Z

@stephprince I just added a documentation page as well, which hopefully helps explain the current proposed design for read so we can review and discuss.

…iven path

…t_open Make calling io->open() explicit

Update read example

oruebel · 2025-01-13T19:23:11Z

@stephprince Looking through the remaining ToDo items, I think we can reasonably create separate issues and PRs for those. I think this is now reasonably well tested that we can merge this and create new issues for the missing items.

stephprince

That sounds good to me! 🎉

oruebel · 2025-01-13T22:37:06Z

That sounds good to me! 🎉

👍 thanks for approving the PR. Here the list of follow-up issues:

Items moved to new issues

oruebel added 15 commits August 30, 2024 22:36

Define base classes for reading

1e1f041

First draft of reading datasets and attributes

e58f50f

Split reading of attribute and dataset to separate functions

250a66b

Remove debug print

e1bfa2e

Add functions to construct ReadDatasetWrapper and ReadAttributeWrappe…

4aedefd

…r from BaseIO

Fix formatting

4dae171

Add test for using the ReadDatasetWrapper

140809e

Start refactor containers for read

4d7b1ea

Fix format

2db4bfe

Read ElectricalSeries.data example working

b2959d2

Revert change to tests used for debugging

37d30d3

Fix codespell issue

6cfe416

Move read example to an example file

3d86aff

Fix bug in Container::create

3fc1d7a

Add example for data read

b09a4fa

oruebel requested a review from stephprince September 1, 2024 08:45

Fix spelling error

970f4c8

Add user docs for data read

26fbae9

oruebel added 6 commits September 1, 2024 16:35

Update read user docs

4299f99

Add toc

9c9301c

Add read software design figure and more details on the design

447337a

Some adjustment to the edges in the fig

e3d4330

Some adjustment to the edges in the fig

77f0861

Add intro for data read page

5190731

oruebel mentioned this pull request Sep 3, 2024

Propose refactor of I/O class organization #88

Closed

oruebel added 3 commits September 2, 2024 22:33

Make Container::create inline

f08e42f

Make DataBlock::fromGeneric inline

1a754b9

Implement function to allow us to get the storage object type for a g…

ebfbdf1

…iven path

Merge branch 'add_read_explicit_open' into add_read_examples

05277ff

oruebel mentioned this pull request Jan 13, 2025

Update read example #123

Merged

oruebel and others added 10 commits January 12, 2025 18:31

Adjust slice to check if it fixes Windows error

e8f1d42

Add output to help debug Windows issue

b8ecc38

Add debug output in HDF5IO to debug Windows issue

e07cf58

Update NWBFile.hpp

d6adf75

Fix code formatting

5e08abc

Some more debug ouput for Windows

480015d

Fix read path to avoid potentially reading from different series

dc6e16d

Merge pull request #128 from NeurodataWithoutBorders/add_read_explici…

14a825c

…t_open Make calling io->open() explicit

Merge branch 'add_read' into add_read_examples

57934cb

Clean up deug output

69faa08

oruebel marked this pull request as ready for review January 13, 2025 19:19

Merge pull request #123 from NeurodataWithoutBorders/add_read_examples

9d5d173

Update read example

stephprince approved these changes Jan 13, 2025

View reviewed changes

This was referenced Jan 13, 2025

Enhance direct unit tests for basic and private read functions #133

Closed

Improve unit test coverage for reading neurodata_type objects and fields #134

Open

oruebel merged commit 354147f into main Jan 13, 2025
10 checks passed

oruebel deleted the add_read branch January 13, 2025 22:42

This was referenced Jan 14, 2025

VectorData description not valid #104

Closed

Discuss design for read #83

Closed

Add option to overwrite existing file #72

Closed

NWBFile not inheriting from Container #61

Closed

add support for reading NWB files #44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data read #85

Add data read #85

oruebel commented Aug 31, 2024 •

edited

Loading

oruebel commented Sep 1, 2024 •

edited

Loading

oruebel commented Sep 1, 2024

oruebel commented Jan 13, 2025

stephprince left a comment

oruebel commented Jan 13, 2025

	/**
	* @brief Base unit of measurement for working with the data. Actual stored
	* values are not necessarily stored in these units. To access the data in
	* these units, multiply ‘data’ by ‘conversion’ and add ‘offset’.
	*/
	std::string unit;

	/**
	* @brief The description of the TimeSeries.
	*/
	std::string description;

	/**
	* @brief Human-readable comments about the TimeSeries.
	*/
	std::string comments;

	/**
	* @brief Size used in dataset creation. Can be expanded when writing if
	* needed.
	*/
	SizeArray dsetSize;

	/**
	* @brief Chunking size used in dataset creation.
	*/
	SizeArray chunkSize;

	/**
	* @brief Scalar to multiply each element in data to convert it to the
	* specified ‘unit’.
	*/
	float conversion;

	/**
	* @brief Smallest meaningful difference between values in data, stored in the
	* specified by unit.
	*/
	float resolution;

	/**
	* @brief Scalar to add to the data after scaling by ‘conversion’ to finalize
	* its coercion to the specified ‘unit’.
	*/
	float offset;

	/**
	* @brief The starting time of the TimeSeries.
	*/
	float startingTime = 0.0;

Add data read #85

Add data read #85

Conversation

oruebel commented Aug 31, 2024 • edited Loading

2. Proposed Implementation for reading data arrays

BaseReadData

BaseIO

HDF5IO

Container

NWB types: TimeSeries, ElectricalSeries etc.

3. Proposed implementation for reading whole Containers (e.g., to read an ElectricalSeries)

Step 1: Define the Template Factory Method in Container

Step 2: Implement the constructors on the specific Container classes (e.g., TimeSeries)

4. Proposed implementation for reading untyped groups (e.g., /acquisition)

TODO

Next steps

Items moved to new issues

oruebel commented Sep 1, 2024 • edited Loading

oruebel commented Sep 1, 2024

oruebel commented Jan 13, 2025

stephprince left a comment

Choose a reason for hiding this comment

oruebel commented Jan 13, 2025

Items moved to new issues

oruebel commented Aug 31, 2024 •

edited

Loading

`BaseReadData`

`BaseIO`

`HDF5IO`

`Container`

NWB types: `TimeSeries`, `ElectricalSeries` etc.

3. Proposed implementation for reading whole `Containers` (e.g., to read an `ElectricalSeries`)

Step 1: Define the Template Factory Method in `Container`

Step 2: Implement the constructors on the specific `Container` classes (e.g., `TimeSeries`)

4. Proposed implementation for reading untyped groups (e.g., `/acquisition`)

oruebel commented Sep 1, 2024 •

edited

Loading