Add data read #85

oruebel · 2024-08-31T09:51:57Z

This PR is to try and implement the proposed approach for data read from #83 to see if this approach is viable. This PR is experimental right now and should not be merged.

2. Proposed Implementation for reading data arrays

`BaseReadData`

Create a new ReadDatasetWrapper and ReadAttributeWrapper classes for reading data. This is modified from the proposal, which suggested a single BaseReadData class for reading any array (datasets, attributes) from a file.
Support conversion to boost multi-dimensional array for convenience
Note I did not update BaseRecordingData to inherit from ReadDataWrapper because the two are not compatible right now. BaseReadData uses the io and the path to access the data, whereas BaseRecordingData leaves that choice to the I/O backend and stores the references to the dataset. We may or may not want to change this.

`BaseIO`

Note I did not add abstract methods for lazy reading objects from a file to BaseIO (or more accurately I remove them) because: 1) I wanted to use the shared_ptr to the I/O rather than a raw pointer, which I can't get from BaseIO, and 2) with the ReadDatasetWrapper this is more approbriately done in the Container class directly.
Add pure virtual method to allow us to get the storage object type (Group, Dataset, Attribute) for a given path
Add pure virtual methods to read data values from a Dataset or Attribute that the BaseReadData can call for read

`HDF5IO`

Note In contrast to the proposal, I did not implement specific version of the BaseReadData for HDF5 but left read logic to HDF5IO itself so that the ReadDatasetWrapper can remain generic. To make this more manageable, I defined ReadAttributeWrapper separately
Implement the methods for reading data values from a Dataset or Attribute that the HDF5ReadDataSet and HDF5ReadAttribute wrappers can call can call for read
Implement the getObjectType method for getting the storage object type (Group, Dataset, Attribute) for a given path

`Container`

Store the io object on the Container so that we can call io->readDataset and io->readAttribute in the read methods

NWB types: `TimeSeries`, `ElectricalSeries` etc.

Remove storage of properties from the Container classes and replace them with access methods that return BaseReadData objects instead. This allows for reading in both read and write mode and avoids keeping data in memory that we have already written to disk. For example, in TimeSeries, these variables would need to change to properties:

aqnwb/src/nwb/base/TimeSeries.hpp

Lines 91 to 140 in e873d95

    
             /** 
        
              * @brief Base unit of measurement for working with the data. Actual stored 
        
              * values are not necessarily stored in these units. To access the data in 
        
              * these units, multiply ‘data’ by ‘conversion’ and add ‘offset’. 
        
              */ 
        
             std::string unit; 
        
             /** 
        
              * @brief The description of the TimeSeries. 
        
              */ 
        
             std::string description; 
        
             /** 
        
              * @brief Human-readable comments about the TimeSeries. 
        
              */ 
        
             std::string comments; 
        
             /** 
        
              * @brief Size used in dataset creation. Can be expanded when writing if 
        
              * needed. 
        
              */ 
        
             SizeArray dsetSize; 
        
             /** 
        
              * @brief Chunking size used in dataset creation. 
        
              */ 
        
             SizeArray chunkSize; 
        
             /** 
        
              * @brief Scalar to multiply each element in data to convert it to the 
        
              * specified ‘unit’. 
        
              */ 
        
             float conversion; 
        
             /** 
        
              * @brief Smallest meaningful difference between values in data, stored in the 
        
              * specified by unit. 
        
              */ 
        
             float resolution; 
        
             /** 
        
              * @brief Scalar to add to the data after scaling by ‘conversion’ to finalize 
        
              * its coercion to the specified ‘unit’. 
        
              */ 
        
             float offset; 
        
             /** 
        
              * @brief The starting time of the TimeSeries. 
        
              */ 
        
             float startingTime = 0.0;

Add access methods that return BaseReadData for missing fields

3. Proposed implementation for reading whole `Containers` (e.g., to read an `ElectricalSeries`)

Add access methods on the respective Container that owns the respective objects, e.g., NWBFile owning ElectricalSeries objects to retrieve the object
Add abstract factory method (that is templated on the return type) to Container to create an instance of the specific Container type using only the io and path for the Container as input. The specific Container classes, such as TimeSeries will then need to implement a corresponding constructor that uses io and path as input.

Step 1: Define the Template Factory Method in `Container`

class Container {
public:
   
    template <typename T>
    static std::unique_ptr<T> create(const BaseIO& io, const std::string& path) {
        static_assert(std::is_base_of<Container, T>::value, "T must be a derived class of Container");
        return std::unique_ptr<T>(new T(path, io));
    }
};

Step 2: Implement the constructors on the specific `Container` classes (e.g., `TimeSeries`)

Add the necessary constructor

class TimeSeries : public Container {
public:
    TimeSeries(const std::string& path, const BaseIO& io) {
        // Implementation of TimeSeries constructor
    }
};

4. Proposed implementation for reading untyped groups (e.g., `/acquisition`)

I'm not sure we'll need do this, since a group by itself does not define data. To access the contents we could define access methods on the parent Container class (e.g., NWBFile) that owns the untyped group to access its contents.

TODO

Items moved to new issues

Next steps

…r from BaseIO

oruebel · 2024-09-01T09:08:33Z

@stephprince when you get a chance ,could you please do a first code review of this PR to make sure this is heading in the right direction. I now have a first outline of one possible solution for how we might implement read. There is still a lot more work to be done before this PR is ready, but it would be useful if you could take a look before I go any further with this approach.

I would start by looking at:

tests/examples/test_ecephys_data_read.cpp which shows an example of how read works for the user
BaseIO then defines the main new classes used for reading and HDF5IO then implements the actual reading
Container and ElectricalSeries also have some relevant changes to allow us to construct Container objects for read and how we can get specific datasets/attributes

oruebel · 2024-09-01T22:26:28Z

@stephprince I just added a documentation page as well, which hopefully helps explain the current proposed design for read so we can review and discuss.

…iven path

Co-authored-by: Steph Prince <[email protected]>

oruebel · 2024-10-23T00:21:55Z

We could modify the tests.yml to run all the tests with both states, i.e., with the code as is in the PR as well as with the temporary merge. This would make the CI runtime longer (since all tests would run twice) and make the workflow a bit longer but would help with finding merge errors.

We decide not to do this and to continue testing only for the merged version. We decided to add a note in the developer docs to clarify this behavior.

Add read for neurodata_types, e.g., Container, TimeSeries

src/nwb/RegisteredType.hpp

oruebel · 2024-12-22T01:36:02Z

@stephprince I synced the branch with the main branch. However, Windows tests are currently failing due to 'boost/multi_array.hpp': No such file or directory. Can you check the Windows action to make sure boost is being installed correctly.

oruebel · 2024-12-22T09:41:42Z

, Windows tests are currently failing due to 'boost/multi_array.hpp': No such file or directory

Windows tests are working again. I had to add boost multi-array to the windows Action to fix the include error for boost/multi_array.hpp and update used of variable length arrays in HDF5IO to use std::vector instead, because apparently gcc has an extension to support variable length arrays but the default compiler on Windows does not.

…ilt errors

oruebel added 15 commits August 30, 2024 22:36

Define base classes for reading

1e1f041

First draft of reading datasets and attributes

e58f50f

Split reading of attribute and dataset to separate functions

250a66b

Remove debug print

e1bfa2e

Add functions to construct ReadDatasetWrapper and ReadAttributeWrappe…

4aedefd

…r from BaseIO

Fix formatting

4dae171

Add test for using the ReadDatasetWrapper

140809e

Start refactor containers for read

4d7b1ea

Fix format

2db4bfe

Read ElectricalSeries.data example working

b2959d2

Revert change to tests used for debugging

37d30d3

Fix codespell issue

6cfe416

Move read example to an example file

3d86aff

Fix bug in Container::create

3fc1d7a

Add example for data read

b09a4fa

oruebel requested a review from stephprince September 1, 2024 08:45

Fix spelling error

970f4c8

Add user docs for data read

26fbae9

oruebel added 6 commits September 1, 2024 16:35

Update read user docs

4299f99

Add toc

9c9301c

Add read software design figure and more details on the design

447337a

Some adjustment to the edges in the fig

e3d4330

Some adjustment to the edges in the fig

77f0861

Add intro for data read page

5190731

oruebel mentioned this pull request Sep 3, 2024

Propose refactor of I/O class organization #88

Open

oruebel added 3 commits September 2, 2024 22:33

Make Container::create inline

f08e42f

Make DataBlock::fromGeneric inline

1a754b9

Implement function to allow us to get the storage object type for a g…

ebfbdf1

…iven path

oruebel and others added 5 commits October 22, 2024 15:29

Merge branch 'add_read' into add_container_read

415926c

Fix build error due to error during merge with base branch

a93cadd

Apply suggestions from code review

2554667

Co-authored-by: Steph Prince <[email protected]>

Fix read.dox figure based on suggestion

6c57527

Fix docs based on code review

a7cf936

stephprince added 9 commits December 12, 2024 10:58

update getAttribute method to detect object type

3d89d78

update read example

c22d658

add file mode options and readonly mode to io

a91cc91

update example to use readonly

b20388c

add getter functions to Container classes with attributes

20afbae

fix formatting

20813c2

update lint workflow

f211978

add NWBFile fields

0b7d181

update filenames

a5cda51

This was referenced Dec 19, 2024

Add developer documentation note on merge commits for PRs #113

Open

Implement read for compound data types and references #114

Open

stephprince added 2 commits December 18, 2024 21:35

remove duplicate file mode checks

6a7bb45

Merge pull request #91 from NeurodataWithoutBorders/add_container_read

bfa740a

Add read for neurodata_types, e.g., Container, TimeSeries

oruebel mentioned this pull request Dec 20, 2024

Support read with classes using REGISTER_SUBCLASS_WITH_TYPENAME #115

Open

oruebel commented Dec 20, 2024

View reviewed changes

src/nwb/RegisteredType.hpp Show resolved Hide resolved

oruebel added 3 commits December 21, 2024 17:01

Merge branch 'main' into add_read

ddb8f50

Fix spelling and complete merge

54d55da

Fix missing merge in file

8492bf7

Install Boost multi-array in windows action

9ed9bd3

Use std::vector instead of variable-length arrays to avoid Windows bu…

f6b306b

…ilt errors

oruebel mentioned this pull request Dec 22, 2024

Allow virtual function for DEFINE_FIELD #117

Open

Add note to docs to clarify non-virtual DEFINE_FIELD

7b88122

	NWBFile::NWBFile(const std::string& path, std::shared_ptr<IO::BaseIO> io)
	: Container("/", io) // Always use "/" for the path
	{
	std::cerr << "NWBFile object is always the root. Path must be /" << std::endl;
	assert(path == "/");
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data read #85

Add data read #85

oruebel commented Aug 31, 2024 •

edited

Loading

oruebel commented Sep 1, 2024 •

edited

Loading

oruebel commented Sep 1, 2024

oruebel commented Oct 23, 2024

oruebel commented Dec 22, 2024

oruebel commented Dec 22, 2024 •

edited

Loading

	/**
	* @brief Base unit of measurement for working with the data. Actual stored
	* values are not necessarily stored in these units. To access the data in
	* these units, multiply ‘data’ by ‘conversion’ and add ‘offset’.
	*/
	std::string unit;

	/**
	* @brief The description of the TimeSeries.
	*/
	std::string description;

	/**
	* @brief Human-readable comments about the TimeSeries.
	*/
	std::string comments;

	/**
	* @brief Size used in dataset creation. Can be expanded when writing if
	* needed.
	*/
	SizeArray dsetSize;

	/**
	* @brief Chunking size used in dataset creation.
	*/
	SizeArray chunkSize;

	/**
	* @brief Scalar to multiply each element in data to convert it to the
	* specified ‘unit’.
	*/
	float conversion;

	/**
	* @brief Smallest meaningful difference between values in data, stored in the
	* specified by unit.
	*/
	float resolution;

	/**
	* @brief Scalar to add to the data after scaling by ‘conversion’ to finalize
	* its coercion to the specified ‘unit’.
	*/
	float offset;

	/**
	* @brief The starting time of the TimeSeries.
	*/
	float startingTime = 0.0;

Add data read #85

Are you sure you want to change the base?

Add data read #85

Conversation

oruebel commented Aug 31, 2024 • edited Loading

2. Proposed Implementation for reading data arrays

BaseReadData

BaseIO

HDF5IO

Container

NWB types: TimeSeries, ElectricalSeries etc.

3. Proposed implementation for reading whole Containers (e.g., to read an ElectricalSeries)

Step 1: Define the Template Factory Method in Container

Step 2: Implement the constructors on the specific Container classes (e.g., TimeSeries)

4. Proposed implementation for reading untyped groups (e.g., /acquisition)

TODO

Items moved to new issues

Next steps

oruebel commented Sep 1, 2024 • edited Loading

oruebel commented Sep 1, 2024

oruebel commented Oct 23, 2024

oruebel commented Dec 22, 2024

oruebel commented Dec 22, 2024 • edited Loading

oruebel commented Aug 31, 2024 •

edited

Loading

`BaseReadData`

`BaseIO`

`HDF5IO`

`Container`

NWB types: `TimeSeries`, `ElectricalSeries` etc.

3. Proposed implementation for reading whole `Containers` (e.g., to read an `ElectricalSeries`)

Step 1: Define the Template Factory Method in `Container`

Step 2: Implement the constructors on the specific `Container` classes (e.g., `TimeSeries`)

4. Proposed implementation for reading untyped groups (e.g., `/acquisition`)

oruebel commented Sep 1, 2024 •

edited

Loading

oruebel commented Dec 22, 2024 •

edited

Loading