-
Notifications
You must be signed in to change notification settings - Fork 2
Usage
This page provides an overview of how to use the exdir-cpp library, once it has been built. This assumes that you have either installed the libraries and headers into the default locations, or you put them somewhere else where you know how to use them.
To use exdir-cpp in your code, you should include the exdir/exdir.hpp
header file in your source file. This should be the only header you need to include, and will include the other exdir-cpp header files, giving you access to all of the libraries features. All of the library features exists inside of the exdir
namespace.
An Exdir directory is represented as an exdir::File
object. You can create a new Exdir directory with
exdir::File exdir_file = create_file("name/of/file.exdir");
The argument can be either an std::string, or an std::filesystem::path object to where you would like the directory to be. This like of code will create the required folder, and exdir.yaml file within. If the name of the directory already exists, a runtime error will be thrown.
If you would like to open an Exdir directory which already exists on your system, you can then use
exdir::File exdir_file("name/of/existing/file.exdir");
where the argument can again be an std::string or std::filesystem::path object.
To access a Group which already exists, both File and Group objects have a method called get_group, which takes a string with the name of the group. Continuing the previous example, if there is a group called "test_group" in the exdir_file, then we can access it with
exdir::Group group = exdir_file.get_group("group_name");
A list of all groups contained in a File or parent Group can be obtained with the member_groups() method, which returns an std::vectorstd::string object of all the groups which are present within that object. A new group can be added with
exdir::Group group2 = exdir_file.create_group("group_name_2");
A Dataset is used to contain numerical data, and can be read from a group or file with their get_dataset("name") method, which returns a Dataset object. In doing this, the stored numerical data will be read into the object, and is ready to be retrieved. Due to the strict static type system of C++, the data must be cast to the appropriate type. To help with this, an enum called DType exists in the exdir namespace. The possible values for a DType and the C++ type they represent are :
- CHAR = char
- UCHAR = unsigned char
- INT16 = int16_t
- INT32 = int32_t
- INT64 = int64_t (int)
- UINT16 = uint16_t
- UINT32 = uint32_t
- UINT64 = uint64_t
- FLOAT32 = float
- DOUBLE64 = double So, when a dataset is loaded, the data type can be checked
exdir::Dataset data = exdir_file.get_dataset("data_set_1");
exdir::DType data_dtype = data.dtype();
To actually get the data out in a usable manner, it must then retrieved from the object, where it will also be cast to the desired type. Should data_dtype == exdir::DType::DOUBLE64
be true, then it is certainly safe to cast the data to that type. Data is always retrieved into an exdir::Array object.
exdir::Array data_array = data.retrieve_data<double>();
From the Array, the data can be accessed in place, or be transferred to the users preferred object. Once the data has been retrieved and copied into the Array, the data in the Dataset object is erased to conserve memory. If you wish to load another Array with the same data, you must first re-load the data into the array, using
data.load_data();
Alternatively, if data has been loaded (which can be verified with the method data.data_loaded()
which returns true if data is retrievable), you can clear the data from the array using
data.clear_data();
Creation of a new Dataset also requires an exdir::Array object. Assuming the Array already exists, then one uses
group2.create_dataset("dataset_name", data_array_2);
where data_array_2
is of type exdir::Array. It should be noted that this method does not return a Dataset object of the newly created Dataset. If you would like to further access the dataset (to add attributes for example), you must first get the object with exdir::Dataset data_2 = group1.get_dataset("dataset_name");
.
Arrays are multi-dimensional objects, used to send and receive data from exdir::Dataset objects. They are similar to Numpy Arrays which exist in Python. The data is stored in a 1D vector, and the linear index is calculated from the provided indicies depending on whether the stored ypedata is C-contiguous, or Fortran-contiguous. An array can be obtained from a Dataset, but can also be created from an std::vector.
std::vector<int> array_data { 1, 2, 3, 4,
5, 6, 7, 8,
9, 10, 11, 12,
13, 14, 15, 16};
std::vector<size_t> array_shape {4,4};
bool array_c_contiguous = true;
exdir::DType array_dtype = exdir::DType::INT64;
exdir::Array<int> test_array(array_data, array_shape, array_dtype, array_c_contiguous);
This example creates a 4x4 matrix of integers. The shape is defined by a vector of type size_t, with one element per dimension. Once created, elements can be accessed using the () operator:
int array_element = test_array(2,2); // array_element = 11 based on the previous array
std::vector<size_t> indicies {2,2};
int array_element_2 = test_array(indicies); // array_element_2 = 11 as well
An index value can be passed directly to the operator, or they can be put into a vector and sent in a single object. There must be one index per dimension of the Array. If this is not the case, an error will be thrown. The data may also be accessed by the linear index using the [] operator. Like a Numpy array, an exdir::Array may also be reshaped by passing a new vector of size_t objects, and it must correspond to a linear array of the same number of elements
test_array.reshape({4,2,2});
/*
test_array now resembles an array of the form
[[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]],
[[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16]]]
*/
The linear index of each data element has remained the same, but an element is no longer indexed with the same values. It also required 3 arguments to index this Array with the () operator, instead of 2.
Raws can be accessed or created with commands similar to those of groups.
exdir::Raw raw = data_set.get_raw("raw_data");
exdir::Raw raw2 = data_set.create_raw("more_raw_data");
Other objects may not be added to Raws. Raws do have a special method which returns a vector of strings of the files inside the directory.
std::vector<std::string> raw_files = raw.member_file();
All objects (File, Group, Dataset, Raw) are able to contain attributes, which contain properties about them. These are accesses through the attrs member of the object. This public object is actually a raw yaml-cpp node. It works somewhat like a dictionary, taking an std::string as a key. You can set an attribute for an object with something along the lines of
group2.attrs["density"] = 2.3;
To learn more about how to use the attributes and yaml-cpp nodes, take a look at their tutorial here.
If you make changes to attributes are saved on destruction of an object. If you would like to write them to dist, then you can use the write()
method:
group2.write();