Skip to content

Commit

Permalink
fix(base_quantity): Removed string return from serialization of pint …
Browse files Browse the repository at this point in the history
…quantity. (#28)

List of changes:
- Removed str(input_value.magnitude) to return a proper number instead
  of a string when calling `.model_dump()`
- Added a specific serialization option when calling
  `.model_dump(mode="json")` or `.model_dump_json()` to return a string
  version of the field. This is only import for pint quantities, such
  that it returns something like `10 kV` that is easily deserialized
  with pint,
- Added better type check for instances of the BaseQuantity used on the
  pydantic model,
- Relaxed the schema serialization since pint will handle most of it,
- Added better typehint for pydantic classmethods
  • Loading branch information
pesap committed Jun 5, 2024
0 parents commit c2f6b70
Show file tree
Hide file tree
Showing 74 changed files with 14,308 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 6af4bc603a8056b95636de2aea193ceb
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file added .nojekyll
Empty file.
45 changes: 45 additions & 0 deletions _sources/explanation/components.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
```{eval-rst}
.. _components-page:
```
# Components
A component is any element that is attached to a system.

All components are required to define a name as a string (it is required in the base class). This
may not be appropriate for all classes. The `Location` class in this package is one example. In
cases like that developers can define their own name field and set its default value to `""`.

Refer to the [Components API](#components-api) for more information.

## Inheritance
Recommended rule: A `Component` that has subclasses should never be directly instantiated.

Consider a scenario where a developer defines a `Load` class and then later decides a new load is
needed because of one custom field.

The temptation may be to create `CustomLoad(Load)`. This is very problematic in the design of
the infrasys API. There will be no way to retrieve only `Load` instances. Consider this example:

```python
for load in system.get_components(Load)
print(load.name)
```

This will retrieve both `Load` and `CustomLoad` instances.

Instead, our recommendation is to create a base class with the common fields.

```python
class LoadBase(Component)
"""Defines common fields for all Loads."""

common_field1: float
common_field2: float

class Load(LoadBase):
"""A load component"""

class CustomLoad(LoadBase):
"""A custom load component"""

custom_field: float
```
16 changes: 16 additions & 0 deletions _sources/explanation/index.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
```{eval-rst}
.. _explanation-page:
```
# Explanation

```{eval-rst}
.. toctree::
:maxdepth: 2
:caption: Contents:

system
components
time_series
location
serialization
```
4 changes: 4 additions & 0 deletions _sources/explanation/location.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Location
Components can compose this class in order to specify its geographic location.

Refer to the [Location API](#location-api) for more information.
122 changes: 122 additions & 0 deletions _sources/explanation/serialization.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Serialization
This page describes how `infrasys` serializes a system and its components to JSON when a user calls
`System.to_json()` and `System.from_json()`.

## Components
`infrasys` converts its nested dictionaries of components-by-type into a flat array. Each component
records metadata about its actual Python type into a field called `__metadata__`. Here is an example
of a serialized `Location` object. Note that it includes the module and type. `infrasys` uses this
information during de-serialization to dynamically import the type and construct it. This allows
serialization to work with types defined outside of `infrasys` as long as the user has imported
those types.

```json
{
"uuid": "1e5f90ae-a386-4c8a-89ae-0ed123da3e26",
"name": null,
"x": 0.0,
"y": 0.0,
"crs": null,
"__metadata__": {
"fields": {
"module": "infrasys.location",
"type": "Location",
"serialized_type": "base"
}
}
},
```

### Composed components
There are many cases where one component will contain an instance of another component. For example,
a `Bus` may contain a `Location` or a `Generator` may contain a `Bus`. When serializing each
component, `infrasys` checks the type of each of that component's fields. If a value is another
component (which means that it must also be attached to system), `infrasys` replaces that instance
with its UUID. It does this to avoid duplicating data in the JSON file.

Here is an example of a serialized `Bus`. Note the value for the `coordinates` field. It contains the
type and UUID of the actual `coordinates`. During de-serialization, `infrasys` will detect this
condition and only attempt to de-serialize the bus once all `Location` instances have been
de-serialized.

```json
{
"uuid": "e503984a-3285-43b6-84c2-805eb3889210",
"name": "bus1",
"voltage": 1.1,
"coordinates": {
"__metadata__": {
"fields": {
"module": "infrasys.location",
"type": "Location",
"serialized_type": "composed_component",
"uuid": "1e5f90ae-a386-4c8a-89ae-0ed123da3e26"
}
}
},
"__type_metadata__": {
"fields": {
"module": "tests.models.simple_system",
"type": "SimpleBus",
"serialized_type": "base"
}
}
},
```

#### Denormalized component data
There are cases where users may prefer to have the full, denormalized JSON data for a component.
All components are of type `pydantic.BaseModel` and so implement the method `model_dump_json`.

Here is an example of a bus serialized that way (`bus.model_dump_json(indent=2)`):

```json
{
"uuid": "e503984a-3285-43b6-84c2-805eb3889210",
"name": "bus1",
"voltage": 1.1,
"coordinates": {
"uuid": "1e5f90ae-a386-4c8a-89ae-0ed123da3e26",
"name": null,
"x": 0.0,
"y": 0.0,
"crs": null
}
}
```

### Pint Quantities
`infrasys` encodes metadata into component JSON when that component contains a `pint.Quantity`
instance. Here is an example of such a component:

```json
{
"uuid": "711d2724-5814-4e0e-be5f-4b0b825b7f07",
"name": "test",
"distance": {
"value": 2,
"units": "meter",
"__metadata__": {
"fields": {
"module": "infrasys.quantities",
"type": "Distance",
"serialized_type": "quantity"
}
}
},
"__metadata__": {
"fields": {
"module": "tests.test_serialization",
"type": "ComponentWithPintQuantity",
"serialized_type": "base"
}
}
}
```

## Time Series
If the user stores time series data in Arrow files (default behavior), then `infrasys` will copy
the Arrow files into the user-specified directory in `system.to_json()`.

If the user instead chose to store time series in memory then `infrasys` will series that data
into Arrow files in the user-specified directory in `system.to_json()`.
60 changes: 60 additions & 0 deletions _sources/explanation/system.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# System
The System class provides a data store for components and time series data.

Refer to the [System API](#system-api) for complete information.

## Items to consider for parent packages

### Composition vs Inheritance
Parent packages must choose one of the following:

1. Derive a custom System class that inherits from `infrasys.System`. Re-implement methods
as desired. Add custom attributes to the System that will be serialized to JSON.

- Reimplement `System.add_components` in order to perform custom validation or custom behavior.
This is only needed for validation that needs information from both the system and the
component. Note that the `System` constructor provides the keyword argument
`auto_add_composed_components` that dictates how to handle the condition where a component
contains another component which is not already attached to the system.

- Reimplement `System.serialize_system_attributes` and `System.deserialize_system_attributes`.
`infrasys` will call those methods during `to_json` and `from_json` and serialize/de-serialize
the contents.

- Reimplement `System.data_format_version` and `System.handle_data_format_upgrade`. `infrasys`
will call the upgrade function if it detects a version change during de-serialization.

2. Implement an independent System class and compose the `infrasys.System`. This can be beneficial
if you want to make the underlying system opaque to users.

- This pattern requires that you call `System.to_json()` with the keyword argument `data` set
to a dictionary containing your system's attributes. `infrasys` will add its contents to a
field called `system` inside that dictionary.

3. Use `infrasys.System` directly. This is probably not what most packages want because they will
not be able to serialize custom attributes or implement specialized behavior as discussed above.

### Units
`infrasys` uses the [pint library](https://pint.readthedocs.io/en/stable/) to help manage units.
Package developers should consider storing fields that are quantities as subtypes of
[Base.Quantity](#base-quantity-api). Pint performs unit conversion automatically when performing
arithmetic.

If you want to be able to generate JSON schema for a model that contains a Pint quantity, you must
add an annotation as shown below. Otherwise, Pydantic will raise an exception.

```python
from pydantic import WithJsonSchema
from infrasys import Component

class ComponentWithPintQuantity(Component):

distance: Annotated[Distance, WithJsonSchema({"type": "string"})]

Component.model_json_schema()
```

**Notes**:
- `infrasys` includes some basic quantities in [infrasys.quantities](#quantity-api).
- Pint will automatically convert a list or list of lists of values into a `numpy.ndarray`.
infrasys will handle serialization/de-serialization of these types.
48 changes: 48 additions & 0 deletions _sources/explanation/time_series.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Time Series
Infrastructure systems supports time series data expressed as a one-dimensional array of floats
using the class [SingleTimeSeries](#singe-time-series-api). Users must provide a `variable_name`
that is typically the field of a component being modeled. For example, if the user has a time array
associated with the active power of a generator, they would assign
`variable_name = "active_power"`.

Here is an example of how to create an instance of `SingleTimeSeries`:

```python
import random
time_series = SingleTimeSeries.from_array(
data=[random.random() for x in range(24)],
variable_name="active_power",
initial_time=datetime(year=2030, month=1, day=1),
resolution=timedelta(hours=1),
)
```

Users can attach their own attributes to each time array. For example,
there might be different profiles for different scenarios or model years.

```python
time_series = SingleTimeSeries.from_array(
data=[random.random() for x in range(24)],
variable_name="active_power",
initial_time=datetime(year=2030, month=1, day=1),
resolution=timedelta(hours=1),
scenario="high",
model_year="2035",
)
```

## Behaviors
Users can customize time series behavior with these flags passed to the `System` constructor:

- `time_series_in_memory`: The `System` stores each array of data in an Arrow file by default. This
is a binary file that enables efficient storage and row access. Set this flag to store the data in
memory instead.
- `time_series_read_only`: The default behavior allows users to add and remove time series data.
Set this flag to disable mutation. That can be useful if you are de-serializing a system, won't be
changing it, and want to avoid copying the data.
- `time_series_directory`: The `System` stores time series data on the computer's tmp filesystem by
default. This filesystem may be of limited size. If your data will exceed that limit, such as what
is likely to happen on an HPC compute node, set this parameter to an alternate location (such as
`/tmp/scratch` on NREL's HPC systems).

Refer to the [Time Series API](#time-series-api) for more information.
12 changes: 12 additions & 0 deletions _sources/how_tos/index.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
```{eval-rst}
.. _how-tos-page:
```
# How Tos

```{eval-rst}
.. toctree::
:maxdepth: 2
:caption: Contents:

list_time_series
```
63 changes: 63 additions & 0 deletions _sources/how_tos/list_time_series.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# How to list existing time series data

Suppose that you have added multiple time series arrays to your components using differing
names and attributes. How can you see what is present?

This example assumes that a system with two generators and time series data has been serialized
to a file.

```python
from infrasys import Component, System

system = System.from_json("system.json")
for component in system.get_components(Component):
for metadata in system.list_time_series_metadata(component):
print(f"{component.label}: {metadata.label} {metadata.user_attributes}")

Generator.gen1: SingleTimeSeries.active_power {'scenario': 'high', 'model_year': '2030'}
Generator.gen1: SingleTimeSeries.active_power {'scenario': 'high', 'model_year': '2035'}
Generator.gen1: SingleTimeSeries.active_power {'scenario': 'low', 'model_year': '2030'}
Generator.gen1: SingleTimeSeries.active_power {'scenario': 'low', 'model_year': '2035'}
Generator.gen1: SingleTimeSeries.reactive_power {'scenario': 'high', 'model_year': '2030'}
Generator.gen1: SingleTimeSeries.reactive_power {'scenario': 'high', 'model_year': '2035'}
Generator.gen1: SingleTimeSeries.reactive_power {'scenario': 'low', 'model_year': '2030'}
Generator.gen1: SingleTimeSeries.reactive_power {'scenario': 'low', 'model_year': '2035'}
Generator.gen2: SingleTimeSeries.active_power {'scenario': 'high', 'model_year': '2030'}
Generator.gen2: SingleTimeSeries.active_power {'scenario': 'high', 'model_year': '2035'}
Generator.gen2: SingleTimeSeries.active_power {'scenario': 'low', 'model_year': '2030'}
Generator.gen2: SingleTimeSeries.active_power {'scenario': 'low', 'model_year': '2035'}
Generator.gen2: SingleTimeSeries.reactive_power {'scenario': 'high', 'model_year': '2030'}
Generator.gen2: SingleTimeSeries.reactive_power {'scenario': 'high', 'model_year': '2035'}
Generator.gen2: SingleTimeSeries.reactive_power {'scenario': 'low', 'model_year': '2030'}
Generator.gen2: SingleTimeSeries.reactive_power {'scenario': 'low', 'model_year': '2035'}
```

Now you can retrieve the exact instance you want.

```python
system.time_series.get(gen1, variable_name="active_power", scenario="high", model_year="2035").data
<pyarrow.lib.Int64Array object at 0x107a38d60>
[
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
...
8774,
8775,
8776,
8777,
8778,
8779,
8780,
8781,
8782,
8783
]
```
Loading

0 comments on commit c2f6b70

Please sign in to comment.