[FEATURE] Pydantic backend to Data Validation #61

gregparkes · 2024-06-16T18:57:27Z

TL;DR - This PR is derived from issue #58 to automatically support data validation using Pydantic, a JSON and JSONschema-friendly validation library.

At this point, the PR only defines the schema and basic validations - I have not supplied any means to integrate it into the current library, so all existing behaviour with SigMFFile remains.

Changes

A number of files within the component directory (renamed?), main one being the pydantic_metadata.py script which contains a Pydantic definition from the JSONschema as specified on the main SigMF repository.

The pydantic_metadata.py script defines the SigMF Metadata Standard which includes:

SigMFGlobalInfo - global_info
SigMFCapture - a single SigMF capture
SigMFAnnotation - a single SigMF annotation
SigMFMetaFileSchema - a single metadata file (in .sigmf-meta format) containing global, list of captures and list of annotations

Features

To the best of my ability, these classes mirror the defined JSONschema standard and go above and beyond in many ways, including the following features:

core:datatype, version and DOI strings utilise regex patterns to ensure compliance (see pydantic_types.py).
core:version (GlobalInfo), core:uuid (Annotation) and core:datetime (Capture) use default factories to fill automatically upon creation if not defined prior (auto-filling timestamps, version numbers etc)
core:collection, core:dataset and core:license use Pathlib.Path and HttpUrl objects which supply extra functionality from Python core libraries when instantiated.
Index attributes (such as core:sample_start) check for non negative or positive integer.
Validation for mutual exclusivity between core:dataset and core:metadata_only.
Captures and Annotations are automatically sorted by their respective core:sample_start.

How to use

Creating an object

I've added a helper method SigMFMetaFileSchema.from_file() which takes a .sigmf-meta file path and returns the Pydantic object for it.

Using the object

All of the attributes are reachable by using their name, e.g core:version becomes obj.global_info.version.

Exporting an object

Once a SigMFMetaFileSchema object is created, it can be exported to dictionary .model_dump() or JSON string (prior to storage in file, or over the network) using the .model_dump_json(by_alias=True, exclude_none=True) method. Setting by_alias and exclude_none to True is important to ensure the core attributes all begin with core: etc.

Accessing the schema

The JSON schema of the SigMFMetaFileSchema can be accessed using .model_json_schema(), allowing you to integrate with any legacy code using the schema.

Testing

I've supplied some unit tests in which seem to cover the basic cases, although a few extra real examples would be pretty handy, and I haven't properly checked (yet) how its outputs compare to the current outputs from SigMFFile.

Current code coverage results (pytest --cov=sigmf && coverage report):

Name	Stmts	Branch	BrPart	Cover
sigmf/component/init.py	1	0	0	100%
sigmf/component/extensions/init.py	1	0	0	100%
sigmf/component/extensions/core.py	8	0	0	100%
sigmf/component/geo_json.py	31	8	0	100%
sigmf/component/pydantic_metadata.py	110	24	2	99%
sigmf/component/pydantic_types.py	7	0	0	100%

My pipeline I've been using is a Python 3.7 environment in Anaconda:

black sigmf/component
ruff check sigmf/component --fix
pylint sigmf/component (gets a 9.95 out of 10 score)
mypy -m sigmf raises no errors in my code

Next steps

At the moment there is no code for manipulating the Pydantic objects (aside from creation) to keep controller functionality separate from the 'data' component.

However supplying code to convert these objects into nested dictionaries / to file should be trivial.

Integration

Basically seeking some guidance and ideas as to how to integrate this into existing sigmf-python classes.

I would suggest introducing this as an optional backend in the next version, with it becoming the default option at the next release version.

Something like adding a backend=pydantic parameter to the sigmf.sigmffile.fromfile method or similar.

Also happy for any changes to names / suggestions to file or internal objects.

SigMF Collections

I've began an implementation of the SigMF collection standard, but I'm less familiar with this object so need to play around with it some more.

777arc · 2024-06-20T05:38:51Z

Was pydantic_metadata.py entirely auto generated off the json schema or were there any manual tweaks that needed to be made?

gregparkes · 2024-06-23T18:09:00Z

Was pydantic_metadata.py entirely auto generated off the json schema or were there any manual tweaks that needed to be made?

Unfortunately a decent number of manual tweaks needed to be made - in particular the autogeneration tool turned every variable from e.g core:generator in the schema into core_generator as a variable name.

This:

+ Maintains uniqueness of each variable, allows extensions to have the same variable name as a core attribute.
- Makes the variable names longer, which is annoying to write and read.

The tool also generated mostly base Python types (e.g int, str, float) for each attribute and did not supply any special typing e.g regex-compliant strings, positive integers (e.g core:sample_count) and so on.

The custom validation and serialization code associated to each object is also not generated - as a number of the rules are specified in the SigMF standard documentation found here but not actually implemented in the underlying JSON schema - for example sorting the captures and annotations array by core:sample_start, or ensuring core:freq_upper_edge > core:freq_lower_edge. We solve this in Pydantic by ensuring these arrays are sorted in the validation process.

Gregory Parkes added 3 commits June 15, 2024 20:11

Initial work (pydantic)

d020b13

Unit tests for SigMFMetaSchema, GeoJSON

57a36fd

Added Literal type to GeoJSON

a86af98

Added pydantic requirement to toml

f244009

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Pydantic backend to Data Validation #61

[FEATURE] Pydantic backend to Data Validation #61

gregparkes commented Jun 16, 2024

777arc commented Jun 20, 2024

gregparkes commented Jun 23, 2024

[FEATURE] Pydantic backend to Data Validation #61

Are you sure you want to change the base?

[FEATURE] Pydantic backend to Data Validation #61

Conversation

gregparkes commented Jun 16, 2024

Changes

Features

How to use

Creating an object

Using the object

Exporting an object

Accessing the schema

Testing

Next steps

Integration

SigMF Collections

777arc commented Jun 20, 2024

gregparkes commented Jun 23, 2024