[FEATURE] Pydantic backend to Data Validation #61
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR - This PR is derived from issue #58 to automatically support data validation using Pydantic, a JSON and JSONschema-friendly validation library.
At this point, the PR only defines the schema and basic validations - I have not supplied any means to integrate it into the current library, so all existing behaviour with
SigMFFile
remains.Changes
A number of files within the
component
directory (renamed?), main one being thepydantic_metadata.py
script which contains a Pydantic definition from the JSONschema as specified on the main SigMF repository.The
pydantic_metadata.py
script defines the SigMF Metadata Standard which includes:SigMFGlobalInfo
- global_infoSigMFCapture
- a single SigMF captureSigMFAnnotation
- a single SigMF annotationSigMFMetaFileSchema
- a single metadata file (in .sigmf-meta format) containing global, list of captures and list of annotationsFeatures
To the best of my ability, these classes mirror the defined JSONschema standard and go above and beyond in many ways, including the following features:
core:datatype
, version and DOI strings utilise regex patterns to ensure compliance (seepydantic_types.py
).core:version
(GlobalInfo),core:uuid
(Annotation) andcore:datetime
(Capture) use default factories to fill automatically upon creation if not defined prior (auto-filling timestamps, version numbers etc)core:collection
,core:dataset
andcore:license
use Pathlib.Path and HttpUrl objects which supply extra functionality from Python core libraries when instantiated.core:sample_start
) check for non negative or positive integer.core:dataset
andcore:metadata_only
.core:sample_start
.How to use
Creating an object
I've added a helper method
SigMFMetaFileSchema.from_file()
which takes a .sigmf-meta file path and returns the Pydantic object for it.Using the object
All of the attributes are reachable by using their name, e.g
core:version
becomesobj.global_info.version
.Exporting an object
Once a
SigMFMetaFileSchema
object is created, it can be exported to dictionary.model_dump()
or JSON string (prior to storage in file, or over the network) using the.model_dump_json(by_alias=True, exclude_none=True)
method. Settingby_alias
andexclude_none
to True is important to ensure the core attributes all begin withcore:
etc.Accessing the schema
The JSON schema of the
SigMFMetaFileSchema
can be accessed using.model_json_schema()
, allowing you to integrate with any legacy code using the schema.Testing
I've supplied some unit tests in which seem to cover the basic cases, although a few extra real examples would be pretty handy, and I haven't properly checked (yet) how its outputs compare to the current outputs from
SigMFFile
.Current code coverage results (
pytest --cov=sigmf && coverage report
):My pipeline I've been using is a Python 3.7 environment in Anaconda:
black sigmf/component
ruff check sigmf/component --fix
pylint sigmf/component
(gets a 9.95 out of 10 score)mypy -m sigmf
raises no errors in my codeNext steps
At the moment there is no code for manipulating the Pydantic objects (aside from creation) to keep controller functionality separate from the 'data' component.
However supplying code to convert these objects into nested dictionaries / to file should be trivial.
Integration
Basically seeking some guidance and ideas as to how to integrate this into existing
sigmf-python
classes.I would suggest introducing this as an optional backend in the next version, with it becoming the default option at the next release version.
Something like adding a
backend=pydantic
parameter to thesigmf.sigmffile.fromfile
method or similar.Also happy for any changes to names / suggestions to file or internal objects.
SigMF Collections
I've began an implementation of the SigMF collection standard, but I'm less familiar with this object so need to play around with it some more.