To install from PyPI, run:
python3 -m pip install yamlprocessor
This project provides two command line utilities yp-data
and yp-schema
.
The yp-data
utility allows automation of the following in a single command:
- Modularisation of YAML files via a simple include mechanism.
- Variable substitutions in string values.
- Environment and pre-defined variables.
- Date-time variables, based on the current time and/or a reference time.
- Validation using JSON schema.
The yp-schema
utility is a compliment to the YAML modularisation / include
functionality provided by yp-data
. It allows users to break up a monolithic
JSON schema file into a set of subschema files.
Command line:
yp-data [options] input-file-name output-file-name
Type -yp-data --help
for a list of options, and see below for usage detail.
Python:
from yamlprocessor.dataprocess import DataProcessor
processor = DataProcess()
processor.process_data(in_file_name, out_file_name)
Allow modularisation of YAML files using a controlled include file mechanism, backed by dividing the original JSON schema file into a set of subschema files.
Consider a YAML file hello.yaml
:
hello:
- location: earth
targets:
- human
- cat
- dog
- location: mars
targets:
- martian
# And so on
And its associated JSON schema file:
{
"properties": {
"hello": {
"items": {
"properties": {
"location": {
"type": "string"
},
"targets": {
"items": {
"type": "string"
},
"minItems": 1,
"type": "array",
"uniqueItems": true
}
},
"required": ["location", "targets"],
"type": "object"
},
"type": "array"
}
},
"required": ["hello"],
"type": "object"
}
We want to modularise the YAML file in this way.
Let's call this hello-root.yaml
:
hello:
- INCLUDE: earth.yaml
- INCLUDE: mars.yaml
Where earth.yaml
contains:
location: earth
targets:
- human
- cat
- dog
And mars.yaml
contains:
location: mars
targets:
- martian
At runtime, we can run the yp-data INFILE OUTFILE
command to process and recombine the YAML files.
To split the schema to support these YAML files, however, we'll
use the yp-schema SCHEMA-FILE CONFIG-FILE
command.
For this command to work, we need to supply it with some settings
to tell it where to split up the schema in the syntax:
{
"OUTPUT-ROOT-SCHEMA-FILENAME": "",
"OUTPUT-SUB-SCHEMA-FILENAME-1": "JMESPATH-1",
/* and so on */
}
Obviously, we must have a root schema output file name.
The rest of the entries are output file names for the subschemas.
The https://jmespath.org/ syntax tells the
yp-schema
command where to split JSON schema into
subschemas. In the example above, we can give use the setting:
{
"hello.schema.json": "",
"hello-location.schema.json": "properties.hello.items"
}
The resulting hello.schema.json
will look like this,
which can be used to validate both hello.yaml
and hello-root.yaml
:
{
"properties": {
"hello": {
"items": {
"oneOf": [
{"$ref": "hello-location.schema.json"},
{
"properties": {
"INCLUDE": {
"type": "string"
},
"QUERY": {
"type": "string"
}
},
"required": ["INCLUDE"],
"type": "string"
}
]
},
"type": "array"
}
},
"required": ["hello"],
"type": "object"
}
The resulting hello-location.schema.json
will look like this
which can be used to validate earth.yaml
and mars.yaml
:
{
"properties": {
"location": {
"type": "string"
},
"targets": {
"items": {
"type": "string"
},
"minItems": 1,
"type": "array",
"uniqueItems": true
}
},
"required": ["location", "targets"],
"type": "object"
}
Consider an example where we want to include only a subset of the data structure from the include file. We can use a JMESPath query to achieve this.
For example, we may have something like this in hello-root.yaml
:
hello:
INCLUDE: planets.yaml
QUERY: "[?type=='rocky'].{location: location, targets: targets}"
Where planets.yaml
contains:
- location: earth
type: rocky
targets:
- human
- cat
- dog
- location: mars
type: rocky
targets:
- martian
- location: jupiter
type: gaseous
targets:
- ...
Running yp-data hello-root.yaml
will return:
hello:
- location: earth
targets:
- human
- cat
- dog
- location: mars
targets:
- martian
You can tell yp-data
to look for a JSON schema file and validate
the current YAML file by adding a #!<SCHEMA-URI>
to the beginning
of the YAML file. The SCHEMA-URI
is a string pointing to the location
of a JSON schema file. Some simple assumptions apply:
- If
SCHMEA-URI
is a normal URI with a leading scheme, e.g.https://
, it is used as-is. - If
SCHEMA-URI
does not have a leading scheme and exists in the local file system, then it is also used as-is. - Otherwise, if the
YP_SCHEMA_PREFIX
environment variable is defined or if--schema-prefix=PREFIX
is specified, then the prefix will be added to the value of theSCHEMA-URI
.
Process variable substitution syntax for string values in YAML files. Consider:
key: ${SWEET_HOME}/sugar.txt
(Note: You can write $SWEET_HOME
or ${SWEET_HOME}
in here.)
If SWEET_HOME
is defined in the environment and has a value /home/sweet
,
then running yp-data
on the above will give:
key: /home/sweet/sugar.txt
You can also use the --define=NAME=VALUE
(-D NAME=VALUE
) option
of yp-data
to define and/or override environment variables.
E.g., yp-data -D SWEET_HOME=/home/sweet
provides another way to
specify the value of a variable to use for substitution.
The yp-data
application also supports date-time substitution using a
similar syntax, for variables names starting with YP_TIME_NOW
(time
when yp-data
starts running) YP_TIME_REF
(reference time,
specified using the YP_TIME_REF_VALUE
environment variable
or the --time-ref=VALUE
command line option). If no value is set
for the reference time, any reference to the reference time will
simply use the current time.
You can use one or more of these trialing suffixes to apply deltas for the date-time:
_ADD_XXX
: adds the duration to the date-time._MINUS_XXX
: substracts the duration to the date-time._AT_xxx
: sets individual fields of the date-time. E.g._AT_T0H
will set the hour of the day part the date-time to 00 hour.
where xxx
is date-time duration-like syntax in the form nYnMnDTnHnMnS
, e.g.:
- 12Y is 12 years.
- 1M2D is 1 month and 2 days.
- 1DT12H is 1 day and 12 hours.
- T12H30M is 12 hours and 30 minutes.
Examples, (for argument sake, let's assume the
current time is 2022-02-01T10:11:18Z
and
we have set the reference time to 2024-12-25T00:00:00Z
.)
${YP_TIME_NOW} # 2022-02-01T10:11:18+0000
${YP_TIME_NOW_AT_0H0M0S} # 2022-02-01T00:00:00+0000
${YP_TIME_NOW_AT_0H0M0S_PLUS_T12H} # 2022-02-01T12:00:00+0000
${YP_TIME_REF} # 2024-12-25T00:00:00+0000
${YP_TIME_REF_AT_1DT18H} # 2024-12-01T18:00:00+0000
${YP_TIME_REF_PLUS_T6H30M} # 2024-12-25T06:30:00+0000
${YP_TIME_REF_MINUS_1D} # 2024-12-24T00:00:00+0000
You can control date-time output formats using the
--time-format=[NAME=]FORMAT
option or YP_TIME_FORMAT[_<NAME>]
environment variables.
For example, if you set:
--time-format='%FT%T%z'
(default)--time-format=CTIME='%a %e %b %T %Z %Y'
orexport YP_TIME_FORMAT_CTIME='%a %e %b %T %Z %Y'
--time-format=ABBR='%Y%m%dT%H%M%S%z'
orexport YP_TIME_FORMAT_ABBR='%Y%m%dT%H%M%S%z'
Then:
${YP_TIME_REF} # 2024-12-25T00:00:00+0000
${YP_TIME_REF_FORMAT_CTIME} # Wed 25 Dec 00:00:00 GMT 2024
${YP_TIME_REF_PLUS_T12H_FORMAT_ABBR} # 20241225T120000+0000
Finally, if a variable name is already defined in the environment
or in a --define=...
option, then the defined value takes precedence,
so if you have already export YP_TIME_REF=whatever
, then you will get
the value whatever
instead of the reference time.