Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix mlm:artifact_type check missing + update corresponding tests/examples #58

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- Use JSON `$schema` version `2019-09` to allow use of `unevaluatedProperties` for stricter validation of MLM fields.
- Explicitly disallow `mlm:name`, `mlm:input`, `mlm:output` and `mlm:hyperparameters` at the Asset level.
These fields describe the model as a whole and should therefore be defined in Item properties.
- Moved `norm_type` to `value_scaling` object to better reflect the expected operation, which could be another
operation than what is typically known as "normalization" or "standardization" techniques in machine learning.
- Moved `statistics` to `value_scaling` object to better reflect their mutual `type` and additional
Expand Down
63 changes: 35 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,34 +116,41 @@ The fields in the table below can be used in these parts of STAC documents:

[item-assets]: https://github.com/stac-extensions/item-assets

| Field Name | Type | Description |
|-----------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| mlm:name | string | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. |
| mlm:architecture | [Model Architecture](#model-architecture) string | **REQUIRED** A generic and well established architecture name of the model. |
| mlm:tasks | \[[Task Enum](#task-enum)] | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). |
| mlm:framework | string | Framework used to train the model (ex: PyTorch, TensorFlow). |
| mlm:framework_version | string | The `framework` library version. Some models require a specific version of the machine learning `framework` to run. |
| mlm:memory_size | integer | The in-memory size of the model on the accelerator during inference (bytes). |
| mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. |
| mlm:pretrained | boolean | Indicates if the model was pretrained. If the model was pretrained, consider providing `pretrained_source` if it is known. |
| mlm:pretrained_source | string \| null | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. If trained from scratch (i.e.: `pretrained = false`), the `null` value should be set explicitly. |
| mlm:batch_size_suggestion | integer | A suggested batch size for the accelerator and summarized hardware. |
| mlm:accelerator | [Accelerator Type Enum](#accelerator-type-enum) \| null | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator. |
| mlm:accelerator_constrained | boolean | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`. |
| mlm:accelerator_summary | string | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details. |
| mlm:accelerator_count | integer | A minimum amount of `accelerator` instances required to run the model. |
| mlm:input | \[[Model Input Object](#model-input-object)] | **REQUIRED** Describes the transformation between the EO data and the model input. |
| mlm:output | \[[Model Output Object](#model-output-object)] | **REQUIRED** Describes each model output and how to interpret it. |
| mlm:hyperparameters | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model. |

To decide whether above fields should be applied under Item `properties` or under respective Assets, the context of
each field must be considered. For example, the `mlm:name` should always be provided in the Item `properties`, since
it relates to the model as a whole. In contrast, some models could support multiple `mlm:accelerator`, which could be
handled by distinct source code represented by different Assets. In such case, `mlm:accelerator` definitions should be
nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset level
would be considered for that specific Asset, and the value at the Item level would be used for other Assets that did
not override it for their respective reference. For some of the fields, further details are provided in following
sections to provide more precisions regarding some potentially ambiguous use cases.
| Field Name | Type | Description |
|-----------------------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| mlm:name <sup>[[1]][1]</sup> | string | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. |
| mlm:architecture | [Model Architecture](#model-architecture) string | **REQUIRED** A generic and well established architecture name of the model. |
| mlm:tasks | \[[Task Enum](#task-enum)] | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). |
| mlm:framework | string | Framework used to train the model (ex: PyTorch, TensorFlow). |
| mlm:framework_version | string | The `framework` library version. Some models require a specific version of the machine learning `framework` to run. |
| mlm:memory_size | integer | The in-memory size of the model on the accelerator during inference (bytes). |
| mlm:total_parameters | integer | Total number of model parameters, including trainable and non-trainable parameters. |
| mlm:pretrained | boolean | Indicates if the model was pretrained. If the model was pretrained, consider providing `pretrained_source` if it is known. |
| mlm:pretrained_source | string \| null | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. If trained from scratch (i.e.: `pretrained = false`), the `null` value should be set explicitly. |
| mlm:batch_size_suggestion | integer | A suggested batch size for the accelerator and summarized hardware. |
| mlm:accelerator | [Accelerator Type Enum](#accelerator-type-enum) \| null | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator. |
| mlm:accelerator_constrained | boolean | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`. |
| mlm:accelerator_summary | string | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details. |
| mlm:accelerator_count | integer | A minimum amount of `accelerator` instances required to run the model. |
| mlm:input <sup>[[1]][1]</sup> | \[[Model Input Object](#model-input-object)] | **REQUIRED** Describes the transformation between the EO data and the model input. |
| mlm:output <sup>[[1]][1]</sup> | \[[Model Output Object](#model-output-object)] | **REQUIRED** Describes each model output and how to interpret it. |
| mlm:hyperparameters <sup>[[1]][1]</sup> | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model. |

[1]: #1-allowed-only-in-item-properties

##### <sup>[1]</sup> Allowed Only in Item `properties`

> [!NOTE]
> Unless stated otherwise by <sup>[[1]][1]</sup> in the table, fields can be used at either the Item or Asset level.
> <br><br>
> To decide whether above fields should be applied under Item `properties` or under respective Assets, the context of
> each field must be considered. For example, the `mlm:name` should always be provided in the Item `properties`, since
> it relates to the model as a whole. In contrast, some models could support multiple `mlm:accelerator`, which could be
> handled by distinct source code represented by different Assets. In such case, `mlm:accelerator` definitions should be
> nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset
> level would be considered for that specific Asset, and the value at the Item level would be used for other Assets that
> did not override it for their respective reference. For some of the fields, further details are provided in following
> sections to provide more precisions regarding some potentially ambiguous use cases.

In addition, fields from the multiple relevant extensions should be defined as applicable. See
[Best Practices - Recommended Extensions to Compose with the ML Model Extension](best-practices.md#recommended-extensions-to-compose-with-the-ml-model-extension)
Expand Down
Loading
Loading