stac-extensions · fmigneault · Nov 6, 2024 · Nov 6, 2024 · Nov 6, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,6 +17,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Changed
 - Use JSON `$schema` version `2019-09` to allow use of `unevaluatedProperties` for stricter validation of MLM fields.
+- Explicitly disallow `mlm:name`, `mlm:input`, `mlm:output` and `mlm:hyperparameters` at the Asset level.
+  These fields describe the model as a whole and should therefore be defined in Item properties.
 - Moved `norm_type` to `value_scaling` object to better reflect the expected operation, which could be another
   operation than what is typically known as "normalization" or "standardization" techniques in machine learning.
 - Moved `statistics` to `value_scaling` object to better reflect their mutual `type` and additional

diff --git a/README.md b/README.md
@@ -116,34 +116,41 @@ The fields in the table below can be used in these parts of STAC documents:
 
 [item-assets]: https://github.com/stac-extensions/item-assets
 
-| Field Name                  | Type                                                          | Description                                                                                                                                                                                                                                                                                 |
-|-----------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| mlm:name                    | string                                                        | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model.                                                    |
-| mlm:architecture            | [Model Architecture](#model-architecture) string              | **REQUIRED** A generic and well established architecture name of the model.                                                                                                                                                                                                                 |
-| mlm:tasks                   | \[[Task Enum](#task-enum)]                                    | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). |
-| mlm:framework               | string                                                        | Framework used to train the model (ex: PyTorch, TensorFlow).                                                                                                                                                                                                                   |
-| mlm:framework_version       | string                                                        | The `framework` library version. Some models require a specific version of the machine learning `framework` to run.                                                                                                                                                                         |
-| mlm:memory_size             | integer                                                       | The in-memory size of the model on the accelerator during inference (bytes).                                                                                                                                                                                                                |
-| mlm:total_parameters        | integer                                                       | Total number of model parameters, including trainable and non-trainable parameters.                                                                                                                                                                                                         |
-| mlm:pretrained              | boolean                                                       | Indicates if the model was pretrained. If the model was pretrained, consider providing `pretrained_source` if it is known.                                                                                                                                                                  |
-| mlm:pretrained_source       | string \| null                                                | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. If trained from scratch (i.e.: `pretrained = false`), the `null` value should be set explicitly.                                            |
-| mlm:batch_size_suggestion   | integer                                                       | A suggested batch size for the accelerator and summarized hardware.                                                                                                                                                                                                                         |
-| mlm:accelerator             | [Accelerator Type Enum](#accelerator-type-enum) \| null       | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator.                                                                                                                                     |
-| mlm:accelerator_constrained | boolean                                                       | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`.                                                                                                                                                       |
-| mlm:accelerator_summary     | string                                                        | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details.                                                                                                                                                                        |
-| mlm:accelerator_count       | integer                                                       | A minimum amount of `accelerator` instances required to run the model.                                                                                                                                                                                                                      |
-| mlm:input                   | \[[Model Input Object](#model-input-object)]                  | **REQUIRED** Describes the transformation between the EO data and the model input.                                                                                                                                                                                                          |
-| mlm:output                  | \[[Model Output Object](#model-output-object)]                | **REQUIRED** Describes each model output and how to interpret it.                                                                                                                                                                                                                           |
-| mlm:hyperparameters         | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model.                                                                                                                                                                                                                                          |
-
-To decide whether above fields should be applied under Item `properties` or under respective Assets, the context of
-each field must be considered. For example, the `mlm:name` should always be provided in the Item `properties`, since
-it relates to the model as a whole. In contrast, some models could support multiple `mlm:accelerator`, which could be
-handled by distinct source code represented by different Assets. In such case, `mlm:accelerator` definitions should be
-nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset level
-would be considered for that specific Asset, and the value at the Item level would be used for other Assets that did
-not override it for their respective reference. For some of the fields, further details are provided in following
-sections to provide more precisions regarding some potentially ambiguous use cases.
+| Field Name                              | Type                                                          | Description                                                                                                                                                                                                                                                                                 |
+|-----------------------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| mlm:name <sup>[[1]][1]</sup>            | string                                                        | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model.                                                           |
+| mlm:architecture                        | [Model Architecture](#model-architecture) string              | **REQUIRED** A generic and well established architecture name of the model.                                                                                                                                                                                                                 |
+| mlm:tasks                               | \[[Task Enum](#task-enum)]                                    | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). |
+| mlm:framework                           | string                                                        | Framework used to train the model (ex: PyTorch, TensorFlow).                                                                                                                                                                                                                                |
+| mlm:framework_version                   | string                                                        | The `framework` library version. Some models require a specific version of the machine learning `framework` to run.                                                                                                                                                                         |
+| mlm:memory_size                         | integer                                                       | The in-memory size of the model on the accelerator during inference (bytes).                                                                                                                                                                                                                |
+| mlm:total_parameters                    | integer                                                       | Total number of model parameters, including trainable and non-trainable parameters.                                                                                                                                                                                                         |
+| mlm:pretrained                          | boolean                                                       | Indicates if the model was pretrained. If the model was pretrained, consider providing `pretrained_source` if it is known.                                                                                                                                                                  |
+| mlm:pretrained_source                   | string \| null                                                | The source of the pretraining. Can refer to popular pretraining datasets by name (i.e. Imagenet) or less known datasets by URL and description. If trained from scratch (i.e.: `pretrained = false`), the `null` value should be set explicitly.                                            |
+| mlm:batch_size_suggestion               | integer                                                       | A suggested batch size for the accelerator and summarized hardware.                                                                                                                                                                                                                         |
+| mlm:accelerator                         | [Accelerator Type Enum](#accelerator-type-enum) \| null       | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator.                                                                                                                                     |
+| mlm:accelerator_constrained             | boolean                                                       | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`.                                                                                                                                                       |
+| mlm:accelerator_summary                 | string                                                        | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details.                                                                                                                                                                        |
+| mlm:accelerator_count                   | integer                                                       | A minimum amount of `accelerator` instances required to run the model.                                                                                                                                                                                                                      |
+| mlm:input <sup>[[1]][1]</sup>           | \[[Model Input Object](#model-input-object)]                  | **REQUIRED** Describes the transformation between the EO data and the model input.                                                                                                                                                                                                          |
+| mlm:output <sup>[[1]][1]</sup>          | \[[Model Output Object](#model-output-object)]                | **REQUIRED** Describes each model output and how to interpret it.                                                                                                                                                                                                                           |
+| mlm:hyperparameters <sup>[[1]][1]</sup> | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model.                                                                                                                                                                                                                                          |
+
+[1]: #1-allowed-only-in-item-properties
+
+##### <sup>[1]</sup> Allowed Only in Item `properties`
+
+> [!NOTE]
+> Unless stated otherwise by <sup>[[1]][1]</sup> in the table, fields can be used at either the Item or Asset level.
+> <br><br>
+> To decide whether above fields should be applied under Item `properties` or under respective Assets, the context of
+> each field must be considered. For example, the `mlm:name` should always be provided in the Item `properties`, since
+> it relates to the model as a whole. In contrast, some models could support multiple `mlm:accelerator`, which could be
+> handled by distinct source code represented by different Assets. In such case, `mlm:accelerator` definitions should be
+> nested under their relevant Asset. If a field is defined both at the Item and Asset level, the value at the Asset
+> level would be considered for that specific Asset, and the value at the Item level would be used for other Assets that
+> did not override it for their respective reference. For some of the fields, further details are provided in following
+> sections to provide more precisions regarding some potentially ambiguous use cases.
 
 In addition, fields from the multiple relevant extensions should be defined as applicable. See
 [Best Practices - Recommended Extensions to Compose with the ML Model Extension](best-practices.md#recommended-extensions-to-compose-with-the-ml-model-extension)