Initial version of defining the interfaces to accept metrics #15913

sm-xu · 2024-12-05T15:26:01Z

Description of PR

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

linux-foundation-easycla · 2024-12-05T15:26:06Z

The committers listed above are authorized under a signed CLA.

✅ login: sm-xu / name: Mei Xu (2c63137, b48641a, 602f06e, 3dd931b, f8fb42f, 9e0a294, 7d24843, 9650f9d, a4cf9fd, 6fcf355, 31f56d5, 2d06ebd)

mssonicbld · 2024-12-05T15:27:26Z

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing tests/snappi_tests/intf_utils/intf_accept_metrics.py check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 tests/snappi_tests/intf_utils/intf_accept_metrics.py:47:1: E302 expected 2 blank lines, found 1 tests/snappi_tests/intf_utils/intf_accept_metrics.py:50:19: E221 multiple spaces before operator tests/snappi_tests/intf_utils/intf_accept_metrics.py:51:19: E221 multiple spaces before operator tests/snappi_tests/intf_utils/intf_accept_metrics.py:52:19: E221 multiple spaces before operator tests/snappi_tests/intf_utils/intf_accept_metrics.py:72:18: E221 multiple spaces before operator ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

tests/snappi_tests/intf_utils/intf_accept_metrics.py

r12f · 2024-12-05T19:48:04Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+#  Metrics data are organized into the hierarchies below
+#  ResourceMetrics
+#    ├── ResourceID
+#    └── ScopeMetrics


I don't think we need the level of ScopeMetrics.

My thought is
Resource level: all metrics from one test run
Scope level: all metrics belonging to one device
Metric level: all metrics belonging to one category
I might be wrong. Let's discuss this topic tomorrow.

tests/snappi_tests/intf_utils/intf_accept_metrics.py

mssonicbld · 2024-12-06T02:41:00Z

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed - hook id: trailing-whitespace - exit code: 1 - files were modified by this hook Fixing tests/snappi_tests/intf_utils/intf_accept_metrics.py fix end of files.........................................................Passed check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 tests/snappi_tests/intf_utils/intf_accept_metrics.py:54:1: E303 too many blank lines (4) tests/snappi_tests/intf_utils/intf_accept_metrics.py:57:1: E266 too many leading '#' for block comment tests/snappi_tests/intf_utils/intf_accept_metrics.py:63:26: E221 multiple spaces before operator tests/snappi_tests/intf_utils/intf_accept_metrics.py:64:24: E221 multiple spaces before operator tests/snappi_tests/intf_utils/intf_accept_metrics.py:66:25: E221 multiple spaces before operator ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

r12f · 2024-12-06T19:45:03Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+
+############################## Report Metrics ##############################
+
+class MetricReporterFactory:


move factory to another file, so we can override easily.

with this change, we can do this in another file:

class MetricReporterFactory: def create_metrics_reporter(self): return OtelMetricReporter(...) class OtelMetricReporter: def emit(....): # Real implementation goes here, which each customer can define their own.

r12f · 2024-12-06T19:50:46Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+#    ├── TestID
+#    └── DeviceMetrics
+#       ├── DeviceID
+#       └── Metric


create a generic Metric class that represents a single metric, which contains:

description/labels: Name, Description, unit, ....

Value: single layer is good enough with inheritance.

Reporter: Reference to MetricsReporter. Register itself to Reporter when created, so Reporter can gather all metrics after everything is changed.

class Metric...: def __init__(name, ...., reporter): reporter.add_metric(self) .... class GaugeMetric(Metric): def __init__(name, ...., reporter): super.__init__(...) self.value = 0 def set(v): self.value = v .... reporter = MetricReporterFactory(...).build() port_rx = GaugeMetric(...., reporter) port_rx.set(123) reporter.report(time)

Hence, ultimately the final code for people to use would be:

metrics = { "PortRx" = GaugeMetric(......, reporter) .... } for r in csv: for c in r: metric[c.title].set(c.value) reporter.report(time)

r12f · 2024-12-06T19:56:57Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+# software version. They are also from the same test case identified by test_run_id.
+class TestMetrics:
+    def __init__(self, testbed_name, os_version, testcase_name, test_run_id):
+        self.testbed_name  = testbed_name


all these fields can be moved to reporter, since it is shared by everyone.

TestMetrics itself can be removed, once we add the per metric class.

r12f · 2024-12-06T19:57:37Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+# software version. They are also from the same test case identified by test_run_id.
+class TestMetrics:
+    def __init__(self, testbed_name, os_version, testcase_name, test_run_id):
+        self.testbed_name  = testbed_name


TestMetrics itself can be removed, once we add the per metric class.

r12f · 2024-12-06T20:00:52Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+
+############################## Report Metrics ##############################
+
+class MetricReporterFactory:


with this change, we can do this in another file:

class MetricReporterFactory: def create_metrics_reporter(self): return OtelMetricReporter(...) class OtelMetricReporter: def emit(....): # Real implementation goes here, which each customer can define their own.

r12f · 2024-12-06T20:02:19Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

@@ -0,0 +1,147 @@
+# This file defines the interfaces that snappi tests accept external metrics.


All common label names are missing too, e.g.: PortId, QueueId, PSUId....

otherwise it will be very hard to create unified dashboard, because each tests could use its own names, and causing problems in filters.

mssonicbld · 2024-12-07T06:40:43Z

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed fix end of files.........................................................Failed - hook id: end-of-file-fixer - exit code: 1 - files were modified by this hook Fixing tests/snappi_tests/intf_utils/intf_report_metrics.py check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Failed - hook id: check-ast - exit code: 1 tests/snappi_tests/intf_utils/intf_accept_metrics.py: failed parsing with CPython 3.10.12: Traceback (most recent call last): File "/home/AzDevOps/.cache/pre-commit/repoqc6a3xnx/py_env-python3/lib/python3.10/site-packages/pre_commit_hooks/check_ast.py", line 21, in main ast.parse(f.read(), filename=filename) File "/usr/lib/python3.10/ast.py", line 50, in parse ... [truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

r12f · 2024-12-10T21:47:05Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+                 name,
+                 description,
+                 unit,
+                 timestamp,


The following fields is common for entire tests, so it can be move into the reporter as common metadata:

testbed_name

os_version

testcase_name

test_run_id

The following fields are common for all metrics in a single report action, so it can be lifted into the reporter's report function parameters:

timestamp

The following fields are not clear on its purpose, we need to rename it to make it clear:

component_id

maybe the timestamp here means the test_start_time?

r12f · 2024-12-10T21:47:53Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+
+class Metric:
+    def __init__(self,
+                 name,


missing type hints

r12f · 2024-12-10T21:49:26Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+                         testcase_name, test_run_id, device_id, component_id, reporter, metadata, metrics)
+
+        # Additional fields for GaugeMetric
+        self.metrics = metrics or {}


each Metric should only represent a single metric. If we are trying to create something that holds all metrics, it should be 1 layer above, say MetricCollections / MetricList / Metrics or whatever.

the purpose of this field is not too clear...

r12f · 2024-12-10T21:50:29Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

@@ -0,0 +1,103 @@
+# This file defines the interfaces that snappi tests accept external metrics.
+import logging


The file it not part of intf_utils, because it is not related to interface.

@sm-xu this comment is missing.

r12f · 2024-12-10T21:52:33Z

tests/snappi_tests/intf_utils/intf_report_metrics.py

+        # Temporary code to report metrics
+        print(f"Reporting metrics at {timestamp}")
+        for metric in self.metrics:
+            print(metric)


it will be great to create a new abstracted function for us to override.

r12f · 2024-12-10T21:52:45Z

tests/snappi_tests/intf_utils/intf_report_metrics.py

+        self.reporter = OtelMetricReporter(self.connection)
+        return self.reporter
+
+class OtelMetricReporter:


The reporter should not be limited to Otel.

this is not addressed

r12f · 2024-12-11T17:39:49Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+            name (str): metric name (e.g., psu power, sensor temperature, port stats, etc.)
+            description (str): brief description of the metric
+            unit (str): metric unit (e.g., seconds, bytes)
+            timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected


if the timestamp is for logging the collection time, the reporter already has it and can be removed

r12f · 2024-12-11T17:41:30Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+            unit (str): metric unit (e.g., seconds, bytes)
+            timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected
+            device_id (str): switch device ID
+            component_id (str): ID of the component (e.g., psu, sensor, port, etc.), where metrics are produced


this can be ignored, since the components are included in the name and we won't use it for filtering too.

Please check out my email

r12f · 2024-12-11T17:41:53Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

+            description (str): brief description of the metric
+            unit (str): metric unit (e.g., seconds, bytes)
+            timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected
+            device_id (str): switch device ID


this can be lifted up to reporter, since it is common to all

r12f · 2024-12-11T17:42:30Z

tests/snappi_tests/intf_utils/intf_report_metrics.py

+        self.reporter = OtelMetricReporter(self.connection)
+        return self.reporter
+
+class OtelMetricReporter:


this is not addressed

r12f · 2024-12-11T17:42:52Z

tests/snappi_tests/intf_utils/intf_report_metrics.py

+        pass
+
+
+class KustoReporter:


let's not limit the implementation to kusto

TestMetricRecordRepoter

r12f · 2024-12-11T17:44:11Z

tests/snappi_tests/intf_utils/intf_accept_metrics.py

@@ -0,0 +1,89 @@
+# This file defines the interfaces that snappi tests accept external metrics.


the definitions of the metric names and meta are missing in the file, we need to get them defined and show a unified format. this will be used for crafting the dashboards.

r12f · 2024-12-13T05:45:40Z

tests/snappi_tests/intf_utils/intf_report_metrics.py

+        Returns:
+            An instance of the specified metrics reporter.
+        """
+        if data_type == "metrics":


will be better to split this into 2 functions instead of using magic string.

mssonicbld · 2024-12-16T18:06:02Z

/azp run

azure-pipelines · 2024-12-16T18:06:15Z

Azure Pipelines successfully started running 1 pipeline(s).

r12f · 2024-12-19T00:15:18Z

tests/snappi_tests/metrics_utils/examples.py

+        "psu.id": "psu1",
+        "model": "PWR-2422-HV-RED",
+        "serial": "6A011010142349Q"}
+


remove the sensitive data.

r12f · 2024-12-19T00:15:51Z

tests/snappi_tests/metrics_utils/examples.py

+                        description = "PSU power reading",
+                        unit = "W",
+                        reporter = reporter)
+    power.set_gauge_metric(scope_labels, 222.00)


# Create a metric and pass it to the reporter vol = GaugeMetric(name = "Voltage", description = "PSU voltage reading", unit = "V", reporter = reporter) # Create a metric and pass it to the reporter cur = GaugeMetric(name = "Current", description = "PSU current reading", unit = "A", reporter = reporter) # Create a metric and pass it to the reporter power = GaugeMetric(name = "Power", description = "PSU power reading", unit = "W", reporter = reporter) scope_labels["psu.id"] = "PSU 1" vol.set_gauge_metric(scope_labels, 12.09) cur.set_gauge_metric(scope_labels, 18.38) power.set_gauge_metric(scope_labels, 222.00) scope_labels["psu.id"] = "PSU 2" vol.set_gauge_metric(scope_labels, 12.10) cur.set_gauge_metric(scope_labels, 17.72) power.set_gauge_metric(scope_labels, 214.00)

r12f · 2024-12-19T00:16:24Z

tests/snappi_tests/metrics_utils/metrics_accepter.py

+                 name: str,
+                 description: str,
+                 unit: str,
+                 reporter: MetricReporterFactory):


this is not the factory.

r12f · 2024-12-19T00:17:08Z

tests/snappi_tests/metrics_utils/metrics_accepter.py

+        return (f"Metric(name={self.name!r}, "
+                f"description={self.description!r}, "
+                f"unit={self.unit!r}, "
+                f"reporter={self.reporter!r})")


reporter might not be converted to string.

r12f · 2024-12-19T00:17:30Z

tests/snappi_tests/metrics_utils/metrics_accepter.py

+        # Initialize the base class
+        super().__init__(name, description, unit, reporter)
+
+    def set_gauge_metric(self, scope_labels: Dict[str, str], value: Union[int, str, float]):


rename function to record, we need to support multiple metrics.

r12f · 2024-12-19T00:32:25Z

tests/snappi_tests/metrics_utils/metrics_reporter.py

+
+class MetricReporterFactory:
+    def __init__(self):
+        self.reporter = None


this is not needed.

r12f · 2024-12-19T00:33:27Z

tests/snappi_tests/metrics_utils/examples.py

+    reporter = factory.create_metrics_reporter(resource_labels)
+
+    scope_labels = {
+        "device.id": "str-7060x6-64pe-stress-02",


label name needs to be standarized for our test cases. otherwise, there is no way to build standard dashboards.

r12f · 2024-12-19T00:35:31Z

tests/snappi_tests/metrics_utils/metrics_reporter.py

+        # Temporary code initializing a RecordsReporter
+        # will be replaced with a real initializer such as Kusto
+        self.resource_labels = resource_labels
+        self.timestamp = int(time.time() * 1_000_000_000) # epoch time in nanoseconds


timestamp should not be here.

it should be report function parameter.

r12f · 2024-12-19T00:36:28Z

tests/snappi_tests/metrics_utils/metrics_reporter.py

+        self.resource_labels = resource_labels
+        self.timestamp = int(time.time() * 1_000_000_000) # epoch time in nanoseconds
+        self.records = []
+


need function to push records into the self.records list.

r12f · 2024-12-19T00:38:27Z

tests/snappi_tests/metrics_utils/metrics_reporter.py

+        Abstract method to report records at a given timestamp.
+        Subclasses must override this method.
+        """
+        pass


report function is usually written in this way:

def report(self): incoming_records = self.records self.records = [] self.process_incoming_records(incoming_records)

r12f · 2024-12-19T00:39:32Z

tests/snappi_tests/metrics_utils/metrics_accepter.py

@@ -0,0 +1,64 @@
+"""


rename this file to metrics.py

the point is to consider the usage:

from metrics_utils.metrics_accepter import Metric, GaugeMetric # The code looks weird here

from utils.metrics import GaugeMetric # This looks more nature from metrics_utils.metrics import GaugeMetric # This works too.

r12f · 2024-12-19T01:07:34Z

tests/snappi_tests/metrics_utils/metrics_reporter.py

+
+#from metrics_accepter import Metric, GaugeMetric
+
+class MetricReporterFactory:


move factory to a dedicated file. for reporter, we can leave in this file or move to metrics.py, no strong opinion in that.

mssonicbld · 2024-12-20T04:46:06Z

/azp run

azure-pipelines · 2024-12-20T04:46:17Z

Azure Pipelines successfully started running 1 pipeline(s).

r12f · 2024-12-20T08:50:10Z

tests/snappi_tests/utils/allowed_labels.json

@@ -0,0 +1,13 @@
+{
+    "allowed_labels": [
+        "testbed.id",


Let's make them constants in code.

Do you mean changing it to this in metrics.py?

Only these labels are allowed

ALLOWED_LABELS = {
"testbed.id",
"os.version",
"testrun.id",
"testcase",
"device.id",
"psu.id",
"port.id",
"sensor.id",
"queue.id",
}

class MetricsReporter: def __init__(self, resource_labels: Dict[str, str]): for label in resource_labels: if label not in ALLOWED_LABELS: raise LabelError(f"Invalid label: {label}.") # Temporary code initializing a MetricsReporter # will be replaced with a real initializer such as OpenTelemetry self.resource_labels = resource_labels self.metrics = []

r12f · 2024-12-20T08:51:31Z

tests/snappi_tests/utils/examples.py

+
+    """
+    resource_labels = {
+        "testbed.id": "sonic_stress_testbed",


Label keys should be constants instead of using literals.

2 approaches:

Add
ALLOWED_LABELS = {
"testbed.id",
"os.version",
"testrun.id",
"testcase",
"device.id",
"psu.id",
"port.id",
"sensor.id",
"queue.id",
}
in metrics.py and keep this place unchanged.

Add
ALLOWED_LABELS = {
"TESTBED_ID": "testbed.id",
"OS_VERSION": "os.version",
"TESTCASE": "testcase",
"TESTRUN_ID": "testrun.id",
... ...
}
in metrics.py and change this place to
resource_labels = {
ALLOWED_LABELS["TESTBED_ID"]: "sonic_stress_testbed",
ALLOWED_LABELS["OS_VERSION"]: "11.2.3",
ALLOWED_LABELS["TESTCASE"]: "stress_test1",
ALLOWED_LABELS["TESTRUN_ID"]: "202412101217"
}

Which way do you prefer?

take os version as an example:

from typing import Final METRIC_LABEL_TEST_TESTBED: Final[str] = "test.testbed" METRIC_LABEL_TEST_BRANCH: Final[str] = "test.branch" METRIC_LABEL_TEST_CASE: Final[str] = "test.testcase" METRIC_LABEL_TEST_FILE: Final[str] = "test.test_file" ... METRIC_LABEL_DEVICE_ID: Final[str] = "device.id" METRIC_LABEL_DEVICE_PORT_ID: Final[str] = "device.port.id" METRIC_LABEL_DEVICE_QUEUE_ID: Final[str] = "device.queue.id" METRIC_LABEL_DEVICE_PSU_ID: Final[str] = "device.psu.id" ... resource_labels = { METRIC_LABEL_TEST_TESTBED: "abc", METRIC_LABEL_TEST_BRANCH: "202411", METRIC_LABEL_TEST_CASE: "mock-case", METRIC_LABEL_TEST_FILE: "mock-test.py", ... } ... scope_labels[METRIC_LABEL_DEVICE_PSU_ID] = "PSU 1" voltage.record(scope_labels, 12.09)

please make sure to check the design doc I shared with you for adding the required labels.

r12f · 2024-12-20T08:54:44Z

tests/snappi_tests/utils/metrics.py

+        """
+
+
+class TestResultsReporter:


This is not test result, which usually refers to pass/fail sort of things

What do we want to name it then? How about TestStatus?

r12f · 2024-12-20T08:56:50Z

tests/snappi_tests/utils/metrics.py

+        stashed_test_results = self.test_results
+        self.test_results = []
+
+        """


Are these removed accidentally and forgot to put back?

I don't quite understand you. In the commented code
"""
print(f"Current time (ns): {current_time}")
pprint(self.resource_labels)
pprint(stashed_metrics)
process_stashed_metrics(current_time, stashed_metrics)
"""
The first 3 lines are for my own testing purpose only. process_stashed_metrics() will later be replaced with real code to emit the metrics to InfluxDB.

there is no way in language level to override the commented code, in here we need to provide a "virtual function" for the subclass to implement.

r12f · 2024-12-20T08:59:43Z

tests/snappi_tests/utils/metrics.py

+        if timestamp is not None:
+            current_time = timestamp
+        else:
+            current_time = time.time_ns()


Can this be moved to parameter?

Is this what you meant?
current_time = timestamp or time.time_ns()

have you tried this?

def report(self, timestamp=time.time_ns()):

r12f · 2024-12-20T09:05:11Z

tests/snappi_tests/utils/metrics.py

+        self.resource_labels = resource_labels
+        self.test_results = []
+
+    def stash_test_results(self, labels: Dict[str, str], value: Union[int, str, float]):


stash_record

r12f · 2024-12-20T09:05:22Z

tests/snappi_tests/utils/metrics.py

+        self.resource_labels = resource_labels
+        self.metrics = []
+
+    def stash_metric(self, new_metric: 'GaugeMetric', labels: Dict[str, str], value: Union[int, str, float]):


stash_record

second parameter type is better to be the base class

r12f · 2024-12-20T09:09:19Z

tests/snappi_tests/utils/metrics.py

+
+    def stash_metric(self, new_metric: 'GaugeMetric', labels: Dict[str, str], value: Union[int, str, float]):
+        # add a new metric
+        self.metrics.append({"labels": labels, "value": value})


labels will need to be deep copied

Change it to

# Deep copy the labels to ensure stored data is immutable copied_labels = deepcopy(labels) # Add the new metric self.metrics.append({"labels": copied_labels, "value": value})

Do I understand you correctly?

yes, something like this.

sm-xu

Please review. Thanks!

r12f · 2024-12-20T19:12:24Z

tests/snappi_tests/utils/metrics.py

+        stashed_test_results = self.test_results
+        self.test_results = []
+
+        """


there is no way in language level to override the commented code, in here we need to provide a "virtual function" for the subclass to implement.

r12f · 2024-12-20T19:13:45Z

tests/snappi_tests/utils/reporter_factory.py

@@ -0,0 +1,20 @@
+


nit: remove empty line.

I wonder why pre-commit didn't fail for this.... CI does failed due to static analysis. might be better to check that.

r12f · 2024-12-20T19:14:30Z

tests/snappi_tests/utils/metrics.py

+from typing import List, Dict, Union
+
+# Function to load allowed labels from a JSON file
+def load_allowed_labels(filename="allowed_labels.json"):


this could be removed once moved to constants.

r12f · 2024-12-20T19:16:44Z

tests/snappi_tests/utils/examples.py

+
+    """
+    resource_labels = {
+        "testbed.id": "sonic_stress_testbed",


please make sure to check the design doc I shared with you for adding the required labels.

r12f · 2024-12-23T22:30:54Z

the pre-commit checks in CI are failing as below, please check them and we need to get them fixed:

(I highly recommend installing pre-commit on your local dev machine, it will help find these problems as soon as possible).

sm-xu

Please take a look at my replies. Thank you!

mssonicbld · 2024-12-24T21:35:45Z

/azp run

azure-pipelines · 2024-12-24T21:35:56Z

Azure Pipelines successfully started running 1 pipeline(s).

r12f · 2024-12-27T01:12:41Z

tests/common/telemetry/examples.py

+# Add the root directory of the project to sys.path
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))
+
+from snappi_tests.utils.metrics import *


import is not fixed.

r12f · 2024-12-27T01:17:54Z

tests/common/telemetry/examples.py

+
+    # Create a MetricReporterFactory and build a MetricReporter
+    factory = TelemetryReporterFactory()
+    reporter = factory.create_periodic_metrics_reporter(resource_labels)


maybe we don't need to create the factory object here, but directly use:

reporter = TelemetryReporterFactory.create_periodic_metrics_reporter(resource_labels)

r12f · 2024-12-27T01:18:37Z

tests/common/telemetry/examples.py

+    PSU 2  PWR-ABCD         1Z011010156787X        01          12.01          17.72       214.00  OK        green
+
+    """
+    resource_labels = {


update resource_labels to common_labels will be better.

r12f · 2024-12-27T01:18:48Z

tests/common/telemetry/examples.py

+    factory = TelemetryReporterFactory()
+    reporter = factory.create_periodic_metrics_reporter(resource_labels)
+
+    scope_labels = {METRIC_LABEL_DEVICE_ID: "switch-A"}


update scope_labels to metric_labels will be better.

r12f · 2024-12-27T01:19:32Z

tests/snappi_tests/utils/examples.py

@@ -0,0 +1,69 @@
+import logging


this is duplicated now, we should remove it.

r12f · 2024-12-27T01:19:56Z

tests/snappi_tests/utils/examples.py

@@ -0,0 +1,69 @@
+import logging


and by moving the file, I mean move the metrics.py and report_factory.py, not just the example : D

r12f · 2024-12-27T01:24:09Z

tests/snappi_tests/utils/metrics.py

+METRIC_LABEL_CAST_DIRECTION: Final[str] = "cast_direction"  # unicast or multicast
+METRIC_LABEL_HARDWARE_REVISION: Final[str] = "hardware.revision"
+METRIC_LABEL_PRIORITY_GROUP: Final[str] = "priority_group"
+METRIC_LABEL_BUFFER_POOL: Final[str] = "buffer_pool"


it will be better to have the name more... organized per feature. labels needs to be explicit, so they don't step on each other's toe in the dashboard, e.g.: 2 metrics might both having the name "model" - one for PSU, one for transceiver. So, this naming is not a good approach.

I would recommend to be:

METRIC_LABEL_DEVICE_PSU_MODEL: Final[str] = "device.psu.model" # component refers to the level below, i.e. parts used by a switch METRIC_LABEL_DEVICE_PSU_SERIAL: Final[str] = "serial" METRIC_LABEL_DEVICE_PG_ID: Final[str] = "device.pg.id" METRIC_LABEL_DEVICE_BUFFER_POOL_ID: Final[str] = "device.buffer_pool.id"

I cannot really tell what are these below, but I bet you should now getting the idea and can make the adjustments by your own now.

METRIC_LABEL_CAST_DIRECTION: Final[str] = "cast_direction" # unicast or multicast METRIC_LABEL_HARDWARE_REVISION: Final[str] = "hardware.revision"

r12f · 2024-12-27T01:47:02Z

Also, we need to treat the CI failure seriously. I am seeing many real failures from pre-commit checks. @sm-xu

mssonicbld · 2025-01-03T02:57:25Z

/azp run

azure-pipelines · 2025-01-03T02:57:37Z

Azure Pipelines successfully started running 1 pipeline(s).

r12f · 2025-01-06T22:59:49Z

tests/common/telemetry/metrics.py

+METRIC_LABEL_TEST_BUILD: Final[str] = "os.version"
+METRIC_LABEL_TEST_CASE: Final[str] = "testcase"
+METRIC_LABEL_TEST_FILE: Final[str] = "test_file"
+METRIC_LABEL_TEST_JOBID: Final[str] = "job_id"


will be better to put these test ones under the "test.*", e.g.: "test.testbed", "test.job.id", and so on.

The os.version looks different - if we like to track it as test build, then better to name it as "test.build" or simply "test.os.version"

mssonicbld · 2025-01-07T19:38:47Z

/azp run

azure-pipelines · 2025-01-07T19:39:00Z

Azure Pipelines successfully started running 1 pipeline(s).

r12f · 2025-01-08T00:24:30Z

As discussed, please make the design doc update in the README file for this folder in a separate PR.

wangxin · 2025-01-09T06:02:00Z

tests/common/telemetry/reporter_factory.py

+    def __init__(self):
+        return
+
+    def create_periodic_metrics_reporter(common_labels: Dict[str, str]):


Should it be @staticmethod?

wangxin · 2025-01-09T06:02:06Z

tests/common/telemetry/reporter_factory.py

+    def create_periodic_metrics_reporter(common_labels: Dict[str, str]):
+        return (PeriodicMetricsReporter(common_labels))
+
+    def create_final_metrics_reporter(common_labels: Dict[str, str]):


Should it be @staticmethod?

mssonicbld · 2025-01-09T20:43:55Z

/azp run

azure-pipelines · 2025-01-09T20:44:20Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2025-01-15T20:41:18Z

/azp run

azure-pipelines · 2025-01-15T20:41:30Z

Azure Pipelines successfully started running 1 pipeline(s).

1st version of interface utils

2c63137

r12f self-requested a review December 5, 2024 16:09

r12f reviewed Dec 5, 2024

View reviewed changes

tests/snappi_tests/intf_utils/intf_accept_metrics.py Outdated Show resolved Hide resolved

r12f reviewed Dec 5, 2024

View reviewed changes

addressed the review comments

3dd931b

r12f reviewed Dec 6, 2024

View reviewed changes

2nd version of interface utils

7d24843

r12f reviewed Dec 10, 2024

View reviewed changes

3rd version of interface utils

f8fb42f

r12f reviewed Dec 11, 2024

View reviewed changes

r12f reviewed Dec 13, 2024

View reviewed changes

metrics handling v1

b48641a

r12f reviewed Dec 19, 2024

View reviewed changes

modified classes and files

2d06ebd

r12f reviewed Dec 20, 2024

View reviewed changes

sm-xu commented Dec 20, 2024

View reviewed changes

r12f reviewed Dec 20, 2024

View reviewed changes

sm-xu commented Dec 23, 2024

View reviewed changes

update on 12/24

9e0a294

r12f added the Request for msft-202412 branch label Dec 27, 2024

r12f reviewed Dec 27, 2024

View reviewed changes

update 01/02

602f06e

r12f reviewed Jan 6, 2025

View reviewed changes

update 01/07

a4cf9fd

r12f approved these changes Jan 8, 2025

View reviewed changes

wangxin reviewed Jan 9, 2025

View reviewed changes

update 01/09

6fcf355

sm-xu requested review from r12f and wangxin January 14, 2025 00:24

update 01/15

31f56d5

r12f approved these changes Jan 15, 2025

View reviewed changes

wangxin approved these changes Jan 16, 2025

View reviewed changes

wangxin merged commit 3b9b320 into sonic-net:master Jan 16, 2025
19 checks passed

r12f added the Approved for msft-202412 branch label Jan 18, 2025


		############################## Report Metrics ##############################

		class MetricReporterFactory:

		@@ -0,0 +1,147 @@
		# This file defines the interfaces that snappi tests accept external metrics.

		@@ -0,0 +1,103 @@
		# This file defines the interfaces that snappi tests accept external metrics.
		import logging

		@@ -0,0 +1,89 @@
		# This file defines the interfaces that snappi tests accept external metrics.


		#from metrics_accepter import Metric, GaugeMetric

		class MetricReporterFactory:

Initial version of defining the interfaces to accept metrics #15913

Initial version of defining the interfaces to accept metrics #15913

Conversation

sm-xu commented Dec 5, 2024 • edited Loading

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

linux-foundation-easycla bot commented Dec 5, 2024 • edited Loading

mssonicbld commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mssonicbld commented Dec 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r12f Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

r12f Dec 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mssonicbld commented Dec 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mssonicbld commented Dec 16, 2024

azure-pipelines bot commented Dec 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r12f Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mssonicbld commented Dec 20, 2024

azure-pipelines bot commented Dec 20, 2024

Choose a reason for hiding this comment

sm-xu Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Only these labels are allowed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

r12f Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sm-xu commented Dec 5, 2024 •

edited

Loading

linux-foundation-easycla bot commented Dec 5, 2024 •

edited

Loading

r12f Dec 6, 2024 •

edited

Loading

r12f Dec 6, 2024 •

edited

Loading

r12f Dec 19, 2024 •

edited

Loading

sm-xu Dec 20, 2024 •

edited

Loading

r12f Dec 20, 2024 •

edited

Loading