Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of defining the interfaces to accept metrics #15913

Merged
merged 12 commits into from
Jan 16, 2025

Conversation

sm-xu
Copy link
Contributor

@sm-xu sm-xu commented Dec 5, 2024

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Copy link

linux-foundation-easycla bot commented Dec 5, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/snappi_tests/intf_utils/intf_accept_metrics.py

check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/snappi_tests/intf_utils/intf_accept_metrics.py:47:1: E302 expected 2 blank lines, found 1
tests/snappi_tests/intf_utils/intf_accept_metrics.py:50:19: E221 multiple spaces before operator
tests/snappi_tests/intf_utils/intf_accept_metrics.py:51:19: E221 multiple spaces before operator
tests/snappi_tests/intf_utils/intf_accept_metrics.py:52:19: E221 multiple spaces before operator
tests/snappi_tests/intf_utils/intf_accept_metrics.py:72:18: E221 multiple spaces before operator
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@r12f r12f self-requested a review December 5, 2024 16:09
tests/snappi_tests/intf_utils/intf_accept_metrics.py Outdated Show resolved Hide resolved
tests/snappi_tests/intf_utils/intf_accept_metrics.py Outdated Show resolved Hide resolved
tests/snappi_tests/intf_utils/intf_accept_metrics.py Outdated Show resolved Hide resolved
# Metrics data are organized into the hierarchies below
# ResourceMetrics
# ├── ResourceID
# └── ScopeMetrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the level of ScopeMetrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is
Resource level: all metrics from one test run
Scope level: all metrics belonging to one device
Metric level: all metrics belonging to one category
I might be wrong. Let's discuss this topic tomorrow.

tests/snappi_tests/intf_utils/intf_accept_metrics.py Outdated Show resolved Hide resolved
tests/snappi_tests/intf_utils/intf_accept_metrics.py Outdated Show resolved Hide resolved
@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing tests/snappi_tests/intf_utils/intf_accept_metrics.py

fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/snappi_tests/intf_utils/intf_accept_metrics.py:54:1: E303 too many blank lines (4)
tests/snappi_tests/intf_utils/intf_accept_metrics.py:57:1: E266 too many leading '#' for block comment
tests/snappi_tests/intf_utils/intf_accept_metrics.py:63:26: E221 multiple spaces before operator
tests/snappi_tests/intf_utils/intf_accept_metrics.py:64:24: E221 multiple spaces before operator
tests/snappi_tests/intf_utils/intf_accept_metrics.py:66:25: E221 multiple spaces before operator
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>


############################## Report Metrics ##############################

class MetricReporterFactory:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move factory to another file, so we can override easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, we can do this in another file:

class MetricReporterFactory:
    def create_metrics_reporter(self):
        return OtelMetricReporter(...)

class OtelMetricReporter:
    def emit(....):
        # Real implementation goes here, which each customer can define their own.

# ├── TestID
# └── DeviceMetrics
# ├── DeviceID
# └── Metric
Copy link
Contributor

@r12f r12f Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a generic Metric class that represents a single metric, which contains:

  • description/labels: Name, Description, unit, ....
  • Value: single layer is good enough with inheritance.
  • Reporter: Reference to MetricsReporter. Register itself to Reporter when created, so Reporter can gather all metrics after everything is changed.
class Metric...:
    def __init__(name, ...., reporter):
        reporter.add_metric(self)
        ....

class GaugeMetric(Metric):
    def __init__(name, ...., reporter):
        super.__init__(...)
        self.value = 0

    def set(v):
        self.value = v
....


reporter = MetricReporterFactory(...).build()
port_rx = GaugeMetric(...., reporter)

port_rx.set(123)
reporter.report(time)

Copy link
Contributor

@r12f r12f Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence, ultimately the final code for people to use would be:

metrics = {
   "PortRx" = GaugeMetric(......, reporter)
   ....
}

for r in csv:
    for c in r:
        metric[c.title].set(c.value)

reporter.report(time)

# software version. They are also from the same test case identified by test_run_id.
class TestMetrics:
def __init__(self, testbed_name, os_version, testcase_name, test_run_id):
self.testbed_name = testbed_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these fields can be moved to reporter, since it is shared by everyone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMetrics itself can be removed, once we add the per metric class.

# software version. They are also from the same test case identified by test_run_id.
class TestMetrics:
def __init__(self, testbed_name, os_version, testcase_name, test_run_id):
self.testbed_name = testbed_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMetrics itself can be removed, once we add the per metric class.


############################## Report Metrics ##############################

class MetricReporterFactory:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, we can do this in another file:

class MetricReporterFactory:
    def create_metrics_reporter(self):
        return OtelMetricReporter(...)

class OtelMetricReporter:
    def emit(....):
        # Real implementation goes here, which each customer can define their own.

@@ -0,0 +1,147 @@
# This file defines the interfaces that snappi tests accept external metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All common label names are missing too, e.g.: PortId, QueueId, PSUId....

otherwise it will be very hard to create unified dashboard, because each tests could use its own names, and causing problems in filters.

@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/snappi_tests/intf_utils/intf_report_metrics.py

check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Failed
- hook id: check-ast
- exit code: 1

tests/snappi_tests/intf_utils/intf_accept_metrics.py: failed parsing with CPython 3.10.12:

Traceback (most recent call last):
File "/home/AzDevOps/.cache/pre-commit/repoqc6a3xnx/py_env-python3/lib/python3.10/site-packages/pre_commit_hooks/check_ast.py", line 21, in main
ast.parse(f.read(), filename=filename)
File "/usr/lib/python3.10/ast.py", line 50, in parse
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

name,
description,
unit,
timestamp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following fields is common for entire tests, so it can be move into the reporter as common metadata:

  • testbed_name
  • os_version
  • testcase_name
  • test_run_id

The following fields are common for all metrics in a single report action, so it can be lifted into the reporter's report function parameters:

  • timestamp

The following fields are not clear on its purpose, we need to rename it to make it clear:

  • component_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the timestamp here means the test_start_time?


class Metric:
def __init__(self,
name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing type hints

testcase_name, test_run_id, device_id, component_id, reporter, metadata, metrics)

# Additional fields for GaugeMetric
self.metrics = metrics or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each Metric should only represent a single metric. If we are trying to create something that holds all metrics, it should be 1 layer above, say MetricCollections / MetricList / Metrics or whatever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of this field is not too clear...

@@ -0,0 +1,103 @@
# This file defines the interfaces that snappi tests accept external metrics.
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file it not part of intf_utils, because it is not related to interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sm-xu this comment is missing.

# Temporary code to report metrics
print(f"Reporting metrics at {timestamp}")
for metric in self.metrics:
print(metric)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be great to create a new abstracted function for us to override.

self.reporter = OtelMetricReporter(self.connection)
return self.reporter

class OtelMetricReporter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporter should not be limited to Otel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not addressed

name (str): metric name (e.g., psu power, sensor temperature, port stats, etc.)
description (str): brief description of the metric
unit (str): metric unit (e.g., seconds, bytes)
timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the timestamp is for logging the collection time, the reporter already has it and can be removed

unit (str): metric unit (e.g., seconds, bytes)
timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected
device_id (str): switch device ID
component_id (str): ID of the component (e.g., psu, sensor, port, etc.), where metrics are produced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be ignored, since the components are included in the name and we won't use it for filtering too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check out my email

description (str): brief description of the metric
unit (str): metric unit (e.g., seconds, bytes)
timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected
device_id (str): switch device ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be lifted up to reporter, since it is common to all

self.reporter = OtelMetricReporter(self.connection)
return self.reporter

class OtelMetricReporter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not addressed

pass


class KustoReporter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not limit the implementation to kusto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMetricRecordRepoter

@@ -0,0 +1,89 @@
# This file defines the interfaces that snappi tests accept external metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the definitions of the metric names and meta are missing in the file, we need to get them defined and show a unified format. this will be used for crafting the dashboards.

Returns:
An instance of the specified metrics reporter.
"""
if data_type == "metrics":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be better to split this into 2 functions instead of using magic string.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

"psu.id": "psu1",
"model": "PWR-2422-HV-RED",
"serial": "6A011010142349Q"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the sensitive data.

description = "PSU power reading",
unit = "W",
reporter = reporter)
power.set_gauge_metric(scope_labels, 222.00)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    # Create a metric and pass it to the reporter
    vol = GaugeMetric(name = "Voltage",
                      description = "PSU voltage reading",
                      unit = "V",
                      reporter = reporter)

    # Create a metric and pass it to the reporter
    cur = GaugeMetric(name = "Current",
                      description = "PSU current reading",
                      unit = "A",
                      reporter = reporter)

    # Create a metric and pass it to the reporter
    power = GaugeMetric(name = "Power",
                        description = "PSU power reading",
                        unit = "W",
                        reporter = reporter)

    scope_labels["psu.id"] = "PSU 1"
    vol.set_gauge_metric(scope_labels, 12.09)
    cur.set_gauge_metric(scope_labels, 18.38)
    power.set_gauge_metric(scope_labels, 222.00)

    scope_labels["psu.id"] = "PSU 2"
    vol.set_gauge_metric(scope_labels, 12.10)
    cur.set_gauge_metric(scope_labels, 17.72)
    power.set_gauge_metric(scope_labels, 214.00)

name: str,
description: str,
unit: str,
reporter: MetricReporterFactory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the factory.

return (f"Metric(name={self.name!r}, "
f"description={self.description!r}, "
f"unit={self.unit!r}, "
f"reporter={self.reporter!r})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reporter might not be converted to string.

# Initialize the base class
super().__init__(name, description, unit, reporter)

def set_gauge_metric(self, scope_labels: Dict[str, str], value: Union[int, str, float]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename function to record, we need to support multiple metrics.


class MetricReporterFactory:
def __init__(self):
self.reporter = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed.

reporter = factory.create_metrics_reporter(resource_labels)

scope_labels = {
"device.id": "str-7060x6-64pe-stress-02",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

label name needs to be standarized for our test cases. otherwise, there is no way to build standard dashboards.

# Temporary code initializing a RecordsReporter
# will be replaced with a real initializer such as Kusto
self.resource_labels = resource_labels
self.timestamp = int(time.time() * 1_000_000_000) # epoch time in nanoseconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamp should not be here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be report function parameter.

self.resource_labels = resource_labels
self.timestamp = int(time.time() * 1_000_000_000) # epoch time in nanoseconds
self.records = []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need function to push records into the self.records list.

Abstract method to report records at a given timestamp.
Subclasses must override this method.
"""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report function is usually written in this way:

def report(self):
    incoming_records = self.records
    self.records = []

    self.process_incoming_records(incoming_records)

@@ -0,0 +1,64 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this file to metrics.py

Copy link
Contributor

@r12f r12f Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point is to consider the usage:

from metrics_utils.metrics_accepter import Metric, GaugeMetric # The code looks weird here
from utils.metrics import GaugeMetric # This looks more nature
from metrics_utils.metrics import GaugeMetric # This works too.


#from metrics_accepter import Metric, GaugeMetric

class MetricReporterFactory:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move factory to a dedicated file. for reporter, we can leave in this file or move to metrics.py, no strong opinion in that.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -0,0 +1,13 @@
{
"allowed_labels": [
"testbed.id",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make them constants in code.

Copy link
Contributor Author

@sm-xu sm-xu Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean changing it to this in metrics.py?

Only these labels are allowed

ALLOWED_LABELS = {
"testbed.id",
"os.version",
"testrun.id",
"testcase",
"device.id",
"psu.id",
"port.id",
"sensor.id",
"queue.id",
}

class MetricsReporter:
    def __init__(self, resource_labels: Dict[str, str]):
        for label in resource_labels:
            if label not in ALLOWED_LABELS:
                raise LabelError(f"Invalid label: {label}.")

        # Temporary code initializing a MetricsReporter
        # will be replaced with a real initializer such as OpenTelemetry 
        self.resource_labels = resource_labels
        self.metrics = []


"""
resource_labels = {
"testbed.id": "sonic_stress_testbed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label keys should be constants instead of using literals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 approaches:

  1. Add
    ALLOWED_LABELS = {
    "testbed.id",
    "os.version",
    "testrun.id",
    "testcase",
    "device.id",
    "psu.id",
    "port.id",
    "sensor.id",
    "queue.id",
    }
    in metrics.py and keep this place unchanged.

  2. Add
    ALLOWED_LABELS = {
    "TESTBED_ID": "testbed.id",
    "OS_VERSION": "os.version",
    "TESTCASE": "testcase",
    "TESTRUN_ID": "testrun.id",
    ... ...
    }
    in metrics.py and change this place to
    resource_labels = {
    ALLOWED_LABELS["TESTBED_ID"]: "sonic_stress_testbed",
    ALLOWED_LABELS["OS_VERSION"]: "11.2.3",
    ALLOWED_LABELS["TESTCASE"]: "stress_test1",
    ALLOWED_LABELS["TESTRUN_ID"]: "202412101217"
    }

Which way do you prefer?

Copy link
Contributor

@r12f r12f Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take os version as an example:

from typing import Final

METRIC_LABEL_TEST_TESTBED: Final[str] = "test.testbed"
METRIC_LABEL_TEST_BRANCH: Final[str] = "test.branch"
METRIC_LABEL_TEST_CASE: Final[str] = "test.testcase"
METRIC_LABEL_TEST_FILE: Final[str] = "test.test_file"
...

METRIC_LABEL_DEVICE_ID: Final[str] = "device.id"
METRIC_LABEL_DEVICE_PORT_ID: Final[str] = "device.port.id"
METRIC_LABEL_DEVICE_QUEUE_ID: Final[str] = "device.queue.id"
METRIC_LABEL_DEVICE_PSU_ID: Final[str] = "device.psu.id"
...

resource_labels = {
    METRIC_LABEL_TEST_TESTBED: "abc",
    METRIC_LABEL_TEST_BRANCH: "202411",
    METRIC_LABEL_TEST_CASE: "mock-case",
    METRIC_LABEL_TEST_FILE: "mock-test.py",
    ...
}
...

scope_labels[METRIC_LABEL_DEVICE_PSU_ID] = "PSU 1"
voltage.record(scope_labels, 12.09)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure to check the design doc I shared with you for adding the required labels.

"""


class TestResultsReporter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not test result, which usually refers to pass/fail sort of things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we want to name it then? How about TestStatus?

stashed_test_results = self.test_results
self.test_results = []

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these removed accidentally and forgot to put back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand you. In the commented code
"""
print(f"Current time (ns): {current_time}")
pprint(self.resource_labels)
pprint(stashed_metrics)
process_stashed_metrics(current_time, stashed_metrics)
"""
The first 3 lines are for my own testing purpose only. process_stashed_metrics() will later be replaced with real code to emit the metrics to InfluxDB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no way in language level to override the commented code, in here we need to provide a "virtual function" for the subclass to implement.

if timestamp is not None:
current_time = timestamp
else:
current_time = time.time_ns()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you meant?
current_time = timestamp or time.time_ns()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tried this?

def report(self, timestamp=time.time_ns()):

self.resource_labels = resource_labels
self.test_results = []

def stash_test_results(self, labels: Dict[str, str], value: Union[int, str, float]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stash_record

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

self.resource_labels = resource_labels
self.metrics = []

def stash_metric(self, new_metric: 'GaugeMetric', labels: Dict[str, str], value: Union[int, str, float]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stash_record

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second parameter type is better to be the base class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


def stash_metric(self, new_metric: 'GaugeMetric', labels: Dict[str, str], value: Union[int, str, float]):
# add a new metric
self.metrics.append({"labels": labels, "value": value})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels will need to be deep copied

Copy link
Contributor Author

@sm-xu sm-xu Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change it to

        # Deep copy the labels to ensure stored data is immutable
        copied_labels = deepcopy(labels)
        # Add the new metric
        self.metrics.append({"labels": copied_labels, "value": value})

Do I understand you correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, something like this.

Copy link
Contributor Author

@sm-xu sm-xu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review. Thanks!

stashed_test_results = self.test_results
self.test_results = []

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no way in language level to override the commented code, in here we need to provide a "virtual function" for the subclass to implement.

@@ -0,0 +1,20 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line.

I wonder why pre-commit didn't fail for this.... CI does failed due to static analysis. might be better to check that.

from typing import List, Dict, Union

# Function to load allowed labels from a JSON file
def load_allowed_labels(filename="allowed_labels.json"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be removed once moved to constants.


"""
resource_labels = {
"testbed.id": "sonic_stress_testbed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure to check the design doc I shared with you for adding the required labels.

@r12f
Copy link
Contributor

r12f commented Dec 23, 2024

the pre-commit checks in CI are failing as below, please check them and we need to get them fixed:

(I highly recommend installing pre-commit on your local dev machine, it will help find these problems as soon as possible).

image

image

Copy link
Contributor Author

@sm-xu sm-xu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at my replies. Thank you!

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

# Add the root directory of the project to sys.path
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))

from snappi_tests.utils.metrics import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import is not fixed.


# Create a MetricReporterFactory and build a MetricReporter
factory = TelemetryReporterFactory()
reporter = factory.create_periodic_metrics_reporter(resource_labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need to create the factory object here, but directly use:

reporter = TelemetryReporterFactory.create_periodic_metrics_reporter(resource_labels)

PSU 2 PWR-ABCD 1Z011010156787X 01 12.01 17.72 214.00 OK green

"""
resource_labels = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update resource_labels to common_labels will be better.

factory = TelemetryReporterFactory()
reporter = factory.create_periodic_metrics_reporter(resource_labels)

scope_labels = {METRIC_LABEL_DEVICE_ID: "switch-A"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update scope_labels to metric_labels will be better.

@@ -0,0 +1,69 @@
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicated now, we should remove it.

@@ -0,0 +1,69 @@
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and by moving the file, I mean move the metrics.py and report_factory.py, not just the example : D

METRIC_LABEL_CAST_DIRECTION: Final[str] = "cast_direction" # unicast or multicast
METRIC_LABEL_HARDWARE_REVISION: Final[str] = "hardware.revision"
METRIC_LABEL_PRIORITY_GROUP: Final[str] = "priority_group"
METRIC_LABEL_BUFFER_POOL: Final[str] = "buffer_pool"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be better to have the name more... organized per feature. labels needs to be explicit, so they don't step on each other's toe in the dashboard, e.g.: 2 metrics might both having the name "model" - one for PSU, one for transceiver. So, this naming is not a good approach.

I would recommend to be:

METRIC_LABEL_DEVICE_PSU_MODEL: Final[str] = "device.psu.model"          # component refers to the level below, i.e. parts used by a switch
METRIC_LABEL_DEVICE_PSU_SERIAL: Final[str] = "serial"
METRIC_LABEL_DEVICE_PG_ID: Final[str] = "device.pg.id"
METRIC_LABEL_DEVICE_BUFFER_POOL_ID: Final[str] = "device.buffer_pool.id"

I cannot really tell what are these below, but I bet you should now getting the idea and can make the adjustments by your own now.

METRIC_LABEL_CAST_DIRECTION: Final[str] = "cast_direction"  # unicast or multicast
METRIC_LABEL_HARDWARE_REVISION: Final[str] = "hardware.revision"

@r12f
Copy link
Contributor

r12f commented Dec 27, 2024

Also, we need to treat the CI failure seriously. I am seeing many real failures from pre-commit checks. @sm-xu

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

METRIC_LABEL_TEST_BUILD: Final[str] = "os.version"
METRIC_LABEL_TEST_CASE: Final[str] = "testcase"
METRIC_LABEL_TEST_FILE: Final[str] = "test_file"
METRIC_LABEL_TEST_JOBID: Final[str] = "job_id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be better to put these test ones under the "test.*", e.g.: "test.testbed", "test.job.id", and so on.

The os.version looks different - if we like to track it as test build, then better to name it as "test.build" or simply "test.os.version"

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@r12f
Copy link
Contributor

r12f commented Jan 8, 2025

As discussed, please make the design doc update in the README file for this folder in a separate PR.

def __init__(self):
return

def create_periodic_metrics_reporter(common_labels: Dict[str, str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be @staticmethod?

def create_periodic_metrics_reporter(common_labels: Dict[str, str]):
return (PeriodicMetricsReporter(common_labels))

def create_final_metrics_reporter(common_labels: Dict[str, str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be @staticmethod?

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sm-xu sm-xu requested review from r12f and wangxin January 14, 2025 00:24
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wangxin wangxin merged commit 3b9b320 into sonic-net:master Jan 16, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants