-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial version of defining the interfaces to accept metrics #15913
Conversation
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
# Metrics data are organized into the hierarchies below | ||
# ResourceMetrics | ||
# ├── ResourceID | ||
# └── ScopeMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the level of ScopeMetrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is
Resource level: all metrics from one test run
Scope level: all metrics belonging to one device
Metric level: all metrics belonging to one category
I might be wrong. Let's discuss this topic tomorrow.
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
|
||
############################## Report Metrics ############################## | ||
|
||
class MetricReporterFactory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move factory to another file, so we can override easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this change, we can do this in another file:
class MetricReporterFactory:
def create_metrics_reporter(self):
return OtelMetricReporter(...)
class OtelMetricReporter:
def emit(....):
# Real implementation goes here, which each customer can define their own.
# ├── TestID | ||
# └── DeviceMetrics | ||
# ├── DeviceID | ||
# └── Metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a generic Metric class that represents a single metric, which contains:
- description/labels: Name, Description, unit, ....
- Value: single layer is good enough with inheritance.
- Reporter: Reference to MetricsReporter. Register itself to Reporter when created, so Reporter can gather all metrics after everything is changed.
class Metric...:
def __init__(name, ...., reporter):
reporter.add_metric(self)
....
class GaugeMetric(Metric):
def __init__(name, ...., reporter):
super.__init__(...)
self.value = 0
def set(v):
self.value = v
....
reporter = MetricReporterFactory(...).build()
port_rx = GaugeMetric(...., reporter)
port_rx.set(123)
reporter.report(time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence, ultimately the final code for people to use would be:
metrics = {
"PortRx" = GaugeMetric(......, reporter)
....
}
for r in csv:
for c in r:
metric[c.title].set(c.value)
reporter.report(time)
# software version. They are also from the same test case identified by test_run_id. | ||
class TestMetrics: | ||
def __init__(self, testbed_name, os_version, testcase_name, test_run_id): | ||
self.testbed_name = testbed_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these fields can be moved to reporter, since it is shared by everyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestMetrics itself can be removed, once we add the per metric class.
# software version. They are also from the same test case identified by test_run_id. | ||
class TestMetrics: | ||
def __init__(self, testbed_name, os_version, testcase_name, test_run_id): | ||
self.testbed_name = testbed_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestMetrics itself can be removed, once we add the per metric class.
|
||
############################## Report Metrics ############################## | ||
|
||
class MetricReporterFactory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this change, we can do this in another file:
class MetricReporterFactory:
def create_metrics_reporter(self):
return OtelMetricReporter(...)
class OtelMetricReporter:
def emit(....):
# Real implementation goes here, which each customer can define their own.
@@ -0,0 +1,147 @@ | |||
# This file defines the interfaces that snappi tests accept external metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All common label names are missing too, e.g.: PortId, QueueId, PSUId....
otherwise it will be very hard to create unified dashboard, because each tests could use its own names, and causing problems in filters.
The pre-commit check detected issues in the files touched by this pull request. Detailed pre-commit check results: To run the pre-commit checks locally, you can follow below steps:
|
name, | ||
description, | ||
unit, | ||
timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following fields is common for entire tests, so it can be move into the reporter as common metadata:
- testbed_name
- os_version
- testcase_name
- test_run_id
The following fields are common for all metrics in a single report action, so it can be lifted into the reporter's report function parameters:
- timestamp
The following fields are not clear on its purpose, we need to rename it to make it clear:
- component_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the timestamp here means the test_start_time?
|
||
class Metric: | ||
def __init__(self, | ||
name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing type hints
testcase_name, test_run_id, device_id, component_id, reporter, metadata, metrics) | ||
|
||
# Additional fields for GaugeMetric | ||
self.metrics = metrics or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each Metric should only represent a single metric. If we are trying to create something that holds all metrics, it should be 1 layer above, say MetricCollections / MetricList / Metrics or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of this field is not too clear...
@@ -0,0 +1,103 @@ | |||
# This file defines the interfaces that snappi tests accept external metrics. | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file it not part of intf_utils, because it is not related to interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sm-xu this comment is missing.
# Temporary code to report metrics | ||
print(f"Reporting metrics at {timestamp}") | ||
for metric in self.metrics: | ||
print(metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be great to create a new abstracted function for us to override.
self.reporter = OtelMetricReporter(self.connection) | ||
return self.reporter | ||
|
||
class OtelMetricReporter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reporter should not be limited to Otel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not addressed
name (str): metric name (e.g., psu power, sensor temperature, port stats, etc.) | ||
description (str): brief description of the metric | ||
unit (str): metric unit (e.g., seconds, bytes) | ||
timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the timestamp is for logging the collection time, the reporter already has it and can be removed
unit (str): metric unit (e.g., seconds, bytes) | ||
timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected | ||
device_id (str): switch device ID | ||
component_id (str): ID of the component (e.g., psu, sensor, port, etc.), where metrics are produced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be ignored, since the components are included in the name and we won't use it for filtering too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check out my email
description (str): brief description of the metric | ||
unit (str): metric unit (e.g., seconds, bytes) | ||
timestamp (int): UNIX Epoch time in nanoseconds when the metric is collected | ||
device_id (str): switch device ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be lifted up to reporter, since it is common to all
self.reporter = OtelMetricReporter(self.connection) | ||
return self.reporter | ||
|
||
class OtelMetricReporter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not addressed
pass | ||
|
||
|
||
class KustoReporter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not limit the implementation to kusto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestMetricRecordRepoter
@@ -0,0 +1,89 @@ | |||
# This file defines the interfaces that snappi tests accept external metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the definitions of the metric names and meta are missing in the file, we need to get them defined and show a unified format. this will be used for crafting the dashboards.
Returns: | ||
An instance of the specified metrics reporter. | ||
""" | ||
if data_type == "metrics": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be better to split this into 2 functions instead of using magic string.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
"psu.id": "psu1", | ||
"model": "PWR-2422-HV-RED", | ||
"serial": "6A011010142349Q"} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the sensitive data.
description = "PSU power reading", | ||
unit = "W", | ||
reporter = reporter) | ||
power.set_gauge_metric(scope_labels, 222.00) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Create a metric and pass it to the reporter
vol = GaugeMetric(name = "Voltage",
description = "PSU voltage reading",
unit = "V",
reporter = reporter)
# Create a metric and pass it to the reporter
cur = GaugeMetric(name = "Current",
description = "PSU current reading",
unit = "A",
reporter = reporter)
# Create a metric and pass it to the reporter
power = GaugeMetric(name = "Power",
description = "PSU power reading",
unit = "W",
reporter = reporter)
scope_labels["psu.id"] = "PSU 1"
vol.set_gauge_metric(scope_labels, 12.09)
cur.set_gauge_metric(scope_labels, 18.38)
power.set_gauge_metric(scope_labels, 222.00)
scope_labels["psu.id"] = "PSU 2"
vol.set_gauge_metric(scope_labels, 12.10)
cur.set_gauge_metric(scope_labels, 17.72)
power.set_gauge_metric(scope_labels, 214.00)
name: str, | ||
description: str, | ||
unit: str, | ||
reporter: MetricReporterFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the factory.
return (f"Metric(name={self.name!r}, " | ||
f"description={self.description!r}, " | ||
f"unit={self.unit!r}, " | ||
f"reporter={self.reporter!r})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reporter might not be converted to string.
# Initialize the base class | ||
super().__init__(name, description, unit, reporter) | ||
|
||
def set_gauge_metric(self, scope_labels: Dict[str, str], value: Union[int, str, float]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename function to record, we need to support multiple metrics.
|
||
class MetricReporterFactory: | ||
def __init__(self): | ||
self.reporter = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed.
reporter = factory.create_metrics_reporter(resource_labels) | ||
|
||
scope_labels = { | ||
"device.id": "str-7060x6-64pe-stress-02", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
label name needs to be standarized for our test cases. otherwise, there is no way to build standard dashboards.
# Temporary code initializing a RecordsReporter | ||
# will be replaced with a real initializer such as Kusto | ||
self.resource_labels = resource_labels | ||
self.timestamp = int(time.time() * 1_000_000_000) # epoch time in nanoseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamp should not be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be report function parameter.
self.resource_labels = resource_labels | ||
self.timestamp = int(time.time() * 1_000_000_000) # epoch time in nanoseconds | ||
self.records = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need function to push records into the self.records list.
Abstract method to report records at a given timestamp. | ||
Subclasses must override this method. | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
report function is usually written in this way:
def report(self):
incoming_records = self.records
self.records = []
self.process_incoming_records(incoming_records)
@@ -0,0 +1,64 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename this file to metrics.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the point is to consider the usage:
from metrics_utils.metrics_accepter import Metric, GaugeMetric # The code looks weird here
from utils.metrics import GaugeMetric # This looks more nature
from metrics_utils.metrics import GaugeMetric # This works too.
|
||
#from metrics_accepter import Metric, GaugeMetric | ||
|
||
class MetricReporterFactory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move factory to a dedicated file. for reporter, we can leave in this file or move to metrics.py, no strong opinion in that.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -0,0 +1,13 @@ | |||
{ | |||
"allowed_labels": [ | |||
"testbed.id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make them constants in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean changing it to this in metrics.py?
Only these labels are allowed
ALLOWED_LABELS = {
"testbed.id",
"os.version",
"testrun.id",
"testcase",
"device.id",
"psu.id",
"port.id",
"sensor.id",
"queue.id",
}
class MetricsReporter:
def __init__(self, resource_labels: Dict[str, str]):
for label in resource_labels:
if label not in ALLOWED_LABELS:
raise LabelError(f"Invalid label: {label}.")
# Temporary code initializing a MetricsReporter
# will be replaced with a real initializer such as OpenTelemetry
self.resource_labels = resource_labels
self.metrics = []
tests/snappi_tests/utils/examples.py
Outdated
|
||
""" | ||
resource_labels = { | ||
"testbed.id": "sonic_stress_testbed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Label keys should be constants instead of using literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 approaches:
-
Add
ALLOWED_LABELS = {
"testbed.id",
"os.version",
"testrun.id",
"testcase",
"device.id",
"psu.id",
"port.id",
"sensor.id",
"queue.id",
}
in metrics.py and keep this place unchanged. -
Add
ALLOWED_LABELS = {
"TESTBED_ID": "testbed.id",
"OS_VERSION": "os.version",
"TESTCASE": "testcase",
"TESTRUN_ID": "testrun.id",
... ...
}
in metrics.py and change this place to
resource_labels = {
ALLOWED_LABELS["TESTBED_ID"]: "sonic_stress_testbed",
ALLOWED_LABELS["OS_VERSION"]: "11.2.3",
ALLOWED_LABELS["TESTCASE"]: "stress_test1",
ALLOWED_LABELS["TESTRUN_ID"]: "202412101217"
}
Which way do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take os version as an example:
from typing import Final
METRIC_LABEL_TEST_TESTBED: Final[str] = "test.testbed"
METRIC_LABEL_TEST_BRANCH: Final[str] = "test.branch"
METRIC_LABEL_TEST_CASE: Final[str] = "test.testcase"
METRIC_LABEL_TEST_FILE: Final[str] = "test.test_file"
...
METRIC_LABEL_DEVICE_ID: Final[str] = "device.id"
METRIC_LABEL_DEVICE_PORT_ID: Final[str] = "device.port.id"
METRIC_LABEL_DEVICE_QUEUE_ID: Final[str] = "device.queue.id"
METRIC_LABEL_DEVICE_PSU_ID: Final[str] = "device.psu.id"
...
resource_labels = {
METRIC_LABEL_TEST_TESTBED: "abc",
METRIC_LABEL_TEST_BRANCH: "202411",
METRIC_LABEL_TEST_CASE: "mock-case",
METRIC_LABEL_TEST_FILE: "mock-test.py",
...
}
...
scope_labels[METRIC_LABEL_DEVICE_PSU_ID] = "PSU 1"
voltage.record(scope_labels, 12.09)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure to check the design doc I shared with you for adding the required labels.
tests/snappi_tests/utils/metrics.py
Outdated
""" | ||
|
||
|
||
class TestResultsReporter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not test result, which usually refers to pass/fail sort of things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we want to name it then? How about TestStatus?
tests/snappi_tests/utils/metrics.py
Outdated
stashed_test_results = self.test_results | ||
self.test_results = [] | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these removed accidentally and forgot to put back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand you. In the commented code
"""
print(f"Current time (ns): {current_time}")
pprint(self.resource_labels)
pprint(stashed_metrics)
process_stashed_metrics(current_time, stashed_metrics)
"""
The first 3 lines are for my own testing purpose only. process_stashed_metrics() will later be replaced with real code to emit the metrics to InfluxDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no way in language level to override the commented code, in here we need to provide a "virtual function" for the subclass to implement.
tests/snappi_tests/utils/metrics.py
Outdated
if timestamp is not None: | ||
current_time = timestamp | ||
else: | ||
current_time = time.time_ns() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved to parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what you meant?
current_time = timestamp or time.time_ns()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you tried this?
def report(self, timestamp=time.time_ns()):
tests/snappi_tests/utils/metrics.py
Outdated
self.resource_labels = resource_labels | ||
self.test_results = [] | ||
|
||
def stash_test_results(self, labels: Dict[str, str], value: Union[int, str, float]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stash_record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
tests/snappi_tests/utils/metrics.py
Outdated
self.resource_labels = resource_labels | ||
self.metrics = [] | ||
|
||
def stash_metric(self, new_metric: 'GaugeMetric', labels: Dict[str, str], value: Union[int, str, float]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stash_record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second parameter type is better to be the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
tests/snappi_tests/utils/metrics.py
Outdated
|
||
def stash_metric(self, new_metric: 'GaugeMetric', labels: Dict[str, str], value: Union[int, str, float]): | ||
# add a new metric | ||
self.metrics.append({"labels": labels, "value": value}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labels will need to be deep copied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change it to
# Deep copy the labels to ensure stored data is immutable
copied_labels = deepcopy(labels)
# Add the new metric
self.metrics.append({"labels": copied_labels, "value": value})
Do I understand you correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review. Thanks!
tests/snappi_tests/utils/metrics.py
Outdated
stashed_test_results = self.test_results | ||
self.test_results = [] | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no way in language level to override the commented code, in here we need to provide a "virtual function" for the subclass to implement.
@@ -0,0 +1,20 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove empty line.
I wonder why pre-commit didn't fail for this.... CI does failed due to static analysis. might be better to check that.
tests/snappi_tests/utils/metrics.py
Outdated
from typing import List, Dict, Union | ||
|
||
# Function to load allowed labels from a JSON file | ||
def load_allowed_labels(filename="allowed_labels.json"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be removed once moved to constants.
tests/snappi_tests/utils/examples.py
Outdated
|
||
""" | ||
resource_labels = { | ||
"testbed.id": "sonic_stress_testbed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure to check the design doc I shared with you for adding the required labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at my replies. Thank you!
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
tests/common/telemetry/examples.py
Outdated
# Add the root directory of the project to sys.path | ||
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..'))) | ||
|
||
from snappi_tests.utils.metrics import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import is not fixed.
tests/common/telemetry/examples.py
Outdated
|
||
# Create a MetricReporterFactory and build a MetricReporter | ||
factory = TelemetryReporterFactory() | ||
reporter = factory.create_periodic_metrics_reporter(resource_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we don't need to create the factory object here, but directly use:
reporter = TelemetryReporterFactory.create_periodic_metrics_reporter(resource_labels)
tests/common/telemetry/examples.py
Outdated
PSU 2 PWR-ABCD 1Z011010156787X 01 12.01 17.72 214.00 OK green | ||
|
||
""" | ||
resource_labels = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update resource_labels to common_labels
will be better.
tests/common/telemetry/examples.py
Outdated
factory = TelemetryReporterFactory() | ||
reporter = factory.create_periodic_metrics_reporter(resource_labels) | ||
|
||
scope_labels = {METRIC_LABEL_DEVICE_ID: "switch-A"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update scope_labels to metric_labels
will be better.
tests/snappi_tests/utils/examples.py
Outdated
@@ -0,0 +1,69 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is duplicated now, we should remove it.
tests/snappi_tests/utils/examples.py
Outdated
@@ -0,0 +1,69 @@ | |||
import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and by moving the file, I mean move the metrics.py and report_factory.py, not just the example : D
tests/snappi_tests/utils/metrics.py
Outdated
METRIC_LABEL_CAST_DIRECTION: Final[str] = "cast_direction" # unicast or multicast | ||
METRIC_LABEL_HARDWARE_REVISION: Final[str] = "hardware.revision" | ||
METRIC_LABEL_PRIORITY_GROUP: Final[str] = "priority_group" | ||
METRIC_LABEL_BUFFER_POOL: Final[str] = "buffer_pool" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be better to have the name more... organized per feature. labels needs to be explicit, so they don't step on each other's toe in the dashboard, e.g.: 2 metrics might both having the name "model" - one for PSU, one for transceiver. So, this naming is not a good approach.
I would recommend to be:
METRIC_LABEL_DEVICE_PSU_MODEL: Final[str] = "device.psu.model" # component refers to the level below, i.e. parts used by a switch
METRIC_LABEL_DEVICE_PSU_SERIAL: Final[str] = "serial"
METRIC_LABEL_DEVICE_PG_ID: Final[str] = "device.pg.id"
METRIC_LABEL_DEVICE_BUFFER_POOL_ID: Final[str] = "device.buffer_pool.id"
I cannot really tell what are these below, but I bet you should now getting the idea and can make the adjustments by your own now.
METRIC_LABEL_CAST_DIRECTION: Final[str] = "cast_direction" # unicast or multicast
METRIC_LABEL_HARDWARE_REVISION: Final[str] = "hardware.revision"
Also, we need to treat the CI failure seriously. I am seeing many real failures from pre-commit checks. @sm-xu |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
tests/common/telemetry/metrics.py
Outdated
METRIC_LABEL_TEST_BUILD: Final[str] = "os.version" | ||
METRIC_LABEL_TEST_CASE: Final[str] = "testcase" | ||
METRIC_LABEL_TEST_FILE: Final[str] = "test_file" | ||
METRIC_LABEL_TEST_JOBID: Final[str] = "job_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be better to put these test ones under the "test.*", e.g.: "test.testbed", "test.job.id", and so on.
The os.version looks different - if we like to track it as test build, then better to name it as "test.build" or simply "test.os.version"
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
As discussed, please make the design doc update in the README file for this folder in a separate PR. |
def __init__(self): | ||
return | ||
|
||
def create_periodic_metrics_reporter(common_labels: Dict[str, str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be @staticmethod
?
def create_periodic_metrics_reporter(common_labels: Dict[str, str]): | ||
return (PeriodicMetricsReporter(common_labels)) | ||
|
||
def create_final_metrics_reporter(common_labels: Dict[str, str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be @staticmethod
?
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Description of PR
Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation