-
Notifications
You must be signed in to change notification settings - Fork 21
Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172
base: master
Are you sure you want to change the base?
Conversation
Add negative_packet_loss_mode pass option for Mellanox nic as traffic sender |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality that this patch is attempting to add is something that we have not previously seen much value in which is why we do not have this capability already. In almost every scenario where we use trafficgen these days we use stream statistics to be able to only calculate packet loss on packets that our code generates -- meaning we ignore broadcast traffic and other "noise" that exists on the network fabric when doing our packet loss calculations. The only way that negative packet loss should be observed is the following scenarios:
- You use
--use-device-stats
because you have to and you receive packets that you did not transmit -- so RX-TX equals a negative number. The question is why do you have to use--use-device-stats
since this is often a crutch solving a different problem entirely than what is addressed in this patch. - Something is wrong with your network topology/packet forwarding setup such that you receive more packets than you transmit. If this occurs with latency packets (which are uniquely identifiable then you will get a different error (see
Lines 1712 to 1719 in 6cbf236
if 'latency_duplicate_error' in trial_stats[dev_pair['rx']]: if trial_params['duplicate_packet_failure_mode'] == 'quit': trial_result = 'abort' bs_logger("\t(critical requirement failure, duplicate latency packets detected, device pair: %d -> %d, pg_ids: [%s], trial result: %s)" % (dev_pair['tx'], dev_pair['rx'], trial_stats[dev_pair['rx']]['latency_duplicate_error'], trial_result)) Lines 212 to 217 in 6cbf236
parser.add_argument('--duplicate-packet-failure', dest='duplicate_packet_failure_mode', help='What to do when a duplicate packet failure is encountered', default = 'quit', choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ] ) --use-device-stats
then you cannot trust any of the packet loss calculations since the bulk packets are not uniquely identifiable -- ie. you cannot tell when you transmit X packets and receive X packets if you received X unique packets or you receive 1 packet X times and dropped X-1 packets. For this reason all of the results that are generated are questionable and you have to ask yourself what you are really determining by running the test.
Ultimately, before relying on the functionality that this patch is trying to add I would take a long, hard look at why you need this because this may just mask an issue that should be resolved in some other way.
Also, if we were going to merge this (assuming the requested changes are addressed), I would prefer that rather than adding pass
as an option we do something like adding ignore
as the option. Then, if --negative-packet-loss
mode is set to ignore
you go ahead and log the detection of this result but do not modify the value of trial_result
.
@@ -230,8 +230,8 @@ def process_options (): | |||
parser.add_argument('--negative-packet-loss', | |||
dest='negative_packet_loss_mode', | |||
help='What to do when negative packet loss is encountered', | |||
default = 'quit', | |||
choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ] | |||
default = 'pass', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot change the default behavior for this result.
trial_result = trial_params['negative_packet_loss_mode'] | ||
bs_logger("\t(trial failed requirement, negative direction packet loss, direction: %s, trial result status: modified, trial result: %s)" % | ||
(direction, | ||
trial_result)) | ||
if trial_result != 'pass': | ||
bs_logger("\t(trial failed requirement, negative direction packet loss, direction: %s, trial result status: modified, trial result: %s)" % | ||
(direction, | ||
trial_result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is written it is possible that if other trial criteria would have already resulted in a failing trial result that encountering a negative packet loss "error" would negate that existing failure when --negative-packet-loss=pass
was used because it does not factor in the previous state of trial_result
.
if trial_result == 'fail': | ||
return trial_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this prevents other code in the evaluate_trial
function from running which is not something we want to do. We want all criteria to be evaluated so that the user has as clear a picture as possible to understand the behavior observed.
… sender