Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172

wanghekai · 2021-06-30T11:08:08Z

… sender

wanghekai · 2021-06-30T11:08:28Z

Add negative_packet_loss_mode pass option for Mellanox nic as traffic sender

k-rister

The functionality that this patch is attempting to add is something that we have not previously seen much value in which is why we do not have this capability already. In almost every scenario where we use trafficgen these days we use stream statistics to be able to only calculate packet loss on packets that our code generates -- meaning we ignore broadcast traffic and other "noise" that exists on the network fabric when doing our packet loss calculations. The only way that negative packet loss should be observed is the following scenarios:

You use --use-device-stats because you have to and you receive packets that you did not transmit -- so RX-TX equals a negative number. The question is why do you have to use --use-device-stats since this is often a crutch solving a different problem entirely than what is addressed in this patch.

Something is wrong with your network topology/packet forwarding setup such that you receive more packets than you transmit. If this occurs with latency packets (which are uniquely identifiable then you will get a different error (see

trafficgen/binary-search.py

Lines 1712 to 1719 in 6cbf236

    
           if 'latency_duplicate_error' in trial_stats[dev_pair['rx']]: 
        
                if trial_params['duplicate_packet_failure_mode'] == 'quit': 
        
                     trial_result = 'abort' 
        
                     bs_logger("\t(critical requirement failure, duplicate latency packets detected, device pair: %d -> %d, pg_ids: [%s], trial result: %s)" % 
        
                               (dev_pair['tx'], 
        
                                dev_pair['rx'], 
        
                                trial_stats[dev_pair['rx']]['latency_duplicate_error'], 
        
                                trial_result))

) which can be avoided using a configuration parameter (see

trafficgen/binary-search.py

Lines 212 to 217 in 6cbf236

    
           parser.add_argument('--duplicate-packet-failure', 
        
                               dest='duplicate_packet_failure_mode', 
        
                               help='What to do when a duplicate packet failure is encountered', 
        
                               default = 'quit', 
        
                               choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ] 
        
                               )

). If you disable latency packets and are still seeing negative packet loss while not using --use-device-stats then you cannot trust any of the packet loss calculations since the bulk packets are not uniquely identifiable -- ie. you cannot tell when you transmit X packets and receive X packets if you received X unique packets or you receive 1 packet X times and dropped X-1 packets. For this reason all of the results that are generated are questionable and you have to ask yourself what you are really determining by running the test.

Ultimately, before relying on the functionality that this patch is trying to add I would take a long, hard look at why you need this because this may just mask an issue that should be resolved in some other way.

Also, if we were going to merge this (assuming the requested changes are addressed), I would prefer that rather than adding pass as an option we do something like adding ignore as the option. Then, if --negative-packet-loss mode is set to ignore you go ahead and log the detection of this result but do not modify the value of trial_result.

k-rister · 2021-06-30T13:34:31Z

binary-search.py

@@ -230,8 +230,8 @@ def process_options ():
    parser.add_argument('--negative-packet-loss',
                        dest='negative_packet_loss_mode',
                        help='What to do when negative packet loss is encountered',
-                        default = 'quit',
-                        choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ]
+                        default = 'pass',


We cannot change the default behavior for this result.

k-rister · 2021-06-30T13:38:54Z

binary-search.py

                         trial_result = trial_params['negative_packet_loss_mode']
-                         bs_logger("\t(trial failed requirement, negative direction packet loss, direction: %s, trial result status: modified, trial result: %s)" %
-                                   (direction,
-                                    trial_result))
+                         if trial_result != 'pass':
+                              bs_logger("\t(trial failed requirement, negative direction packet loss, direction: %s, trial result status: modified, trial result: %s)" %
+                                        (direction,
+                                        trial_result))


The way this is written it is possible that if other trial criteria would have already resulted in a failing trial result that encountering a negative packet loss "error" would negate that existing failure when --negative-packet-loss=pass was used because it does not factor in the previous state of trial_result.

k-rister · 2021-06-30T13:43:39Z

binary-search.py

+                    if trial_result == 'fail':
+                         return trial_result


Adding this prevents other code in the evaluate_trial function from running which is not something we want to do. We want all criteria to be evaluated so that the user has as clear a picture as possible to understand the behavior observed.

Add negative_packet_loss_mode pass option for Mellanox nic as traffic…

e05b9a9

… sender

k-rister suggested changes Jun 30, 2021

View reviewed changes

wanghekai and others added 2 commits October 13, 2022 19:28

increse sync and async timeout to 30 secs

f5d8651

Merge branch 'atheurer:master' into master

bba6d62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172

Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172

wanghekai commented Jun 30, 2021

wanghekai commented Jun 30, 2021

k-rister left a comment

k-rister Jun 30, 2021

k-rister Jun 30, 2021

k-rister Jun 30, 2021

	if 'latency_duplicate_error' in trial_stats[dev_pair['rx']]:
	if trial_params['duplicate_packet_failure_mode'] == 'quit':
	trial_result = 'abort'
	bs_logger("\t(critical requirement failure, duplicate latency packets detected, device pair: %d -> %d, pg_ids: [%s], trial result: %s)" %
	(dev_pair['tx'],
	dev_pair['rx'],
	trial_stats[dev_pair['rx']]['latency_duplicate_error'],
	trial_result))

	parser.add_argument('--duplicate-packet-failure',
	dest='duplicate_packet_failure_mode',
	help='What to do when a duplicate packet failure is encountered',
	default = 'quit',
	choices = [ 'fail', 'quit', 'retry-to-fail', 'retry-to-quit' ]
	)

Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172

Are you sure you want to change the base?

Add negative_packet_loss_mode pass option for Mellanox nic as traffic… #172

Conversation

wanghekai commented Jun 30, 2021

wanghekai commented Jun 30, 2021

k-rister left a comment

Choose a reason for hiding this comment

k-rister Jun 30, 2021

Choose a reason for hiding this comment

k-rister Jun 30, 2021

Choose a reason for hiding this comment

k-rister Jun 30, 2021

Choose a reason for hiding this comment