Update the performance regression detection method to reduce false positives. #58

linlin-s · 2023-10-10T18:07:08Z

Issue #, if available:

#53

Description of changes:

This PR optimized the method of detecting performance regression by making these changes:

Removing the logic of calculating threshold value of regression detection. Instead, using two-sample t-test to compare the benchmark results before and after changes to see if regression happened. More details about how to use t-test method to detect regression is included in the comments of source code.
Adding logic to remove outliers from raw data. Two-sample t-test requires normally distributed data. Given that some of our raw data deviates from this due to noise, we preprocess using the IQR (Interquartile range) method to filter out outliers. After this preprocessing, we verify data is normally distributed by using Shapiro–Wilk test.
Adding unit tests for methods removeOutliers and detectRegression.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…stives.

popematt · 2023-10-10T23:03:32Z

src/com/amazon/ion/benchmark/.DS_Store

Please remove this from the commit.

popematt · 2023-10-11T19:40:02Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

+        // Calculate means of both datasets
+        double meanBefore = StatUtils.mean(before);
+        double meanAfter = StatUtils.mean(after);
+        // Calculate the difference in means (regression value)
+        double regressionValue = (meanAfter - meanBefore) / meanBefore;


The code would be more logical/readable if these lines were moved into the if (pValue <0.05) { block.

popematt · 2023-10-11T19:41:08Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

-        for (int i = 0; i < rawDataList.size(); i++) {
-            IonDecimal score = (IonDecimal) rawDataList.get(i);
-            rawData.add(score.bigDecimalValue());
+    public static double detectRegression(double[] before, double[] after) {


Technically, this is detecting whether there is a statistically significant change. It looks like it will return a non-zero value both for a significant regression and a significant improvement.

Technically, this is detecting whether there is a statistically significant change. It looks like it will return a non-zero value both for a significant regression and a significant improvement.

Yes, and only significant regression value will be added to the comparisonResults map which is used for deciding whether ion-java-regression-detection workflow should fail or not. Here is the comment about the conditional check

popematt · 2023-10-11T19:51:19Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

+        List<Double> filteredData = new ArrayList<>();
+        for (double value : data) {
+            if (value >= lowerBound && value <= upperBound) {
+                filteredData.add(value);
+            }
        }
-        return scoreStruct;
+        return filteredData.stream().mapToDouble(d -> d).toArray();


We can make this a lot more concise. (I'm only 95% sure the syntax of suggestion is correct, but I know the concept is sound.)

return Arrays.stream(data).filter(d -> lowerBound <= d && d <= upperBound).toArray();

Thanks for the suggestion.

popematt · 2023-10-11T19:52:24Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

-        if (score.getType().equals(IonType.FLOAT)) {
-            IonFloat scoreFloat = (IonFloat) score;
-            return scoreFloat.bigDecimalValue();
+    public static double[] preProcess(String benchmarkResult, String keyWord) throws Exception {


We should use a more descriptive name than preProcess. How about loadKeywordSpecificBenchmarkResults?

Yes, sounds good to me. I'll update it in the next commit.

popematt · 2023-10-11T19:54:23Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

-        IonStruct benchmarkResultStruct = readHelper(benchmarkResultFilePath);
-        IonStruct parameterStruct = (IonStruct) benchmarkResultStruct.get(PARAMETERS);
-        return parameterStruct.get(keyWord);
+    public static DoubleStream toDouble(IonList data){


This can be private. You should also just return List<Double> or double[] unless you're going to be passing around DoubleStream to other functions. (E.g. if the input and output of removeOutliers was DoubleStream.)

Thanks, will change to private in the next commit.
The raw data is a nested IonList, and toDouble is used for transform the inner IonList to DoubleStream while flattening the raw data. The return value of toDouble is not directly passing to removeOutliers. Here is where the data flattening happened. In this case, should we still consider the returned data type as List<Double> or double[] ?

popematt · 2023-10-11T19:55:24Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

-    private final static String GC_ALLOCATE = "·gc.alloc.rate";
-    private final static String HEAP_USAGE = "Heap usage";


Can you explain to me why it's okay to get rid of these?

Can you explain to me why it's okay to get rid of these?

These two constant variables were used for composing the regression detection summary to represent which metric has regression. For now, we compose the regression detection summary by iterating the comparisonResults. The map comparisonResults contains key-value pairs where the keys are metrics that have regression detected and the values are the amounts they have regressed. The keys already include the names of deleted variables, so the deleted information does not need to be initialized separately.

tgregg

Nice! The new technique is simpler (in the code we own, anyway) and more sound.

tgregg · 2023-10-11T19:48:09Z

src/com/amazon/ion/benchmark/.DS_Store

This looks like an accidental commit?

tgregg · 2023-10-11T19:50:28Z

src/com/amazon/ion/benchmark/Main.java

@@ -44,7 +44,7 @@ public class Main {

        + "  ion-java-benchmark run-suite (--test-ion-data <file_path>) (--benchmark-options-combinations <file_path>) <output_file>\n"

-        + "  ion-java-benchmark compare (--benchmark-result-previous <file_path>) (--benchmark-result-new <file_path>) <output_file>\n"
+        + "  ion-java-benchmark compare (--benchmark-result-previous <file_path>) (--benchmark-result-new <file_path>)\n"


Was <output_file> not used? Or did you determine that it is not useful, and we should always write to stdout?

Yes, <output_file> is not useful for now. We will write out the regression results.

linlin-s · 2023-10-11T20:09:57Z

src/com/amazon/ion/benchmark/ParseAndCompareBenchmarkResults.java

+            double[] previousData = preProcess(benchmarkResultPrevious, benchmarkScoreKeyword);
+            double[] newData = preProcess(benchmarkResultNew, benchmarkScoreKeyword);
+            double comparisonResult = detectRegression(previousData, newData);
+            if (comparisonResult > 0) {


Replying to this comment
detectRegression() method will return a non-zero value both for a significant regression and a significant improvement and only significant regression (when comparison result > 0) will be added to the final comparisonResults map.

Update the performance regression detection method to reduce false po…

2b02d3a

…stives.

popematt reviewed Oct 11, 2023

View reviewed changes

tgregg approved these changes Oct 11, 2023

View reviewed changes

linlin-s commented Oct 11, 2023

View reviewed changes

Updates the commit to resolve the comments.

163d14e

linlin-s mentioned this pull request Oct 12, 2023

Optimize ion-java-regression-detection workflow to improve the accuracy. amazon-ion/ion-java#603

Merged

linlin-s requested a review from popematt October 12, 2023 17:56

popematt approved these changes Oct 12, 2023

View reviewed changes

linlin-s merged commit 2e995de into master Oct 12, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the performance regression detection method to reduce false positives. #58

Update the performance regression detection method to reduce false positives. #58

linlin-s commented Oct 10, 2023

popematt Oct 10, 2023

popematt Oct 11, 2023

popematt Oct 11, 2023

linlin-s Oct 11, 2023

popematt Oct 11, 2023

linlin-s Oct 11, 2023

popematt Oct 11, 2023

linlin-s Oct 11, 2023

popematt Oct 11, 2023

linlin-s Oct 11, 2023

popematt Oct 11, 2023

linlin-s Oct 11, 2023

tgregg left a comment

tgregg Oct 11, 2023

tgregg Oct 11, 2023

linlin-s Oct 11, 2023

linlin-s Oct 11, 2023

		private final static String GC_ALLOCATE = "·gc.alloc.rate";
		private final static String HEAP_USAGE = "Heap usage";

Update the performance regression detection method to reduce false positives. #58

Update the performance regression detection method to reduce false positives. #58

Conversation

linlin-s commented Oct 10, 2023

Issue #, if available:

Description of changes:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgregg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment