Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the performance regression detection method to reduce false positives. #58

Merged
merged 2 commits into from
Oct 12, 2023

Conversation

linlin-s
Copy link
Contributor

Issue #, if available:

#53

Description of changes:

This PR optimized the method of detecting performance regression by making these changes:

  1. Removing the logic of calculating threshold value of regression detection. Instead, using two-sample t-test to compare the benchmark results before and after changes to see if regression happened. More details about how to use t-test method to detect regression is included in the comments of source code.
  2. Adding logic to remove outliers from raw data. Two-sample t-test requires normally distributed data. Given that some of our raw data deviates from this due to noise, we preprocess using the IQR (Interquartile range) method to filter out outliers. After this preprocessing, we verify data is normally distributed by using Shapiro–Wilk test.
  3. Adding unit tests for methods removeOutliers and detectRegression.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this from the commit.

Comment on lines 76 to 80
// Calculate means of both datasets
double meanBefore = StatUtils.mean(before);
double meanAfter = StatUtils.mean(after);
// Calculate the difference in means (regression value)
double regressionValue = (meanAfter - meanBefore) / meanBefore;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code would be more logical/readable if these lines were moved into the if (pValue <0.05) { block.

for (int i = 0; i < rawDataList.size(); i++) {
IonDecimal score = (IonDecimal) rawDataList.get(i);
rawData.add(score.bigDecimalValue());
public static double detectRegression(double[] before, double[] after) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this is detecting whether there is a statistically significant change. It looks like it will return a non-zero value both for a significant regression and a significant improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this is detecting whether there is a statistically significant change. It looks like it will return a non-zero value both for a significant regression and a significant improvement.

Yes, and only significant regression value will be added to the comparisonResults map which is used for deciding whether ion-java-regression-detection workflow should fail or not. Here is the comment about the conditional check

Comment on lines 129 to 135
List<Double> filteredData = new ArrayList<>();
for (double value : data) {
if (value >= lowerBound && value <= upperBound) {
filteredData.add(value);
}
}
return scoreStruct;
return filteredData.stream().mapToDouble(d -> d).toArray();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this a lot more concise. (I'm only 95% sure the syntax of suggestion is correct, but I know the concept is sound.)

return Arrays.stream(data).filter(d -> lowerBound <= d && d <= upperBound).toArray();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

if (score.getType().equals(IonType.FLOAT)) {
IonFloat scoreFloat = (IonFloat) score;
return scoreFloat.bigDecimalValue();
public static double[] preProcess(String benchmarkResult, String keyWord) throws Exception {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a more descriptive name than preProcess. How about loadKeywordSpecificBenchmarkResults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good to me. I'll update it in the next commit.

IonStruct benchmarkResultStruct = readHelper(benchmarkResultFilePath);
IonStruct parameterStruct = (IonStruct) benchmarkResultStruct.get(PARAMETERS);
return parameterStruct.get(keyWord);
public static DoubleStream toDouble(IonList data){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be private. You should also just return List<Double> or double[] unless you're going to be passing around DoubleStream to other functions. (E.g. if the input and output of removeOutliers was DoubleStream.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will change to private in the next commit.
The raw data is a nested IonList, and toDouble is used for transform the inner IonList to DoubleStream while flattening the raw data. The return value of toDouble is not directly passing to removeOutliers. Here is where the data flattening happened. In this case, should we still consider the returned data type as List<Double> or double[] ?

Comment on lines -41 to -42
private final static String GC_ALLOCATE = "·gc.alloc.rate";
private final static String HEAP_USAGE = "Heap usage";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me why it's okay to get rid of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me why it's okay to get rid of these?

These two constant variables were used for composing the regression detection summary to represent which metric has regression. For now, we compose the regression detection summary by iterating the comparisonResults. The map comparisonResults contains key-value pairs where the keys are metrics that have regression detected and the values are the amounts they have regressed. The keys already include the names of deleted variables, so the deleted information does not need to be initialized separately.

Copy link
Contributor

@tgregg tgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The new technique is simpler (in the code we own, anyway) and more sound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an accidental commit?

@@ -44,7 +44,7 @@ public class Main {

+ " ion-java-benchmark run-suite (--test-ion-data <file_path>) (--benchmark-options-combinations <file_path>) <output_file>\n"

+ " ion-java-benchmark compare (--benchmark-result-previous <file_path>) (--benchmark-result-new <file_path>) <output_file>\n"
+ " ion-java-benchmark compare (--benchmark-result-previous <file_path>) (--benchmark-result-new <file_path>)\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was <output_file> not used? Or did you determine that it is not useful, and we should always write to stdout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, <output_file> is not useful for now. We will write out the regression results.

double[] previousData = preProcess(benchmarkResultPrevious, benchmarkScoreKeyword);
double[] newData = preProcess(benchmarkResultNew, benchmarkScoreKeyword);
double comparisonResult = detectRegression(previousData, newData);
if (comparisonResult > 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to this comment
detectRegression() method will return a non-zero value both for a significant regression and a significant improvement and only significant regression (when comparison result > 0) will be added to the final comparisonResults map.

@linlin-s linlin-s merged commit 2e995de into master Oct 12, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants