[WIP] Add E2E Integration Test For Adaptive Sampling Processor #5951

mahadzaryab1 · 2024-09-07T15:11:29Z

Which problem is this PR solving?

Resolves Create e2e integration tests for Adaptive Sampling #5717

Description of the changes

How was this change tested?

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: yarn lint and yarn test

Signed-off-by: Mahad Zaryab <[email protected]>

codecov · 2024-09-07T15:18:44Z

Codecov Report

Attention: Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.89%. Comparing base (f411b3c) to head (ca9a8c9).

Files with missing lines	Patch %	Lines
internal/safeexpvar/safeexpvar.go	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5951      +/-   ##
==========================================
- Coverage   96.91%   96.89%   -0.03%     
==========================================
  Files         349      349              
  Lines       16587    16598      +11     
==========================================
+ Hits        16076    16082       +6     
- Misses        328      333       +5     
  Partials      183      183

Flag	Coverage Δ
badger_v1	`7.99% <0.00%> (-0.01%)`	⬇️
badger_v2	`1.82% <0.00%> (-0.01%)`	⬇️
cassandra-4.x-v1	`15.76% <0.00%> (-0.01%)`	⬇️
cassandra-4.x-v2	`1.74% <0.00%> (-0.01%)`	⬇️
cassandra-5.x-v1	`15.76% <0.00%> (-0.01%)`	⬇️
cassandra-5.x-v2	`1.74% <0.00%> (-0.01%)`	⬇️
elasticsearch-6.x-v1	`18.71% <0.00%> (+<0.01%)`	⬆️
elasticsearch-7.x-v1	`18.77% <0.00%> (-0.02%)`	⬇️
elasticsearch-8.x-v1	`18.95% <0.00%> (-0.03%)`	⬇️
elasticsearch-8.x-v2	`1.82% <0.00%> (-0.01%)`	⬇️
grpc_v1	`9.37% <0.00%> (-0.01%)`	⬇️
grpc_v2	`7.12% <0.00%> (-0.01%)`	⬇️
kafka-v1	`9.70% <0.00%> (-0.01%)`	⬇️
kafka-v2	`1.82% <0.00%> (-0.01%)`	⬇️
memory_v2	`1.82% <0.00%> (-0.01%)`	⬇️
opensearch-1.x-v1	`18.81% <0.00%> (+<0.01%)`	⬆️
opensearch-2.x-v1	`18.81% <0.00%> (-0.02%)`	⬇️
opensearch-2.x-v2	`1.82% <0.00%> (+0.01%)`	⬆️
tailsampling-processor	`0.46% <0.00%> (-0.01%)`	⬇️
unittests	`95.68% <72.22%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

yurishkuro · 2024-09-07T15:58:05Z

docker-compose/adaptive-sampling/Makefile

+# Copyright (c) 2024 The Jaeger Authors.
+# SPDX-License-Identifier: Apache-2.0
+
+BINARY ?= all-in-one


all-in-one is a v1 style binary. I would prefer we test v2 version (or at least both, but v2 is higher priority and testing v1 at this point is wasted work since v1 will be EOLed in a year)

@yurishkuro I was trying to test jaeger binary but I can't seem to access port 14268. Do you know why that is? I left more details on the issue: #5717 (comment)

Signed-off-by: Mahad Zaryab <[email protected]>

docker-compose/adaptive-sampling/docker-compose.yml

docker-compose/adaptive-sampling/Makefile

Signed-off-by: Mahad Zaryab <[email protected]>

docker-compose/adaptive-sampling/docker-compose.yml

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-09-17T03:37:39Z

plugin/sampling/strategyprovider/adaptive/aggregator.go

@@ -152,7 +152,7 @@ func (a *aggregator) HandleRootSpan(span *span_model.Span, logger *zap.Logger) {
 	}
 	samplerType, samplerParam := span.GetSamplerParams(logger)
 	if samplerType == span_model.SamplerTypeUnrecognized {
-		return
+		samplerType = span_model.SamplerTypeProbabilistic


@yurishkuro what kind of a config do we want to add to perform this override?

something like "do not check sampler tags"

@yurishkuro should this config be exposed as part of the YAML configuration? or do we just want it to be internal?

It should be user settable

mahadzaryab1 · 2024-09-17T03:38:22Z

plugin/sampling/strategyprovider/adaptive/post_aggregator.go

+	// 	}
+	// }
+	// return false
+	return true


@yurishkuro this causes the unit tests to fail and I believe its messing with the calculations as well. Any ideas on how we can get around this? If we don't hardcode this here however, the probability only gets calculated once.

it's very difficult to troubleshoot like this. I would suggest maybe altering tracegen and manually adding the sampler.type=probabilistic / sampler.param=0.5 (any value for now) attributes to the span to see how the system reacts to this. To my knowledge aside from this check the probability used by the sampler should not be affecting the calculations, but I may be wrong.

and another thing would help is to expose internal state via expvar so that we can actually monitor how that state changes.

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro · 2024-09-24T16:35:54Z

.github/workflows/ci-e2e-adaptivesampling-processor.yml

+    - name: Setup Node.js version
+      uses: ./.github/actions/setup-node.js


not sure you need this, unless the test specifically checks that the UI is able to render the metrics

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-10-06T01:05:50Z

@yurishkuro I added the expvar reporting to debug the first element in the service cache. Here is what I see after the first few intervals. Do these calculations make sense to you?

"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{1 true}]]"
"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.2083785217916667 true}]]"
"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.021478606836127154 true}]]"
"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.002214752631527174 true}]]"
"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.00022832613436132902 true}]]"
"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{2.3538892285428596e-05 true}]]"

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-10-06T03:13:48Z

plugin/sampling/strategyprovider/adaptive/post_aggregator.go

@@ -398,7 +401,7 @@ func (p *PostAggregator) isUsingAdaptiveSampling(
 	// before.
 	if len(p.serviceCache) > 1 {
 		if e := p.serviceCache[1].Get(service, operation); e != nil {
-			return e.UsingAdaptive && !FloatEquals(e.Probability, p.InitialSamplingProbability)
+			return !FloatEquals(e.Probability, p.InitialSamplingProbability)


@yurishkuro with this patch, the numbers seem to make a bit more sense. here's the output i see now

"post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{1 true}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.20840949054166666 false}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.021478798306074933 false}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.012634530863339348 false}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.02105483853348014 false}]]" …. "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.08421935413392057 false}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.1403881259748575 false}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.08254900503950167 false}]]" "post_aggregator_service_cache[0]": "map[tracegen:map[lets-go:{0.051602341629876265 false}]]"

let me know if you have any thoughts on how to proceed here

post_aggregator_service_cache name is unclear, are these probabilities, or throughput?

the Boolean value at the end looks suspicious, what does it mean? If it's "using adaptive sampling" indicator then we need to know why it goes to false.

its the sampling cache entry where the first value is the probability and the second is the using adaptive sampling indicator

i mostly reverted the old isAdaptiveSamplingLogic aside from the e.UsingAdaptive check. When I had it hardcoded to true, the values didn't make sense (see [WIP] Add E2E Integration Test For Adaptive Sampling Processor #5951 (comment))

yurishkuro · 2024-10-06T20:55:01Z

plugin/sampling/strategyprovider/adaptive/post_aggregator.go

@@ -346,6 +348,7 @@ func (p *PostAggregator) calculateProbability(service, operation string, qps flo
 		Probability:   oldProbability,
 		UsingAdaptive: usingAdaptiveSampling,
 	})
+	safeexpvar.SetString("post_aggregator_service_cache[0]", fmt.Sprintf("%v", p.serviceCache[0].ToValue()))


the tostring loses important information, we should use hierarchical expvar.Map

Setup Docker Compose With Jaeger All In One And Tracegen

6cc7c1b

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Sep 7, 2024

View reviewed changes

mahadzaryab1 added 2 commits September 7, 2024 15:28

Use V2 Binary Instead of V1

e4eb3b6

Signed-off-by: Mahad Zaryab <[email protected]>

Adjust Parameters For Integration Test

ea76c8e

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Sep 7, 2024

View reviewed changes

docker-compose/adaptive-sampling/docker-compose.yml Show resolved Hide resolved

docker-compose/adaptive-sampling/Makefile Outdated Show resolved Hide resolved

mahadzaryab1 and others added 8 commits September 7, 2024 15:48

Fix Makefile Cleanup

5483545

Signed-off-by: Mahad Zaryab <[email protected]>

Expose Port 4318 In Jaeger

8ff74a7

Signed-off-by: Mahad Zaryab <[email protected]>

Revert To Port 5000

6a9699a

Signed-off-by: Mahad Zaryab <[email protected]>

Merge branch 'main' into adaptive-sampling-e2e

a65c190

Remove Leader Check In Adaptive Strategy Provider

3018caf

Signed-off-by: Mahad Zaryab <[email protected]>

Merge branch 'main' into adaptive-sampling-e2e

d7ba2ce

Reduce Calculation Interval And Calculation Delay

c3b1ca4

Signed-off-by: Mahad Zaryab <[email protected]>

Remove Unused Method

7561945

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Sep 14, 2024

View reviewed changes

docker-compose/adaptive-sampling/docker-compose.yml Outdated Show resolved Hide resolved

mahadzaryab1 and others added 8 commits September 14, 2024 18:56

Make Forwarding Port Explicit

9aaaf75

Signed-off-by: Mahad Zaryab <[email protected]>

Merge branch 'main' into adaptive-sampling-e2e

3dc923d

Hardcode Adaptive Sampling

7dd33d1

Signed-off-by: Mahad Zaryab <[email protected]>

Add Expvar Extension

b2be33c

Signed-off-by: Mahad Zaryab <[email protected]>

Add Script For E2E Integration Test

b996270

Signed-off-by: Mahad Zaryab <[email protected]>

Add Github Action

d661a0f

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Typo

7aba57e

Signed-off-by: Mahad Zaryab <[email protected]>

Merge branch 'main' into adaptive-sampling-e2e

c9811ab

mahadzaryab1 commented Sep 17, 2024

View reviewed changes

mahadzaryab1 added 2 commits September 16, 2024 22:00

Add Build Step To Script

735704c

Signed-off-by: Mahad Zaryab <[email protected]>

Add Missing Components To Workflow File

f485cc8

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Sep 24, 2024

View reviewed changes

Add ExpVar Debugging For Post Aggregator Service Cache

54cd6b8

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 and others added 5 commits October 5, 2024 21:21

Merge branch 'main' into adaptive-sampling-e2e

cf8dd9c

Use New Configuration Schema

e18d8d5

Signed-off-by: Mahad Zaryab <[email protected]>

Patch To Only Remove One Check

24d11d4

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Linting

9ebf133

Signed-off-by: Mahad Zaryab <[email protected]>

Comment Out Failing Tests For Now

ca9a8c9

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 commented Oct 6, 2024

View reviewed changes

yurishkuro reviewed Oct 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add E2E Integration Test For Adaptive Sampling Processor #5951

[WIP] Add E2E Integration Test For Adaptive Sampling Processor #5951

mahadzaryab1 commented Sep 7, 2024

codecov bot commented Sep 7, 2024 •

edited

Loading

yurishkuro Sep 7, 2024

mahadzaryab1 Sep 7, 2024 •

edited

Loading

mahadzaryab1 Sep 17, 2024

yurishkuro Sep 24, 2024

mahadzaryab1 Sep 30, 2024 •

edited

Loading

yurishkuro Sep 30, 2024

mahadzaryab1 Sep 17, 2024

yurishkuro Sep 24, 2024

yurishkuro Sep 24, 2024

yurishkuro Sep 24, 2024

mahadzaryab1 commented Oct 6, 2024

mahadzaryab1 Oct 6, 2024

mahadzaryab1 Oct 6, 2024

yurishkuro Oct 6, 2024

mahadzaryab1 Oct 6, 2024

yurishkuro Oct 6, 2024

		- name: Setup Node.js version
		uses: ./.github/actions/setup-node.js

[WIP] Add E2E Integration Test For Adaptive Sampling Processor #5951

Are you sure you want to change the base?

[WIP] Add E2E Integration Test For Adaptive Sampling Processor #5951

Conversation

mahadzaryab1 commented Sep 7, 2024

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

codecov bot commented Sep 7, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

mahadzaryab1 Sep 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahadzaryab1 Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mahadzaryab1 commented Oct 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 7, 2024 •

edited

Loading

mahadzaryab1 Sep 7, 2024 •

edited

Loading

mahadzaryab1 Sep 30, 2024 •

edited

Loading