Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/opampsupervisor] Report mismatched identifying attributes found during bootstrapping #29864

Open
evan-bradley opened this issue Dec 13, 2023 · 6 comments
Assignees
Labels
cmd/opampsupervisor enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed

Comments

@evan-bradley
Copy link
Contributor

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

During bootstrapping, it may happen that the Supervisor's OpAMP server receives an AgentDescription message with a service instance ID or other identifying attribute that doesn't match what the Supervisor has set or expects.

This is a problem because the identifying attributes are intended to be used by the OpAMP server to determine which configuration to send to a Collector.

This could happen for a handful of reasons, using the instance ID as an example:

  1. The Collector has been configured to obtain its own instance ID and is ignoring the ID set by the Supervisor.
  2. There is an issue with the Supervisor setting the Collector's instance ID.
  3. Unlikely, but another Collector has connected to the Supervisor's bootstrap OpAMP server and is reporting its instance ID.

Describe the solution you'd like

If this happens, we should report the mismatch to the OpAMP server and stop the Collector. This behavior should be defined in the Supervisor specification document as well.

It's unclear whether the Supervisor should stay running after such an error occurs.

Describe alternatives you've considered

No response

Additional context

No response

Copy link
Contributor

Pinging code owners for cmd/opampsupervisor: @evan-bradley @atoulme @tigrannajaryan. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 13, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024
@evan-bradley evan-bradley added never stale Issues marked with this label will be never staled and automatically removed and removed Stale closed as inactive labels Apr 15, 2024
@evan-bradley evan-bradley reopened this Apr 15, 2024
@evan-bradley
Copy link
Contributor Author

There's an associated TODO for this in the Supervisor code, and it is a case we should handle.

@bacherfl
Copy link
Contributor

@evan-bradley I can look into creating a PR for this

@bacherfl
Copy link
Contributor

I have created a rough PR for now: #37541

Regarding the report to the opamp server, we have a bit of a chicken-egg problem here, as the startOpampClient method requires the agent Identifier to be available, so right now the client can only be established after having retrieved the bootstrap info successfully.
Regarding the restart of the collector, I have added a simple retry mechanism for, but my feeling is that this just adds more complexity to the code, and restarting the collector with the same config is unlikely to solve the issue of a mismatched instance UID - however there might be something i'm missing here, appreciate any opinions on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd/opampsupervisor enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

3 participants