Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OSQuery Manager] Remove or greatly increase OSQueryMaxTimeout #42352

Open
qcorporation opened this issue Jan 20, 2025 · 8 comments
Open

[OSQuery Manager] Remove or greatly increase OSQueryMaxTimeout #42352

qcorporation opened this issue Jan 20, 2025 · 8 comments
Labels
bug Osquerybeat Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution

Comments

@qcorporation
Copy link

qcorporation commented Jan 20, 2025

Context

One of our customers are running into timeout issues with executing Yara disk scans, which could take hours based on the environment
From the field:

We would like to be able to use this for actions requiring a full system scan, which would, of course, take more than 15 minutes.
The author has pointed out that osqueryMaxTimeout = 15 * time.Minute in osquerybeat.go

The limit of 15 minutes was put there on purpose to avoid long-running queries blocking other possible query executions.
At some point, we need to weigh the benefits and possible implications of changing this hard-coded timeout.

Discussion

  • What should that max limit be if any?
  • Will we get complaints down the road that the queries never return or the queries never executed (because there is one long-running blocking query), or, for example, the queries triggered by the security alert expired or were delayed by so much that they became useless?

These are the things to consider when increasing the max timeout

Tasks

  • Work with the PM team to define a reasonable max time out value or no value at all
  • Implement the change in line with the next stack release.
  • Create documentation with regards to increasing the timeout and it's implications at the stack level
@qcorporation qcorporation added bug Osquerybeat Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution labels Jan 20, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

@qcorporation
Copy link
Author

@jamiehynds, @roxana-gheorghe Can you please weigh in on the questions within the ticket?
@andrewkroh : great to get your feedback on this

cc.ing @mjwolf @ThorbenJ

@ThorbenJ
Copy link

ThorbenJ commented Jan 20, 2025

I would argue to let the customer decided by making this configurable in the policy (not remove it) and to document the possible impact of increasing the timeout.
The customer I am working with already points out that this issue is not exclusive to Elastic. Other endpoint vendors have similar issues with long running (collection) jobs delaying/blocking automated or detection triggered jobs.

Stretch goals might be to provided more insight to the job queue and job status. To do some sort of multi processing; e.g. one for long jobs and one for (Automated) short jobs - but most important right now, I would argue, is to give end users the choice/control over timeouts.

@jamiehynds
Copy link

++ to making it configurable in the policy, but with a maximum value to ensure we don't block other queries. As for a reasonable max, I'm open to suggestions. How do we feel about a 24 hour max, too long/short?

@qcorporation
Copy link
Author

I would argue to let the customer decided by making this configurable in the policy (not remove it) and to document the possible impact of increasing the timeout. The customer I am working with already points out that this issue is not exclusive to Elastic. Other endpoint vendors have similar issues with long running (collection) jobs delaying/blocking automated or detection triggered jobs.

Stretch goals might be to provided more insight to the job queue and job status. To do some sort of multi processing; e.g. one for long jobs and one for (Automated) short jobs - but most important right now, I would argue, is to give end users the choice/control over timeouts.

@ThorbenJ, thanks for the feedback; I've added a task to document the implications at the deployment level when there is a code change to address this.
Do you happen to have any telemetry on how long could a Yara file scan take before a user would abandon the results?

@ThorbenJ
Copy link

ThorbenJ commented Jan 21, 2025

I've shared a link to this with the particular customer I am working with, I will comment if they have an opinion on the max. To me 24h seems reasonable. However I would not make that the default, I think we should keep the current 15min. or there abouts default. And make the config per policy so different hosts with different policies have different timeouts.

@qcorporation
Copy link
Author

qcorporation commented Jan 21, 2025

I've shared a link to this with the particular customer I am working with, I will comment if they have an opinion on the max. To me 24h seems reasonable. However I would not make that the default, I think we should keep the current 15min. or there abouts default. And make the config per policy so different hosts with different policies have different timeouts.

@ThorbenJ Thanks, any information would be great.
I agree; we are not changing the defaults within this ticket, just the maximum limit for the timeout.

Thank you, @mjwolf, for posting the PR. If folks are in agreement, let's try to get it in before the FF

@mjwolf
Copy link
Contributor

mjwolf commented Jan 21, 2025

I will increase the max timeout to 24h. Two PRs are needed to do this, as Kibana also enforces the maximum timeout that can be set. #42356, elastic/kibana#207276.

This will be a global increase in the max timeout. If we want to set it per-policy or add other things like a work queue, I think that would be more work than can be done before the 9.0 or 8.18 feature freeze.

mjwolf added a commit to elastic/kibana that referenced this issue Jan 31, 2025
Some Osquery queries are expected to be long running. To accommodate
this, increase the maximum timeout in the query creation UI to 24 hours
(86400 seconds).

24 hours should allow most long-running queries, while still having a
limit that ensures misbehaving queries do not block others for an
extremely long time.

Relates to elastic/beats#42352. Osquerybeat
will also increase its timeout limit to 24h, this change will allow the
higher timeout to be set by users in Kibana.
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jan 31, 2025
Some Osquery queries are expected to be long running. To accommodate
this, increase the maximum timeout in the query creation UI to 24 hours
(86400 seconds).

24 hours should allow most long-running queries, while still having a
limit that ensures misbehaving queries do not block others for an
extremely long time.

Relates to elastic/beats#42352. Osquerybeat
will also increase its timeout limit to 24h, this change will allow the
higher timeout to be set by users in Kibana.

(cherry picked from commit 81a57e0)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jan 31, 2025
Some Osquery queries are expected to be long running. To accommodate
this, increase the maximum timeout in the query creation UI to 24 hours
(86400 seconds).

24 hours should allow most long-running queries, while still having a
limit that ensures misbehaving queries do not block others for an
extremely long time.

Relates to elastic/beats#42352. Osquerybeat
will also increase its timeout limit to 24h, this change will allow the
higher timeout to be set by users in Kibana.

(cherry picked from commit 81a57e0)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jan 31, 2025
Some Osquery queries are expected to be long running. To accommodate
this, increase the maximum timeout in the query creation UI to 24 hours
(86400 seconds).

24 hours should allow most long-running queries, while still having a
limit that ensures misbehaving queries do not block others for an
extremely long time.

Relates to elastic/beats#42352. Osquerybeat
will also increase its timeout limit to 24h, this change will allow the
higher timeout to be set by users in Kibana.

(cherry picked from commit 81a57e0)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Jan 31, 2025
Some Osquery queries are expected to be long running. To accommodate
this, increase the maximum timeout in the query creation UI to 24 hours
(86400 seconds).

24 hours should allow most long-running queries, while still having a
limit that ensures misbehaving queries do not block others for an
extremely long time.

Relates to elastic/beats#42352. Osquerybeat
will also increase its timeout limit to 24h, this change will allow the
higher timeout to be set by users in Kibana.

(cherry picked from commit 81a57e0)
kibanamachine added a commit to elastic/kibana that referenced this issue Jan 31, 2025
# Backport

This will backport the following commits from `main` to `8.x`:
- [Increase maximum Osquery timeout to 24 hours
(#207276)](#207276)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Michael
Wolf","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-31T00:18:47Z","message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f","branchLabelMapping":{"^v9.0.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-major"],"title":"Increase
maximum Osquery timeout to 24
hours","number":207276,"url":"https://github.com/elastic/kibana/pull/207276","mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/207276","number":207276,"mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}}]}]
BACKPORT-->

Co-authored-by: Michael Wolf <[email protected]>
kibanamachine added a commit to elastic/kibana that referenced this issue Jan 31, 2025
# Backport

This will backport the following commits from `main` to `8.17`:
- [Increase maximum Osquery timeout to 24 hours
(#207276)](#207276)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Michael
Wolf","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-31T00:18:47Z","message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f","branchLabelMapping":{"^v9.0.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-major"],"title":"Increase
maximum Osquery timeout to 24
hours","number":207276,"url":"https://github.com/elastic/kibana/pull/207276","mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/207276","number":207276,"mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}}]}]
BACKPORT-->

Co-authored-by: Michael Wolf <[email protected]>
kibanamachine added a commit to elastic/kibana that referenced this issue Jan 31, 2025
# Backport

This will backport the following commits from `main` to `8.16`:
- [Increase maximum Osquery timeout to 24 hours
(#207276)](#207276)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Michael
Wolf","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-31T00:18:47Z","message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f","branchLabelMapping":{"^v9.0.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-major"],"title":"Increase
maximum Osquery timeout to 24
hours","number":207276,"url":"https://github.com/elastic/kibana/pull/207276","mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/207276","number":207276,"mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}}]}]
BACKPORT-->

Co-authored-by: Michael Wolf <[email protected]>
kibanamachine added a commit to elastic/kibana that referenced this issue Jan 31, 2025
# Backport

This will backport the following commits from `main` to `8.18`:
- [Increase maximum Osquery timeout to 24 hours
(#207276)](#207276)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Michael
Wolf","email":"[email protected]"},"sourceCommit":{"committedDate":"2025-01-31T00:18:47Z","message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f","branchLabelMapping":{"^v9.0.0$":"main","^v8.19.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:enhancement","v9.0.0","backport:prev-major"],"title":"Increase
maximum Osquery timeout to 24
hours","number":207276,"url":"https://github.com/elastic/kibana/pull/207276","mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/207276","number":207276,"mergeCommit":{"message":"Increase
maximum Osquery timeout to 24 hours (#207276)\n\nSome Osquery queries
are expected to be long running. To accommodate\r\nthis, increase the
maximum timeout in the query creation UI to 24 hours\r\n(86400
seconds).\r\n\r\n24 hours should allow most long-running queries, while
still having a\r\nlimit that ensures misbehaving queries do not block
others for an\r\nextremely long time.\r\n\r\nRelates to
elastic/beats#42352. Osquerybeat\r\nwill also
increase its timeout limit to 24h, this change will allow the\r\nhigher
timeout to be set by users in
Kibana.","sha":"81a57e005ed0a6b72a254056813b1c6ee633da1f"}}]}]
BACKPORT-->

Co-authored-by: Michael Wolf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Osquerybeat Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution
Projects
None yet
Development

No branches or pull requests

5 participants