Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment #9887

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Oct 25, 2023

not to run into an exec(3) error E2BIG due to a too long argument. This sends a notification with truncated output instead of not sending.

fixes #9340

Reproduction before this PR

[2023-10-25 12:14:34 +0200] warning/PluginCheckTask: Check command for object 'e2big' (PID: 90872, arguments: 'true') terminated with exit code 128, output: execvpe(true) failed: Argument list too long

@Al2Klimov Al2Klimov added the consider backporting Should be considered for inclusion in a bugfix release label Oct 25, 2023
@Al2Klimov Al2Klimov added this to the 2.15.0 milestone Oct 25, 2023
@cla-bot cla-bot bot added the cla/signed label Oct 25, 2023
@icinga-probot icinga-probot bot added area/notifications Notification events bug Something isn't working labels Oct 25, 2023
@julianbrost
Copy link
Contributor

I strongly disagree with the basic idea of this PR: if a function is supposed to execute a specific command, it may not simply truncate the command and hope for the best. If you alter the command, you don't know if it still performs the same or at least a desirable action. For example, you don't know that you're not truncating a list of e-mail addresses, cut off a domain name somewhere in the middle and then send a notification to a wrong address. If automatic truncation should happen, it would have to know where it's safe to do so.

So for me, truncating specific values in the command definition as suggested by #9340 (comment) sound like a way safer and better approach in the short term. In the longer term, we could discuss if it even makes sense allowing arbitrarily large check outputs or if it's a broken check that's just sending garbage anyways if it exceeds some limit.

@julianbrost julianbrost removed their request for review October 25, 2023 12:57
@Al2Klimov
Copy link
Member Author

@julianbrost Do you want to touch every single command definition? (As a user.) Before or after you realise that you're loosing notifications?

@lippserd Please share your opinion on the concept (as a review).

@julianbrost
Copy link
Contributor

Do you want to touch every single command definition? (As a user.)

If that and this PR are my only options, yes. If the configuration results in a command that's too long to execute, it's safer to return an error instead of arbitrarily removing parts of the command until it's short enough to execute.

There are other options we could do though. After all, your PR probably tries to truncate the plugin output but that code location no longer has the information what part was the plugin output, so you make a guess. For a safe option, we have to know what we're truncating. In order to achieve this for existing configurations without user intervention, we'd have to ensure that the $output$ length is limited to a sane value, so one option could be to say that $output$ is always truncated to 64K and if you really want it, there's some $output_full$. Or we could place a general limit on the output size, which might also be a good idea in terms of memory usage and database growth. But all of this sound more like something to be discussed for 2.15 instead of 2.14.1.

@slalomsk8er
Copy link
Contributor

This reminds me of Linuxfabrik/monitoring-plugins#559

@Al2Klimov
Copy link
Member Author

I understand and respect your position. I just don't agree with it.

First of all, almost everything can handle long outputs. Only exec(3) can't. So IMAO just it should shorten stuff as it's objectively necessary.

Second, why 64K? Let me derive a formula from your unexpected shortening is bad opinion: the more you shorten unexpectedly, the worse. So I'd prefer to shorten only as much as the OS requires me to do. Admittedly the current algorithm is a bit radical while dividing strings by 2 every time. Maybe I can change this to, say, 1.5 if you wish. After all this is the slow path(TM) where such delay is acceptable.

Also, what about this alternative: the Process takes a bool whether to shorten which isn't set for checks. So you notice when your checks fail due to long strings. Also, I mean... we have to shorten notification cmd lines directly or indirectly. So, given this alternative, does it really matter where in the code that happens?

@julianbrost
Copy link
Contributor

does it really matter where in the code that happens?

Yes, you're changing a general "execute this command" function that does not know what exactly it's shortening. It should know that it's shortening the output and only the output (and maybe other fields that were explicitly marked as safe to be shortened). Otherwise, you don't know if your shortening operation is safe to perform.

@julianbrost julianbrost removed their request for review November 7, 2023 08:17
@Al2Klimov
Copy link
Member Author

It should know that it's shortening the output and only the output (and maybe other fields that were explicitly marked as safe to be shortened).

OK if PluginNotificationTask::ScriptFunc() fetches host.output, service.output and notification.comment values, passes them down to PluginUtility::ExecuteCommand() and Process::Process() and the latter shortens only strings which equals those values? Of course if they're like >=1024 characters, so that's a safe match?

OK if the above shortening is based on string X contains string Y? So -output=$output$ is also shortened, not just literal $output$.

@Al2Klimov Al2Klimov removed the request for review from julianbrost November 9, 2023 13:06
@Al2Klimov Al2Klimov self-assigned this Nov 9, 2023
@Al2Klimov Al2Klimov force-pushed the argument-list-too-long-9340 branch 2 times, most recently from de91a35 to eb05644 Compare November 9, 2023 13:52
@Al2Klimov
Copy link
Member Author

Test protocol II

Config

object Host "lolcat" {
  check_command = "dummy"
var big = "x"
	for (i in range(20)) {
		big = big + big
	}
vars.dummy_state=2
max_check_attempts=1
  vars.dummy_text = big + "Y"
}

object NotificationCommand "lolcat" {
 command = ["bash", "-c", "echo $$0; exit 139", "$output$"]
}

object User "lolcat" {
}

object Notification "lolcat" {
  host_name = "lolcat"
  command = "lolcat"
  users = ["lolcat"]
}

Result

...xxxxxxx
[2023-11-09 15:19:22 +0100] notice/CheckerComponent: Pending checkables: 0;...

👍 The Y in "$output$" of the NotificationCommand was truncated.

@Al2Klimov Al2Klimov removed their assignment Nov 9, 2023
@Al2Klimov Al2Klimov requested review from julianbrost and removed request for lippserd November 9, 2023 14:21
Comment on lines 167 to 171
for (auto strings : {argv, envp}) {
for (auto s (strings); *s; ++s) {
++totalArgs;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the surrounding code would quite benefit from changing argv and envp to std::vector<char*> and it would also avoid needing C-style code like this.

Comment on lines 157 to 161
if (len >= 1024u) { // Better safe than sorry
safeToTruncateStrings.emplace_back(s, len);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment would be more useful if it said what it actually protects from.

lib/base/process.cpp Outdated Show resolved Hide resolved
Comment on lines 219 to 227
// Initialize safeToTruncateArgs
for (auto strings : {argv, envp}) {
for (auto it (strings); *it; ++it) {
auto s (*it);
auto len (strlen(s));

for (auto suffix : safeToTruncateStrings) {
if (suffix.second <= len) {
auto substr (s + len - suffix.second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of auto hides what's actually going on here and whether these operations are actually safe to perform in the forked process. I think this would greatly benefit from explicitly specifying the types.

@Al2Klimov
Copy link
Member Author

Honestly tried to unit test all this. Hits the Boost timeout, not to even mention GHA. a2ea751

@Al2Klimov Al2Klimov self-assigned this Dec 12, 2023
@Al2Klimov Al2Klimov changed the title Process: if exec(3) fails with "Argument list too long", truncate the input PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment Dec 12, 2023
@Al2Klimov Al2Klimov removed their assignment Dec 12, 2023
lib/methods/pluginnotificationtask.cpp Outdated Show resolved Hide resolved
auto output (cr->GetOutput());

if (output.GetLength() > l_MaxOutLen) {
resolvers.emplace_back("service", new Dictionary({{"output", output.SubStr(0, l_MaxOutLen)}}));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's Linux, the macro resolver key service is set twice? Is it safe to rely on the vector element insertion order and expect this element to be found before the actual service object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notification is already set twice, on all platforms.

lib/methods/pluginnotificationtask.cpp Outdated Show resolved Hide resolved

// Make e.g. the $host.output$ itself even 10% shorter to leave enough room
// for e.g. --host-output= as in --host-output=$host.output$
const static auto l_MaxOutLen = MAX_ARG_STRLEN * 9u / 10u;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know in detail about what you agreed with Julian about these particular numbers, otherwise I have nothing to complain about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on these particular numbers. My out of the blue suggestion was 64K which then sounded quite reasonable as MAX_ARG_STRLEN turns out to be 128K on most platforms (PAGE_SIZE is 4K) so that would have been half of that leaving more than enough room for any --output= prefixes or similar. This is just trying to push it closer to the limit.

FYI: you could do MAX_ARG_STRLEN - MAX_ARG_STRLEN / 10u as this doesn't have intermediate values that could overflow if this was raised to something like "basically unlimited" in the future (like MAX_ARG_STRINGS which currently is 0x7FFFFFFF).

Comment on lines 64 to 71
String commandline = Array::Ptr(future.get())->Join(" ");

BOOST_CHECK(commandline.Contains("echo Hx"));
BOOST_CHECK(!commandline.Contains("xh"));
BOOST_CHECK(commandline.Contains("x Sx"));
BOOST_CHECK(!commandline.Contains("xs"));
BOOST_CHECK(commandline.Contains("x Cx"));
BOOST_CHECK(!commandline.Contains("xc"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more straight-forward to check that the specific parts of the command line array are prefixes of the input values?

Also, this does actually execute echo, doesn't it? So wouldn't it make sense to also check check that exit code and output so that the test case also covers the actual execution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not yet solved! You haven't made any changes to the address this suggestion:

So wouldn't it make sense to also check check that exit code and output so that the test case also covers the actual execution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the actual execution via the output is totally enough.

test/methods-pluginnotificationtask.cpp Outdated Show resolved Hide resolved

// Make e.g. the $host.output$ itself even 10% shorter to leave enough room
// for e.g. --host-output= as in --host-output=$host.output$
const static auto l_MaxOutLen = MAX_ARG_STRLEN * 9u / 10u;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on these particular numbers. My out of the blue suggestion was 64K which then sounded quite reasonable as MAX_ARG_STRLEN turns out to be 128K on most platforms (PAGE_SIZE is 4K) so that would have been half of that leaving more than enough room for any --output= prefixes or similar. This is just trying to push it closer to the limit.

FYI: you could do MAX_ARG_STRLEN - MAX_ARG_STRLEN / 10u as this doesn't have intermediate values that could overflow if this was raised to something like "basically unlimited" in the future (like MAX_ARG_STRINGS which currently is 0x7FFFFFFF).

…mment

not to run into an exec(3) error E2BIG due to a too long argument.
This sends a notification with truncated output instead of not sending.
@Al2Klimov Al2Klimov merged commit 96cfc4a into master Dec 19, 2023
25 checks passed
@Al2Klimov Al2Klimov deleted the argument-list-too-long-9340 branch December 19, 2023 13:36
@Al2Klimov Al2Klimov added backported Fix was included in a bugfix release and removed consider backporting Should be considered for inclusion in a bugfix release labels May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/notifications Notification events backported Fix was included in a bugfix release bug Something isn't working cla/signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Notfication scripts exiting with "Argument list too long"
4 participants