PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment #9887

Al2Klimov · 2023-10-25T11:18:39Z

not to run into an exec(3) error E2BIG due to a too long argument. This sends a notification with truncated output instead of not sending.

fixes #9340

Reproduction before this PR

[2023-10-25 12:14:34 +0200] warning/PluginCheckTask: Check command for object 'e2big' (PID: 90872, arguments: 'true') terminated with exit code 128, output: execvpe(true) failed: Argument list too long

julianbrost · 2023-10-25T12:57:14Z

I strongly disagree with the basic idea of this PR: if a function is supposed to execute a specific command, it may not simply truncate the command and hope for the best. If you alter the command, you don't know if it still performs the same or at least a desirable action. For example, you don't know that you're not truncating a list of e-mail addresses, cut off a domain name somewhere in the middle and then send a notification to a wrong address. If automatic truncation should happen, it would have to know where it's safe to do so.

So for me, truncating specific values in the command definition as suggested by #9340 (comment) sound like a way safer and better approach in the short term. In the longer term, we could discuss if it even makes sense allowing arbitrarily large check outputs or if it's a broken check that's just sending garbage anyways if it exceeds some limit.

Al2Klimov · 2023-10-27T17:15:12Z

@julianbrost Do you want to touch every single command definition? (As a user.) Before or after you realise that you're loosing notifications?

@lippserd Please share your opinion on the concept (as a review).

julianbrost · 2023-11-02T13:53:17Z

Do you want to touch every single command definition? (As a user.)

If that and this PR are my only options, yes. If the configuration results in a command that's too long to execute, it's safer to return an error instead of arbitrarily removing parts of the command until it's short enough to execute.

There are other options we could do though. After all, your PR probably tries to truncate the plugin output but that code location no longer has the information what part was the plugin output, so you make a guess. For a safe option, we have to know what we're truncating. In order to achieve this for existing configurations without user intervention, we'd have to ensure that the $output$ length is limited to a sane value, so one option could be to say that $output$ is always truncated to 64K and if you really want it, there's some $output_full$ . Or we could place a general limit on the output size, which might also be a good idea in terms of memory usage and database growth. But all of this sound more like something to be discussed for 2.15 instead of 2.14.1.

slalomsk8er · 2023-11-02T15:26:35Z

This reminds me of Linuxfabrik/monitoring-plugins#559

Al2Klimov · 2023-11-06T17:31:12Z

I understand and respect your position. I just don't agree with it.

First of all, almost everything can handle long outputs. Only exec(3) can't. So IMAO just it should shorten stuff as it's objectively necessary.

Second, why 64K? Let me derive a formula from your unexpected shortening is bad opinion: the more you shorten unexpectedly, the worse. So I'd prefer to shorten only as much as the OS requires me to do. Admittedly the current algorithm is a bit radical while dividing strings by 2 every time. Maybe I can change this to, say, 1.5 if you wish. After all this is the slow path(TM) where such delay is acceptable.

Also, what about this alternative: the Process takes a bool whether to shorten which isn't set for checks. So you notice when your checks fail due to long strings. Also, I mean... we have to shorten notification cmd lines directly or indirectly. So, given this alternative, does it really matter where in the code that happens?

julianbrost · 2023-11-07T08:17:11Z

does it really matter where in the code that happens?

Yes, you're changing a general "execute this command" function that does not know what exactly it's shortening. It should know that it's shortening the output and only the output (and maybe other fields that were explicitly marked as safe to be shortened). Otherwise, you don't know if your shortening operation is safe to perform.

Al2Klimov · 2023-11-07T10:50:08Z

It should know that it's shortening the output and only the output (and maybe other fields that were explicitly marked as safe to be shortened).

OK if PluginNotificationTask::ScriptFunc() fetches host.output, service.output and notification.comment values, passes them down to PluginUtility::ExecuteCommand() and Process::Process() and the latter shortens only strings which equals those values? Of course if they're like >=1024 characters, so that's a safe match?

OK if the above shortening is based on string X contains string Y? So -output=$output$ is also shortened, not just literal $output$.

Al2Klimov · 2023-11-09T14:21:29Z

Test protocol II

Config

object Host "lolcat" {
  check_command = "dummy"
var big = "x"
	for (i in range(20)) {
		big = big + big
	}
vars.dummy_state=2
max_check_attempts=1
  vars.dummy_text = big + "Y"
}

object NotificationCommand "lolcat" {
 command = ["bash", "-c", "echo $$0; exit 139", "$output$"]
}

object User "lolcat" {
}

object Notification "lolcat" {
  host_name = "lolcat"
  command = "lolcat"
  users = ["lolcat"]
}

Result

...xxxxxxx
[2023-11-09 15:19:22 +0100] notice/CheckerComponent: Pending checkables: 0;...

👍 The Y in "$output$" of the NotificationCommand was truncated.

julianbrost · 2023-11-21T10:47:29Z

lib/base/process.cpp

+		for (auto strings : {argv, envp}) {
+			for (auto s (strings); *s; ++s) {
+				++totalArgs;
+			}
+		}


I think the surrounding code would quite benefit from changing argv and envp to std::vector<char*> and it would also avoid needing C-style code like this.

julianbrost · 2023-11-21T10:47:57Z

lib/base/process.cpp

+				if (len >= 1024u) { // Better safe than sorry
+					safeToTruncateStrings.emplace_back(s, len);
+				}


That comment would be more useful if it said what it actually protects from.

lib/base/process.cpp

julianbrost · 2023-11-21T11:06:44Z

lib/base/process.cpp

+						// Initialize safeToTruncateArgs
+						for (auto strings : {argv, envp}) {
+							for (auto it (strings); *it; ++it) {
+								auto s (*it);
+								auto len (strlen(s));
+
+								for (auto suffix : safeToTruncateStrings) {
+									if (suffix.second <= len) {
+										auto substr (s + len - suffix.second);


The use of auto hides what's actually going on here and whether these operations are actually safe to perform in the forked process. I think this would greatly benefit from explicitly specifying the types.

Al2Klimov · 2023-11-22T16:03:09Z

Honestly tried to unit test all this. Hits the Boost timeout, not to even mention GHA. a2ea751

lib/methods/pluginnotificationtask.cpp

yhabteab · 2023-12-13T09:11:41Z

lib/methods/pluginnotificationtask.cpp

+			auto output (cr->GetOutput());
+
+			if (output.GetLength() > l_MaxOutLen) {
+				resolvers.emplace_back("service", new Dictionary({{"output", output.SubStr(0, l_MaxOutLen)}}));


If it's Linux, the macro resolver key service is set twice? Is it safe to rely on the vector element insertion order and expect this element to be found before the actual service object?

notification is already set twice, on all platforms.

lib/methods/pluginnotificationtask.cpp

test/methods-pluginnotificationtask.cpp

yhabteab · 2023-12-13T14:51:09Z

lib/methods/pluginnotificationtask.cpp

+
+// Make e.g. the $host.output$ itself even 10% shorter to leave enough room
+// for e.g. --host-output= as in --host-output=$host.output$
+const static auto l_MaxOutLen = MAX_ARG_STRLEN * 9u / 10u;


I don't know in detail about what you agreed with Julian about these particular numbers, otherwise I have nothing to complain about.

Not on these particular numbers. My out of the blue suggestion was 64K which then sounded quite reasonable as MAX_ARG_STRLEN turns out to be 128K on most platforms (PAGE_SIZE is 4K) so that would have been half of that leaving more than enough room for any --output= prefixes or similar. This is just trying to push it closer to the limit.

FYI: you could do MAX_ARG_STRLEN - MAX_ARG_STRLEN / 10u as this doesn't have intermediate values that could overflow if this was raised to something like "basically unlimited" in the future (like MAX_ARG_STRINGS which currently is 0x7FFFFFFF).

julianbrost · 2023-12-18T10:05:15Z

test/methods-pluginnotificationtask.cpp

+	String commandline = Array::Ptr(future.get())->Join(" ");
+
+	BOOST_CHECK(commandline.Contains("echo Hx"));
+	BOOST_CHECK(!commandline.Contains("xh"));
+	BOOST_CHECK(commandline.Contains("x Sx"));
+	BOOST_CHECK(!commandline.Contains("xs"));
+	BOOST_CHECK(commandline.Contains("x Cx"));
+	BOOST_CHECK(!commandline.Contains("xc"));


Wouldn't it be more straight-forward to check that the specific parts of the command line array are prefixes of the input values?

Also, this does actually execute echo, doesn't it? So wouldn't it make sense to also check check that exit code and output so that the test case also covers the actual execution?

This is not yet solved! You haven't made any changes to the address this suggestion:

So wouldn't it make sense to also check check that exit code and output so that the test case also covers the actual execution?

Checking the actual execution via the output is totally enough.

test/methods-pluginnotificationtask.cpp

julianbrost · 2023-12-18T10:24:51Z

lib/methods/pluginnotificationtask.cpp

+
+// Make e.g. the $host.output$ itself even 10% shorter to leave enough room
+// for e.g. --host-output= as in --host-output=$host.output$
+const static auto l_MaxOutLen = MAX_ARG_STRLEN * 9u / 10u;


Not on these particular numbers. My out of the blue suggestion was 64K which then sounded quite reasonable as MAX_ARG_STRLEN turns out to be 128K on most platforms (PAGE_SIZE is 4K) so that would have been half of that leaving more than enough room for any --output= prefixes or similar. This is just trying to push it closer to the limit.

FYI: you could do MAX_ARG_STRLEN - MAX_ARG_STRLEN / 10u as this doesn't have intermediate values that could overflow if this was raised to something like "basically unlimited" in the future (like MAX_ARG_STRINGS which currently is 0x7FFFFFFF).

…mment not to run into an exec(3) error E2BIG due to a too long argument. This sends a notification with truncated output instead of not sending.

Al2Klimov added the consider backporting Should be considered for inclusion in a bugfix release label Oct 25, 2023

Al2Klimov added this to the 2.15.0 milestone Oct 25, 2023

Al2Klimov requested a review from julianbrost October 25, 2023 11:18

cla-bot bot added the cla/signed label Oct 25, 2023

icinga-probot bot added area/notifications Notification events bug Something isn't working labels Oct 25, 2023

julianbrost removed their request for review October 25, 2023 12:57

Al2Klimov requested a review from lippserd October 25, 2023 13:00

Al2Klimov requested a review from julianbrost November 6, 2023 17:31

julianbrost removed their request for review November 7, 2023 08:17

Al2Klimov requested a review from julianbrost November 7, 2023 10:50

Al2Klimov force-pushed the argument-list-too-long-9340 branch from e63008f to 91cd8e1 Compare November 9, 2023 13:06

Al2Klimov removed the request for review from julianbrost November 9, 2023 13:06

Al2Klimov self-assigned this Nov 9, 2023

Al2Klimov force-pushed the argument-list-too-long-9340 branch 2 times, most recently from de91a35 to eb05644 Compare November 9, 2023 13:52

Al2Klimov removed their assignment Nov 9, 2023

Al2Klimov requested review from julianbrost and removed request for lippserd November 9, 2023 14:21

julianbrost requested changes Nov 21, 2023

View reviewed changes

Al2Klimov force-pushed the argument-list-too-long-9340 branch from eb05644 to 7119051 Compare November 22, 2023 15:59

Al2Klimov requested a review from yhabteab December 11, 2023 16:47

Al2Klimov self-assigned this Dec 12, 2023

Al2Klimov removed request for julianbrost and yhabteab December 12, 2023 10:44

Al2Klimov force-pushed the argument-list-too-long-9340 branch from 117f4ec to 33ff948 Compare December 12, 2023 14:52

Al2Klimov changed the title ~~Process: if exec(3) fails with "Argument list too long", truncate the input~~ PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment Dec 12, 2023

Al2Klimov removed their assignment Dec 12, 2023

Al2Klimov requested review from julianbrost and yhabteab December 12, 2023 14:53

yhabteab reviewed Dec 13, 2023

View reviewed changes

Al2Klimov force-pushed the argument-list-too-long-9340 branch from 33ff948 to c589d49 Compare December 13, 2023 09:55

Al2Klimov requested a review from yhabteab December 13, 2023 09:55

yhabteab reviewed Dec 13, 2023

View reviewed changes

test/methods-pluginnotificationtask.cpp Outdated Show resolved Hide resolved

Al2Klimov requested a review from yhabteab December 13, 2023 14:12

yhabteab reviewed Dec 13, 2023

View reviewed changes

julianbrost requested changes Dec 18, 2023

View reviewed changes

Al2Klimov force-pushed the argument-list-too-long-9340 branch from c589d49 to 7fa4035 Compare December 18, 2023 11:26

Al2Klimov requested review from julianbrost and yhabteab December 18, 2023 11:27

PluginNotificationTask::ScriptFunc(): on Linux truncate output and co…

175153c

…mment not to run into an exec(3) error E2BIG due to a too long argument. This sends a notification with truncated output instead of not sending.

Al2Klimov force-pushed the argument-list-too-long-9340 branch from 7fa4035 to 175153c Compare December 19, 2023 11:21

julianbrost approved these changes Dec 19, 2023

View reviewed changes

yhabteab approved these changes Dec 19, 2023

View reviewed changes

Al2Klimov merged commit 96cfc4a into master Dec 19, 2023
25 checks passed

Al2Klimov deleted the argument-list-too-long-9340 branch December 19, 2023 13:36

Al2Klimov mentioned this pull request Dec 20, 2023

Truncate too big notification command lines, fix GelfWriter deadlock and return 503 in /v1/console/* during reload #9947

Merged

julianbrost mentioned this pull request Jan 3, 2024

icinga2 may be removed from Debian armel, mips64el, ppc64el, and riscv64 #9954

Open

Al2Klimov added backported Fix was included in a bugfix release and removed consider backporting Should be considered for inclusion in a bugfix release labels May 14, 2024

oxzi mentioned this pull request Aug 15, 2024

When using a very large argument key icingadb will see a fatal error and quit Icinga/icingadb#791

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment #9887

PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment #9887

Al2Klimov commented Oct 25, 2023 •

edited

Loading

julianbrost commented Oct 25, 2023

Al2Klimov commented Oct 27, 2023

julianbrost commented Nov 2, 2023

slalomsk8er commented Nov 2, 2023

Al2Klimov commented Nov 6, 2023

julianbrost commented Nov 7, 2023

Al2Klimov commented Nov 7, 2023

Al2Klimov commented Nov 9, 2023

julianbrost Nov 21, 2023

julianbrost Nov 21, 2023

julianbrost Nov 21, 2023

Al2Klimov commented Nov 22, 2023

yhabteab Dec 13, 2023

Al2Klimov Dec 13, 2023

yhabteab Dec 13, 2023

julianbrost Dec 18, 2023

julianbrost Dec 18, 2023

yhabteab Dec 18, 2023

Al2Klimov Dec 18, 2023

julianbrost Dec 18, 2023

PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment #9887

PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment #9887

Conversation

Al2Klimov commented Oct 25, 2023 • edited Loading

Reproduction before this PR

julianbrost commented Oct 25, 2023

Al2Klimov commented Oct 27, 2023

julianbrost commented Nov 2, 2023

slalomsk8er commented Nov 2, 2023

Al2Klimov commented Nov 6, 2023

julianbrost commented Nov 7, 2023

Al2Klimov commented Nov 7, 2023

Al2Klimov commented Nov 9, 2023

Test protocol II

Config

Result

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Al2Klimov commented Nov 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Al2Klimov commented Oct 25, 2023 •

edited

Loading