Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track effect of an object on dependent children #10158

Open
nilmerg opened this issue Sep 17, 2024 · 12 comments · May be fixed by #10290
Open

Track effect of an object on dependent children #10158

nilmerg opened this issue Sep 17, 2024 · 12 comments · May be fixed by #10290
Assignees
Labels
enhancement New feature or request

Comments

@nilmerg
Copy link
Member

nilmerg commented Sep 17, 2024

Is your feature request related to a problem? Please describe.

In Icinga DB Web we'd like to show an indicator in lists showing the number of potentially affected children of a particular host/service. All affected children, i.e. also grandchildren.

Describe the solution you'd like

Icinga must calculate this for each parent during startup. Replace startup with whatever you like, my expectation is just that Icinga does not need to calculate this on every state change.

Though, what I'd like Icinga to calculate on every state change, is whether a parent is responsible for any directly dependent child. (Causing it to be unreachable)

The result should be that in the database the number is available in e.g. host.affected_children (uint) and the responsibility flag in host_state.affects_children (bool enum).

@julianbrost
Copy link
Contributor

Though, what I'd like Icinga to calculate on every state change, is whether a parent is responsible for any now unreachable child, wherever in the hierarchy. (Somewhat similar to #10143)

The result should be that in the database the number is available in [...] host_state.affects_children (bool enum).

I'm not 100% sure what this is asking for. Is this supposed to say whether any of the potentially affected children is actually in a problem state?

@raviks789
Copy link

raviks789 commented Sep 23, 2024

This column says that there may be at least one child that would be in problem state if there is a problem with the parent.

@julianbrost
Copy link
Contributor

I still don't get it. A configured dependency does not imply that the child must be in a problem state if the parent is in a problem state. It just says that if both are failed, there's a good chance that one caused the other. Can you provide an example of a dependency structure and how you'd expect that bool to be set?

@yhabteab
Copy link
Member

yhabteab commented Sep 23, 2024

The result should be that in the database the number is available in e.g. host.affected_children (uint) and host_state.affects_children (bool enum).

I actually do understand that, as the host.affected_children columns just show the number of dependent children on that host, and host_state.affects_children is just the boolean representation of that expression host.affected_children != 0. A host problem state may not directly affect its children when it is part of a redundancy group, and in this case host.affected_children would simply be 0 and host_state.affects_children would be set to false accordingly.

@raviks789
Copy link

raviks789 commented Sep 23, 2024

This just a simple example.

Suppose we have a parent say Service-A(I will assume the parent is a service here) with two children (Child-1, Child-2) and its dependency is configured to fail if Service-A is not in OK state. And one of the children also belongs to another dependency with parent say Service-B with one child Child-2 which is configured to fail if the Service-B is not in OK or Warning state.

Now, if Service-B is in OK or Warning state and Service-A is OK then both Child-1 and Child-2 are reachable and service_state.affects_children is false for both the parents. But now if Service-A is not in OK state then both the children are unreachable and service_state.affects_children is true for Service-A, but is false for Service-B. But if Service-A is in OK state and Service-B is neither in OK or Warning state then only Child-2 is unreachable and service_state.affects_children is true. But if Service-A is not in OK state and Service-B is in neither OK or Warning state then service_state.affects_children is set to true for both and both the children are unreachable again.

Example evaluation of affects_children

Service-A Service-B Service-A.affects_children Service-B.affects_children Child-A Child-B
OK Warning false false reachable reachable
Warning OK true false ureachable ureachable

@julianbrost
Copy link
Contributor

Now, if Service-B is in OK or Warning state and Service-A is OK then both Child-1 and Child-2 are unreachable

I guess that should say "reachable" instead?

You never mentioned any state of the children. Does this imply that affects_children does not have to take this into account?

Let me have another shot at trying to rephrase this so that we can see if we think of the same now: x.affects_children says whether there exists any child that has a path of failed dependencies to x. However, I have a hard time describing what that would tell in the end, like "fixing this checkable will make other checkables reachable again" (not necessarily, there could be a second failed dependency) or "fixing this checkable is required to make other checkables reachable" (not necessarily, if a redundancy group is involved, fixing another checkable could make the children reachable as well).

Icinga/icingadb-web#1058 doesn't really help me understand either what information this bool should convey to the user in the end.

@nilmerg
Copy link
Member Author

nilmerg commented Sep 23, 2024

Forget child states. (Don't confuse this with reachability!) They're not relevant at all when talking about reachability.

affects_children has one single purpose: Show the total number (affected_children) only if the parent is actually responsible (i.e. one of it's direct related dependencies result in the respective child to be unreachable)

stateDiagram-v2
    state if_state <<choice>>
    [*] --> affects_children
    affects_children --> if_state
    if_state --> HideTotal: if n
    if_state --> ShowTotal: if y
Loading

I don't know what is difficult to understand here. If you need further explanations, we should discuss this in person tomorrow.

@julianbrost
Copy link
Contributor

Thank you very much for this extraordinarily helpful flowchart. Unfortunately, it doesn't answer the question why you want to hide the number sometimes. Like there must be something different what makes the boolean different from just checking the number for zero. Like is it supposed to be false if all children are still reachable due to another OK parent in a redundancy group? Is it supposed to be false if the parent is in a WARN state and all dependencies with it as a parent have states = [ OK, Warning ] set?

Forget child states. (Don't confuse this with reachability!) They're not relevant at all when talking about reachability.

So I guess that's a yes for the question I asked:

You never mentioned any state of the children. Does this imply that affects_children does not have to take this into account?

Does Yonas' comment describe what you're asking for? Otherwise, I have the feeling that the specification might be a bit unclear if multiple persons fail to understand it.

Oh and your comment just added a new reason for confusion:

is responsible for any now unreachable child, wherever in the hierarchy

i.e. one of it's direct related dependencies result in the respective child to be unreachable

"wherever in the hierarchy" and "direct related dependencies" don't really fit together.

@nilmerg
Copy link
Member Author

nilmerg commented Sep 23, 2024

Thank you very much for this extraordinarily helpful flowchart.

Sorry, my impression was that I communicate with Icinga professionals.

--

I don't think I need to outline the exact behavior of Icinga dependencies here. Ravi included one of them in his example, for clarification. Redundancy groups are of course another one, so are time periods. (surprise!)

The bool affects_children just indicates that a dependency, where the respective host/service is the parent, decides that its child is unreachable.

Can we stop nitpicking now, please?

@nilmerg
Copy link
Member Author

nilmerg commented Oct 17, 2024

We just noticed that it might be necessary to have the count of affected children also for redundancy groups. But I need to discuss this with Florian first, this is just a note (to myself) to not forget about this.

@yhabteab
Copy link
Member

After reading this countless times and with the extra information from #10190 (comment), I think I get it now.

In Icinga DB Web we'd like to show an indicator in lists showing the number of potentially affected children of a particular host/service. All affected children, i.e. also grandchildren.

That should be easy and we don't have to do much on the Icinga 2 side. It only needs a one-line patch in the Icinga DB component code.

attributes->Set("affected_children", checkable->GetAllChildren().size());

Though, what I'd like Icinga to calculate on every state change, is whether a parent is responsible for any now unreachable child, wherever in the hierarchy. (Somewhat similar to #10143)

If I have understood this correctly, that should also be easy to implement. Icinga 2 already tracks the child dependencies as well, one just needs to implement a new method that evaluates the necessary conditions. And no, it's not similar to #10143, we only need to check the direct child dependencies and not all child ones recursively. So, something like this should do the job for this!

bool Checkable::AffectsChildren() const
{
	auto cr(GetLastCheckResult());
	if (!cr || IsStateOK(cr->GetState()) || !IsReachable()) {
		// If there is no check result, the state is OK, or the Checkable is not reachable, we can't
		// safely determine whether the Checkable affects its child dependencies.
		return false;
	}

	for (auto& dep: GetReverseDependencies()) {
		if (!dep->IsAvailable()) {
			// If one of the child dependency is not available, then it's definitely due to the
			// current Checkable state, so we don't need to verify the remaining ones.
			return true;
		}
	}

	return false;
}

@nilmerg
Copy link
Member Author

nilmerg commented Dec 4, 2024

"wherever in the hierarchy" and "direct related dependencies" don't really fit together.

Corrected the opening post.

And no, it's not similar to #10143, we only need to check the direct child dependencies and not all child ones recursively.

Indeed. Sorry for the confusion.

@yhabteab yhabteab linked a pull request Jan 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants