Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant restarts #278

Closed
widhalmt opened this issue Sep 28, 2023 · 1 comment · Fixed by #279
Closed

Redundant restarts #278

widhalmt opened this issue Sep 28, 2023 · 1 comment · Fixed by #279
Assignees
Labels
bug Something isn't working feature New feature or request

Comments

@widhalmt
Copy link
Member

We have handlers restarting services and thus creating lag that can lead to timeouts in our checks.

I guess, we can reduce the count of restarts, especially for Elasticsearch and speed up the checks dramatically.

One issue I found so far is that in #137 we agreed on not restarting the cluster after the change, but we still notify the handler.

I'll look into the roles and try to remove any redundant restart.

@widhalmt widhalmt added bug Something isn't working feature New feature or request labels Sep 28, 2023
@widhalmt widhalmt self-assigned this Sep 28, 2023
@widhalmt
Copy link
Member Author

This might also be connected to #252 .

widhalmt added a commit that referenced this issue Sep 28, 2023
We don't need to restart Elasticseach after this task. Everything is set
in a similar task earlier. This one is only to change the start
bevaviour to a safer one (not reinitializing the cluster). The change is
only needed during restarts, so whenever Elasticsearch is restarted, the
new version will be used.

fixes #278
github-merge-queue bot pushed a commit that referenced this issue Oct 16, 2023
Restarting Elasticsearch takes quite a while and may lead to connection
issues as well as sync issues. So keeping restarts to a minimum is
important. These changes will make sure that, even when the `Restart
Elasticsearch` handler is notified, it will only restart if
Elasticsearch was running before. If there's a fresh start (after
reconfiguration) we don't need to restart again.

Same goes for Logstash and Kibana. Some restarts of these tools happen
fairly fast. But others (like after fresh installs or updates) will
trigger internal jobs that should not be intercepted by another restart.

Beats restart very fast and as far as I know there's not a big downside
to restarting them right after the first start so I didn't include them
in the change.

Additionally, this PR will make sure some tasks in `verify.yml` of the
full stack are only run when the service to be checked is actually
running on this node. This helps with spreading services over nodes to
save ressources.

Since GitHub hosted runners are quite low on ressources we can't run
every service on every node in a cluster setup anymore. So this PR will
make sure that only Elasticsearch runs everywhere and the others are
spread out.

Caches get cleared after every role in during a Molecule test. This
helps with saving ressources, too.

Elasticsearch still won't sync all shards due to full volumes, the
watermarks for Elasticseach are set to extremely high volumes so that
the cluster can at least get into sync.

fixes #278
fixes #141 
fixes #194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant