Update the rolling upgrade procedure, and add automation to it. #1463

arooshap · 2024-04-15T13:09:52Z

Our main focus would be to make it descriptive enough to not cause any major outages after the migration. Also, add some additional scripts to make it more automated i.e. deployments based on namespaces, automatic configuration of fluetnd, e.t.c.

I am adding all the points that we need to focus on/include in the documentation, so that we don't miss anything.

Add more endpoint checks for the services. Some new ones that I have discovered are for das-server, dbs, and rucio monitor.
Include about nginx settings in the rolling upgrade document.
Create a separate directory for storing secrets for individual cluster. The .pem files can be encrypted (the procedure that was already being followed for DBS cluster).
Improve the procedure for stress testing the cluster.
Remove IT services that are not being used. One particular example is the fluentd service that was causing major issues with the nodes.

I will add more points to this.

arooshap self-assigned this Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the rolling upgrade procedure, and add automation to it. #1463

Update the rolling upgrade procedure, and add automation to it. #1463

arooshap commented Apr 15, 2024 •

edited

Loading

Update the rolling upgrade procedure, and add automation to it. #1463

Update the rolling upgrade procedure, and add automation to it. #1463

Comments

arooshap commented Apr 15, 2024 • edited Loading

arooshap commented Apr 15, 2024 •

edited

Loading