Upgrading the controller nodes

Short version of https://etherpad.nue.suse.com/p/cloud-upgrade-6-to-7

What's the state before going into upgrade?

All non-essential services on all nodes are already stopped
Remaining are drbd, keystone, neutron

Select the first node to upgrade

Pick DRBD slave
Let's assume we have 2 nodes, first node to upgrade (= DRBD slave) is node1, remaining one is node2

On node1:

3.1. Migrate all l3 routers off the l3-agent

3.2. Shutdown pacemaker

3.3. Upgrade the packages to Cloud7 and SP2 (zypper dup)

3.4. Reboot

On node2, after node1 is upgraded:
- Stop and delete pacemaker resources, except of drbd and vip:
- neutron-agents: can be deleted as well (they will be started there though)
- Done in ~~crowbar/crowbar-core#716~~
Upgrade related pacemaker location constraint

5.1 Create the pre-upgrade role (technically, it's pacemaker node's attribute) and assign it to all controller nodes that are not upgraded yet (node2)

We could do this probably already during Cloud6 time
Or directly via crm node attribute ...
Done incorrectly in ~~crowbar/crowbar-core#702~~
Fixed in crowbar/crowbar-core#749 (pending)

5.2 Create a location constraint that does not allow starting service on the node that is has the pre-upgrade role

Done in ~~crowbar/crowbar-openstack#562~~

5.3. Do not put this location constraint to neutron-agents, ms-drbd-postgresql-controller and ms-drbd-rabbitmq-controller (see postgresql/recipes/ha_storage.rb and rabbitmq/recipes/ha.rb)

Done in ~~crowbar/crowbar-openstack#567~~

5.4. TODO Figure out how to handle neutron-agents correctly

5.4.1. We need to allow starting `neutron-agents` on **node2** so they have access to routers that are present there.

5.4.2. We need to migrate routers again from non-upgraded node to **node1** once all services are running with new configuration at **node1**

This could be achieved by not adding any constraint to `neutron-agents` resource. First chef-client on **node1** would start `nutron-agents` on both nodes. But:

5.4.3. Once we upgrade **node2**, we can't allow starting neutron-agents there, before the configuration is updated, i.e. before the chef-client run on **node2** is finished. So it looks like for this time, we need a constraint that allows `neutron-agents` to be running at upgraded nodes only. **FIXME** this paragraph seems to be obsolete, see Step 13 bellow.

Remove "pre-upgrade" attribute from node1

So the location constraint does not apply for upgraded node
When could we do it and from where? Probably from the node2, which still has pacemaker running
- Done in ~~crowbar/crowbar-core#725~~

Cluster founder settings

7.1. Explicitly mark node1 as the cluster founder * Also remove the founder attribute from node2 if it is there * This is needed because pacemaker starts the services on the founder nodes * Done in ~~crowbar/crowbar-ha#152~~ and ~~crowbar/crowbar-core#725~~

7.2. Set node['drbd']['rsc']['postgresql']['master'] to true for node1 AND false for node2, otherwise drbd recipe will notice inconsistency and complain.

* Done in the same PR's as cluster founder change

DRBD upgrade

recreate metadata of drbd at node1 using "drbdadm create-md all", you can use "-- --force" to skipping input "yes"
Done in ~~crowbar/crowbar-core#725~~
This has to be done explicitely from script. We cannot do it from chef resource, because DRBD service is restarted right after each resource is upgraded (see https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/drbd/providers/resource.rb#L66) but that keeps it in inconsistent state when postgresql resource metadata are up-to-date, while it is still old for rabbitmq)

On node1, start pacemaker

so pacemaker starts DRBD synchronizes it with node2.
FIXME currently this only works after second run of create-md see https://bugzilla.suse.com/show_bug.cgi?id=1006105
Wait until DRBD is correctly synchronized (done as part of ~~crowbar/crowbar-core#725~~, with the workaround for the bug above)

On node1, start crowbar-join that runs chef-client and moves the node to ready state

waiting for sync marks is skipped (see ~~crowbar/crowbar-ha#146~~)
when creating new pacemaker resources the services are started on upgraded nodes only (see point 5 how to achieve that)

Manually promote DRBD on node1 to master

This should not be needed. Once we cleanly shutdown node2, promotion should happen automatically.

Wait until node1 is ready. Proceed with node2
Execute pre-upgrade script at node2 so

neutron routers are migrated off this node

FIXME How should we do it? See https://etherpad.nue.suse.com/p/cloud-upgrade-6-to-7 lines 284-287
pacemaker is stopped

Upgrade and reboot node2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step 12.md

Step 12.md

Upgrading the controller nodes

Files

Step 12.md

Latest commit

History

Step 12.md

File metadata and controls

Upgrading the controller nodes