Skip to content

Latest commit

 

History

History
108 lines (63 loc) · 5.27 KB

Step 12.md

File metadata and controls

108 lines (63 loc) · 5.27 KB

Upgrading the controller nodes

Short version of https://etherpad.nue.suse.com/p/cloud-upgrade-6-to-7

  1. What's the state before going into upgrade?
  • All non-essential services on all nodes are already stopped
  • Remaining are drbd, keystone, neutron
  1. Select the first node to upgrade
  • Pick DRBD slave
  • Let's assume we have 2 nodes, first node to upgrade (= DRBD slave) is node1, remaining one is node2
  1. On node1:

3.1. Migrate all l3 routers off the l3-agent

3.2. Shutdown pacemaker

3.3. Upgrade the packages to Cloud7 and SP2 (zypper dup)

3.4. Reboot

  1. On node2, after node1 is upgraded:

    • Stop and delete pacemaker resources, except of drbd and vip:

    • neutron-agents: can be deleted as well (they will be started there though)

    • Done in crowbar/crowbar-core#716

  2. Upgrade related pacemaker location constraint

5.1 Create the pre-upgrade role (technically, it's pacemaker node's attribute) and assign it to all controller nodes that are not upgraded yet (node2)

5.2 Create a location constraint that does not allow starting service on the node that is has the pre-upgrade role

5.3. Do not put this location constraint to neutron-agents, ms-drbd-postgresql-controller and ms-drbd-rabbitmq-controller (see postgresql/recipes/ha_storage.rb and rabbitmq/recipes/ha.rb)

5.4. TODO Figure out how to handle neutron-agents correctly

5.4.1. We need to allow starting `neutron-agents` on **node2** so they have access to routers that are present there.

5.4.2. We need to migrate routers again from non-upgraded node to **node1** once all services are running with new configuration at **node1**

This could be achieved by not adding any constraint to `neutron-agents` resource. First chef-client on **node1** would start `nutron-agents` on both nodes. But:

5.4.3. Once we upgrade **node2**, we can't allow starting neutron-agents there, before the configuration is updated, i.e. before the chef-client run on **node2** is finished. So it looks like for this time, we need a constraint that allows `neutron-agents` to be running at upgraded nodes only. **FIXME** this paragraph seems to be obsolete, see Step 13 bellow.
  1. Remove "pre-upgrade" attribute from node1
  • So the location constraint does not apply for upgraded node

  • When could we do it and from where? Probably from the node2, which still has pacemaker running

  1. Cluster founder settings

7.1. Explicitly mark node1 as the cluster founder * Also remove the founder attribute from node2 if it is there * This is needed because pacemaker starts the services on the founder nodes * Done in crowbar/crowbar-ha#152 and crowbar/crowbar-core#725

7.2. Set node['drbd']['rsc']['postgresql']['master'] to true for node1 AND false for node2, otherwise drbd recipe will notice inconsistency and complain.

* Done in the same PR's as cluster founder change
  1. DRBD upgrade
  1. On node1, start pacemaker
  1. On node1, start crowbar-join that runs chef-client and moves the node to ready state
  • waiting for sync marks is skipped (see crowbar/crowbar-ha#146)
  • when creating new pacemaker resources the services are started on upgraded nodes only (see point 5 how to achieve that)
  1. Manually promote DRBD on node1 to master
  • This should not be needed. Once we cleanly shutdown node2, promotion should happen automatically.
  1. Wait until node1 is ready. Proceed with node2

  2. Execute pre-upgrade script at node2 so

  1. Upgrade and reboot node2