Making-of: docker-based MongoDB replica-set migration from single instance

https://github.com/xverges/mongo-replica-set

Initial situation

Docker-based. Using 3.4.4. We would be happy to move to something with a later end of life (3.6 or 4.0 as today).
Code is using PyMongo 3.5.1 (compatibility char)
Starting point: two disconnected single nodes, with data that needs to be consolidated and migrated into a replica set. The mongodb instances have root credentials setup with MONGO_INITDB_ROOT_USERNAME and additional accounts created with a script in docker-entrypoint-initdb.d.
Simplest target configuration: primary, secondary and arbitrer

Vagrant environment to test

Learned that Vagrant now has a docker provisioner and that there is a docker compose provisioner plugin.
So... vagrant plugin install vagrant-docker-compose
Created first, second and arbitrer

Plan

Taking advantage of docker-entrypoint.sh to run mongo commands is not valid approach: it only works on empty instances
Steps
- backup: mongodump first and second
- verify that the backups work
- consolidate: mongorestore second's dump into first (alternatively, could mongoexport and mongoimport)
- stop second. Delete its data folder (Would not do that if I could have file system snapshot backups)
- start second
- run

A Tangent

I was looking into some simple way to setup/activate a virtualenv associated to the project, and went jumping from cool project to cool project: from autoenv to direnv and finally betting on Pipenv: Python Development Workflow for Humans. Thus, the project dependencies are tracked in Pipfile.

Doing. First (half) successful attempt.

Get our python dependencies and the environment variables defined in .env.

$ pipnenv shell
(mongo-replica-set-qvtM3FSm)$

Start the first and second vagrant boxes and their docker-compose (vagrant destroy + vagrant up). Requires installing docker inside the guests, getting the mongodb docker image... takes its time (15 minute-ish on my home network).

(mongo-replica-set-qvtM3FSm)$ ./scripts/01-start-standalone.sh

^{01-start-standalone.sh}

Reset the boxes. Trial an error required getting to this step very often.

Stop the docker containers.
Delete the mongodb data folders
Recreate the docker containers (by re-provisioning the vagrant box again). The mongodb accounts specified in a script in /docker-entrypoint-initdb.d are created. These scripts won't be executed once the data folder is not empty.
Give the xv_mongo_rw account privileges to access collections in the local database.

(mongo-replica-set-qvtM3FSm)$ ./scripts/reset-standalone.sh all

^{reset-standalone.sh}

Create dbs, collections and documents in the first and second instances

(mongo-replica-set-qvtM3FSm)$ ./scripts/02-feed-standalone.py
(mongo-replica-set-qvtM3FSm)$ ./scripts/read-standalone.py

^{02-feed-standalone.py} ^{read-standalone.sh all}

Consolidate all the info on the first.

mongodump + mongorestore to have everything in the first
delete the data folder of second to remove its contents

(mongo-replica-set-qvtM3FSm)$ ./scripts/backup.sh
(mongo-replica-set-qvtM3FSm)$ ./scripts/restore.sh second 192.168.100.10
(mongo-replica-set-qvtM3FSm)$ ./scripts/reset-standalone.sh second
(mongo-replica-set-qvtM3FSm)$ ./scripts/read-standalone.py

^backup.sh ^restore.sh ^{reset-standalone.sh} ^{read-standalone.sh}

Restart the instances, now with the --replSet param set and allowing to have other hosts that localhost to connect to mongodb. I allowed everything with param --bind_ip 0.0.0.0 .

Note that, when I specified the --replSet param with an empty data directory, the scripts in /docker-entrypoint-initdb.d were not executed.

(mongo-replica-set-qvtM3FSm)$ ./scripts/03-stop-standalone.sh
(mongo-replica-set-qvtM3FSm)$ ./scripts/04-start-with-repl-param.sh

^{03-stop-standalone.sh} ^{04-start-with-repl-param.sh}

The instances are not operative now. If we try to read them...

(mongo-replica-set-qvtM3FSm)$ ./scripts/read-standalone.py
(mongo-replica-set-qvtM3FSm) bash-3.2$ ./scripts/read-standalone.py
----FIRST----
node is not in primary or recovering state
----SECOND----
{'local': {'not_replicated': ()}}

...and the reported error is on track:

(mongo-replica-set-qvtM3FSm) bash-3.2$ ./scripts/get-replicaset-status.sh
Working with first... Mapped to the host port 27110
MongoDB shell version v3.4.4
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.4
{
	"info" : "run rs.initiate(...) if not yet done for the set",
	"ok" : 0,
	"errmsg" : "no replset config has been received",
	"code" : 94,
	"codeName" : "NotYetInitialized"
}
Working with second... Mapped to the host port 27111
MongoDB shell version v3.4.4
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.4
{
	"info" : "run rs.initiate(...) if not yet done for the set",
	"ok" : 0,
	"errmsg" : "no replset config has been received",
	"code" : 94,
	"codeName" : "NotYetInitialized"
}

^{get-replicaset-status.sh}

And, all the previous commands in a single line to make trial-and-error faster:

 (mongo-replica-set-qvtM3FSm)$ ./scripts/reset-standalone.sh all && ./scripts/02-feed-standalone.py && ./scripts/backup.sh && ./scripts/restore.sh second 192.168.100.10 && ./scripts/reset-standalone.sh second && ./scripts/03-stop-standalone.sh && ./scripts/04-start-with-repl-param.sh

Initialize the replica set. This is done in the replicaset-init.js and replicaset-add-additional.js

It is important to pass the ip of first, because, when I used rs.initiate() without params, the configuration for the primary member of the replica set pointed to an unreachable address, and second was unable to reach first.
Adding second as a second member in the config param of rs.initiate() also resulted in failure.
Adding second right after rs.initiate() could also fail: you have to wait until its stateStr is set to PRIMARY.

(mongo-replica-set-qvtM3FSm)$ ./scripts/05-init-replicaset.sh
(mongo-replica-set-qvtM3FSm)$ ./scripts/read-standalone.py

^{05-init-replicaset.sh}

The arbitrer has not been setup, but we can access both instances. Using the previous connection params, that do not specify anything related to the replicaset:

we can read from both instances
we can update the primary (that is replicated to the secondary)
we get a failure when updating the secondary (pymongo.errors.NotMasterError: not master)

Networking issues

When trying to use the replicaset param when creating MongoClient, I learned that my OSX host can not reach my VirtualBox guests, or my guests reach the hostonly address where I expected my host to be. Lots of googling but nothing helped.

For the record:

$ VBoxManage list hostonlyifs

...

Name:            vboxnet2
GUID:            786f6276-656e-4274-8000-0a0027000002
DHCP:            Disabled
IPAddress:       192.168.100.1
NetworkMask:     255.255.255.0
IPV6Address:
IPV6NetworkMaskPrefixLength: 0
HardwareAddress: 0a:00:27:00:00:02
MediumType:      Ethernet
Wireless:        No
Status:          Up
VBoxNetworkName: HostInterfaceNetworking-vboxnet2

$ ifconfig
...
vboxnet2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
  ether 0a:00:27:00:00:02
  inet 192.168.100.1 netmask 0xffffff00 broadcast 192.168.100.255
...

$ VBoxManage showvminfo mongo-replica-set_arbitrer_1537298209734_479
...
Guest OS:        Red Hat (64-bit)
...
NIC 1:           MAC: 08002737F846, Attachment: NAT, Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 1 Settings:  MTU: 0, Socket (send: 64, receive: 64), TCP Window (send:64, receive: 64)
NIC 1 Rule(0):   name = ssh, protocol = tcp, host ip = 127.0.0.1, host port = 2201, guest ip = , guest port = 22
NIC 1 Rule(1):   name = tcp27112, protocol = tcp, host ip = , host port = 27112, guest ip = , guest port = 27017
NIC 2:           MAC: 080027EA7320, Attachment: Host-only Interface 'vboxnet2', Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: allow-all, Bandwidth group: none
...
Guest:

Configured memory balloon size:      0 MB
OS type:                             Linux26_64
Additions run level:                 2
Additions version:                   5.1.26 r117224

Guest Facilities:

Facility "VirtualBox Base Driver": active/running (last update: 2018/09/18 19:17:03 UTC)
Facility "VirtualBox System Service": active/running (last update: 2018/09/18 19:17:06 UTC)
Facility "Seamless Mode": not active (last update: 2018/09/18 19:17:03 UTC)
Facility "Graphics Mode": not active (last update: 2018/09/18 19:17:03 UTC)

Connecting to the replica set from a proper network host

The vagrant boxes see each other, and there we can connect to mongo specifying that we are connecting to a replica set:

MongoClient(url_to_local, replicaset=replicaset_name, read_preference=ReadPreference.NEAREST)

We can read and write using this client, from both the box that has the primary and the box that has the secondary.

Working with proper credentials and with the local database

So far the tests have been run using mongo root's credentials and regular databases. We now need to verify that we can work with less privileged credentials and with the local databases. We will

add a script that sets up the permissions to be able to use the local database
use non-root credentials to read and update the database and the replicaset

The tests show that

the local database can be updated when using the secondary and no replicaset specification
the local database of the secondary WON'T be updated if we specify a replicaset.

Wrap up

Migrate to local the databases that we don't want replicated
Modify the code to have different ways to create the pymongo.MongoClient if we are targeting the local database or a database that we want replicated
Use mongodump to create a backup of the two instances
Use mongorestore to consolidate all the info in first
Stop the servers
Delete the data volume of the second server
Create and make available the shared key
Start the servers, specifying --keyFile, --bind_ip and --replSet params for the mongodb container
Run rs.initiate() on the first server
Wait for the first to be PRIMARY and then add the rest of servers: rs.add() and rs.addArb()
Modify and deploy the code that specifies that replicaset when creating a pymongo.MongoClient to be used for databases that we want replicated

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docker		docker
docs		docs
scripts		scripts
.env		.env
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
Vagrantfile		Vagrantfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Making-of: docker-based MongoDB replica-set migration from single instance

Initial situation

Vagrant environment to test

Plan

A Tangent

Doing. First (half) successful attempt.

Networking issues

Connecting to the replica set from a proper network host

Working with proper credentials and with the local database

Wrap up

About

Releases

Packages

Languages

xverges/mongo-replica-set

Folders and files

Latest commit

History

Repository files navigation

Making-of: docker-based MongoDB replica-set migration from single instance

Initial situation

Vagrant environment to test

Plan

A Tangent

Doing. First (half) successful attempt.

Networking issues

Connecting to the replica set from a proper network host

Working with proper credentials and with the local database

Wrap up

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages