Skip to content

Commit

Permalink
Merge #1407
Browse files Browse the repository at this point in the history
1407: fix(fs/shutdown): pause nexus when faulted r=tiagolobocastro a=tiagolobocastro

When the last healthy child of a nexus is faulted, errors are propagated up the stack causing the filesystem to shutdown itself down.

Instead I propose that we simply leave the nexus in a paused state, allowing the initiator to start the re-connection cycle, giving us some “time” to recreate the nexus again when the replica is available again.

todo: we might want to add a time-limit after which we resort to failing IOs?

Co-authored-by: Tiago Castro <[email protected]>
  • Loading branch information
mayastor-bors and tiagolobocastro committed Jun 13, 2023
2 parents d5ec5af + 08dff5d commit 5e08424
Show file tree
Hide file tree
Showing 11 changed files with 359 additions and 8 deletions.
2 changes: 1 addition & 1 deletion deploy/csi-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ spec:
# the same.
containers:
- name: mayastor-csi
image: mayadata/mayastor:v1.0.5
image: mayadata/mayastor:v1.0.7
imagePullPolicy: IfNotPresent
# we need privileged because we mount filesystems and use mknod
securityContext:
Expand Down
2 changes: 1 addition & 1 deletion deploy/mayastor-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ spec:
command: ['sh', '-c', 'until nc -vz nats 4222; do echo "Waiting for message bus..."; sleep 1; done;']
containers:
- name: mayastor
image: mayadata/mayastor:v1.0.5
image: mayadata/mayastor:v1.0.7
imagePullPolicy: IfNotPresent
env:
- name: RUST_LOG
Expand Down
14 changes: 12 additions & 2 deletions mayastor/src/bdev/nexus/nexus_bdev.rs
Original file line number Diff line number Diff line change
Expand Up @@ -983,9 +983,9 @@ impl<'n> Nexus<'n> {
) -> Result<(), Error> {
self.child_retire_for_each_channel(Some(name.clone()))
.await?;
debug!(?self, "PAUSE");
debug!(?self, "PAUSING");
self.as_mut().pause().await?;
debug!(?self, "UNPAUSE");
debug!(?self, "PAUSED");
if let Some(child) = self.lookup_child(&name) {
let uri = child.name.clone();
// schedule the deletion of the child eventhough etcd has not been
Expand Down Expand Up @@ -1030,6 +1030,16 @@ impl<'n> Nexus<'n> {
})))
.await;
}
// If we are faulted then rather than failing all IO back to the
// initiator we can instead leave the subsystem paused, and wait
// for the control-plane to do something about this.
// Meanwhile the initiator will begin it's reconnect loop and won't see
// a swarm of IO failures which could cause a fs to shutdown.
if self.status() == NexusStatus::Faulted {
tracing::warn!(?self, "Nexus Faulted: not resuming subsystem");
return Ok(());
}
debug!(?self, "RESUMING");
self.resume().await
}

Expand Down
2 changes: 1 addition & 1 deletion scripts/check-deploy-yamls.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ DEPLOYDIR="$ROOTDIR"/deploy

CORES=2
PROFILE=release
TAG=v1.0.5
TAG=v1.0.7

"$SCRIPTDIR"/generate-deploy-yamls.sh -c "$CORES" -t "$TAG" "$PROFILE"

Expand Down
9 changes: 9 additions & 0 deletions scripts/pytest-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ function run_tests()
python -m pytest --tc-file='test_config.ini' --docker-compose="$name" "$name"
)
fi
if [ -f "$name" ]
then
(
set -x
base=$(dirname "$name")
python -m pytest --tc-file='test_config.ini' --docker-compose="$base" "$name"
)
fi
done
}

Expand All @@ -46,6 +54,7 @@ tests/replica_uuid
# tests/rpc
tests/nexus_multipath
tests/nexus_fault
tests/nexus
v1/pool
Expand Down
1 change: 1 addition & 0 deletions shell.nix
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ mkShell {
pytest_inputs
python3
utillinux
xfsprogs
libunwind
] ++ (if (nospdk) then [ libspdk-dev.buildInputs ] else [ libspdk-dev ]);

Expand Down
15 changes: 12 additions & 3 deletions test/python/common/fio.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@


class Fio(object):
def __init__(self, name, rw, device, runtime=15, optstr=""):
def __init__(self, name, rw, device, size=None, runtime=15, optstr=""):
self.name = name
self.rw = rw
self.device = device
Expand All @@ -11,17 +11,26 @@ def __init__(self, name, rw, device, runtime=15, optstr=""):
self.success = {}
self.runtime = runtime
self.optstr = optstr
self.size = size

def build(self):
devs = [self.device] if isinstance(self.device, str) else self.device
size = ""
if self.size is not None:
size = "--size={}".format(self.size)

command = (
"sudo fio --ioengine=linuxaio --direct=1 --bs=4k "
"--time_based=1 {} --rw={} "
"--group_reporting=1 --norandommap=1 --iodepth=64 "
"--runtime={} --name={} --filename={}"
"--runtime={} --name={} --filename={} {}"
).format(
self.optstr, self.rw, self.runtime, self.name, ":".join(map(str, devs))
self.optstr,
self.rw,
self.runtime,
self.name,
":".join(map(str, devs)),
size,
)

return command
1 change: 1 addition & 0 deletions test/python/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ pytest-docker-compose
pytest-testconfig
pytest-timeout
pytest-variables
retrying
66 changes: 66 additions & 0 deletions test/python/tests/nexus_fault/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#
# {SRCDIR} should point to your working tree which should be your current pwd
#

version: '3'
services:
ms0:
container_name: "ms0"
image: rust:latest
environment:
- MY_POD_IP=10.0.0.2
- NEXUS_NVMF_ANA_ENABLE=1
- NEXUS_NVMF_RESV_ENABLE=1
command: ${SRCDIR}/target/debug/mayastor -g 0.0.0.0 -l 1 -r /tmp/ms0.sock
networks:
mayastor_net:
ipv4_address: 10.0.0.2
cap_add:
# NUMA related
- SYS_ADMIN
- SYS_NICE
# uring needs mmap
- IPC_LOCK
security_opt:
# we can set this to a JSON file to allow per syscall access
- seccomp=unconfined
volumes:
- ${SRCDIR}:${SRCDIR}
- /nix:/nix
- /dev/hugepages:/dev/hugepages
- /tmp:/tmp
- /var/tmp:/var/tmp
ms1:
container_name: "ms1"
image: rust:latest
environment:
- MY_POD_IP=10.0.0.3
- NEXUS_NVMF_ANA_ENABLE=1
- NEXUS_NVMF_RESV_ENABLE=1
command: ${SRCDIR}/target/debug/mayastor -g 0.0.0.0 -l 2 -r /tmp/ms1.sock
networks:
mayastor_net:
ipv4_address: 10.0.0.3
cap_add:
# NUMA related
- SYS_ADMIN
- SYS_NICE
# uring needs mmap
- IPC_LOCK
security_opt:
# we can set this to a JSON file to allow per syscall access
- seccomp=unconfined
volumes:
- ${SRCDIR}:${SRCDIR}
- /nix:/nix
- /dev/hugepages:/dev/hugepages
- /tmp:/tmp
- /var/tmp:/var/tmp

networks:
mayastor_net:
name: mayastor_net
ipam:
driver: default
config:
- subnet: "10.0.0.0/16"
16 changes: 16 additions & 0 deletions test/python/tests/nexus_fault/features/nexus_fault.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Feature: Faulted nexus I/O management

Background:
Given a local mayastor instance
And a remote mayastor instance

Scenario: a temporarily faulted nexus should not cause initiator filesystem to shutdown
Given a single replica (remote) nexus is published via nvmf
And the nexus is connected to a kernel initiator
And a filesystem is placed on top of the connected device
And the filesystem is mounted
And a fio workload is started on top of the mounted filesystem
When the remote mayastor instance is restarted
And the faulted nexus is recreated
Then the fio workload should complete gracefully
And the initiator filesystem should not be shutdown
Loading

0 comments on commit 5e08424

Please sign in to comment.