-
Notifications
You must be signed in to change notification settings - Fork 106
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
1407: fix(fs/shutdown): pause nexus when faulted r=tiagolobocastro a=tiagolobocastro When the last healthy child of a nexus is faulted, errors are propagated up the stack causing the filesystem to shutdown itself down. Instead I propose that we simply leave the nexus in a paused state, allowing the initiator to start the re-connection cycle, giving us some “time” to recreate the nexus again when the replica is available again. todo: we might want to add a time-limit after which we resort to failing IOs? Co-authored-by: Tiago Castro <[email protected]>
- Loading branch information
Showing
11 changed files
with
359 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,3 +9,4 @@ pytest-docker-compose | |
pytest-testconfig | ||
pytest-timeout | ||
pytest-variables | ||
retrying |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# | ||
# {SRCDIR} should point to your working tree which should be your current pwd | ||
# | ||
|
||
version: '3' | ||
services: | ||
ms0: | ||
container_name: "ms0" | ||
image: rust:latest | ||
environment: | ||
- MY_POD_IP=10.0.0.2 | ||
- NEXUS_NVMF_ANA_ENABLE=1 | ||
- NEXUS_NVMF_RESV_ENABLE=1 | ||
command: ${SRCDIR}/target/debug/mayastor -g 0.0.0.0 -l 1 -r /tmp/ms0.sock | ||
networks: | ||
mayastor_net: | ||
ipv4_address: 10.0.0.2 | ||
cap_add: | ||
# NUMA related | ||
- SYS_ADMIN | ||
- SYS_NICE | ||
# uring needs mmap | ||
- IPC_LOCK | ||
security_opt: | ||
# we can set this to a JSON file to allow per syscall access | ||
- seccomp=unconfined | ||
volumes: | ||
- ${SRCDIR}:${SRCDIR} | ||
- /nix:/nix | ||
- /dev/hugepages:/dev/hugepages | ||
- /tmp:/tmp | ||
- /var/tmp:/var/tmp | ||
ms1: | ||
container_name: "ms1" | ||
image: rust:latest | ||
environment: | ||
- MY_POD_IP=10.0.0.3 | ||
- NEXUS_NVMF_ANA_ENABLE=1 | ||
- NEXUS_NVMF_RESV_ENABLE=1 | ||
command: ${SRCDIR}/target/debug/mayastor -g 0.0.0.0 -l 2 -r /tmp/ms1.sock | ||
networks: | ||
mayastor_net: | ||
ipv4_address: 10.0.0.3 | ||
cap_add: | ||
# NUMA related | ||
- SYS_ADMIN | ||
- SYS_NICE | ||
# uring needs mmap | ||
- IPC_LOCK | ||
security_opt: | ||
# we can set this to a JSON file to allow per syscall access | ||
- seccomp=unconfined | ||
volumes: | ||
- ${SRCDIR}:${SRCDIR} | ||
- /nix:/nix | ||
- /dev/hugepages:/dev/hugepages | ||
- /tmp:/tmp | ||
- /var/tmp:/var/tmp | ||
|
||
networks: | ||
mayastor_net: | ||
name: mayastor_net | ||
ipam: | ||
driver: default | ||
config: | ||
- subnet: "10.0.0.0/16" |
16 changes: 16 additions & 0 deletions
16
test/python/tests/nexus_fault/features/nexus_fault.feature
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
Feature: Faulted nexus I/O management | ||
|
||
Background: | ||
Given a local mayastor instance | ||
And a remote mayastor instance | ||
|
||
Scenario: a temporarily faulted nexus should not cause initiator filesystem to shutdown | ||
Given a single replica (remote) nexus is published via nvmf | ||
And the nexus is connected to a kernel initiator | ||
And a filesystem is placed on top of the connected device | ||
And the filesystem is mounted | ||
And a fio workload is started on top of the mounted filesystem | ||
When the remote mayastor instance is restarted | ||
And the faulted nexus is recreated | ||
Then the fio workload should complete gracefully | ||
And the initiator filesystem should not be shutdown |
Oops, something went wrong.