You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The append recovery logic currently depends on inode numbers of the file and the staging file stored in the append log. But the inode number of the file may change in some scenarios upon recovery.
Consider the following example happening in order (SplitFS Strict mode):
file1 is created with size 0. Let its inode number be 1. A LOG_FILE_CREATE operation is created in oplog
An append operation is done on file1. The contents are written on a staging file with say inode number 2
Also, an append log entry is created storing source (1) and destination (2) numbers.
There's a crash (power failure) and there was no fsync. Lets assume it crashed after the append/write call returned to the application.
During recovery, the following happens:
Op log recovery attempts to from step 1 in example attempts to re-create the file (file1 is lost due to lack of fsync, thus relies log recovery) via ext4-dax. This inode number is not guaranteed to be 1. Lets say it is 3 now.
Append log recovery attempts to relink file with an invalid inode (1) and inode (3) and thus the append is lost.
To fix this, one solution that I could think of is to keep track of old and new inode numbers during op log recovery by creating a mapping between old and new inode numbers. During append log recovery use the new inode in place of the old one by examining the mapping.
The text was updated successfully, but these errors were encountered:
The append recovery logic currently depends on inode numbers of the file and the staging file stored in the append log. But the inode number of the file may change in some scenarios upon recovery.
Consider the following example happening in order (SplitFS Strict mode):
file1
is created with size 0. Let its inode number be 1. ALOG_FILE_CREATE
operation is created in oplogfile1
. The contents are written on a staging file with say inode number 2Also, an append log entry is created storing source (1) and destination (2) numbers.
fsync
. Lets assume it crashed after the append/write call returned to the application.During recovery, the following happens:
file1
is lost due to lack of fsync, thus relies log recovery) via ext4-dax. This inode number is not guaranteed to be 1. Lets say it is 3 now.To fix this, one solution that I could think of is to keep track of old and new inode numbers during op log recovery by creating a mapping between old and new inode numbers. During append log recovery use the new inode in place of the old one by examining the mapping.
The text was updated successfully, but these errors were encountered: