-
Notifications
You must be signed in to change notification settings - Fork 78
Problems when file://
storage runs out of space
#67
Comments
Similar panic happens in |
Thank you for issue, that's a good edge case. The unfinished data won't be able to recover as writing is processed in a transaction. If writing doesn't call |
After a crash with unfinished writes, storage is still full in megabytes. Are those files securely inaccessible (e.g. no decryption key available) or just unrefereced? |
With I think ZboxFS should actively mind available room (in |
Yes, those files are still secured but they are inaccessible. |
That could be hard as some storage are not able to retrieve free space at run time, even through it can be retrieved the cost to constantly monitor it is still high. |
ZboxFS can just pre-allocate some space needed to complete a transaction. |
Secured with the same key as the rest of files or secured with some session key that is supposed to be written to storage only on transaction commit? If the former, there should be some recovery function that trades consistency for availability, allowing to read possibly broken data from outside normal transaction finish. Like |
Accurately pre-allocate space is impossible because it has no idea how much data user will write in. The file writing is like streaming, until transaction was committed there is no way to know the actual size in advance. |
I see megabytes used by Anyway, even a heuristical "pre-allocate X kilobytes for transaction commits" where X is configurable by user can be beneficial, to reduce probability of In short: running out of space should not lead to big data loss, only maybe a limited one. |
ZboxFS uses the former. When writing data to file, it firstly writes data to a staging area. When transaction is committing, those staging data will be transferred to content management system. That means any uncommitted data will not be in content management system, thus it cannot be accessed and it will discard when transaction aborted. |
How do I recover data from that staging area if transaction was neither cleanly finished nor cleanly aborted? Does that staging area contain all necessary information needed to complete a transaction or some things are only kept in RAM? |
You see the storage going up because of the staging data, the file content has been written to storage and committing tx is just to 'mark' it as permanent. Committing tx needs to write a WAL file, its size is not large but still depends on how large the file you are writing. Larger file produces more content nodes to be added to tx, thus the wal file size will increase. Heuristically estimate the wal file size in advance is impossible, but I agree pre-allocation some blocks for wal is a good idea, and it might be able to handle most cases. |
The staging area only contains file content and it is referenced by a temporary content node which is only in memory. That content node is merged with pre-existing file content node and written to storage when committing. Thus, if tx is not completed, the temporary content is lost and staging data cannot be accessed. |
Would you mind share your code and setup, so I can reproduce this issue locally and that will be helpful? Thanks. |
High-level setup is outlined in the opening comment. Here is example code.
fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "123asd")?;
use std::io::Write;
let mut f = r.create_file("/test")?;
loop {
match f.write(&[77;65536]) {
Ok(_) => {
eprint!(".")
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
break;
}
}
}
f.finish()?;
Ok(())
}
[package]
name = "zboxtest"
version = "0.1.0"
edition = "2018"
[dependencies]
zbox = {version="0.8.8",features=["storage-file"]}
env_logger = "0.7.1"
|
Either this is not true (I suppose it should be something like a logarithm of file size being written) or design is flawed somewhere. It may be reasonable that it's not possible to calculate the size exactly beforehand, but impossibility to estimate (e.g. provide some upper bound), heuristically (i.e. it works with high probability, but can fail) looks very suspicious. Space being reserved for wal file can even dynamically grow based on ongoing written,but uncommitted data size. If I want, for example, to store 10 terabytes of data into ZboxFS, how often am I recommended to |
Do I correctly understand that recovery of that lost content is theoretically possible (given the correct password obviously), but is not implemented currently. |
ZboxFs uses mega-transaction, so if you use a single |
Theoretically it is possible but hard to implement and I also think it is not necessary. The application should be aware of that any uncommitted data can be lost, and take proper actions to deal with this situation. ZboxFS itself only gives atomicity promise, a transaction is either success or failure in full. Data recovery is the application's responsibility. You can think of ZboxFS as a RDBMS, which require user to re-submit the whole transaction again in case of failure. |
So for this issue, the root cause is actually because of improper error propagation while loading cow object. A cow object has two copies (arms), each change will make a copy of current arm and write to the other arm. When loading the cow object, the error should pop up only when both of the arms loading failed. If any arm was successfully loaded, we should use that arm. In this case, the correct result should be like this:
I've change it in c30a303, can you test it again in your end? Thanks. |
What is a failed transaction (in terms of public ZboxFS API)? Does any |
Tried again with zbox commit c30a303:
Still the same panic. |
Transaction is used internally when mutating file or file system. There is no API can check its status. Before writing a file, a transaction is created internally and keep updating while writing that file. Any error during the writing will mark that transaction failed and to abort. If everything goes fine, |
This panic is because you're calling |
After eb8b135 Similar protection however should be added to Similar panic is still happening with a modified test code: fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
use std::io::Write;
let mut f : zbox::File;
let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "qwe123")?;
f = r.create_file("/test")?;
let mut remaining_failed_writes = 10;
loop {
match f.write(&[77;65536]) {
Ok(_) => {
eprint!(".")
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
if remaining_failed_writes == 0 { break; }
remaining_failed_writes -= 1;
}
}
}
f.finish()?;
Ok(())
} Actually I expect that a |
What is official way to recover from Even after proper (non-panic) repository close after ENOSPC it remains cluttered with uncleared aborted transaction and no more transactions are accepted, even smaller ones. Here is source code illustrating what do I mean: fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
use std::io::Write;
let mut f : zbox::File;
let mut r = zbox::RepoOpener::new().create(true).force(true).open("file:///tmp/limit/q", "qwe123")?;
f = r.create_file("/test")?;
let mut written_size : usize = 0;
eprintln!("Trying to create a file as big as possible");
loop {
match f.write(&[77;65536]) {
Ok(x) => {
eprint!(".");
written_size+=x;
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
break;
}
}
}
eprintln!("\nOK, we cannot write {} bytes here. Let's try storing a {}-byte file instead", written_size, written_size/2);
r.remove_file("/test")?;
f = r.create_file("/test.smaller")?;
let mut written_size2 : usize = 0;
loop {
match f.write(&[77;65536]) {
Ok(x) => {
eprint!(".");
written_size2+=x;
if written_size2 * 2 >= written_size {
eprintln!("\nOK, enough. Committing");
break;
}
}
Err(e) => {
eprintln!("\nwrite error: {}", e);
break;
}
}
}
f.finish()?;
Ok(())
} Log:
Expected: after failing to write the big file, it succeeds to write and commit smaller file. |
There is no way to recover when writing is failed, the reason is described above. The recommended way writing big file is to use smaller batch and call The example code is like below: fn main() -> Result<(), Box<dyn std::error::Error>> {
env_logger::init();
zbox::init_env();
let mut r = zbox::RepoOpener::new()
.create(true)
.force(true)
.open("file:///mnt/q", "123asd")?;
use std::io::Write;
let src = vec![77; 65536 * 1024];
let batch_size = 8 * 1024 * 1024; // choose proper batch size suits your need
let mut offset = 0;
let mut written_in_batch = 0;
let mut retry = 3;
let mut f = r.create_file("/test")?;
while offset < src.len() {
match f.write(&src[offset..]) {
Ok(written) => {
eprint!(".");
offset += written;
written_in_batch += written;
if written_in_batch >= batch_size {
f.finish()?;
written_in_batch = 0;
}
}
Err(e) => {
eprintln!("\nwrite error (retry #{}): {}", retry, e);
retry -= 1;
if retry == 0 {
break;
}
}
}
}
Ok(())
} |
Can this be documented in API?
Looks like the answer is ZboxFS does not like multi-gigabyte transactions and keeping reasonable upper bound on number of unfinished bytes is recommended. |
Sure I will make it more clear in the document.
Actually, ZboxFS doesn't have preferred way or size of transaction of write big file, it all depends on the application's requirement. If app wants integrity and atomicity, use single |
As of 330ed30 it no longer panics. However, once run out of free space, I cannot even delete an existing file to free up some space. Repository effectively becomes read-only. |
Yes, that's true. Each transaction needs to write a WAL file beforehand, when there is no free space left no WAL file can be persisted, thus no transaction can be made and the repo becomes effectively read-only. There might be better way to deal with this situation, but it needs thoroughly thinking and might need some trade-offs. |
Is there upper bound on WAL file size needed to remove one ZboxFS File? Such file may be kept allocated in advance for such cases. Also why aborted transaction does not clean up it's unused files in |
file://
storage runs out of spacefile://
storage runs out of space
The difficulty is WAL file size is dynamic and unknown in advance. Pre allocation might work in most cases but cannot guarantee.
The same reason. The sector file needs write a copy first when change, that's how COW works. When no space left, the aborted tx even cannot reclaim used space. That's why you see tx abort failed. |
I just checked with sqlite, it also cannot complete any transaction when disk is full. So this behavior might be valid for ZboxFS as well, but I will keep trying to mitigate it as much as possible. My think is that, when disk is full, at least the transaction for file deletion should still be able to execute, otherwise the app will get stuck in this limbo. The solution might be pre-allocation for some fix-sized space for wal and rotate it when transactions flow in. |
After carefully checking different possible solutions, I think the repo mutability when disk full is hard to implement, so I will have to leave it unimplemented. The expected result will be same as sqlite, when disk is full the repo becomes read-only and no more transactions can be made, even deletion. The major difficulties I found are:
|
What a WAL file consists of? What affects its file size? Maybe pre-allocated WAL size should be just a user-configurable parameter? That file would normally be unused except of when:
|
The WAL file contains all affected objects in the transaction, including new, update and deletion. When writing data to a file, the data will be split to chunks and chunks will be grouped to limit-sized segments, each segment is an new object to be put in the transaction. So when keeping writing data, more and more new segments are created and added to transaction, thus the WAL file size will keep increasing. It's hard to predict the WAL size because we don't know how many bytes users will write to a file. Even though we know it, the number of chunks and segments will still hard to know because they are depending on the Rabin rolling hashing algorithm, which will produce variable-length chunks based on the data content. |
How many chunks/segments are required to delete one file? Maybe in "only delete 1 file" special case WAL size is more predictable compared to general case? |
To delete one file, one transaction needed, which consists of 2 wal files (one for tx itself, one for wal queue). For deletion, if it is not in extreme case, normally no more than several kilo bytes. But the difficulty is not this, it is how to reliably capture the disk full error without sacrifice efficiency. |
Just capture it unreliably. If it is not a disk full error, it is probably no problem trying to handle it like if it were a full disk error (i.e. trying to use the reserved files). It would just fail again. |
That will be overkill, that error in Rust is too general and you will end up with doing unnecessary error handling for all other errors. |
Steps to reproduce:
file://
storage on that sizezbox::File
there and start filling it with data (withoutfinish
ing) ad infinum.Expected:
Eventually
std::io::Write::write
returns IO error suggesting that allowed storage space is exhaused. Ideally the file may befinish()
ed at that point, with the data previously accepted bystd::io::Write::write
without error being guaranteed to fit on remaining space.Actual:
Additionally, storage cannot be opened again after that:
The text was updated successfully, but these errors were encountered: