-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error writing to datastore on Windows 10 #7115
Comments
We do close the files before renaming. This suggests that some other process (like an antivirus) might be opening them though. It would be helpful to try this golang/go#36568 (comment) and see if it's indeed the antivirus. The change we made was to create the temp files in specific temp-folder that can be cleaned up later, rather than creating in their final folder and rename them there. I guess the fact that it has to move cross-folders causes issues. Workarounds:
@Stebalien I don't see a good fix here, should we revert the temp file thing for the moment while we rethink it? |
Older IPFS works correctly. I checked antivirus and there are no exceptions defined for old IPFS data directory. |
But does it still happen if you add such exceptions? |
I added exception to AVG antivirus and there is still number of files left in blocks\.temp and occasional exceptions - No change from previous behaviour, I didn't get Error writing to datastore stack trace every time. I am not sure if defined exception in AVG actually works because while browsing http://docs.ipfs.io.ipns.localhost:8080/guides/concepts/dht/ I got error:
To be sure I disabled AVG completely and these errors are still there. No change - still temp files stuck and stack traces. |
Can you build and use a custom branch: https://github.com/ipfs/go-ipfs/tree/test/flatfs? This branch reverts to a version of flatfs before the change, to make sure we're on the right track. |
Note: the easiest way to build on windows is to run:
Alternatively, you can use a pre-built binary. |
I did the same testing at your provided binary, IPFS directory added to antivirus exceptions. This error is still there:
There is lot of put-DDDDD files in repository. I checked ipfs repo from stable branch and no put-DDDD files are there. After inspecting blocks and checking file sizes of put-DDDDD and an .data block file it looks like everything what is in put-DDDDD file left behind is also in .data file, so it looks like cleanup problem. Sample of such directory, look carefully at file dates:
and more interesting case:
It looks like put file left behind is written after block already finished downloading from other source, rename to an existing file fails but there is no cleanup procedure for deleting stale put-DDDDDD. |
Yeah, ok, so it's likely due to some change in timing in bitswap. It looks like we should be opening all files in SHARE_DELETE mode. |
Could you try the build linked in #7133 (comment)? (or just build that branch). |
Testing go-ipfs version: 0.5.0-rc1-924e870 shows no change in behaviour |
Ok, well, we're now opening these files with FILE_SHARE_DELETE. There's still a potential issue where a get could fail because a put is in progress, but we're seeing the reverse here which should be impossible given only IPFS. Do you use any local search/indexing tools? Are you storing your IPFS repo in something like Dropbox? Given the fact that we're somehow failing to delete the temporary file, something must be keeping these files open. |
I would not blame external influence because old 0.4 version works just fine. |
An updated stack trace / error output from the latest version would be useful for continued investigation, if you can reliably reproduce |
That exception do not relate 1:1 to every file left behind. Its about 15 files left behind : 1 exception. |
Are you still seeing temporary files get left behind? We delete the temporary file on failure so that should only be possible if some outside application is opening them. Unfortunately, this working in previous go-ipfs versions could simply be a timing difference. One thing I have noticed is that we're now using batch puts in bitswap instead of putting each node sequentially. I'll put together a patch to revert that and see if it helps. Things I've checked:
|
Hm. I do have an alternative explanation for why we might not be deleting temporary files. I need to check it. edit: yes, we had a bug there. But that still doesn't explain this behavior. |
Could you try the test/flatfs branch again, or ipfs.zip? This version will just ignore the put errors if the put was idempotent anyways (which is always the case here). |
No change |
Could you be more specific?
The second issue should definitely have been fixed. If you're still seeing the first issue, something is preventing us from opening the target block for reading, not just for writing. |
Temporary files are left behind, I didn't tested long enough to wait for stacktrace it needs about 10 minutes to show. |
What about to make old 0.4.X version with new datastore for testing purposes? |
Here's a test binary of the previous release with the new datastore. It's not perfect because I needed to upgrade multiple datsatores (interface changes) but it should (hopefully) work: I expect this version will work, but you're right in that we should still test it. I believe this issue is as follows: In the previous release, when we wrote blocks, we'd do the following repeatedly:
Now, we collect a batch of blocks, then:
If we're writing multiple temporary files at once, this opens up a larger window for thrid-party applications to open these temporary files and start messing with them. That's why we asked you to disable your AV (and I'd still like you to try to completely disable all AVs). I've tried to reproduce this issue on my own Windows machine but couldn't. As for how to proceed:
I'm currently leaning towards 2 as it should "just work" in most cases and won't have any performance impact in the normal non-error case. |
This works ok. No file left behind and no exceptions after 20 minutes of testing |
Ok, we've updated flatfs to retry more aggressively. Could you try ipfs.zip? |
it seems to be fixed now. I also noticed that CPU usage is significantly higher compared to 0.4.X but it may not be caused by filesystem fix. |
It's probably but I'd love to take a look and see if we have a different issue. If you have some time, I'd appreciate it if you could follow the debug guide (while observing the increased CPU usage) and file an issue with the results. Anyways, thanks for all your help in debugging this! |
Another attempt to fix ipfs#7115.
Another attempt to fix ipfs#7115.
Another attempt to fix ipfs#7115.
Another attempt to fix ipfs#7115.
Another attempt to fix ipfs#7115.
Another attempt to fix ipfs#7115.
Version information:
0.5 rc1
Description:
New flatfs datastore version introduced in 0.5 has problems with writing blocks on Windows 10.
There is a very large number of temp-* files in blocks\.temp directory which means that this error happens quite often. It does not fail writing for all blocks, some blocks are written correctly. My guess is that file has to be closed first before it can be renamed on windows platform.
The text was updated successfully, but these errors were encountered: