Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use O_DIRECT #51

Closed
6 of 8 tasks
JackKelly opened this issue Feb 10, 2024 · 12 comments · Fixed by #92
Closed
6 of 8 tasks

Use O_DIRECT #51

JackKelly opened this issue Feb 10, 2024 · 12 comments · Fixed by #92
Assignees
Labels
enhancement New feature or request performance Improvements to runtime performance

Comments

@JackKelly
Copy link
Owner

JackKelly commented Feb 10, 2024

todo

@JackKelly JackKelly added enhancement New feature or request performance Improvements to runtime performance labels Feb 10, 2024
@JackKelly JackKelly moved this to Todo in light-speed-io Feb 10, 2024
@JackKelly JackKelly self-assigned this Feb 14, 2024
@JackKelly
Copy link
Owner Author

I think the problem is that the buffer needs to be aligned to 512-byte boundaries. And possibly the length also needs to be a multiple of 512 bytes.

My SSD uses 512-byte logical block sizes:

jack@jack-NUC:~/dev/rust/light-speed-io$ sudo blockdev --getss /dev/nvme0n1
512

See the O_DIRECT notes near the bottom of the manual for open.

I should try using statx to find the O_DIRECT support and alignment restrictions for the files.

@JackKelly JackKelly linked a pull request Feb 14, 2024 that will close this issue
@JackKelly JackKelly moved this from Todo to In Progress in light-speed-io Feb 14, 2024
@JackKelly
Copy link
Owner Author

JackKelly commented Feb 14, 2024

Huh! Surprisingly, O_DIRECT seems to be slower on my Intel NUC! Some speeds:

  1. 560 MiB/s: O_DIRECT, and using vec.push(0) in a loop to init the AVec. The vec.push is slow.
  2. 886 MiB/s: O_DIRECT, but not initialising the AVec.
  3. 1011 MiB/s: Not using O_DIRECT. But still using (uninitialised) AVec.
  4. 1200 MiB/s: Not using O_DIRECT. Using a normal vec![].

UPDATE: These results are invalid. See comments below

I've run a flamegraph for the second scenario. It's in the associated pull request.

@JackKelly
Copy link
Owner Author

Because this is so much slower (on my NUC), I'm not going to merge this PR.

But I should re-visit this when I get my new workstation with a PCIe gen5 SSD.

And maybe I should try some other ways to create a vector which is aligned to 512-bytes:

JackKelly added a commit that referenced this issue Feb 14, 2024
Slow! Only about 560 MiB/s. See #51. Pausing work on this PR for now because it's so slow. I'll re-visit when I get my workstation with a PCIe gen 5 SSD!
@JackKelly
Copy link
Owner Author

Hopefully DIRECT will come into its own when we're loading huge numbers of small chunks (DIRECT avoids read ahead) and when we recycle buffers (because allocating aligned buffers seems to take a while!)

@JackKelly JackKelly moved this from In Progress to Todo in light-speed-io Feb 21, 2024
@JackKelly
Copy link
Owner Author

I think the high numbers of page-faults might actually be due to copying data from the OS's page cache into the program's memory space. So O_DIRECT should help. And I need to re-run benchmarking now I've fixed the benchmarks! #71

@JackKelly
Copy link
Owner Author

I recently figured out that my benchmarks weren't correctly clearing the cache before every run. That invalidates my findings above that O_DIRECT is slow. So I should have another go with O_DIRECT!

JackKelly added a commit that referenced this issue Feb 28, 2024
uring: 100 MiB/s
local_file_system: 105 MiB/s

Hopefully uring will pull ahead when using O_DIRECT (#51)!

Closes #73
JackKelly added a commit that referenced this issue Feb 28, 2024
uring: 100 MiB/s
local_file_system: 105 MiB/s

Hopefully uring will pull ahead when we use O_DIRECT! (#51)

Closes #73
@JackKelly
Copy link
Owner Author

JackKelly commented Feb 28, 2024

How to create an aligned buffer?

In summary: I think I should create a new struct AlignedBuffer<const ALIGNMENT: usize>{vec: Vec<[u8; ALIGNMENT]>, len: usize} struct. Internally, allocate memory using Vec<[u8; ALIGNMENT]>::with_capacity(). impl Deref for AlignedBuffer to allow a view using std::slice::from_raw_parts. Implement a set_len_in_bytes method to set the length of the slice.

Alternatives:

  • Create a new AlignedBuffer struct which uses alloc::alloc to allocate memory. And uses std::slice::from_raw_parts to provide a view into the buffer. AlignedBuffer will take care of deallocating using alloc::dealloc. Similar to this answer. One disadvantage of this (compared to plan A) is that I need to manually handle what happens when allocation fails.
  • Allocate using alloc::alloc and then create a Vec with Vec::from_raw_parts(ptr: *mut [u8], length: usize, capacity: usize). This won't deallocate correctly because deallocation also needs to know the alignment, and the type [u8] is unlikely to be aligned correctly.
  • Vec::align_to should kind of work. But it's unsafe. And allocates more memory than we need. And I'd have to keep the Vec alive while using the middle (aligned) slice.
  • Creating a Vec<[u8; 512]>, dismantling it to raw parts, and then re-creating a Vec<u8> (like this answer) will also lead to UB, because the Vec<u8> won't deallocate correctly.
  • aligned_vec::AVec looked good. But I can't see how to create an uninitialsed AVec. AVec doesn't have a set_len method.

@JackKelly JackKelly linked a pull request Feb 28, 2024 that will close this issue
@JackKelly
Copy link
Owner Author

JackKelly commented Feb 28, 2024

On second thoughts, let's use alloc::alloc, so we can control alignment at runtime (and different filesystems may have different alignment requirements)

@JackKelly
Copy link
Owner Author

See fio results in this comment for evidence that O_DIRECT appears pretty important to achieve full speed.

@JackKelly JackKelly moved this from Todo to In Progress in light-speed-io Mar 11, 2024
@JackKelly
Copy link
Owner Author

Hmm, so, I've implemented AlignedBuffer. It passes unit tests for AlignedBuffer. But, for some reason, test_get_with_io_uring_local doesn't pass!

@JackKelly
Copy link
Owner Author

JackKelly commented Mar 12, 2024

Still TODO:

EDIT: Moved to top of this thread

@JackKelly
Copy link
Owner Author

Yay! Using O_DIRECT has sped things up!

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Improvements to runtime performance
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant