Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] - Md bitmap writes to unallocated pages #797

Open
meir6264 opened this issue Sep 19, 2024 · 3 comments
Open

[Bug] - Md bitmap writes to unallocated pages #797

meir6264 opened this issue Sep 19, 2024 · 3 comments

Comments

@meir6264
Copy link

Md bitmap writes to unallocated pages

A bug was discovered in the md driver. bitmap write exceeded the page limit.
the patch was already approved and merged to md6.11(commit ab99a87542f194f28e2364a42afbf9fb48b1c724).
Can you please patch it to Amaz Linux 2023 as soon as possible? this bug may cause a crash.

Patch Description:
__write_sb_page() rounds up the io size to the optimal io size if it
doesn't exceed the data offset, but it doesn't check the final size
exceeds the bitmap length.

For example:
page count - 1
page size - 4K
data offset - 1M
optimal io size - 256K

The final io size would be 256K (64 pages) but md_bitmap_storage_alloc()
allocated 1 page, the IO would write 1 valid page and 63 pages that
happens to be allocated afterwards. This leaks memory to the raid device
superblock.

This issue caused a data transfer failure in nvme-tcp. The network
drivers checks the first page of an IO with sendpage_ok(), it returns
true if the page isn't a slabpage and refcount >= 1. If the page
!sendpage_ok() the network driver disables MSG_SPLICE_PAGES.

As of now the network layer assumes all the pages of the IO are
sendpage_ok() when MSG_SPLICE_PAGES is on.

The bitmap pages aren't slab pages, the first page of the IO is
sendpage_ok(), but the additional pages that happens to be allocated
after the bitmap pages might be !sendpage_ok(). That cause
skb_splice_from_iter() to stop the data transfer, in the case below it
hangs 'mdadm --create'.

The bug is reproducible, in order to reproduce we need nvme-over-tcp
controllers with optimal IO size bigger than PAGE_SIZE. Creating a raid
with bitmap over those devices reproduces the bug.

In order to simulate large optimal IO size you can use dm-stripe with a
single device.
Script to reproduce the issue on top of brd devices using dm-stripe is
attached below (will be added to blktest).

I have added some logs to test the theory:
...
md: created bitmap (1 pages) for device md127
__write_sb_page before md_super_write offset: 16, size: 262144. pfn: 0x53ee
=== __write_sb_page before md_super_write. logging pages ===
pfn: 0x53ee, slab: 0 <-- the only page that allocated for the bitmap
pfn: 0x53ef, slab: 1
pfn: 0x53f0, slab: 0
pfn: 0x53f1, slab: 0
pfn: 0x53f2, slab: 0
pfn: 0x53f3, slab: 1
...
nvme_tcp: sendpage_ok - pfn: 0x53ee, len: 262144, offset: 0
skbuff: before sendpage_ok() - pfn: 0x53ee
skbuff: before sendpage_ok() - pfn: 0x53ef
WARNING at net/core/skbuff.c:6848 skb_splice_from_iter+0x142/0x450
skbuff: !sendpage_ok - pfn: 0x53ef. is_slab: 1, page_count: 1

@bjoernd
Copy link

bjoernd commented Oct 4, 2024

I tried running the reproducer from that upstream commit in AL2023. I am unable to reproduce any warning or bug. Have you actually experienced this issue in AL2023 with kernel 6.1?

@meir6264
Copy link
Author

meir6264 commented Oct 6, 2024

Hi, I using 6.1.109-118.189.amzn2023 and I've noticed that it's missing this important fix.
I've seen some problems that might caused by the writing to these unallocated pages (Although, I haven't proven it yet). Any reason why not to patch it?

@bjoernd
Copy link

bjoernd commented Oct 6, 2024

Sure, I was just imploring about the urgency of the issue to help our kernel team prioritize this against other bug reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants