-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hook for updating the canonical address #3787
base: master
Are you sure you want to change the base?
Conversation
I'm sorry I am a bit confused here, could you explain the problem this is trying to solve? In Rust it is generally not correct to have an allocation be accessed via two different addresses. So "only use alias once set" doesn't work; if an alias is to be used, there must not have been any accesses to the original address ever. The existing t-opsem discussions for pointer tagging schemes concluded that these schemes are fine if we can just pretend that the full pointer with the tag is the actual address that the allocation is located at. However, accessing the same allocation with pointers that use different tags is equivalent to accessing physical memory that is mapped into the virtual address space multiple times, and that's something Rust doesn't really support currently (and AFAIK neither doses LLVM). |
Maybe This seems to me like something that needs opsem discussions, so maybe bring this up on Zulip or at https://github.com/rust-lang/unsafe-code-guidelines/ ? |
It's currently not possible to run code which modifies the top byte of a pointer (whether via TBI or MTE) under Miri, as Miri will consider the pointer to be dangling on dereference. Given that rustc itself does not currently have any issues (in practice) with this, and the only issues arise when running under Miri, the intent is to allow for updating the address Miri considers an allocation to live at.
I asked on Zulip for advice regarding adding TBI accessors and MTE intrinsics, and was referred to discussions which - to my reading - concluded what you describe.
Yes, that would be the intent, I'm just not sure how best to handle this as
Could you elaborate on what you mean by this, sorry? The code that this would enable to run under Miri currently already runs under Rust, and MTE has, at least, been generally supported by LLVM for some time now; in the case of TBI, the only issues I'm aware of are that some operations (such as But, in short, I would be happy to replace the current approach which uses a single, mutable, alias with actually changing the address that the allocation is considered to live at, as you describe - but advice on where such a change needs to take place would be appreciated, as it's not clear to me. |
I assume this is UB in rustc as well, you just happen to not hit any optimizations that would cause problems. What is the smallest example demonstrating the unexpected UB?
I mean that the code has UB.
A lot of code has UB but still "works", until it doesn't.
It's not clear to me either as I know absolutely nothing about how pointer tags are set up in the (non-Miri) code.^^ |
So, in general, anything that looks like the following is UB: // let pointer be some pointer to an allocation
let addr = pointer.addr() & 0x00ffffffffffffff;
let p = addr | ((0x70 as usize) << 56);
let pointer = pointer.with_addr(p);
// dereferencing `pointer` will now always trip Miri. Of course, generally, we're happy to accept that this should be considered UB, but we were therefore hoping that by providing a Rust-controlled mechanism to do this instead (rust-lang/stdarch#1622), the UB could be mitigated as Rust/Miri can track the address change. That's the intent of this work, as Miri raises an error when dereferencing a pointer from Put another way, the |
Can you provide a small program with a |
LLVM tracks allocations, so this definitely needs changes in rustc and maybe LLVM to make them aware of the |
To give one example, if you do something like this fn tag_ptr(ptr: *const i32, tag: u8) -> *const i32 {
ptr.map_addr(|a| {
(a & 0x00ffffffffffffff) | ((tag as usize) << 56)
})
}
let my_local_var = 42;
tag_ptr(&my_local_var, 0x70).read();
tag_ptr(&my_local_var, 0x80).read(); then LLVM might be able to see that this pointer points to 4 bytes of memory and it has been offset in two different ways that are way more than 4 bytes apart, so one of these reads must be out-of-bounds, and therefore this program has UB. I don't know if LLVM has any optimizations today that may actually detect this (@nikic and @comex might know), but it would totally be in its right to start doing analyses like that any time, so we can't just say "yeah let's accept such code and hope nothing goes wrong". |
Here's an example where LLVM seems more likely to notice and cause trouble: fn tag_ptr(ptr: *mut i32, tag: u8) -> *mut i32 {
ptr.map_addr(|a| {
(a & 0x00ffffffffffffff) | ((tag as usize) << 56)
})
}
let mut my_local_var = 0;
tag_ptr(&mut my_local_var, 0x70).write(1);
tag_ptr(&mut my_local_var, 0x80).write(2);
assert_eq!(
tag_ptr(&mut my_local_var, 0x70).read(),
2
); From LLVMs perspective it should not be too hard to notice that the two writes cannot alias, and therefore the final read must return what was written by the first write. |
☔ The latest upstream changes (presumably #3854) made this pull request unmergeable. Please resolve the merge conflicts. |
Where a method modifies the address that should be considered canonical - such as via TBI or when an MTE tag has been set - Miri will need to be notified. This adds a hook to inform Miri of the new address.
Hi, in line with the changes to the TBIBox helper and discussions on Zulip here's a smaller version of the address hook, mainly intended for use with the aforementioned TBIBox. |
Thanks! I'll wait for the discussion about the library surface and "real rustc" implementation for this to settle before taking a closer look at this PR. (And sadly I don't have the capacity to be involved in the design there, sorry.) TbiBox should probably be using some new Miri primitives to properly model this all. |
Which way around should this be done then? Since the TBIBox will require this hook before it gets merged to avoid breaking miri I was thinking we should focus on merging this first. Should we then finalise stdarch#1622 first, then finalise & merge this and then add the hooks to TBIBox and merge that? |
Which existing code breaks if that stdarch PR lands? It's new code, it's okay if Miri does not support it from the start. First a design needs to be proposed for how TbiBox's API looks and how it is implemented. Once t-libs-api and t-libs are happy with that, we can look at how it should be supported in Miri. (t-opsem should likely also be involved there, due to the nature of this type.) |
That approach makes sense to me as well, fair enough, thanks! |
Where a method modifies the address that should be considered canonical - such as via TBI or when an MTE tag has been set - Miri will need to be notified.
This adds a hook to inform Miri of the new address.
The implementation used here permits a single (mutable) alias, with the intent that user code will only use the alias (once set) rather than the original
base_addr
. I'm open to feedback on this approach, but ran into issues when trying to update the actualbase_addr
when popping the function off the stack (and thus deallocating anything created in the function), hence this implementation which allows that to work correctly.In particular, this would allow rust-lang/stdarch#1622 to work under Miri, as it could use this hook to ensure that Miri is aware that the change to the pointer address is actually fine in context.