The purpose of this document is to describe the interface between a MiniRust program and memory.
The interface shown below already makes several key decisions. It is not intended to be able to support any imaginable memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult defect report responses, to see this). Another key property of the interface presented below is that it is untyped. This implies that in MiniRust, operations are typed, but memory is not - a key difference to C and C++ with their type-based strict aliasing rules.
One key question a memory model has to answer is what is a pointer. It might seem like the answer is just "an integer of appropriate size", but that is not the case (as more and more discussion shows). This becomes even more prominent with aliasing models such as Stacked Borrows. The memory model hence takes the stance that a pointer consists of the address (which truly is just an integer of appropriate size) and a provenance. What exactly provenance is is up to the memory model. As far as the interface is concerned, this is some opaque extra data that we carry around with our pointers and that places restrictions on which pointers may be used to do what when.
The unit of communication between the memory model and the rest of the program is a byte.
To distinguish our MiniRust bytes from u8
, we will call them "abstract bytes".
Abstract bytes differ from u8
to support representing uninitialized Memory and to support maintaining pointer provenance when pointers are stored in memory.
We define the AbstractByte
type as follows, where Provenance
will later be instantiated with the Memory::Provenance
associated type.
pub enum AbstractByte<Provenance> {
/// An uninitialized byte.
Uninit,
/// An initialized byte, optionally with some provenance (if it is encoding a pointer).
Init(u8, Option<Provenance>),
}
impl<Provenance> AbstractByte<Provenance> {
pub fn data(self) -> Option<u8> {
match self {
AbstractByte::Uninit => None,
AbstractByte::Init(data, _) => Some(data),
}
}
pub fn provenance(self) -> Option<Provenance> {
match self {
AbstractByte::Uninit => None,
AbstractByte::Init(_, provenance) => provenance,
}
}
}
The MiniRust memory interface is described by the following (not-yet-complete) trait definition:
/// An "address" is a location in memory. This corresponds to the actual
/// location in the real program.
/// We make it a mathematical integer, but of course it is bounded by the size
/// of the address space.
pub type Address = Int;
/// A "pointer" is an address together with its Provenance.
/// Provenance can be absent; those pointers are
/// invalid for all non-zero-sized accesses.
pub struct Pointer<Provenance> {
pub addr: Address,
pub provenance: Option<Provenance>,
}
/// *Note*: All memory operations can be non-deterministic, which means that
/// executing the same operation on the same memory can have different results.
/// We also let read operations potentially mutate memory (they actually can
/// change the current state in concurrent memory models and in Stacked Borrows).
pub trait Memory {
/// The type of pointer provenance.
type Provenance;
/// The size and align of a pointer.
const PTR_SIZE: Size;
const PTR_ALIGN: Align;
/// The endianess used for encoding multi-byte integer values (and pointers).
const ENDIANNESS: Endianness;
/// Maximum size of an atomic operation.
const MAX_ATOMIC_SIZE: Size;
fn new() -> Self;
/// Create a new allocation.
/// The initial contents of the allocation are `AbstractByte::Uninit`.
fn allocate(&mut self, size: Size, align: Align) -> NdResult<Pointer<Self::Provenance>>;
/// Remove an allocation.
fn deallocate(&mut self, ptr: Pointer<Self::Provenance>, size: Size, align: Align) -> Result;
/// Write some bytes to memory.
fn store(&mut self, ptr: Pointer<Self::Provenance>, bytes: List<AbstractByte<Self::Provenance>>, align: Align) -> Result;
/// Read some bytes from memory.
fn load(&mut self, ptr: Pointer<Self::Provenance>, len: Size, align: Align) -> Result<List<AbstractByte<Self::Provenance>>>;
/// Test whether the given pointer is dereferenceable for the given size and alignment.
/// Raises UB if that is not the case.
/// Note that a successful read/write/deallocate implies that the pointer
/// was dereferenceable before that operation (but not vice versa).
fn dereferenceable(&self, ptr: Pointer<Self::Provenance>, size: Size, align: Align) -> Result;
/// Retag the given pointer, which has the given type.
/// `fn_entry` indicates whether this is one of the special retags that happen
/// right at the top of each function.
/// FIXME: Referencing `PtrType` here feels like a layering violation, but OTOH
/// also seems better than just outright duplicating that type.
///
/// Return the retagged pointer.
fn retag_ptr(&mut self, ptr: Pointer<Self::Provenance>, ptr_type: PtrType, fn_entry: bool) -> Result<Pointer<Self::Provenance>>;
/// Checks that `size` is not too large for the Memory.
fn valid_size(size: Size) -> bool;
}
This is a very basic memory interface that is incomplete in at least the following ways:
- To represent concurrency, many operations need to take a "thread ID" and
load
andstore
need to take an [Option<Ordering>
] (withNone
indicating non-atomic accesses). - Maybe we want operations that can compare pointers without casting them to integers. Or else we decide only the address can matter for comparison.
impl<Provenance> Pointer<Provenance> {
/// Calculates the offset from a pointer in bytes using wrapping arithmetic.
/// This does not check whether the pointer is still in-bounds of its allocation.
pub fn wrapping_offset<M: Memory<Provenance=Provenance>>(self, offset: Int) -> Self {
let offset = offset.modulo(Signed, M::PTR_SIZE);
let addr = self.addr + offset;
let addr = addr.modulo(Unsigned, M::PTR_SIZE);
Pointer { addr, ..self }
}
}