- Feature Name:
references
- Start Date: 2023-08-17
- RFC PR: FuelLabs/sway-rfcs#28
- Sway Issue: FueLabs/sway#5063
This RFC aims to make the paradigm for handling reference types clear, fully typed and to eliminate redundant concepts.
There currently exists a kludge of features on top of pointers that have various levels of type correctness, offer various paradigms and have less than coherent syntax.
We want to clarify the rules for passing dynamic data to and from functions, how
they are expressed in the type system and through syntax, and eliminate untyped
values such as raw_ptr
and raw_slice
.
Unlike Rust, Sway does not manage lifetimes. Due to the nature of smart contract
execution, dynamic memory usage only grows and allocated memory is allocated for
the entire lifetime of the execution. In Rust terms, every dynamic value has a
lifetime of 'static
.
This allows us to do away with some complexity and handle values more like they would be in Java or ML. In most cases, you shouldn't need to use explicit pointer types.
In Sway there are two types of data types. Built-in types and compound types.
Built-in types include u8
, u16
, u32
, u64
, and pointers such as *mut u8
.
Compound types are arrays, structs, tuples, and enums.
Both built-in and compound types live for the lifetime of the function they are declared in. They are passed and returned by value, which means their value is directly copied.
Internally, access to compound types (also called aggregates within the compiler) is realized via pointer to a memory region. That's why those types are internally called reference types. Note that from the Sway programmer's perspective, they still have value semantics.
There are two kinds of such internal references. Slim and fat.
Slim references are implemented using a single address. Slim references
include structs, enums, and arrays, such as String
, Option
, [u8; 42]
.
Fat references are implemented using an address and a small amount of additional
data, either a length or a pointer to a more complex table. Fat references
include slices, such as [u8]
, str
.
References mentioned above are an internal compiler construct, but from the programmer's perspective still regular value types, copied in assignments, and passed and returned by value.
From the programmer's perspective, it is sometimes desirable to have a reference to a value type, such as for passing a mutable reference to value to a function or declaring a recursive data type.
This can be achieved with the new language construct, references.
References to values can be obtained by the reference operator &. References have their own mutability and can point to mutable or immutable values. Thus, we support mutable references to mutable values and any combination of the two.
The below example shows how a reference can be declared.
let mut m_i = 0u64;
let r_m_i = &mut m_i; // Immutable reference to a mutable `m_i`.
let mut m_r_m_i = &mut m_i; // Mutable reference to a mutable `m_i`.
let r_m_i = &m_i; // Immutable reference to an immutable (via reference) `m_i`.
let mut r_m_i = &m_i; // Mutable reference to an immutable (via reference) `m_i`.
let i = 0u64;
let r_m_i = &mut i; // ERROR: `i` is not mutable.
The dereference operator * is used to access the referenced value. In addition, if the reference refers to a compound type, the operators . and [] automatically dereference.
let mut r_i = &i; // Mutable reference to immutable a `u64`: r_i: &u64.
let mut r_m_i = &mut m_i; // Mutable reference to a mutable `u64`: r_i: &mut u64.
*r_i = 1; // ERROR: Referenced value is not mutable.
r_1 = &x; // OK: `r_1` is mutable.
*r_m_i = 1; // OK: Changes `m_i`.
r_m_i = &x; // OK: `r_m_i` is mutable.
// Accessing built-in types and enums over reference via dereferencing operator (*).
let a = 2 * *r_i;
let mut s = Struct { x: 0u64 };
let r_s = &mut s; // `r_s` is immutable reference to a mutable struct.
r_s.x = 1; // Same as `(*r_s).x = 1`.
Here, we are listing the major properties of references:
- References have their own mutability and can point to mutable or immutable values.
- & operator defines the reference. * operator is dereferencing.
- . and [] operators also dereference if the reference is a reference to a struct/tuple or array, respectively.
- References can be parts of aggregates.
- References can reference other references.
- References can be used in pattern matching and deconstructing.
- References can reference parts of aggregates. E.g., having a reference to an array element.
- References can be passed to and returned from functions.
- References will play well with iterators and the
for
loop, once implemented. - References can be used together with generics.
- References can be taken from arbitrary expressions, including constants and literals.
- References can be used in type aliases.
self
keyword will become a reference, and us such comply to the reference passing syntax.- References cannot be used in storage.
- References cannot be used in ABIs and
main
functions. - Equality of references is the equality of the values they refer to if the underlying type implements
std::ops::Eq
. __addr_of
called on a reference returns the address the reference points to.
For detailed examples of syntax and semantics see the accompanied file 0010-references.sw.
There are cases where one may want to directly manipulate memory addresses with pointer arithmetic.
To allow for this we have a pointer type that represents a single address: *mut T
where T
is the type being pointed to.
Pointers can be obtained by using the __addr_of
intrinsic on a reference and dereferenced using the __deref
intrinsic, like so:
let val: u64 = 1;
let ptr: *mut u64 = __addr_of(&val);
let ptr_val: u64 = __deref(ptr);
assert_eq(val, ptr_val);
let ref: &u64 = &1;
let ptr: *mut &u64 = __addr_of(&ref);
let ptr_val: &u64 = __deref(ptr);
assert_eq(ref, ptr_val); // Equality of references is the equality of referenced values.
let ref: &u64 = &1;
let ptr: *mut u64 = __addr_of(ref); // Returns the address od the referenced value.
let ptr_val: u64 = __deref(ptr);
assert_eq(ref, &ptr_val); // Equality of references is the equality of referenced values.
A pointer that does not point to memory currently allocated by the program is considered to be an invalid pointer. Dereferencing an invalid pointer is an undefined behavior that depends on the underlying VM implementation.
E.g., FuelVM has the concept of stack and heap, and instructions that access memory must pass the ownership check. Thus, dereferencing and accessing an invalid pointer in case of the FuelVM will result in VM panic.
Other VMs could have different behavior. Thus, from the language perspective, dereferencing an invalid pointer is an undefined behavior.
Slices represent contiguous areas of dynamic memory. They can be used to represent dynamically sized data. Slices are represented by a pair containing a pointer to the data and a length.
String slices, of type str
represent a series of bytes, encoding a valid UTF-8 string.
This is the type returned by string literals.
let _: str = "Lorem Ipsum";
Typed slices, of type [T]
, represent contiguous series of elements of a type T
.
let _: [u64] = [1, 2, 3].as_slice();
Slices can be obtained from arrays and other slices by using the __slice
intrinsic. This will produce a smaller slice of the argument type at the
specified indices. This slicing is bounds-checked and will produce a revert if
slicing out of bounds.
let array: [u64; 4] = [1, 2, 3, 4];
// will produce 1, 2, 3, 4
let slice: [u64] = __slice(array, 0, 4);
// will produce 2, 3
let slice: [u64] = __slice(array, 1, 3);
// will revert
// let slice: [u64] = __slice(array, 0, 5);
Elements of slices can be obtained using the __slice_elem
intrinsic.
This will return the element at the given index. Invalid indices will produce a revert.
let slice: [u64] = [1, 2, 3, 4].as_slice();
let elem = __slice_elem(slice, 2);
assert_eq(elem, 3);
// will revert
// let elem = __slice_elem(slice, 4);
When values are passed as arguments to a function they are either mutable or
immutable which is denoted by the mut
prefix before the argument name.
Mutable value type arguments can be reassigned.
Mutable reference arguments can both be reassigned and have their fields assigned to, depending on the declaration of the reference.
fn foo(
ref: &u32, // Immutable reference to immutable value.
mut m_ref: &u32, // Mutable reference to immutable value.
mut m_ref_m: &mut u32, // Mutable reference to a mutable value.
val: u32 // Immutable value passed by-value.
mut m_val: u32, // Mutable value passed by-value.
) {
*ref = 0; // ERROR: The referenced value is immutable.
ref = &0; // ERROR: The reference `ref` is immutable.
*m_ref = 0; // ERROR: The referenced value is immutable.
m_ref = &0; // OK: The reference `m_ref` is mutable.
*m_ref_m = 0; // OK: The referenced value is mutable.
m_ref_m = &0; // OK: The reference `m_ref_m` is mutable.
val = 0; // ERROR: `val` is immutable.
m_val = 0; // OK: `m_val` is mutable.
}
For more detailed examples see the section Passing and returning references from functions in the accompanied file 0010-references.sw.
When returning values from functions, copies are returned, as per the by-value semantics of all Sway types.
When returning references, the mutability of the referenced value can be explicitly specified.
fn foo() -> &u32 { // Returns a reference to an immutable `u32`.
&0
}
fn bar() -> &mut u32 { // Returns a reference to a mutable `u32`.
&0
}
pub fn main() {
let r_foo = foo();
*r_foo = 1; // ERROR: `r_foo` is a reference to immutable value.
let r_bar = bar();
*r_bar = 1; // OK: `r_bar` is a reference to mutable value.
let mut m_r_foo = foo();
m_r_foo = &1; // OK: `m_r_foo` is a mutable reference to immutable value.
let mut r_bar = bar();
m_r_bar = &1; // OK: `m_r_bar` is a mutable reference to mutable value.
}
For more detailed examples see the section Passing and returning references from functions in the accompanied file 0010-references.sw.
Methods definitions use self
to refer to the value the method is being called
on, this follows the same rules defined for functions, except of course self
can't be reassigned.
struct S {
a: u32,
}
impl S {
pub fn foo(&self) {
// illegal since self is immutable
// self.a = 0;
}
pub fn foo(&mut self) {
// legal since self is mutable
self.a = 0;
}
}
For more detailed examples see the section self
keyword in the accompanied file 0010-references.sw.
Existing codebases can adapt to this change by changing every occurrence of ref mut
to &mut
and every use of raw_slice
and raw_ptr
to uses of references, typed
pointer and slice types.
In general, switching untyped pointers to a properly typed reference, even a generic
&T
, is preferred. Pointers for whom the pointee isn't a value that can be
named may need to use &()
or *mut ()
, the former is preferred.
Implementing the new slice types should require:
- parser changes to introduce the new syntax
- adding the new types to the type system (this is already partially done)
- implementing the
__slice
and__slice_elem
intrinsics
The slicing intrinsics should be thoroughly tested (specifically on the bounds checking).
There are considerations to be given to the element sizes. We'll want to finish
current changes to the memory representation of small values so that [u8]
represents a continuous series of bytes without padding or provide this capability in other ways.
One thing to keep in mind is that slices' dynamic size makes them difficult to deal with for the current ABI and they can't be returned as element of other dynamic types. This is to be addressed by a later RFC.
String literals are meant to return str
and string array types are meant to
be totally replaced by str
, however due to this ABI issue we'll need to keep
them around until further changes so string data can still be passed through
the ABI. We may want to consider temporarily introducing an intrinsic to convert
a string slice to a string array of a given size (with or without some bounds
checking).
raw_slice
backs a lot of our dynamic types but we should be able to replace
most of its uses with a properly represented [u8]
.
Implementing the new reference construct should require:
- parser changes to introduce the new syntax for referencing (
&
) and dereferencing (*
). - adding the new types to the type system,
&T
and&mut T
. - implementing heap allocation for values.
- implementing escape analysis to avoid heap allocation.
Escape analysis will be improved over time. We want to stepwise come to the solution that
heap-allocates only values whose actual lifecycle is static
and that cannot be promoted
to live on a stack of any of the functions in the call stack.
*mut
should provide a fully typed alternative to raw_ptr
.
__addr_of
will have to be altered to return a *mut
and we'll need to
introduce __deref
.
We will not check pointer mutability at this time, hence the mut
in *mut
,
however we'll probably want to introduce *const
once typed pointers are
stabilized.
Using this instead of raw_ptr
will require heavy edits of std
and core
.
We may need to support pointer casts without asm block hacks. Introducing a
__ptr_cast
intrinsic is on the table if this becomes a requirement.
Changing the base assumptions about what a type represents is a major change and programmers coming from Rust may be surprised at the differences. However the differences are easy enough to explain and being able to shed the complexity of borrow semantics is worth the cost. We may want to make the particulars of how references work in Sway explicit in documentation aimed at programmers coming from Rust.
Abstracting away memory management too much might be a drawback for Sway and its positioning as a smart-contract language. However maintaining access to pointer arithmetic should still allow for atypical usage patterns without compromising the use of safe and zero cost abstractions for most cases.
To make the use of references more explicit and maintain syntax legibility for
people coming from Rust we have chosen to use &
everywhere a reference
is used. However the reason for Rust's use of this syntax rather than ML's
terseness is to support a borrow checker and guarantees that we do not make.
This could be misleading at first for programers coming from Rust. We will properly
document the difference between Sway references and Rust borrowing.
We believe that the difference, properly presented, will be straightforward and
easy to understand. Apart from that, the chosen syntax should feel natural for
programers coming from Rust.
Since we're leaning more towards ML, we may also have chosen to use ref
to denote reference types. However we already majorly borrow
from Rust for our library abstractions. Moreover, the ref
syntax felt much verbose.
A lot of the design choices of this RFC have to do with striking the balance between Rust and ML's levels of abstraction. Sway is not meant to be a general purpose systems programming language, but it is not a functional memory managed language either. We want to maintain access to memory primitives whilst making most Sway code terse, legible and easily understandable.
As prior art we considered Rust, ML and other managed GC languages such as Java or C# that have similar simplified memory models. For the parts of the references semantics we borrow from C++.
The exact memory representation of values, by themselves or as part of a slice is not fully addressed here. The ability to represent a series of bytes is a hard requirement but how we do this and what specific types look like in memory will have to be decided by the implementation.
How dynamic types interact with the ABI will have to be resolved in a later RFC.
We should consider introducing slicing and indexing operator such as [0..n]
and [0]
. We needn't introduce the notion of a range yet to make those work.
As discussed, a natural extension of *mut
is *const
.