Skip to content
Peter Kehl edited this page Jan 18, 2023 · 28 revisions

Summary & Disambiguation

Highly controversial, and possibly not much beneficial, attempt to improve CPU and cache efficiency for heap (de)allocations/resizing in Rust. This is not related/specific to Rust co-routines (async/await), neither to any cooperative multitasking/scheduling.

Proof of concept is close to the benchmarking phase. Help, (with building cargo as a part of rust build; and with internal compiler errors) as per below (or with any insights), please.

Goal

Leverage Rust memory & data safety guarantees to streamline de-allocation (and resizing of data on heap).

The problem/waste being solved

When de-allocating, all that Rust passes through the (standard) allocator API is a pointer (to the data being de-allocated). The allocator has to look it up for its housekeeping in its internal data structures (often hash-based).

How (in short)

Rust could keep some extra metadata along every smart pointer. (This metadata would come from the allocator's new functions co_allocate, co_allocate_zeroed, co_grow, co_grow_zeroed, co_shrink.) Then Rust could pass that metadata back (to the allocator) when de-allocating (or resizing). That could save the allocator some CPU cycles, but more importantly, cache/memory access.

The size of the metadata (per smart pointer) must be known in compile time. Hence, this could be supported only for the global allocator, which must define the metadata and make it (somehow) known to the compiler. That may need some new compiler configuration/directives and/or rustc target configuration. However, this doesn't affect Rust language itself (other than fixing some rustc internal compiler errors). (Would you like this to be clearer, or can you see any way? Get in touch & help shape this, or at least shine the light, please.)

Controversy: Runtime pros & cons

Of course, this increases the size of affected smart pointers (wherever used/stored/passed). So, the initial candidates are primarily non-leaf smart pointers:

  • Less likely to serve as leaves (of the object tree), and more as containers (for example Vec). Such (non-leaf) smart pointers.
  • Less likely to be passed around (up & down the stack, or stored/moved around on the heap) in total than non-leaf objects (as data flow accumulated over time). (Well designed data structures have more leaves than non-leaves for most inputs.)
  • Small in total size (as per the previous). Then any small metadata (a small multiple of usize, adjacent to those smart pointers) may not increase RAM requirements too much.
  • So, (as per the previous two) this metadata may not excessively affect stack depth, neither the data flow & CPU cycles (of many applications).
  • Due to the CPU cache pagination, handling the metadata may increase execution time even less so than it increases the number of CPU cycles (since the CPU waits for the cache anyway). Especially so when these smart pointers are stored on stack (which grows/shrinks sequentially). Therefore the runtime speed cost of handling the metadata may not be as high as it may sound.
  • Resizable (which can make use of the metadata): Vec (and hopefully HashMap/HashSet, BTreeMap/BTreeSet).
  • (De)allocated or resized frequently. Hence String, or any frequent Vec of static data, too (even though it's a leaf in terms of objects.)
  • Of course, leaf smart pointers are more frequently (de)allocated than non-leaf ones (in well designed data structures), but the overall data flow may prohibit them.

Hence, (in general) this would be much less beneficial (or even detrimental) for Box, Rc, Arc. Or, if anyone volunteers to implement this for those smart pointers, too, we would see when we are benchmarking this.

Controversy & How: Choice, Complexity, Safety, Compatible

Choice

HIGHLY configurable. To be explained here.

Rust internal complexity

Adding the new functions to the Allocator API (listed above; with defaults/fallbacks) is easy. Implementing them is up to the allocator, and out of scope here. (Do you know of any allocators written primarily in Rust - rather than in C - with developers open to adding new functionality? Please connect us.)

However, most of the work in Rust (alloc/std and core) is about storing and handling the metadata, the developer's choices and defaults.

This affects a lot of Vec, VecDeque (and RawVec) in library/alloc(which is aliased tolibrary/std). Implementing this for String affectsCowandToOwned(inlibrary/core`), too.

Questions

  • about names of new methods (Vec::new_co)
  • when adding the preference flag to structs/enums/traits that currently only work with the global allocator, should we add the allocator generic parameter (and related methods that accept the allocator instance, similar to the existing Vec::new_in), too?

Safety

HIGHLY safe. To be explained here.

Compatible

HIGHLY compatible. To be explained here.

Scope

  • Primarily, and initially, Vec, VecDeque and RawVec (MVP complete, not tested, not benchmarked).
  • Possibly String and related (Cow, ToOwned). (MVP mostly complete, not tested, not benchmarked. Please help.)
  • Hopefully HashMap/HashSet, BTreeMap/BTreeSet)
  • Possible, but with unlikely benefit (as per above), and more involved in the compiler & the language internals: Box, Rc, Arc.
Clone this wiki locally