Skip to content

Commit

Permalink
pulley: Reimplement wasm loads/stores & memory opcodes
Browse files Browse the repository at this point in the history
This commit is a large refactoring to reimplement how WebAssembly
loads/stores are translated to Pulley opcodes when using the
interpreter. Additionally the functionality related to memory support
has changed quite a bit with the interpreter as well. This is all based
off comments on bytecodealliance#10102 with the end goal of folding the two Pulley
opcodes today of "do the bounds check" and "do the load" into one
opcode. This is intended to reduce the number of opcodes and overall
improve interpreter throughput by minimizing turns of the interpreter
loop.

The basic idea behind this PR is that a new basic suite of loads/stores
are added to Pulley which trap if the address is zero. This provides a
route to translate trapping loads/stores in CLIF to Pulley bytecode
without actually causing segfaults at runtime. WebAssembly translation
to CLIF is then updated to use the `select` trick for wasm loads/stores
where either 0 is loaded from or the actual address is loaded from.
Basic support for translation and such is added for this everywhere, and
this ensures that all loads/stores for wasm will be translated
successfully with Pulley.

The next step was to extend the "g32" addressing mode preexisting in
Pulley to support a bounds check as well. New pattern-matches were added
to ISLE to search for a bounds check in the address of a trapping
load/store. If found then the entire chain of operations necessary to
compute the address are folded into a single "g32" opcode which ends up
being a fallible load/store at runtime.

To fit all this into Pulley this commit contains a number of
refactorings to shuffle around existing opcodes related to memory and
extend various pieces of functionality here and there:

* Pulley now uses a `AddrFoo` types to represent addressing modes as a
  single immediate rather than splitting it up into pieces for each
  method. For example `AddrO32` represents "base + offset32". `AddrZ`
  represents the same thing but traps if the address is zero. The
  `AddrG32` mode represents a bounds-checked 32-bit linear memory access
  on behalf of wasm.

* Pulley loads/stores were reduced to always using an `AddrFoo`
  immediate. This means that the old `offset8` addressing mode was
  removed without replacement here (to be added in the future if
  necessary). Additionally the suite of sign-extension modes supported
  were trimmed down to remove 8-to-64, 16-to-64, and 32-to-64 extensions
  folded as part of the opcode. These can of course always be re-added
  later but probably want to be added just for the `G32` addressing mode
  as opposed to all addressing modes.

* The interpreter itself was refactored to have an `AddressingMode`
  trait to ensure that all memory accesses, regardless of addressing
  modes, are largely just copy/pastes of each other. In the future it
  might make sense to implement these methods with a macro, but for now
  it's copy/paste.

* In ISLE the `XLoad` generic instruction removed its `ext` field to
  have extensions handled exclusively in ISLE instead of partly in
  `emit.rs`.

* Float/vector loads/stores now have "g32" addressing (in addition to
  the "z" that's required for wasm) since it was easy to add them.

* Translation of 1-byte accesses on Pulley from WebAssembly to CLIF no
  longer has a special case for using `a >= b` instead of `a > b - 1` to
  ensure that the same bounds-check instruction can be used for all
  sizes of loads/stores.

* The bounds-check which folded a load-of-the-bound into the opcode is
  now present as a "g32bne" addressing mode. with its of suite of
  instructions to boo.

Overall this PR is not a 1:1 replacement of all previous opcodes with
exactly one opcode. For example loading 8 bits sign-extended to 64-bits
is now two opcodes instead of one. Additionally some previous opcodes
have expanded in size where for example the 8-bit offset mode was remove
in favor of only having 32-bit offsets. The goal of this PR is to reboot
how memory is handled in Pulley. All loads/stores now use a specific
addressing mode and currently all operations supported across addressing
modes are consistently supported. In the future it's expected that some
features will be added to some addressing modes and not others as
necessary, for example extending the "g32" addressing mode only instead
of all addressing modes.

For an evaluation of this PR:

* Code size: `spidermonkey.cwasm` file is reduced from 19M to 16M.
* Sightglass: `pulldown-cmark` is improved by 15%
* Sightglass: `bz2` is improved by 20%
* Sightglass: `spidermonkey` is improved by 22%
* Coremark: score improved by 40%

Overall this PR and new design looks to be a large win. This is all
driven by the reduction in opcodes both for compiled code size and
execution speed by minimizing turns of the interpreter loop. In the end
I'm also pretty happy with how this turned out and I think the
refactorings are well worth it.
  • Loading branch information
alexcrichton committed Jan 29, 2025
1 parent facc992 commit 4cba949
Show file tree
Hide file tree
Showing 36 changed files with 2,195 additions and 1,650 deletions.
10 changes: 10 additions & 0 deletions cranelift/codegen/meta/src/pulley.rs
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ pub fn generate_rust(filename: &str, out_dir: &Path) -> Result<(), Error> {
let mut pat = String::new();
let mut uses = Vec::new();
let mut defs = Vec::new();
let mut addrs = Vec::new();
for op in inst.operands() {
match op {
// `{Push,Pop}Frame{Save,Restore}` doesn't participate in
Expand All @@ -200,6 +201,10 @@ pub fn generate_rust(filename: &str, out_dir: &Path) -> Result<(), Error> {
uses.push(name);
pat.push_str(name);
pat.push_str(",");
} else if ty.starts_with("Addr") {
addrs.push(name);
pat.push_str(name);
pat.push_str(",");
}
}
Operand::Writable { name, ty } => {
Expand Down Expand Up @@ -230,12 +235,17 @@ pub fn generate_rust(filename: &str, out_dir: &Path) -> Result<(), Error> {
.iter()
.map(|u| format!("collector.reg_def({u});\n"))
.collect::<String>();
let addrs = addrs
.iter()
.map(|u| format!("{u}.collect_operands(collector);\n"))
.collect::<String>();

rust.push_str(&format!(
"
RawInst::{name} {{ {pat} .. }} => {{
{uses}
{defs}
{addrs}
}}
"
));
Expand Down
205 changes: 195 additions & 10 deletions cranelift/codegen/src/isa/pulley_shared/inst.isle
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@
;;
;; How much is written to the register is defined by `ExtKind`. The `flags`
;; control behavior such as endianness.
(XLoad (dst WritableXReg) (mem Amode) (ty Type) (flags MemFlags) (ext ExtKind))
(XLoad (dst WritableXReg) (mem Amode) (ty Type) (flags MemFlags))
(FLoad (dst WritableFReg) (mem Amode) (ty Type) (flags MemFlags))
(VLoad (dst WritableVReg) (mem Amode) (ty Type) (flags MemFlags) (ext VExtKind))
(VLoad (dst WritableVReg) (mem Amode) (ty Type) (flags MemFlags))

;; Stores.
(XStore (mem Amode) (src XReg) (ty Type) (flags MemFlags))
Expand Down Expand Up @@ -158,6 +158,22 @@

;;;; Address Modes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(type ExtKind (enum None Sign32 Sign64 Zero32 Zero64))
(type VExtKind (enum None S8x8 U8x8 S16x4 U16x4 S32x2 U32x2))

;; Helper to convert a `(Value Offset32)` to `(Value i32)` while peeling off
;; constant addition within the first `Value` into the static offset, if
;; possible.
;;
;; Note that ideally this wouldn't be necessary and we could rely on the egraph
;; pass to do this but that's not implemented at this time.
(type ValueOffset (enum (Both (value Value) (offset i32))))
(decl pure amode_base (Value Offset32) ValueOffset)
(rule (amode_base addr (offset32 offset)) (ValueOffset.Both addr offset))
(rule 1 (amode_base (iadd addr (i32_from_iconst b)) (offset32 offset))
(if-let new_offset (s32_add_fallible b offset))
(ValueOffset.Both addr new_offset))

(type StackAMode extern (enum))

(type Amode
Expand All @@ -168,9 +184,53 @@
)
)

(type ExtKind (enum None Sign32 Sign64 Zero32 Zero64))
(decl amode (Value Offset32) Amode)
(rule (amode addr offset)
(if-let (ValueOffset.Both a o) (amode_base addr offset))
(Amode.RegOffset a o))

(type VExtKind (enum None S8x8 U8x8 S16x4 U16x4 S32x2 U32x2))
;;; ISLE representation of the `AddrO32` ("*_o32") addressing mode in Pulley.
(type AddrO32
(enum
(Base
(addr XReg)
(offset i32))))

;; Constructor for the `AddrO32` type used in `*_o32` loads/stores
(decl addro32 (Value Offset32) AddrO32)
(rule (addro32 addr offset)
(if-let (ValueOffset.Both reg off32) (amode_base addr offset))
(AddrO32.Base reg off32))

;;; ISLE representation of the `AddrZ` ("*_z") addressing mode in Pulley.
(type AddrZ
(enum
(Base
(addr XReg)
(offset i32))))

;; Constructor for the `AddrZ` type used in `*_z` loads/stores
(decl addrz (XReg i32) AddrZ)
(rule (addrz base offset) (AddrZ.Base base offset))

;;; ISLE representation of the `AddrG32` ("*_g32") addressing mode in Pulley.
(type AddrG32
(enum
(RegisterBound
(host_heap_base XReg)
(host_heap_bound XReg)
(wasm_addr XReg)
(offset u16))))

;;; ISLE representation of the `AddrG32Bne` ("*_g32bne") addressing mode in Pulley.
(type AddrG32Bne
(enum
(BoundNe
(host_heap_base XReg)
(host_heap_bound_addr XReg)
(host_heap_bound_offset u8)
(wasm_addr XReg)
(offset u8))))

;; Helper to determine the endianness of `MemFlags` taking the current target
;; into account.
Expand Down Expand Up @@ -198,6 +258,7 @@
(rule (sinkable_load value @ (value_type ty))
(if-let inst @ (load flags addr (offset32 offset)) (is_sinkable_inst value))
(if-let true (is_native_endianness (endianness flags)))
(if-let true (memflags_nontrapping flags))
(if-let offset8 (u8_try_from_i32 offset))
(SinkableLoad.Load inst ty addr offset8))

Expand All @@ -220,6 +281,130 @@
(decl pure pointer_width () PointerWidth)
(extern constructor pointer_width pointer_width)

(decl pure memflags_nontrapping (MemFlags) bool)
(extern constructor memflags_nontrapping memflags_nontrapping)

(decl pure memflags_is_wasm (MemFlags) bool)
(extern constructor memflags_is_wasm memflags_is_wasm)

;; Helper type to represent a "pending" `AddrG32` value.
(type G32 (enum (All (heap_base Value) (heap_bound Value) (wasm_addr Value) (offset u16))))

;; Auto-conversion from `G32` to `AddrG32`.
(decl gen_addrg32 (G32) AddrG32)
(rule (gen_addrg32 (G32.All base bound wasm offset))
(AddrG32.RegisterBound base bound wasm offset))
(convert G32 AddrG32 gen_addrg32)

;; Helper type to represent a "pending" `AddrG32Bne` value.
(type G32Bne (enum (All (heap_base Value) (heap_bound SinkableLoad) (wasm_addr Value) (offset u8))))

;; Auto-conversion from `G32Bne` to `AddrG32Bne`.
(decl gen_addrg32bne (G32Bne) AddrG32Bne)
(rule (gen_addrg32bne (G32Bne.All base heap_bound_load wasm offset))
(gen_addrg32bne_for_real base heap_bound_load wasm offset))
(convert G32Bne AddrG32Bne gen_addrg32bne)

;; Small workaround to pattern-match `SunkLoad` here.
(decl gen_addrg32bne_for_real (XReg SunkLoad XReg u8) AddrG32Bne)
(rule (gen_addrg32bne_for_real base (SunkLoad.Load _ bound_addr bound_offset) wasm_addr offset)
(AddrG32Bne.BoundNe base bound_addr bound_offset wasm_addr offset))

;; Helper to extract `G32Bne` from `G32` if applicable.
;;
;; This is possible when the heap bound is itself a sinkable load which can get
;; folded into `G32Bne`.
(decl pure partial addrg32bne (G32) G32Bne)
(rule (addrg32bne (G32.All heap_base heap_bound wasm_addr offset))
(if-let heap_bound_load (sinkable_load heap_bound))
(if-let offset8 (u8_try_from_u16 offset))
(G32Bne.All heap_base heap_bound_load wasm_addr offset8))

;; Main entrypoint for matching a `G32` address which can be used in `*_g32`
;; pulley instructions for loads/stores.
;;
;; Pulley loads/stores are emitted as fallible loads/stores where the only
;; defined trapping address is null. That's modeled as a load-from-`select`
;; where an out-of-bounds condition yields null.
;;
;; Here the top-layer of this rule matches the `(select ...)` node where 0 is
;; used if the oob condition is true. Otherwise the raw address is used for the
;; load/store.
(decl pure partial wasm_g32 (Value Offset32 MemFlags Type) G32)
(rule (wasm_g32 (select oob (u64_from_iconst 0) raw_addr) (offset32 0) flags ty)
;; This must be a wasm load/store according to `MemFlags`, namely that it's
;; got a "HEAP_OUT_OF_BOUNDS" trap code and it's little-endian.
(if-let true (memflags_is_wasm flags))
;; Peel off the static wasm offset from `raw_addr`, if one is present. If
;; one isn't present then `heap_offset` will be zero. This handles the `N`
;; in wasm instructions `i32.load offset=N`.
(if-let (HostOffset.All host_addr heap_offset) (host_offset raw_addr))
;; Next see if the `oob` and `raw_addr` combination match. This will attempt
;; extract a full bounds check from these values. If everything succeeds the
;; final step is then to extract an 8-bit offset of the load/store operation,
;; if appplicable, assuming that the constants used in various places all line
;; up just right.
(if-let (OobSelect.All base bound wasm_addr access_size_plus_offset)
(wasm_oob_select oob host_addr))
(if-let offset16 (g32_offset heap_offset ty access_size_plus_offset))
(G32.All base bound wasm_addr offset16))

;; Host helper to do the math to extract the offset here.
;;
;; Ensures that the first argument, the heap offset on the load, plus the size
;; of the type equals the third argument, the bounds-checked static offset. If
;; the offset then fits within `u16` it's returned.
(decl pure partial g32_offset (i32 Type u64) u16)
(extern constructor g32_offset g32_offset)

;; Helper used in `wasm_g32` above to extract `(iadd addr N)` where `N` is a
;; constant. If there is no constant then the constant 0 is returned instead.
(decl pure host_offset (Value) HostOffset)
(type HostOffset (enum (All (a Value) (b i32))))
(rule 0 (host_offset val) (HostOffset.All val 0))
(rule 1 (host_offset (iadd val (i32_from_iconst n))) (HostOffset.All val n))
(rule 2 (host_offset (iadd (i32_from_iconst n) val)) (HostOffset.All val n))

;; Helper to test whether the first argument `oob` contains a condition for
;; matching whether second argument `addr` is out-of-bounds. Searches for a
;; variety of structures that optimizations or the frontend may produce.
(decl pure partial wasm_oob_select (Value Value) OobSelect)
(type OobSelect (enum (All (a Value) (b Value) (c Value) (d u64))))

;; 32-bit hosts: search either side of the `iadd` for the base address within
;; `oob` to see if it's a matching condition.
(rule 0 (wasm_oob_select oob (iadd base wasm_addr @ (value_type $I32)))
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr oob))
(if-let (PointerWidth.PointerWidth32) (pointer_width))
(OobSelect.All base bound wasm_addr n))
(rule 1 (wasm_oob_select oob (iadd wasm_addr @ (value_type $I32) base))
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr oob))
(if-let (PointerWidth.PointerWidth32) (pointer_width))
(OobSelect.All base bound wasm_addr n))

;; 64-bit hosts: also search either side, but the wasm address must also be
;; a `uextend` from a 32-bit value.
(rule 0 (wasm_oob_select oob (iadd base wasm_addr_ext @ (value_type $I64)))
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr_ext oob))
(if-let (uextend wasm_addr @ (value_type $I32)) wasm_addr_ext)
(if-let (PointerWidth.PointerWidth64) (pointer_width))
(OobSelect.All base bound wasm_addr n))
(rule 1 (wasm_oob_select oob (iadd wasm_addr_ext @ (value_type $I64) base))
(if-let (OobCond.All bound n) (wasm_oob_cond wasm_addr_ext oob))
(if-let (uextend wasm_addr @ (value_type $I32)) wasm_addr_ext)
(if-let (PointerWidth.PointerWidth64) (pointer_width))
(OobSelect.All base bound wasm_addr n))

;; Helper to search for the first argument, `wasm_addr`, within the oob
;; condition second argument, `oob`. It should appear as an integer comparison
;; of the address against a particular bound.
(decl pure partial wasm_oob_cond (Value Value) OobCond)
(type OobCond (enum (All (a Value) (b u64))))
(rule (wasm_oob_cond wasm_addr (ugt wasm_addr (isub bound (u64_from_iconst n))))
(OobCond.All bound n))
(rule (wasm_oob_cond wasm_addr (ult (isub bound (u64_from_iconst n)) wasm_addr))
(OobCond.All bound n))

;;;; Newtypes for Different Register Classes ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(type XReg (primitive XReg))
Expand Down Expand Up @@ -467,10 +652,10 @@
(rule (pulley_br_if cond taken not_taken)
(SideEffectNoResult.Inst (MInst.BrIf cond taken not_taken)))

(decl pulley_xload (Amode Type MemFlags ExtKind) XReg)
(rule (pulley_xload amode ty flags ext)
(decl pulley_xload (Amode Type MemFlags) XReg)
(rule (pulley_xload amode ty flags)
(let ((dst WritableXReg (temp_writable_xreg))
(_ Unit (emit (MInst.XLoad dst amode ty flags ext))))
(_ Unit (emit (MInst.XLoad dst amode ty flags))))
dst))

(decl pulley_xstore (Amode XReg Type MemFlags) SideEffectNoResult)
Expand All @@ -487,10 +672,10 @@
(rule (pulley_fstore amode src ty flags)
(SideEffectNoResult.Inst (MInst.FStore amode src ty flags)))

(decl pulley_vload (Amode Type MemFlags VExtKind) VReg)
(rule (pulley_vload amode ty flags ext)
(decl pulley_vload (Amode Type MemFlags) VReg)
(rule (pulley_vload amode ty flags)
(let ((dst WritableVReg (temp_writable_vreg))
(_ Unit (emit (MInst.VLoad dst amode ty flags ext))))
(_ Unit (emit (MInst.VLoad dst amode ty flags))))
dst))

(decl pulley_vstore (Amode VReg Type MemFlags) SideEffectNoResult)
Expand Down
Loading

0 comments on commit 4cba949

Please sign in to comment.