Unreachable code and `shared` typing #80

abrown · 2024-08-14T22:20:46Z

While implementing validation logic for instructions like any.convert_extern, I ran into a question: if we're validating unreachable code, we may not have a type available for the incoming operand. For the result type of that instruction, the nullable bit is set to false — what should the shared bit be set to?

The text was updated successfully, but these errors were encountered:

tlively · 2024-08-15T00:36:42Z

Good question. @rossberg, this seems to require type variables for the shareability. Do our shared-polymorphic instructions like any.convert_extern and ref.as_non_null violate closed forward principle types?

conrad-watt · 2024-08-15T01:49:24Z

@abrown does your typing algorithm explicitly represent bot? (ref). My understanding is that the addition of bot removed the need to "guess" nullability of types in unreachable code, and the same should be true for shareability.

IIUC there should be two cases:

we're either in unreachable code and popping from the base of the stack, or alternatively popping an explicit bot produced by select etc. In this case the result of any.convert_extern is just bot.
we're popping a type with known nullability and shareability, in which case this is preserved by any.convert_extern.

conrad-watt · 2024-08-15T02:01:14Z

Oh, looking more closely at the current typing rule for any.convert_extern, my explanation above doesn't seem to be true. But I think it should be! @rossberg what's your understanding? I think the other interpretation is that we need explicit "shared` variants of the two instructions.

tlively · 2024-08-15T06:37:46Z

No instruction currently results in bot when its input is bot. For example, (unreachable) (i32.eqz) (i64.add) is not valid. The reason you don't need to guess an output nullability for nullable-polymorphic instructions whose input is bot is that it's always valid to guess "non-nullable," since that is a subtype of "nullable" and is allowed in strictly more places. We do not have such a subtyping relationship between shared and unshared, so there's no conservative guess to make.

rossberg · 2024-08-15T08:48:33Z

[Short reply from vacation.]

Yes, interestingly, it seems like this is okay for Principal Types, but breaks closedness of Principal Forward Types, which is the property engines depend on. We'd need to either introduce "bottom sharability" or require separate instructions / annotations.

tlively · 2024-08-15T20:27:42Z

Adding a bottom shareability for validation purposes would be an interesting solution. The typing rules for shareability-polymorphic instructions would have to propagate bottom shareability from their input to their output, so this solution is similar to the solution @conrad-watt suggested of propagating the bottom value type from the input to the output. But AFAICT bottom shareability propagation maintains the current decomposition property that says any valid sequence of instructions can be decomposed into two valid sequences of instructions at any point, whereas full-blown bottom type propagation would break that property.

(We didn't notice this problem in Binaryen because we do full-blown bottom type propagation anyway, contrary to the spec. It would be nice to fix this in our parsers at some point.)

@abrown, do you think this bottom shareability idea would be easily implementable in wasm-tools?

conrad-watt · 2024-08-16T02:10:41Z

Nullability is recorded directly on the reftype but shareability comes from the heaptype, so there would be some spec messiness unless we restrict ⊥-shared to appear only on abstract heap types (i.e. it's hard to manufacture a new concrete heap type that only appears during validation, since these are identified by index). I guess for this particular case it's sufficient to introduce any ⊥-shared and extern ⊥-shared. Are there any potential polymorphic instructions that might produce non-abstract heap types? How is ref.as_non_null currently validated in dead code, since it doesn't have an annotation?

any valid sequence of instructions can be decomposed into two valid sequences of instructions at any point, whereas full-blown bottom type propagation would break that property.

Well, with bottom propagation the second sequence of instructions would have type bot -> bot. I guess it could make outlining more complicated, since replacing bot -> bot with a well-typed function call might invalidate subsequent dead code. But I'm still kind of cynical about these examples - it seems easier to just remove all syntactically dead code as a first step!

conrad-watt · 2024-08-16T02:20:12Z

How is ref.as_non_null currently validated in dead code, since it doesn't have an annotation?

It looks like we introduce a bot abstract heap type as well as a bot value type, so this all works out (link).

alexcrichton · 2024-08-20T20:27:58Z

The situation is a little funky in wasm-tools because historically we've tried to avoid adding bot in places that can be exposed outside of the validator. Within validation we currently have

enum MaybeType {
    Bot,
    HeapBot,
    Type(ValType),
}

which enables us to encapsulate the bottom-ness within the validator itself and not part of the public API of reading a wasm module or getting its type information. In that sense adding bot to the absheaptype equivalent wouldn't be easy because things aren't set up that way, but it's doable.

I'll admit though I'm not following some of the discussion here 100%, so I wanted to ask a question about this. Is the idea that if there's a bot variant of absheaptype then this problem just goes away? For any.convert_extern which @abrown mentioned I'm not sure what the shared bit would be set to.

Or is the suggestion that there's a new sharedbot variant? In addition to a maybesharedbot variant?

tlively · 2024-08-20T20:55:20Z

The HeapBot in the code you quoted corresponds to the reference type (ref heap-bot) rather than the bottom heap type from the spec. That works so far because we only ever need to use the bottom heap type as part of a non-nullable reference. But it makes it tricky to extend the code to handle the proposed share-bot, which can appear with heap types other than heap-bot.

The type on the stack after (unreachable) (any.convert_extern) would be (ref (share-bot any)) because it must be some kind of reference to either a shared or unshared any. It is conservative to assume the reference is non-nullable, but future instructions might expected either a shared or unshared any, and neither works for instructions that expect the other, so we have to use the new share-bot instead.

If I were trying to implement this in wasm-tools, I would try using something like these enums:

enum MaybeShare {
    ShareBot,
    Share(Shareability),
}

enum MaybeHeapType {
    HeapBot,
    MaybeShare(MaybeShare, AbsHeapType),
    Type(HeapType)
}

enum MaybeType {
    Bot,
    MaybeReference(MaybeHeapType),
    Type(ValType),
}

And enforce an invariant that the MaybeHeapType and MaybeShare representations can only be used for types that cannot be represented as a ValType.

alexcrichton · 2024-08-21T15:06:37Z

In prototyping this I found that it was sufficient to effectively add the concept of a bottom heap type with an optionally known abstract heap type. The heap bottom type suffices to represent "maybe shared, maybe not" and the optionally known abstract heap type handles the any.convert_extern case (and vice versa) by saying that it's at least a known abstract type. For wasm-tools that ended up with

enum MaybeType {
    Bot,
    HeapBot(Option<AbstractHeapType>),
    Type(ValType),
}

in the validator. I've only updated the instructions that @abrown has implemented for shared-everything-threads in wasm-tools though so this may not be sufficient for everything if there's more flavorful instructions to be added.

rossberg · 2024-08-28T11:31:40Z

Sounds plausible.

The spec internally extends the type grammar for validation purposes, as specified here (which come with suitable subtyping rules). For sharedness, we'll need a corresponding extension with bot.

An actual implementation of validation will have to do something similar. But not all expressible cases can arise with the current instruction set. IIUC, the representation you show uses HeapBot to represent (ref (botshared absheaptype)), but it cannot express (ref null (botshared absheaptype)) or (ref (botshared $t)). Currently that's probably okay, because the latter do not arise in any instruction's forward principal type, I think. Though there could be future instructions that need to distinguish some of these cases.

alexcrichton · 2024-09-12T20:42:40Z

To perhaps close the loop on this, an update to wasm-tools's implementation was in bytecodealliance/wasm-tools#1734 with a number of new tests that pass as a result of that PR as well.

tlively · 2024-10-28T15:02:38Z

I'm writing a PR to update the overview to describe bot-share, but there's another choice we have to make. In #76, we decided to disallow mixed-sharedness ref.eq by using a shareability type variable. In this issue we discovered that such a type variable is insufficient to preserve principle typing for other instructions like any.convert_extern.

Do we want to have both shareability type variables and bot-share, in which case we can use the former for ref.eq and the latter for any.convert_extern, or do we just want to use bot-share for all our shared-polymorphic instructions? We choose to use bot-share everywhere, that would allow mixed-shareability ref.eq again.

conrad-watt · 2024-10-28T17:03:02Z

Do we want to have both shareability type variables and bot-share, in which case we can use the former for ref.eq and the latter for any.convert_extern, or do we just want to use bot-share for all our shared-polymorphic instructions? We choose to use bot-share everywhere, that would allow mixed-shareability ref.eq again.

Before looking at the PR in detail, I'm surprised by the following statement in it:

Besides fixing the forward principle typing property and validation in
unreachable code, this also causes mixed-shareability ref.eq to be
allowed again since there is no longer a mechanism for constraining both
of its operands to have the same shareability.

I'd expect that shareability type variables in the rules + bot-share in the type system (not mentioned explicitly in the instruction rules or representable in the concrete syntax) would get exactly the desired behaviour: no mixed concrete shareability but still achieving (forward) principle types.

i.e. consider the following typing rules in a system where one of the allowed values of share (in dead code) is bot-shared:

-------
(ref.eq) : [(ref share a_eq) (ref share b_eq)] -> [i32]

...
--------
any.convert_extern : [(ref share any)] -> [(ref share extern)]

(apologies if I'm getting the any and extern the wrong way round)

for ref.eq, any two concrete share values must match exactly, while bot-shared in either arm (due to dead code) will be implicitly upcasted to match the other arm before applying the rule;
for any.convert_extern, share values are preserved, including bot-shared for maximum permissiveness in dead code.

(@rossberg please check my intuition above)

Unrelated thought - I think there's also a connection between bot-shared and the concept of "either-shared-or-unshared" that we've discussed previously in other contexts.

tlively · 2024-10-28T18:32:37Z

Yes, keeping both the type variables and the bottom shareability means we can still disallow mixed shareability in ref.eq. The PR current removes the new type variables entirely, though. It doesn't seem useful to introduce shareability type variables and expand our definitions of principal types to allow them just to prevent mixed shareability on ref.eq.

conrad-watt · 2024-10-29T00:49:16Z

The PR current removes the new type variables entirely, though. It doesn't seem useful to introduce shareability type variables and expand our definitions of principal types to allow them just to prevent mixed shareability on ref.eq.

Can you expand a little on what we'd be removing with the PR's approach? IIUC the main expansion of the concept of principle types comes from allowing bot-shared to occur in them. I think either way, we'll need to spec a mechanism to extract the "shareness" of a type so it can be referenced by a side condition. At the very least all table loads and stores will need to check that the shareness of the provided ref is the same as the shareness of the table. Once we have this, writing the rules above doesn't seem to present any new complications.

I don't think it's important whether the share variable is written inline with the ref or not. I could equivalently write

shareness(a_eq) = shareness(b_eq)
-------
(ref.eq) : [(ref a_eq) (ref b_eq)] -> [i32]

and so long as the range of shareness is shared | nonshared | bot-shared, I believe we still get principle types while forbidding mixed sharedness for ref.eq.

rossberg · 2024-10-29T12:57:34Z

@tlively, I believe this doesn't work as you suggest, see my comment on #88. AFAICS, contravariance works the other way.

tlively · 2024-10-29T16:41:48Z

Yes, thanks, I was confused. We still need sharedness metavariables for principal types, so there's no reason to make a change to ref.eq. We can still disallow mixed sharedness.

abrown mentioned this issue Aug 14, 2024

threads: improve tests bytecodealliance/wasm-tools#1719

Merged

alexcrichton mentioned this issue Aug 21, 2024

Update handling of unreachable code and heap types bytecodealliance/wasm-tools#1734

Merged

tlively linked a pull request Oct 28, 2024 that will close this issue

Change from type variables to bot-share for polymorphism #88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreachable code and `shared` typing #80

Unreachable code and `shared` typing #80

abrown commented Aug 14, 2024

tlively commented Aug 15, 2024

conrad-watt commented Aug 15, 2024

conrad-watt commented Aug 15, 2024 •

edited

Loading

tlively commented Aug 15, 2024

rossberg commented Aug 15, 2024

tlively commented Aug 15, 2024

conrad-watt commented Aug 16, 2024

conrad-watt commented Aug 16, 2024

alexcrichton commented Aug 20, 2024

tlively commented Aug 20, 2024

alexcrichton commented Aug 21, 2024

rossberg commented Aug 28, 2024

alexcrichton commented Sep 12, 2024

tlively commented Oct 28, 2024 •

edited

Loading

conrad-watt commented Oct 28, 2024 •

edited

Loading

tlively commented Oct 28, 2024

conrad-watt commented Oct 29, 2024

rossberg commented Oct 29, 2024

tlively commented Oct 29, 2024

Unreachable code and shared typing #80

Unreachable code and shared typing #80

Comments

abrown commented Aug 14, 2024

tlively commented Aug 15, 2024

conrad-watt commented Aug 15, 2024

conrad-watt commented Aug 15, 2024 • edited Loading

tlively commented Aug 15, 2024

rossberg commented Aug 15, 2024

tlively commented Aug 15, 2024

conrad-watt commented Aug 16, 2024

conrad-watt commented Aug 16, 2024

alexcrichton commented Aug 20, 2024

tlively commented Aug 20, 2024

alexcrichton commented Aug 21, 2024

rossberg commented Aug 28, 2024

alexcrichton commented Sep 12, 2024

tlively commented Oct 28, 2024 • edited Loading

conrad-watt commented Oct 28, 2024 • edited Loading

tlively commented Oct 28, 2024

conrad-watt commented Oct 29, 2024

rossberg commented Oct 29, 2024

tlively commented Oct 29, 2024

Unreachable code and `shared` typing #80

Unreachable code and `shared` typing #80

conrad-watt commented Aug 15, 2024 •

edited

Loading

tlively commented Oct 28, 2024 •

edited

Loading

conrad-watt commented Oct 28, 2024 •

edited

Loading