Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreachable code and shared typing #80

Open
abrown opened this issue Aug 14, 2024 · 19 comments · May be fixed by #88
Open

Unreachable code and shared typing #80

abrown opened this issue Aug 14, 2024 · 19 comments · May be fixed by #88

Comments

@abrown
Copy link
Collaborator

abrown commented Aug 14, 2024

While implementing validation logic for instructions like any.convert_extern, I ran into a question: if we're validating unreachable code, we may not have a type available for the incoming operand. For the result type of that instruction, the nullable bit is set to false — what should the shared bit be set to?

@tlively
Copy link
Member

tlively commented Aug 15, 2024

Good question. @rossberg, this seems to require type variables for the shareability. Do our shared-polymorphic instructions like any.convert_extern and ref.as_non_null violate closed forward principle types?

@conrad-watt
Copy link
Collaborator

@abrown does your typing algorithm explicitly represent bot? (ref). My understanding is that the addition of bot removed the need to "guess" nullability of types in unreachable code, and the same should be true for shareability.

IIUC there should be two cases:

  1. we're either in unreachable code and popping from the base of the stack, or alternatively popping an explicit bot produced by select etc. In this case the result of any.convert_extern is just bot.
  2. we're popping a type with known nullability and shareability, in which case this is preserved by any.convert_extern.

@conrad-watt
Copy link
Collaborator

conrad-watt commented Aug 15, 2024

Oh, looking more closely at the current typing rule for any.convert_extern, my explanation above doesn't seem to be true. But I think it should be! @rossberg what's your understanding? I think the other interpretation is that we need explicit "shared` variants of the two instructions.

@tlively
Copy link
Member

tlively commented Aug 15, 2024

No instruction currently results in bot when its input is bot. For example, (unreachable) (i32.eqz) (i64.add) is not valid. The reason you don't need to guess an output nullability for nullable-polymorphic instructions whose input is bot is that it's always valid to guess "non-nullable," since that is a subtype of "nullable" and is allowed in strictly more places. We do not have such a subtyping relationship between shared and unshared, so there's no conservative guess to make.

@rossberg
Copy link
Member

[Short reply from vacation.]

Yes, interestingly, it seems like this is okay for Principal Types, but breaks closedness of Principal Forward Types, which is the property engines depend on. We'd need to either introduce "bottom sharability" or require separate instructions / annotations.

@tlively
Copy link
Member

tlively commented Aug 15, 2024

Adding a bottom shareability for validation purposes would be an interesting solution. The typing rules for shareability-polymorphic instructions would have to propagate bottom shareability from their input to their output, so this solution is similar to the solution @conrad-watt suggested of propagating the bottom value type from the input to the output. But AFAICT bottom shareability propagation maintains the current decomposition property that says any valid sequence of instructions can be decomposed into two valid sequences of instructions at any point, whereas full-blown bottom type propagation would break that property.

(We didn't notice this problem in Binaryen because we do full-blown bottom type propagation anyway, contrary to the spec. It would be nice to fix this in our parsers at some point.)

@abrown, do you think this bottom shareability idea would be easily implementable in wasm-tools?

@conrad-watt
Copy link
Collaborator

Nullability is recorded directly on the reftype but shareability comes from the heaptype, so there would be some spec messiness unless we restrict ⊥-shared to appear only on abstract heap types (i.e. it's hard to manufacture a new concrete heap type that only appears during validation, since these are identified by index). I guess for this particular case it's sufficient to introduce any ⊥-shared and extern ⊥-shared. Are there any potential polymorphic instructions that might produce non-abstract heap types? How is ref.as_non_null currently validated in dead code, since it doesn't have an annotation?

any valid sequence of instructions can be decomposed into two valid sequences of instructions at any point, whereas full-blown bottom type propagation would break that property.

Well, with bottom propagation the second sequence of instructions would have type bot -> bot. I guess it could make outlining more complicated, since replacing bot -> bot with a well-typed function call might invalidate subsequent dead code. But I'm still kind of cynical about these examples - it seems easier to just remove all syntactically dead code as a first step!

@conrad-watt
Copy link
Collaborator

How is ref.as_non_null currently validated in dead code, since it doesn't have an annotation?

It looks like we introduce a bot abstract heap type as well as a bot value type, so this all works out (link).

@alexcrichton
Copy link
Contributor

The situation is a little funky in wasm-tools because historically we've tried to avoid adding bot in places that can be exposed outside of the validator. Within validation we currently have

enum MaybeType {
    Bot,
    HeapBot,
    Type(ValType),
}

which enables us to encapsulate the bottom-ness within the validator itself and not part of the public API of reading a wasm module or getting its type information. In that sense adding bot to the absheaptype equivalent wouldn't be easy because things aren't set up that way, but it's doable.

I'll admit though I'm not following some of the discussion here 100%, so I wanted to ask a question about this. Is the idea that if there's a bot variant of absheaptype then this problem just goes away? For any.convert_extern which @abrown mentioned I'm not sure what the shared bit would be set to.

Or is the suggestion that there's a new sharedbot variant? In addition to a maybesharedbot variant?

@tlively
Copy link
Member

tlively commented Aug 20, 2024

The HeapBot in the code you quoted corresponds to the reference type (ref heap-bot) rather than the bottom heap type from the spec. That works so far because we only ever need to use the bottom heap type as part of a non-nullable reference. But it makes it tricky to extend the code to handle the proposed share-bot, which can appear with heap types other than heap-bot.

The type on the stack after (unreachable) (any.convert_extern) would be (ref (share-bot any)) because it must be some kind of reference to either a shared or unshared any. It is conservative to assume the reference is non-nullable, but future instructions might expected either a shared or unshared any, and neither works for instructions that expect the other, so we have to use the new share-bot instead.

If I were trying to implement this in wasm-tools, I would try using something like these enums:

enum MaybeShare {
    ShareBot,
    Share(Shareability),
}

enum MaybeHeapType {
    HeapBot,
    MaybeShare(MaybeShare, AbsHeapType),
    Type(HeapType)
}

enum MaybeType {
    Bot,
    MaybeReference(MaybeHeapType),
    Type(ValType),
}

And enforce an invariant that the MaybeHeapType and MaybeShare representations can only be used for types that cannot be represented as a ValType.

@alexcrichton
Copy link
Contributor

In prototyping this I found that it was sufficient to effectively add the concept of a bottom heap type with an optionally known abstract heap type. The heap bottom type suffices to represent "maybe shared, maybe not" and the optionally known abstract heap type handles the any.convert_extern case (and vice versa) by saying that it's at least a known abstract type. For wasm-tools that ended up with

enum MaybeType {
    Bot,
    HeapBot(Option<AbstractHeapType>),
    Type(ValType),
}

in the validator. I've only updated the instructions that @abrown has implemented for shared-everything-threads in wasm-tools though so this may not be sufficient for everything if there's more flavorful instructions to be added.

@rossberg
Copy link
Member

Sounds plausible.

The spec internally extends the type grammar for validation purposes, as specified here (which come with suitable subtyping rules). For sharedness, we'll need a corresponding extension with bot.

An actual implementation of validation will have to do something similar. But not all expressible cases can arise with the current instruction set. IIUC, the representation you show uses HeapBot to represent (ref (botshared absheaptype)), but it cannot express (ref null (botshared absheaptype)) or (ref (botshared $t)). Currently that's probably okay, because the latter do not arise in any instruction's forward principal type, I think. Though there could be future instructions that need to distinguish some of these cases.

@alexcrichton
Copy link
Contributor

To perhaps close the loop on this, an update to wasm-tools's implementation was in bytecodealliance/wasm-tools#1734 with a number of new tests that pass as a result of that PR as well.

@tlively
Copy link
Member

tlively commented Oct 28, 2024

I'm writing a PR to update the overview to describe bot-share, but there's another choice we have to make. In #76, we decided to disallow mixed-sharedness ref.eq by using a shareability type variable. In this issue we discovered that such a type variable is insufficient to preserve principle typing for other instructions like any.convert_extern.

Do we want to have both shareability type variables and bot-share, in which case we can use the former for ref.eq and the latter for any.convert_extern, or do we just want to use bot-share for all our shared-polymorphic instructions? We choose to use bot-share everywhere, that would allow mixed-shareability ref.eq again.

@conrad-watt
Copy link
Collaborator

conrad-watt commented Oct 28, 2024

Do we want to have both shareability type variables and bot-share, in which case we can use the former for ref.eq and the latter for any.convert_extern, or do we just want to use bot-share for all our shared-polymorphic instructions? We choose to use bot-share everywhere, that would allow mixed-shareability ref.eq again.

Before looking at the PR in detail, I'm surprised by the following statement in it:

Besides fixing the forward principle typing property and validation in
unreachable code, this also causes mixed-shareability ref.eq to be
allowed again since there is no longer a mechanism for constraining both
of its operands to have the same shareability.

I'd expect that shareability type variables in the rules + bot-share in the type system (not mentioned explicitly in the instruction rules or representable in the concrete syntax) would get exactly the desired behaviour: no mixed concrete shareability but still achieving (forward) principle types.

i.e. consider the following typing rules in a system where one of the allowed values of share (in dead code) is bot-shared:

-------
(ref.eq) : [(ref share a_eq) (ref share b_eq)] -> [i32]

...
--------
any.convert_extern : [(ref share any)] -> [(ref share extern)]

(apologies if I'm getting the any and extern the wrong way round)

  • for ref.eq, any two concrete share values must match exactly, while bot-shared in either arm (due to dead code) will be implicitly upcasted to match the other arm before applying the rule;
  • for any.convert_extern, share values are preserved, including bot-shared for maximum permissiveness in dead code.

(@rossberg please check my intuition above)

Unrelated thought - I think there's also a connection between bot-shared and the concept of "either-shared-or-unshared" that we've discussed previously in other contexts.

@tlively
Copy link
Member

tlively commented Oct 28, 2024

Yes, keeping both the type variables and the bottom shareability means we can still disallow mixed shareability in ref.eq. The PR current removes the new type variables entirely, though. It doesn't seem useful to introduce shareability type variables and expand our definitions of principal types to allow them just to prevent mixed shareability on ref.eq.

@conrad-watt
Copy link
Collaborator

The PR current removes the new type variables entirely, though. It doesn't seem useful to introduce shareability type variables and expand our definitions of principal types to allow them just to prevent mixed shareability on ref.eq.

Can you expand a little on what we'd be removing with the PR's approach? IIUC the main expansion of the concept of principle types comes from allowing bot-shared to occur in them. I think either way, we'll need to spec a mechanism to extract the "shareness" of a type so it can be referenced by a side condition. At the very least all table loads and stores will need to check that the shareness of the provided ref is the same as the shareness of the table. Once we have this, writing the rules above doesn't seem to present any new complications.

I don't think it's important whether the share variable is written inline with the ref or not. I could equivalently write

shareness(a_eq) = shareness(b_eq)
-------
(ref.eq) : [(ref a_eq) (ref b_eq)] -> [i32]

and so long as the range of shareness is shared | nonshared | bot-shared, I believe we still get principle types while forbidding mixed sharedness for ref.eq.

@rossberg
Copy link
Member

@tlively, I believe this doesn't work as you suggest, see my comment on #88. AFAICS, contravariance works the other way.

@tlively
Copy link
Member

tlively commented Oct 29, 2024

Yes, thanks, I was confused. We still need sharedness metavariables for principal types, so there's no reason to make a change to ref.eq. We can still disallow mixed sharedness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants