- Where: zoom.us
- When: December 5, 17:00-18:00 UTC
- Location: link on calendar invite
- Contact:
- Name: Dan Gohman
- Email: [email protected]
None required if you've attended before. Email Dan Gohman to sign up if it's your first time. The meeting is open to CG members only.
The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.
- Opening, welcome and roll call
- Opening of the meeting
- Introduction of attendees
- Find volunteers for note taking (acting chair to volunteer)
- Adoption of the agenda
- Proposals and discussions
- Stack trace API
- WebAssembly#159
- Whare are the next steps here?
- Character encodings
- https://github.com/WebAssembly/WASI/issues/8
- Can we decide on this, or do we need more information?
- Case sensitivity
- https://github.com/WebAssembly/WASI/issues/72
- Can we decide on this, or do we need more information?
- ANSI escape sequences
- WebAssembly#162
- Can we agree on the overall framework proposed here?
- Input sequences
- WebAssembly#163
- What standards govern this space?
- Stack trace API
- Closure
Attendees:
Dan Gohman Pat Hickey Stefan Junker Wouter van Oortmerssen Jan Falkin Andrew Brown Mark McCaskey Peter Huene
Meeting notes:
bytecodealliance/wasmtime#235 (comment)
DG: If people have comments on the process we’re following with these meetings, please give feedback.
PH: I’ll second the agenda
DG: First item, stack trace API. Thomas Lively proposed this, but he’s not in the meeting at this time. Let’s table this until he’s here.
DG: Character encodings. This issue has been open for a while and there’s no easy answer. There’s a shape of a proposal in the issue that most engines are following - we say that everything is UTF-8. Is there any disagreement with that consensus here?
(none)
DG: For things like environment variables, it would simplify things a lot if we could guarantee those are UTF-8 strings.
DG: What about filesystems with file names that are unencodable? Can we find a way to tunnel that through? We currently use pointer+length to pass all filenames. No filesystem permits null in a filename, so we can encode a non-UTF8 string by putting extra information after the null. The trouble comes when you have to manipulate the path.
PH: What about interface types interaction?
DG: The stuff after the null will still be valid UTF-8 which encodes the non-UTF-8 part of the filename. So any library code that expects UTF-8 will still work. DG: We’ll take this as consensus, and go forward with this design for now. We can change it if problems happen or we have objections during implementations.
AB: Can you clarify what is being encoded after the null?
DG: Two things: in the part before the null we want a normal-looking string, so any bytes that cant be translated get the unencodable-character (uffd). Then, after the null, we put the information that was lost. Something like, index of the string, then the missing bytes. We want something that is reversible. I agree it is goofy. Does anyone know how to encode an arbitrary integer in a utf-8 byte pattern? Obviously we could invent something, like using ascii digits. If theres an existing way to encode it, we’d want to follow that.
AB: So we can’t just say we support all unicode strings?
DG: The problem is that there are names out there that can’t be expressed in unicode. And from people I’ve talked to, its rare in practice. So the goal here is that if you get a path from one wasi api, and treat it as a utf-8 string without manipulating it, and then pass it to another wasi api, it works exactly as is.
AB: I misstated my question - is there no unicode format where this reconstruction wouldn't be necessary? Or will there always be some way for there to be an unencodable string?
DG: My understanding is there’s no way out of this.
Peter H: This would be for bytes which are not valid unicode codepoints.
DG: I’m looking for enough agreement that we can keep designing this for now.
Peter H: is there currently a part of the spec that allows non-utf8?
DG: Witx String type is always UTF-8. We don’t have a byte vector type right now, but there are byte pointer-length pairs. For any string passed to WASI we’ll require it to be UTF-8, and any string coming from WASI would be guaranteed to be UTF-8.
DG: Next topic, case sensitivity in filesystems. If you look at issue 72, the big picture strategy is that WASI doesn’t dictate case sensitivity or insensitivity, or what variation of sensitivity it has (because different filesystems have different rules, whether they case map ascii or different versions of unicode). I think the point of the API is to interact with existing filesystems, and we can’t control those, so this is going to be a portability hazard for WASI. We can make tools that help you detect these pitfalls. Does anyone have ideas of how we can make this better?
DG: Ok, so to use the wasm scary word, it is nondeterministic to use different cases to access the same file in the WASI api. DG: Alright, so the next two issues are about virtual terminal handling. Its a space largely defined by compatibility with existing systems like xterm, unix consoles, and windows consoles. It makes sense to have something simple here. You could imagine unifying terminals with GUIs, modernizing terminals, lots of blue sky stuff here. My sense is that the value here is to find a simple thing that will interoperate with lots of different existing systems.
Sbc: Not just to interoperate with existing systems but to allow existing programs to run on many different systems?
DG: Yes my goal is something like vi on wasi just works. It looks like lots of people don’t use terminfo to figure out colors and just hard-code them. And windows support is tricky because it doesn’t have exactly the same features as unix. In cases where the terminals differ, should we expose WASI programs to the differences between them?
DG: The direction I’m going is we don’t try to do terminfo, the WASI implementation will be responsible for remapping escape sequences and all that. You could imagine having an environment variable TERM=WASI. It would look a lot like xterm, which in turn looks a lot like windows.
DG: On that topic, input sequences are a subfield of escape sequences. Input varies way more than output, across platforms. This made me think we could do a cool radical thing like use unicode symbols. But maybe the simplest thing is, just do what xterm does, and if it needs to be remapped by a WASI implementation to fit another console, thats up to them. But there are interesting questions to still ask here like what set of input sequences to support
DG: Its tempting to say, we can go and design a modern system that doesn’t have all the legacy baggage. But my instinct now is to pull back from that and pick a reasonable set of defaults based on some of the successful modern implementations. If someone wants to design a whole new system, I won’t stand in their way.
SJ: Can you clarify about how vim would work on this?
DG: Vim itself wouldnt have to change. Terminfo is a library that gets linked in, but hopefully we could provide a radically simplified version that works on wasi. Inside the wasi implementation, anything like a TTY would have to filter the output coming from WASI program to neutralize all escape sequences, and then encode in terms of whatever its client expects.
SJ: I would have assumed that stdin/out/err just map to whatever the host of the WASI implementation expects
DG: We’re worried about security, because the terminal interface wasn’t designed to deal with untrusted code running on the terminal. So one example is that you could run a program and you think that it exited, but really its just showing you something that looks like your shell. We want to not allow applications to do things like that, by default. So if you wanted to run vim, you’d have to give it permission to rewrite your entire screen, versus just print ordinary text to stdout, which would be the default. So, controlling cursor position and colors and that would be exposed to wasi as a capability.