Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: wasi-icu #590

Open
oovm opened this issue Mar 17, 2024 · 6 comments
Open

Request: wasi-icu #590

oovm opened this issue Mar 17, 2024 · 6 comments

Comments

@oovm
Copy link

oovm commented Mar 17, 2024

I had some difficulties writing features such as string iterators under utf8 and wtf16, parsers with unicode character properties, and time-related formatters, and I realized that I needed some standard interfaces related to internationalization.

Can wasi standardize the International Components for Unicode interfaces?

Advantage

The host implementation can greatly reduce the size of wasm, does not need to embed a huge dictionary, no loading time, has better performance, and can maintain follow-up standards, which can evolve to a new version of icu without updating the distributed binary.

These advantages are not available with sdk embedding or guest embedding.

Moreover, some rules of the icu standard are very complex, and non-experts will implement them incorrectly. It is costly for all languages to implement them individually.

Disadvantages

Not quite in line with WebAssembly System Interfaces, but in line with WebAssembly Standard Interfaces.

Related

@kaizhu256
Copy link

+1

it would allow sqlite to easily extend regexp-replace in webassembly (https://github.com/sqlite/sqlite/blob/master/ext/icu/icu.c)

@sunfishcode
Copy link
Member

I wonder how feasible it would be to adapt ICU4X's language-bindings system to (semi-)automatically produce a Wit API.

@devsnek
Copy link
Member

devsnek commented Mar 18, 2024

@sffc i think we discussed something like this at one point... do you think diplomat could be up to the task?

@sffc
Copy link

sffc commented Mar 18, 2024

It makes sense to have bindings to system libraries in order to reduce binary size. ICU4X can of course serve as a polyfill when a platform API is unavailable. For example, Android, Windows, and iOS (and maybe others) have standard APIs that can be wrapped for a subset of i18n functionality without adding any additional dependencies.

On the specifics:

string iterators under utf8 and wtf16

Many modern programming languages give you this for free. If in Rust, UTF-8 iteration is built in, and for UTF-16 you can use the lightweight utf16_iter crate. In C++, I like to use the macros in ICU4C utf8.h or utf16.h, which do not require any runtime library dependencies (you can include them at build time only).

parsers with unicode character properties

What are you trying to parse? If you're talking about regular expressions, that is an interesting topic requiring further evaluation, because you'll need not only ICU4* for the properties but also a regex engine.

time-related formatters

This is of course ICU4*'s core competency.

Other features not mentioned here are Collator and Normalizer. These are smaller, data-heavy APIs that might be good starting points.

When doing API design, I encourage using the ECMA-402 API surface.

@sunfishcode
Copy link
Member

Thanks! The main thing this needs now is for some people to volunteer to be champions, who can put together an API proposal, and ideally also a prototype implementation.

@guybedford
Copy link
Contributor

Just to note having a standard interface here, either available as a component or host API, would be directly useful for JS component runtimes like ComponentizeJS and Fastly's JS Compute Runtime to support ECMA-402.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants