Skip to content

API Unicode

Florian Nücke edited this page Dec 4, 2013 · 2 revisions

Because all strings pass through Java at some point it can be useful to handle them with Unicode support (since Java's internal string representation is UTF-8 encoded). In particular, screens display UTF-8 strings, meaning the related GPU functions expect UTF-8 strings. Also, keyboard input will generally be UTF-8 encoded, especially the clipboard.

However, keep in mind that only a subset of UTF-8 can actually be displayed on screens. Specifically all glyphs defined in code page 437 are supported.

The following functions are provided to allow basic UTF-8 handling:

  • unicode.char(value: number, ...): string
    UTF-8 aware version of string.char. The values may be in the full UTF-8 range, not just ASCII.
  • unicode.len(value: string): number
    UTF-8 aware version of string.len. For example, for Ümläüt it'll return 6, where string.len would return 9.
  • unicode.lower(value: string): string
    UTF-8 aware version of string.lower.
  • unicode.reverse(value: string): string
    UTF-8 aware version of string.reverse. For example, for Ümläüt it'll return tüälmÜ, where string.reverse would return tälm.
  • unicode.sub(value: string, i:number[, j:number]): string
    UTF-8 aware version of string.sub.
  • unicode.upper(value: string): string
    UTF-8 aware version of string.upper.

For example, these are used when files are opened in non-binary mode. The original string functions are used for files opened in binary mode.

Clone this wiki locally