Skip to content

Fluent and ICU MessageFormat

Zibi Braniecki edited this page Mar 8, 2017 · 13 revisions

Project Fluent shares a lot of philosophy that drove the design of MessageFormat. It provides similar separation of concerns leaving variant selection logic in the hands of localizers, rather than developers, it aims to support rich grammatical features and it aims to be generic, technology independent solution that can be used in many different environments.

At the same time, it does have a few fundamental differences.

As the authors of the proposal that challenges the industry standard, we believe the burden is on us to justify the effort and the cost it poses on the whole industry.

Below, we listed significant differences between MessageFormat and Fluent:

MessageFormat Fluent
API design focus C++ and Java the Web, JavaScript, Python and Rust
Syntax focus on conciseness focus on readability
Syntax scope per message per list of messages (resource)
Error recovery fragile resilient, strong recovery logic
Compound Messages none value + attributes per message
Multi-variant Messages none built-in support
Multiline no explicit solution built-in support
BiDi none bidirectional isolation
Intl Formatters explicit explicit and implicit
Formatting Control only the developer controls formatting localizer has limited control over formatting

API design focus

MessageFormat has been designed with primary targets being Java and C++. The result is an API that can at times feel awkward for the modern Web, and the file format that is not designed with the same focus on being readable and writeable by hand. Fluent aims to be readable without any prior knowledge of the syntax, editable with minimal risk of introducing an error, with strong recovery model for both syntax and runtime errors and API that fits well into the Web.

Syntax

MessageFormat syntax has come up as hard to read and write in our experience. It also makes it trivially easy to introduce a syntactic error that it cannot recover from and does not help the reader distinguish the parts of the syntax that are to be translated from the logic. Finally, MessageFormat syntax does not bear any similarity with other technologies in result not allowing new users to tap into their knowledge of other systems in working with the format.

Fluent syntax has been shaped after TOML, with influence from CSS error recovery. It follows the Principle of Least Power aiming to enable users to read, edit and write FTL syntax with as minimum prior knowledge as possible. It also reuses concepts and syntactic choices from other technologies that the user may be familiar with like Excel functions, JavaScript and algebra.

You can read more about the exact differences between MessageFormat and Fluent syntax in the MessageFormat vs Fluent Syntax article.

Learn more about the Fluent syntax in the Syntax Guide and try editing translations in the online editor.

Syntax scope

MessageFormat choice to define syntax on the level of a single message goes against our design goals of making the format editable by hand. Messages encoded in MessageFormat must be stored in a separate container format like ResourceBundle, JSON or XML which require any reader to perform a double-parsing as they have to keep both formats - the container and the message - in mind. That makes editing by hand impractical and pushes all uses of MessageFormat to use strong tooling.

Fluent syntax operates on per-resource basis, allowing for a single syntax to write lists of messages. This design decision enable additional features like Sections, and comments that are attached to messages or groups of messages. It also enables referencing one message from another from within the syntax.

Error recovery

MessageFormat is a strict format for what it accepts on the input and that, combined with a complex syntax makes the system relatively fragile.

When translations come from non-technical people, any error on the input will completely reject the message with no logic for recovery. That means that any error on any level ends up with high-cost consequence in form of missing string in the product.

On top of that, MessageFormat requires the user to encapsulate the messages in another format like JSON or XML. Those formats were not designed with error recovery in mind, which means that any error syntax on that level makes the whole resource file useless.

Fluent, like CSS, follows The Robustness Principle aiming to be lenient in what it accepts and strict in what it produces.

It degrades gracefully, throwing away only the minimal amount of syntax and and attempts to recover from errors on multiple levels.

That means that most errors will result in degraded quality of the message, but never in an empty string in the UI.

The syntax and API are also designed with tooling support in mind, making it easy for tools to aid the localizer while translating and warn against potential errors.

You can read more about the vision of error recovery in Fluent in the Error Handling document.

Compound Messages

MessageFormat follows the single-string-per-message paradigm that doesn't fit well into modern UI localization. With an increase in more complex widgets that carry multiple related messages, it requires introduction of multiple separate messages that have no semantic relation making it hard to provide comments that are attached to all of them, and introducing inconsistent recovery like when some messages from the widget error out, and others don't.

Fluent introduces the concept of compound messages that translate well into HTML Elements, React Components, Web Components and other complex objects. With its syntax design choices, Fluent allows localizers to work with whole translation units and developers to bind one message to a single widget. That in turn, fits well with localization workflow cycle, updates, error recovery and aids localizers in keeping the translations consistent.

Example:

// fluent.js:

// This is a comment applicable to the whole compound message
confirm = Do you want to delete all your emails?
    .ok     = Yes
    .cancel = No

// {React|Web}Component:

<Dialog l10n-id="confirm" />

Multi-variant Messages

Both, MessageFormat and Fluent support ability to branch the message value depending on a selector. In Fluent we call this a SelectExpression and it may look like this:

portfolio-cta = { $gender ->
    [male]   Take a look at his portfolio
    [female] Take a look at her portfolio
   *[neuter] Take a look at their portfolio
}

Fluent also introduces one more way to branch a message value, without a selector:

brand-name = {
   *[nominative]   Firefox
    [genitive]     Firefoksa
    [dative]       Firefoxu
    [accusative]   Firefox
    [locative]     Firefoxu
    [instrumental] Firefoxom
}

hello = Witaj w { brand-name[dative] }

In this scenario the message has multiple variants of it's value, but no selector expression. A message like that is not very useful for the developer, since if he requests it, the default (nominative) variant will be returned each time.

But this message can be references from another message and in that case, the caller may specify which variant he's asking for. This enables constructing consistent and grammatically accurate interpolations.

Multiline / DOM Fragments

MessageFormat multi-line support is complex and requires character escaping. That makes multi-line values impractical, increases the reliance on tooling and makes the syntax more fragile to errors.

Fluent has first class support for multi lines which is particularly important for cases like localization of whole DOM Fragments. That plays well with fluent-dom feature called DOM Overlays which enable developers and localizers to easily work on complex semantic fragments.

BiDi

MessageFormat has not been designed with bidirectional text support in mind. It doesn't support scenarios where part of the string, for example external variable, comes in different direction than the string itself.

Fluent supports Unicode BDI/FSI control codes when working with placeables, allowing for high quality bidirectional localization.

You can read more about Fluent's approach to BiDi in the BiDi in Fluent article.

Intl Formatters

MessageFormat provides support for multiple Intl formatters like NUMBER, DATE and TIME, but the syntax for using formatters is unfamiliar for new users and inconsistent with the rest of the format. It is also possible to add more formatters, but the API for that is not hashed out and differs per implementation. Using custom formatters is discouraged by the ICU.. Cumbersome syntax for using formatters increases the toll on the reader and plays a substantial role in making the format hard to edit by hand.

Fluent not only supports more formatters and has well designed API for extending functions, but also has clean logic for using formatters implicitly which increases the readability of the format and makes it easier to work with it by hand.

Positional and keyword based arguments syntax are similar to Excel syntax which allows users to use their prior knowledge when working with functions.

You can read more about Fluent relationship with standards in the Fluent and Standards article.

Formatting Control

MessageFormat relies on developer to make all decisions about formatting.

Fluent enables localizers to have limited control over selected subset of arguments to formatters. This means that, for example, the developer decides what currency the passed number is in, but the localizer can override the defaults on whether the currency should be displayed as a symbol or currency code.

Below are examples of the three models of control over formatting:

Localizer driven

// main.js:
data = {
  date: new Date(0)
}

// main.ftl:
today = Today is { DATETIME($date, style: "long") }

Developer driven

// main.js:
data = {
  amount: Fluent.NumberArgument(value, { currency: "USD" })
}

// main.ftl:
amount-owed = You owe { $amount }

Mixed

// main.js:
data = {
  amount: Fluent.NumberArgument(value, { currency: "USD" })
}

// main.ftl:
amount-owed = You owe { NUMBER($amount, currencyDisplay: "code", useGrouping: "false") }

In the last case, Fluent controls which parameters can be overriden by the localizer.

Summary

We believe that both Fluent API and syntax represent a substantial improvement over MessageFormat and justify it being a separate proposal for standardization of localization framework.