Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22907 MF2: Finish updating spec tests and implement required test functions #3216

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

catamorphism
Copy link
Contributor

@catamorphism catamorphism commented Sep 25, 2024

This PR does everything that's needed in order to make the pattern-selection spec tests work.

This includes implementing the test-only standard functions, test:function, test:format and test:select, specified in the test README file in the spec.

This was tricky because the select-only test:select function is used in a composable way in these tests, which isn't something that appeared in previous tests. Previously, the code assumed that a named result of a selector could be used in a .match, but not used as an operand to another function.

So in this PR, I implement a limited form of what will eventually (see draft PR #3228, which is still in design review) be a more coherent mechanism for function composition. I implemented just enough of it to make these tests work.

I added an InternalValue type used in the formatter, which is distinct from the FormattedPlaceholder type that function handlers return. This makes it easier to guarantee that fallback (error) values are not passed to functions. An InternalValue includes a function name, operand, and options. Since at the time when a resolved value is named, it's unknown whether it will be used for formatting or selection, InternalValue has both a formatting and a selection method, which can be called when the resolved value is eventually consumed by either a .match or a pattern.

I had to modify the depstest.py script for the same reasons as before; now, code that manipulates variants is also found in messageformat2_evaluation.cpp, but as before, it's never used in a way that could throw an exception.

I'll be happy to explain more and/or add comments to the code as needed.

Checklist

  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22907
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from 2680b67 to c3902e3 Compare September 25, 2024 23:49
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2_function_registry_internal.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from c3902e3 to 5f55d7c Compare September 25, 2024 23:55
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from 5f55d7c to 38500d3 Compare September 25, 2024 23:58
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism changed the title DRAFT: ICU-22908 MF2: Finish updating spec tests and implement required test functions DRAFT: ICU-22907 MF2: Finish updating spec tests and implement required test functions Sep 27, 2024
@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from 51d2915 to 43b0280 Compare November 14, 2024 18:06
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2_checker.cpp is different
  • icu4c/source/i18n/messageformat2_checker.h is different
  • icu4c/source/i18n/messageformat2_data_model.cpp is different
  • icu4c/source/i18n/messageformat2_evaluation.cpp is different
  • icu4c/source/i18n/messageformat2_evaluation.h is different
  • icu4c/source/i18n/messageformat2_formatter.cpp is different
  • icu4c/source/i18n/messageformat2_function_registry_internal.h is different
  • icu4c/source/i18n/messageformat2_function_registry.cpp is different
  • icu4c/source/i18n/messageformat2_parser.cpp is different
  • icu4c/source/i18n/messageformat2_parser.h is different
  • icu4c/source/i18n/messageformat2_serializer.cpp is different
  • icu4c/source/i18n/messageformat2.cpp is different
  • icu4c/source/i18n/unicode/messageformat2_data_model.h is different
  • icu4c/source/i18n/unicode/messageformat2_formattable.h is different
  • icu4c/source/i18n/unicode/messageformat2.h is different
  • icu4c/source/test/intltest/messageformat2test_custom.cpp is different
  • icu4c/source/test/intltest/messageformat2test_read_json.cpp is different
  • icu4c/source/test/intltest/messageformat2test_utils.h is different
  • icu4c/source/test/intltest/messageformat2test.cpp is different
  • testdata/message2/normalization.json is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism changed the title DRAFT: ICU-22907 MF2: Finish updating spec tests and implement required test functions ICU-22907 MF2: Finish updating spec tests and implement required test functions Dec 9, 2024
@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from 43b0280 to 076ee46 Compare December 9, 2024 18:24
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2_checker.cpp is no longer changed in the branch
  • icu4c/source/i18n/messageformat2_checker.h is no longer changed in the branch
  • icu4c/source/i18n/messageformat2_data_model.cpp is no longer changed in the branch
  • icu4c/source/i18n/messageformat2_parser.cpp is no longer changed in the branch
  • icu4c/source/i18n/messageformat2_parser.h is no longer changed in the branch
  • icu4c/source/i18n/messageformat2_serializer.cpp is no longer changed in the branch
  • icu4c/source/i18n/unicode/messageformat2_data_model.h is no longer changed in the branch
  • icu4c/source/test/intltest/messageformat2test_custom.cpp is no longer changed in the branch
  • icu4c/source/test/intltest/messageformat2test_read_json.cpp is no longer changed in the branch
  • icu4c/source/test/intltest/messageformat2test_utils.h is no longer changed in the branch
  • icu4c/source/test/intltest/messageformat2test.cpp is no longer changed in the branch
  • testdata/message2/alias-selector-annotations.json is no longer changed in the branch
  • testdata/message2/icu-test-selectors.json is no longer changed in the branch
  • testdata/message2/matches-whitespace.json is no longer changed in the branch
  • testdata/message2/more-data-model-errors.json is no longer changed in the branch
  • testdata/message2/normalization.json is no longer changed in the branch
  • testdata/message2/resolution-errors.json is no longer changed in the branch
  • testdata/message2/runtime-errors.json is no longer changed in the branch
  • testdata/message2/spec/data-model-errors.json is no longer changed in the branch
  • testdata/message2/spec/functions/date.json is no longer changed in the branch
  • testdata/message2/spec/functions/datetime.json is no longer changed in the branch
  • testdata/message2/spec/functions/integer.json is no longer changed in the branch
  • testdata/message2/spec/functions/number.json is no longer changed in the branch
  • testdata/message2/spec/functions/string.json is no longer changed in the branch
  • testdata/message2/spec/functions/time.json is no longer changed in the branch
  • testdata/message2/spec/pattern-selection.json is no longer changed in the branch
  • testdata/message2/spec/syntax-errors.json is no longer changed in the branch
  • testdata/message2/spec/syntax.json is no longer changed in the branch
  • testdata/message2/valid-tests.json is no longer changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from 076ee46 to 31abd17 Compare December 9, 2024 18:24
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism marked this pull request as ready for review December 9, 2024 18:25
@catamorphism catamorphism requested a review from srl295 December 9, 2024 18:25
… functions

Implement :test:format, :test:select, and :test:function, which are
required by the new `pattern-selection.json` tests.

Change the internal value representation in the formatter in order to
support some of the test cases (binding the results of selectors to a
variable).
@catamorphism catamorphism force-pushed the spec-updates-plus-pattern-selection branch from 31abd17 to 99fe24c Compare December 9, 2024 18:31
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/messageformat2.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@richgillam
Copy link
Contributor

@catamorphism and @srl295 What's going on with this one? Do we need new reviewers? Is this still happening?

@catamorphism
Copy link
Contributor Author

catamorphism commented Jan 20, 2025

@catamorphism and @srl295 What's going on with this one? Do we need new reviewers? Is this still happening?

@richgillam I'm currently on medical leave, hopefully returning Feb. 24. In the meantime, it could be reviewed. I hope it's still happening, as it's necessary for spec compliance!

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable to me— there are suggestions for improvement that are minor - performance, maintainability. I would recommend this is mergable with followups.

}
// 9. If the fails option is set, then
Formattable failsOpt;
if (options.getFunctionOption(UnicodeString("fails"), failsOpt)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the expensive constructor for UnicodeString, using default char* constructor.
I think a followon would be good to make these statics or something.

UnicodeString failsString = failsOpt.getString(status);
if (U_SUCCESS(status)) {
// 9i. If its value resolves to the string 'always', then
if (failsString == u"always") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. won't this construct a string every time?
  2. move these strings into constants?

@@ -211,6 +211,60 @@ namespace message2 {

TextSelector(const Locale& l) : locale(l) {}
};

// See https://github.com/unicode-org/message-format-wg/blob/main/test/README.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For API (though this is internal?) this should perhaps go to something more permanent in the ICU user guide?

// TODO doc comments
// Encapsulates either a formatted string or formatted number;
// more output types could be added in the future.
// Returns a new FunctionOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Returns a new FunctionOptions
/**
* The original FunctionOptions isn't usable after this call.
* @returns A new, merged FunctionOptions.
*/

is 'other' usable after this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like neither are usable

Comment on lines +1468 to +1473
// 3i. The character . U+002E Full Stop.
result += u".";
// 3ii. The single decimal digit character representing the value
// floor((abs(Input) - floor(abs(Input))) * 10)
int32_t val = floor((abs(input) - floor(abs(input)) * 10));
result += digitToChar(val, status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like it should be a utility function w/ other ICU number formatting. COuld be future.

}

ResolvedSelector::ResolvedSelector(FormattedPlaceholder&& val) : value(std::move(val)) {}
// Options in `this` take precedence
// `this` can't be used after mergeOptions is called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added a comment, this might be good to note in the function declaration

} else {
formatSelectors(context, *globalEnv, status, result);
// Check for errors/warnings -- if so, then the result of pattern selection is the fallback value
// See https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#pattern-selection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should link to tr35 - maybe current spec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// See https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#pattern-selection
// See https://www.unicode.org/reports/tr35/tr35-messageFormat.html#pattern-selection

@@ -465,8 +469,42 @@ static double parseNumberLiteral(const FormattedPlaceholder& input, UErrorCode&
return result;
}

static UChar32 digitToChar(int32_t val, UErrorCode errorCode) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be merged with or use other utility function in ICU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants