Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22984 Generate the UAX29 monkeys #3304

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
23d9a3e
ICU-22986 GL takes CM
eggrobin Dec 10, 2024
ea8748c
Regenerate line.brk (@markusicu to flip the bytes)
eggrobin Dec 10, 2024
b644118
Fix the old monkeys from Java who are still hardcoded
eggrobin Dec 10, 2024
46c29d1
meow
eggrobin Dec 10, 2024
6e992c3
Something that compiles at last
eggrobin Dec 12, 2024
e3a8136
Update the tailorings too
eggrobin Dec 12, 2024
6f6fdde
Update .brk files (to be flipped by @markusicu)
eggrobin Dec 12, 2024
ed3d6d5
Somehow it compiles
eggrobin Dec 12, 2024
71eb398
It seems to work
eggrobin Dec 12, 2024
9befb32
Dumber escaping
eggrobin Dec 12, 2024
d428c8d
🍎.xml
eggrobin Dec 12, 2024
938ef97
(?!.) ftw
eggrobin Dec 12, 2024
c165ab8
Merge branch '22986' into surili
eggrobin Dec 13, 2024
fe75c00
Greedier regices, prevent remap rules from creating surrogate pairs
eggrobin Dec 13, 2024
24ec66f
I’ll be back
eggrobin Dec 13, 2024
a954b16
Merge branch '22986' into surili
eggrobin Dec 13, 2024
29351ce
monkeys
eggrobin Dec 13, 2024
3decf2c
🐪
eggrobin Dec 13, 2024
698633d
Merge branch '22986' into surili
eggrobin Dec 13, 2024
1301f94
Port the surrogate assembly preventer
eggrobin Dec 13, 2024
eb6c9b1
sot last
eggrobin Dec 13, 2024
b102ed7
Joys of 4-space indent
eggrobin Dec 13, 2024
8f78834
Merge branch '22986' into surili
eggrobin Dec 13, 2024
e0332c7
charred monkey
eggrobin Dec 13, 2024
09df098
No exclusion
eggrobin Dec 13, 2024
be3d042
strongly worded monkeys
eggrobin Dec 13, 2024
f0ec9ee
Sentenced monkeys, factor dictionarySet_, known issues on 20 year old…
eggrobin Dec 13, 2024
cfe6d4a
I hallucinated that bug, apparently.
eggrobin Dec 13, 2024
71098c6
Apparently I hallucinated that bug
eggrobin Dec 13, 2024
0ac10b1
Merge branch '22986' into surili
eggrobin Dec 13, 2024
e504948
Merge branch 'surili' into 𒅗𒇻𒋗𒉡
eggrobin Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line.txt
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_cj.txt
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_loose.txt
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_loose_cj.txt
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_loose_phrase_cj.txt
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_normal.txt
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_normal_cj.txt
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_normal_phrase_cj.txt
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
2 changes: 1 addition & 1 deletion icu4c/source/data/brkitr/rules/line_phrase_cj.txt
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ $LB20NonBreaks = [$LB18NonBreaks - $CB];
# and then to default UAX #14 behaviour (UTC-179-C32).
#
^($HY | $HH) $CM* $ALPlus;
$GL ($HY | $HH) $CM* $ALPlus;
$GL $CM* ($HY | $HH) $CM* $ALPlus;
# Non-breaking CB from LB8a:
$CB $CM* $ZWJ ($HY | $HH) $CM* $ALPlus;
# Non-breaking SP from LB14:
Expand Down
Loading
Loading