Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify and translate missing plural keys #596

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
4 changes: 4 additions & 0 deletions config/locales/en.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ en:
nostdin: Do not read from stdin
out_format: 'Output format: %{valid_text}'
pattern_router: 'Use pattern router: keys moved per config data.write'
skip_interpolation: Skip translating strings that are straight interpolations (eg. "%{comment}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding an option, perhaps we should always simply copy such keys verbatim? There is never a reason to translate them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be preferential in our use case but I didn't want to assume...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we don't even copy them, we have Rails configured with fallback which then just works... Not sure that's a safe assumption either though.

Copy link
Owner

@glebm glebm Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is never any reason to pass interpolation-only values through machine translation, so let's copy them verbatim?

Copy link
Author

@markedmondson markedmondson Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! :)

Edit: Although in our case we don't want to copy them either, I'd prefer to skip and leave them to fallback. Should I leave a mechanism in place for that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense to have that as an option.

strict: >-
Avoid inferring dynamic key usages such as t("cats.#{cat}.name"). Takes precedence over
the config setting if set.
Expand Down Expand Up @@ -112,8 +113,11 @@ en:
missing:
details_title: Value in other locales or source
none: No translations are missing.
plural: Missing plural keys
openai_translate:
errors:
invalid_json: invalid JSON returned from backend.
invalid_size: 'Invalid size returned, expected: %{expected}, received: %{actual}.'
no_api_key: >-
Set OpenAI API key via OPENAI_API_KEY environment variable or translation.openai_api_key
in config/i18n-tasks.yml. Get the key at https://openai.com/.
Expand Down
5 changes: 5 additions & 0 deletions config/locales/ru.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ ru:
nostdin: Не читать дерево из стандартного ввода
out_format: 'Формат вывода: %{valid_text}.'
pattern_router: 'Использовать pattern_router: ключи распределятся по файлам согласно data.write'
skip_interpolation: Пропустить перевод строк, которые являются прямыми интерполяциями (например,
"%{comment}")
strict: Не угадывать динамические использования ключей, например `t("category.#{category.key}")`
translation_backend: Движок перевода [google, deepl, yandex, openai]
value: >-
Expand Down Expand Up @@ -111,8 +113,11 @@ ru:
missing:
details_title: На других языках или в коде
none: Всё переведено.
plural: Отсутствуют ключи множественного числа
openai_translate:
errors:
invalid_json: недействительный JSON, возвращенный из бэкэнда
invalid_size: 'Возвращен неверный размер, ожидалось: %{expected}, получено: %{actual}.'
no_api_key: |-
Установить ключ API Яндекса с помощью переменной среды OPENAI_API_KEY или translation.openai_api_key
в config / i18n-tasks.yml. Получите ключ по адресу https://openai.com/.
Expand Down
3 changes: 3 additions & 0 deletions lib/i18n/tasks/base_task.rb
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ class BaseTask
include Data
include Stats

INTERPOLATION_KEY_RE = /%\{[^}]+}/.freeze
INTERPOLATION_ONLY_KEY_RE = /^%\{[^}]+\}$/.freeze

def initialize(config_file: nil, **config)
@config_override = config_file
self.config = config || {}
Expand Down
13 changes: 11 additions & 2 deletions lib/i18n/tasks/command/commands/missing.rb
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,19 @@ def missing(opt = {})
cmd :translate_missing,
pos: '[locale ...]',
desc: t('i18n_tasks.cmd.desc.translate_missing'),
args: [:locales, :locale_to_translate_from, arg(:out_format).from(1), :translation_backend, :pattern]
args: [
:locales, :locale_to_translate_from, arg(:out_format).from(1),
:translation_backend, :pattern, :skip_interpolation
]

def translate_missing(opt = {})
missing = i18n.missing_diff_forest opt[:locales], opt[:from]
missing = i18n.missing_keys(
locales: opt[:locales],
types: %w[diff plural],
base_locale: opt[:from],
skip_interpolation: (opt[:skip_interpolation] != 'false')
)

if opt[:pattern]
pattern_re = i18n.compile_key_pattern(opt[:pattern])
missing.select_keys! { |full_key, _node| full_key =~ pattern_re }
Expand Down
5 changes: 5 additions & 0 deletions lib/i18n/tasks/command/options/common.rb
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ module Common
'--config FILE',
t('i18n_tasks.cmd.args.desc.config')

arg :skip_interpolation,
'-s',
'--skip_interpolation',
t('i18n_tasks.cmd.args.desc.skip_interpolation')

def arg_or_pos!(key, opts)
opts[key] ||= opts[:arguments].try(:shift)
end
Expand Down
85 changes: 56 additions & 29 deletions lib/i18n/tasks/missing_keys.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ def missing_keys_types

# @param types [:used, :diff, :plural] all if `nil`.
# @return [Siblings]
def missing_keys(locales: nil, types: nil, base_locale: nil)
def missing_keys(locales: nil, types: nil, base_locale: nil, skip_interpolation: false)
locales ||= self.locales
types ||= missing_keys_types
base = base_locale || self.base_locale
types.inject(empty_forest) do |f, type|
f.merge! send(:"missing_#{type}_forest", locales, base)
f.merge! send(:"missing_#{type}_forest", locales, base, skip_interpolation)
end
end

Expand All @@ -35,47 +35,31 @@ def eq_base_keys(opts = {})
end
end

def missing_diff_forest(locales, base = base_locale)
def missing_diff_forest(locales, base = base_locale, skip_interpolation = false)
tree = empty_forest

# present in base but not locale
(locales - [base]).each do |locale|
tree.merge! missing_diff_tree(locale, base)
tree.merge! missing_diff_tree(locale, base, skip_interpolation)
end
if locales.include?(base)
# present in locale but not base
(self.locales - [base]).each do |locale|
tree.merge! missing_diff_tree(base, locale)
tree.merge! missing_diff_tree(base, locale, skip_interpolation)
end
end
tree
end

def missing_used_forest(locales, _base = base_locale)
def missing_used_forest(locales, _base = base_locale, _skip_interpolation = false)
locales.inject(empty_forest) do |forest, locale|
forest.merge! missing_used_tree(locale)
end
end

def missing_plural_forest(locales, _base = base_locale)
locales.each_with_object(empty_forest) do |locale, forest|
required_keys = required_plural_keys_for_locale(locale)
next if required_keys.empty?

tree = empty_forest
plural_nodes data[locale] do |node|
children = node.children
present_keys = Set.new(children.map { |c| c.key.to_sym })
next if ignore_key?(node.full_key(root: false), :missing)
next if present_keys.superset?(required_keys)

tree[node.full_key] = node.derive(
value: children.to_hash,
children: nil,
data: node.data.merge(missing_keys: (required_keys - present_keys).to_a)
)
end
tree.set_root_key!(locale, type: :missing_plural)
forest.merge!(tree)
def missing_plural_forest(locales, base = base_locale, _skip_interpolation = false)
locales.inject(empty_forest) do |forest, locale|
forest.merge! missing_plural_tree(locale, base)
end
end

Expand All @@ -93,9 +77,14 @@ def load_rails_i18n_pluralization!(locale)
end

# keys present in compared_to, but not in locale
def missing_diff_tree(locale, compared_to = base_locale)
data[compared_to].select_keys do |key, _node|
locale_key_missing? locale, depluralize_key(key, compared_to)
def missing_diff_tree(locale, compared_to = base_locale, skip_interpolation = false)
data[compared_to].select_keys do |key, node|
# Skip if the string is just an interpolation, eg: "%{body}"
if skip_interpolation && node.value.to_s.match?(BaseTask::INTERPOLATION_ONLY_KEY_RE)
false
else
locale_key_missing? locale, depluralize_key(key, compared_to)
end
end.set_root_key!(locale, type: :missing_diff).keys do |_key, node|
# change path and locale to base
data = { locale: locale, missing_diff_locale: node.data[:locale] }
Expand All @@ -106,6 +95,44 @@ def missing_diff_tree(locale, compared_to = base_locale)
end
end

def missing_plural_tree(locale, base = base_locale, _skip_interpolation = false) # rubocop:disable Metrics/AbcSize
tree = empty_forest

required_keys = required_plural_keys_for_locale(locale)
return tree if required_keys.empty?

plural_nodes data[base] do |node|
children = node.children

# Get the existing translated node if it exists
node_key = node.full_key(root: false)
translated_node = data[locale].get("#{locale}.#{node_key}")
present_keys = if translated_node
# If it's a single existing (non-plural) translation, skip it
next unless translated_node.value.nil?

# Otherwise get the existing plural keys
Set.new(translated_node.children.map { |c| c.key.to_sym })
else
Set.new
end
# Compare the keys to those existing in base
next if ignore_key?(node.full_key(root: false), :missing, locale)
next if present_keys.superset?(required_keys)

# Mark for removal any existing keys that are not required
base_keys = Set.new(node.children.map { |c| c.key.to_sym })
remove_keys = (present_keys + base_keys) - required_keys

tree[node.full_key] = node.derive(
value: children.to_hash,
children: nil,
data: node.data.merge(missing_keys: (required_keys - present_keys).to_a, remove_keys: remove_keys)
)
end
tree.set_root_key!(locale, type: :missing_plural)
end

# keys used in the code missing translations in locale
def missing_used_tree(locale)
used_tree(strict: true).select_keys do |key, _node|
Expand Down
3 changes: 2 additions & 1 deletion lib/i18n/tasks/reports/terminal.rb
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ def missing_key_info(leaf)
when :missing_used
first_occurrence leaf
when :missing_plural
leaf[:data][:missing_keys].join(', ')
"#{Rainbow(I18n.t('i18n_tasks.missing.plural')).cyan} " +
leaf[:data][:missing_keys].join(', ')
else
"#{Rainbow(leaf[:data][:missing_diff_locale]).cyan} " \
"#{format_value(leaf[:value].is_a?(String) ? leaf[:value].strip : leaf[:value])}"
Expand Down
42 changes: 37 additions & 5 deletions lib/i18n/tasks/translators/base_translator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,46 @@ def initialize(i18n_tasks)
# @return [I18n::Tasks::Tree::Siblings] translated forest
def translate_forest(forest, from)
forest.inject @i18n_tasks.empty_forest do |result, root|
merge_missing_plural_nodes!(root)

translated = translate_pairs(root.key_values(root: true), to: root.key, from: from)
result.merge! Data::Tree::Siblings.from_flat_pairs(translated)
end
end

protected

# Recursively loop through the tree until all :missing_plural nodes have
# been added to the base
# @param [I18n::Tasks::Data::Tree::Node] node
# @return [void]
def merge_missing_plural_nodes!(node) # rubocop:disable Metrics/AbcSize
return unless node.children?

node.children.each do |child|
if child.data[:type] == :missing_plural
child.data.delete(:type)
# Add missing plural keys
child.data.delete(:missing_keys).each do |missing_key|
# If the key is present use it, otherwise use the 'other' value
other_value = child.value[missing_key.to_s] || child.value['other']
new_node = Data::Tree::Node.new(
key: missing_key,
value: other_value,
parent: child.parent,
data: { type: :missing_plural_key, locale: node.root.key }
)
child.append!(new_node)
end
# Remove uneeded plural keys
remove_keys = child.data.delete(:remove_keys) || Set.new
child.select_nodes! { |n| !remove_keys.include?(n.key.to_sym) }
else
merge_missing_plural_nodes!(child)
end
end
end

# @param [Array<[String, Object]>] list of key-value pairs
# @return [Array<[String, Object]>] translated list
def translate_pairs(list, opts)
Expand Down Expand Up @@ -51,7 +84,7 @@ def fetch_translations(list, opts)
# @param [Array<[String, Object]>] list of key-value pairs
# @return [Array<String>] values for translation extracted from list
def to_values(list, opts)
list.map { |l| dump_value(l[1], opts) }.flatten.compact
list.map { |l| dump_value(l[1], opts) }.flatten(1).compact
end

# @param [Array<[String, Object]>] list
Expand Down Expand Up @@ -104,14 +137,13 @@ def parse_value(untranslated, each_translated, opts)
end
end

INTERPOLATION_KEY_RE = /%\{[^}]+}/.freeze
UNTRANSLATABLE_STRING = 'X__'

# @param [String] value
# @return [String] 'hello, %{name}' => 'hello, <round-trippable string>'
def replace_interpolations(value)
i = -1
value.gsub INTERPOLATION_KEY_RE do
value.gsub BaseTask::INTERPOLATION_KEY_RE do
i += 1
"#{UNTRANSLATABLE_STRING}#{i}"
end
Expand All @@ -121,9 +153,9 @@ def replace_interpolations(value)
# @param [String] translated
# @return [String] 'hello, <round-trippable string>' => 'hello, %{name}'
def restore_interpolations(untranslated, translated)
return translated if untranslated !~ INTERPOLATION_KEY_RE
return translated if untranslated !~ BaseTask::INTERPOLATION_KEY_RE

values = untranslated.scan(INTERPOLATION_KEY_RE)
values = untranslated.scan(BaseTask::INTERPOLATION_KEY_RE)
translated.gsub(/#{Regexp.escape(UNTRANSLATABLE_STRING)}\d+/i) do |m|
values[m[UNTRANSLATABLE_STRING.length..].to_i]
end
Expand Down
51 changes: 44 additions & 7 deletions lib/i18n/tasks/translators/openai_translator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
module I18n::Tasks::Translators
class OpenAiTranslator < BaseTranslator
# max allowed texts per request
BATCH_SIZE = 50
BATCH_SIZE = 25
DEFAULT_SYSTEM_PROMPT = <<~PROMPT.squish
You are a professional translator that translates content from the %{from} locale
to the %{to} locale in an i18n locale array.
Expand All @@ -16,6 +16,13 @@ class OpenAiTranslator < BaseTranslator

HTML markups (enclosed in < and > characters) must not be changed under any circumstance.
Variables (starting with %%{ and ending with }) must not be changed under any circumstance.
Any provided keys must not be changed under any circumstance.

If a "X__" prefixed interpolation is given for the "other" plural key, you should try and
use that in the translation for any missing pluralizations keys that may represent
multiples.
The goal is to conform to the Unicode CLDR plural rules while maintaining a understandable
translation.

Keep in mind the context of all the strings for a more accurate translation.
PROMPT
Expand Down Expand Up @@ -75,19 +82,48 @@ def translate_values(list, from:, to:)
results = []

list.each_slice(BATCH_SIZE) do |batch|
translations = translate(batch, from, to)

results << JSON.parse(translations)
results << translate_batch(batch, from, to)
end

results.flatten
end

def translate(values, from, to)
def translate_batch(batch, from, to, retry_count = 0)
translations = translate(batch, from, to)
result = JSON.parse(remove_json_markdown(translations))

if result.size != batch.size
if retry_count < RETRIES
translate_batch(batch, from, to, retry_count)
elsif retry_count == RETRIES
# Try each string individually
batch.each do |string|
translate_batch([string], from, to, RETRIES + 1)
end
else
error = I18n.t('i18n_tasks.openai_translate.errors.invalid_size', expected: batch.size, actual: result.size)
fail ::I18n::Tasks::CommandError, error
end
end

result
rescue JSON::ParserError
if retry_count < RETRIES
translate_batch([string], from, to, retry_count + 1)
else
raise ::I18n::Tasks::CommandError, I18n.t('i18n_tasks.openai_translate.errors.invalid_json')
end
end

def remove_json_markdown(string)
string.gsub(/^```json\s*(.*?)\s*```$/m, '\1').strip
end

def translate(values, from, to) # rubocop:disable Metrics/MethodLength
messages = [
{
role: 'system',
content: format(system_prompt, from: from, to: to)
content: format(system_prompt, from: from, to: to, size: values.size)
},
{
role: 'user',
Expand All @@ -103,7 +139,8 @@ def translate(values, from, to)
parameters: {
model: model,
messages: messages,
temperature: 0.0
temperature: 0.0,
stream: false
}
)

Expand Down
Loading