Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translator.create_glossary() forces to remove regional variant #109

Open
CJRzzZ opened this issue Jun 14, 2024 · 3 comments
Open

translator.create_glossary() forces to remove regional variant #109

CJRzzZ opened this issue Jun 14, 2024 · 3 comments

Comments

@CJRzzZ
Copy link

CJRzzZ commented Jun 14, 2024

I've encountered a problem with the translator.create_glossary() function, where it sets the source language of a glossary object to "EN" despite the argument specifying "EN-US". This behavior seems to stem from the code in "translator.py" at line 302, which attempts to strip regional variants and retain only the base language code.

This leads to an issue because "EN" is deprecated in the DeepL API, which then throws a deepl.exceptions.DeepLException stating "target_lang="EN" is deprecated, please use "EN-GB" or "EN-US" instead." Furthermore, if the glossary is set with "EN" and translator.translate_text() is called with "EN-US" as the source language, a ValueError is raised, stating "source_lang and target_lang must match glossary". This inconsistency makes it impossible to use a matching value for the source language.

Could you please look into this? Thank you for your attention to this matter.

@JanEbbing
Copy link
Member

Sorry, can you clearly describe (maybe with sample code) what you are doing and what error you get?

  1. Glossaries don't have a regional variant attached to them, so "EN" is correct as the source or target language of a glossary.
  2. It should then be possible to use glossaries for all variants of their associated language.
  3. "EN-US" as the source language
    This sounds like the issue - the source language would have to be "EN". Regional variants are only supported for target languages. The error you get seems to be wrong though, I can follow up on this.

You can read more on this differentiation in the documentation here

@CJRzzZ
Copy link
Author

CJRzzZ commented Jun 18, 2024

Sure, here is the sample code,
g = translator.create_glossary("GITCG_en_to_jp", 'EN-US', 'JA', dict_en_to_jp )
result = translator.translate_text(clean_text, source_lang=source_lang, target_lang=target_lang, glossary=g, ).text
In the first line, I tried to store the glossary with "EN-US" as the source language. The function "create_glossary" will automatically convert the source language into "EN". But it brings problem in the second line, when I tried to use "EN-US" as the source_lang, it returned "source_lang and target_lang must match glossary" error; when I tried to use "EN" as the source_lang, it returned "target_lang="EN" is deprecated, please use "EN-GB" or "EN-US" instead" error. So this is the error I have met and I hope I made it clear to you.

@JanEbbing
Copy link
Member

Yes, like I said - we differentiate between source and target languages

  1. "EN" is a valid source language
  2. "EN-US" is an invalid source language
  3. "EN" is an invalid target language
  4. "EN-US" is a valid target language

So in your code, the following should work:

source_lang = "EN"
target_lang = "JA"
g = translator.create_glossary("GITCG_en_to_jp", source_lang, target_lang, dict_en_to_jp )
result = translator.translate_text(clean_text, source_lang=source_lang, target_lang=target_lang, glossary=g, ).text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants