-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Links and hashtags seem to change after translation #93
Comments
Do you have an example to reproduce ? |
Absolutely @Animenosekai ! Here is a text I got from a tweet. Notice how both hashtags and the tweet link letters are alterated. In the second hashtag, a letter even gets added out of nowhere.
Result: Even if the normal text got translated fine, hashtags and link got alterated:
Any ideas on how to fix this? |
Parsing with a Regex maybe ? |
what do you mean? theres params I can pass to the GoogleTranslate() instance that allow me to hide parts of the passed text using regex? |
Nope not for now but should I ? Here is the major problem coming with this and HTML translation though : TLDR: Might work for Latin based languages, but different languages have different structures and the order of words might need to change from one language to another. (this is also one of the reasons why when we translate stuff we don't translate each word individually and put back the pieces) |
Yeah I mean implement what I said would actually make it way better. The issue you mentioned kinda relates to the topic, and yeah thats easily fixable by just add a space in the final result after the dots or commas, if missing, but yeah implementing regex or any other way to hide certain parts of text would be awesome as it's frequent to alterate them |
Yes, this issue might be easier to handle than normal translations, as links don't exactly mean anything and don't need to be translated. But, here is the problem : First, it is not possible to separately translate things because it might not result in the best translation (because words have different meanings as a whole rather than individually). Also, as said before, there is no telling the position of the link should change, thus we can't just pin the position of the link and replace it after the translation:
Now, if we let the translator translate everything and it ends up having issues with the links, we might want to find the link in the translated text and replace it with the previous one. Something like this would be imaginable: def link_correction(translated_text: str, links: list[str]) -> str:
"""A simple link correction function to keep the same links as before translation"""
processing_text = translated_text.lower()
for link in links:
index = processing_text.find(link.lower()) # try to find the link in the translated text
translated_text = translated_text[:index] + link + translated_text[len(link) + 1:] # just replace the link with the one before translation
return translated_text
Now, as you mentioned previously:
So if we have two links similar lower cased, they might be both replaced by the same link. Now what should I do ?
|
@reddere, Use GoogleTranslateV2 and specify all your "static" links/hashtags into specific
For more information visit: https://cloud.google.com/translate/troubleshooting In [5]: from translatepy.translators.google import GoogleTranslateV2
In [6]: dl = GoogleTranslateV2()
In [9]: dl.translate('Kado Thorne es un Vampiro y viajó en el tiempo desde el año 2020 cuando se presentó a la skin Oro.\n\n<span class="notranslate">#Fortnite</span> <span class="notran
...: slate">#FortniteLastResort</span> <span class="notranslate">https://t.co/m1cE9sSrNb</span>', 'it')
Out[9]: TranslationResult(service=Translator(Google), source='Kado Thorne es un Vampiro y viajó en el tiempo desde el año 2020 cuando se presentó a la skin Oro.\n\n<span class="notranslate">#Fortnite</span> <span class="notranslate">#FortniteLastResort</span> <span class="notranslate">https://t.co/m1cE9sSrNb</span>', source_lang=Language(Spanish), dest_lang=Language(Italian), translation='Kado Thorne è un vampiro e ha viaggiato indietro nel tempo a partire dall\'anno 2020 quando gli è stata presentata la skin Oro.\n\n<span class="notranslate">#Fortnite</span> <span class="notranslate">#FortniteLastResort</span> <span class="notranslate">https://t.co/m1cE9sSrNb</span>') |
Thank you so much @ZhymabekRoman @Animenosekai . Haven't tested the workaround yet, but I kept my old GoogleTranslator until just 2 days ago when I tried the ReversoTranslator, which to me, seems to work even better than GoogleTranslator. Both on a lexical and choice of word level, in Italian seems to work decently. Somehow though, I did find an issue for that one as well, as it throws error when word like |
Was talking with Venom on Discord about possible workarounds and support for |
When using GoogleTranslate(), it alterates the links capital and non-capital letters randomly. How to fix this?
The text was updated successfully, but these errors were encountered: