Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some words with umlauts are causing UnicodeEncodeError when using Wiktionary dumps as source #181

Open
east825 opened this issue Jan 5, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@east825
Copy link

east825 commented Jan 5, 2025

Describe the bug
Some words with umlauts are causing UnicodeEncodeError when using Wiktionary dumps as source.

To Reproduce
Steps to reproduce the behavior:

  1. Configure an English Wiktionary dump as the primary source.
  2. Try to download the definition for the word "sihinää".
  3. See an error dialog with a UnicodeEncodeError exception.

Expected behavior
The lookup is successful or at least fails gracefully without a modal dialog with a traceback. The same word can be located in the online Wiktionary no problem.

Screenshots
image

Logs

2025-01-05 13:53:32.119 | DEBUG    | vocabsieve.main:getKnownDataOnThread:426 - Some data sources aren't available, not getting known data now
2025-01-05 13:53:38.577 | DEBUG    | vocabsieve.ui.searchable_boldable_text_edit:bold:11 - bolding sihinää
2025-01-05 13:53:38.579 | DEBUG    | vocabsieve.ui.multi_definition_widget:lookup:138 - Looking up sihinää in [<vocabsieve.sources.local_dictionary_source.LocalDictionarySource object at 0x335bab590>]
2025-01-05 13:53:38.580 | ERROR    | vocabsieve.uncaught_hook:make_error_box:17 - Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/sources/local_dictionary_source.py", line 14, in _lookup
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/local_dictionary.py", line 97, in define
KeyError: 'Word sihinää not found in raw-wiktextract-data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/main.py", line 827, in lookup
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/ui/multi_definition_widget.py", line 140, in lookup
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/ui/multi_definition_widget.py", line 160, in _lookup_in_source
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/models.py", line 325, in define
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/models.py", line 336, in _fmt_lookup
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/sources/local_dictionary_source.py", line 17, in _lookup
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-21: ordinal not in range(128)

Desktop (please complete the following information):

  • OS: macOS 14.6.1
  • Vocabsieve version (if nightly, must be latest): 0.12.4
@east825 east825 added the bug Something isn't working label Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant