You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Some words with umlauts are causing UnicodeEncodeError when using Wiktionary dumps as source.
To Reproduce
Steps to reproduce the behavior:
Configure an English Wiktionary dump as the primary source.
Try to download the definition for the word "sihinää".
See an error dialog with a UnicodeEncodeError exception.
Expected behavior
The lookup is successful or at least fails gracefully without a modal dialog with a traceback. The same word can be located in the online Wiktionary no problem.
Screenshots
Logs
2025-01-05 13:53:32.119 | DEBUG | vocabsieve.main:getKnownDataOnThread:426 - Some data sources aren't available, not getting known data now
2025-01-05 13:53:38.577 | DEBUG | vocabsieve.ui.searchable_boldable_text_edit:bold:11 - bolding sihinää
2025-01-05 13:53:38.579 | DEBUG | vocabsieve.ui.multi_definition_widget:lookup:138 - Looking up sihinää in [<vocabsieve.sources.local_dictionary_source.LocalDictionarySource object at 0x335bab590>]
2025-01-05 13:53:38.580 | ERROR | vocabsieve.uncaught_hook:make_error_box:17 - Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/sources/local_dictionary_source.py", line 14, in _lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/local_dictionary.py", line 97, in define
KeyError: 'Word sihinää not found in raw-wiktextract-data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/main.py", line 827, in lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/ui/multi_definition_widget.py", line 140, in lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/ui/multi_definition_widget.py", line 160, in _lookup_in_source
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/models.py", line 325, in define
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/models.py", line 336, in _fmt_lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/sources/local_dictionary_source.py", line 17, in _lookup
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-21: ordinal not in range(128)
Desktop (please complete the following information):
OS: macOS 14.6.1
Vocabsieve version (if nightly, must be latest): 0.12.4
The text was updated successfully, but these errors were encountered:
Describe the bug
Some words with umlauts are causing UnicodeEncodeError when using Wiktionary dumps as source.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The lookup is successful or at least fails gracefully without a modal dialog with a traceback. The same word can be located in the online Wiktionary no problem.
Screenshots
![image](https://private-user-images.githubusercontent.com/1007556/400198013-56362b43-f96c-47bd-8510-1dc115391c11.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg3Njk2OTgsIm5iZiI6MTczODc2OTM5OCwicGF0aCI6Ii8xMDA3NTU2LzQwMDE5ODAxMy01NjM2MmI0My1mOTZjLTQ3YmQtODUxMC0xZGMxMTUzOTFjMTEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDVUMTUyOTU4WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NjExMjQxYTg3MjliYzZhODljZWI5NGM1NTU2ZTRlOTY4NmMxODc1ODY4NThkMWY4YjU5YjhiNGRkNWM1M2ViNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.dblUUXjKlSnMIsWVFjSpc4xxU3vRtP99rMDOykBWboo)
Logs
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: