-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Handling for Non-English Alphabets #1029
Comments
Hi @elbre Regards, |
Just to note, a workaround might be to unzip the contents here: Which might point to there being something in the archive handling process in general causing the issue. I'm not sure. It's actually the same pattern in Siegfried too cc. @richardlehane. e.g. Without extracting: ---
filename : 'Jindrich.Stovicek.zip#Jind²ich µ£ovíƒek/ⁿíƒansk∞ k²iτ£ál/ⁿí'
filesize : 0
modified : 2023-11-29T21:59:26Z
errors : 'empty source'
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
class :
basis :
warning : 'no match'
---
filename : 'Jindrich.Stovicek.zip#Jind²ich µ£ovíƒek/µt╪σátko/µ'
filesize : 0
modified : 2023-11-29T22:00:50Z
errors : 'empty source'
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
class :
basis :
warning : 'no match' With extracting: ---
filename : 'Říčanský křišťál/Říčanský křišťál.txt'
filesize : 0
modified : 2023-11-29T21:59:26+01:00
errors : 'empty source'
matches :
- ns : 'pronom'
id : 'x-fmt/111'
format : 'Plain Text File'
version :
mime : 'text/plain'
class :
basis : 'extension match txt'
warning : 'match on extension only'
---
filename : 'Štěňátko/Štěňátko.txt'
filesize : 0
modified : 2023-11-29T22:00:50+01:00
errors : 'empty source'
matches :
- ns : 'pronom'
id : 'x-fmt/111'
format : 'Plain Text File'
version :
mime : 'text/plain'
class :
basis : 'extension match txt'
warning : 'match on extension only' Was just interested to take a look at this as we had problems with earlier DROID releases with the Māori language character set, but I had thought they were resolved. I guess we didn't process a lot of zips back in the day! |
"Thank you for the workaround. Unfortunately, we are working on a workflow where ZIP files should also be acceptable." |
I did a little testing with this today. It looks like the file names within the zip file aren't UTF-8 or IBM437 (the default in the zip spec), but rather have the character encoding IBM852. I'm not really sure how you'd go about reliably detecting this during unzipping (though tools like 7-zip and WinZip seem to manage it so perhaps it is possible?): |
I can also provide material made in command line: |
archiv.zip still contains non-UTF-8 filenames. Try the
|
Good day once again. However, as mentioned earlier, the originally provided source is the default method for creating zip files, and it is highly probable in our region to encounter these files. Therefore, I would prefer to keep the issue open." |
Hello,
I would like to bring attention to an issue I encountered while working with Droid, specifically when dealing with files containing characters from alphabets other than English.
Initially, we suspected that the problem might be related to using *.zip files. However, after further investigation, we observed similar issues when generating *.7z files and attempting different export methods.
To assist in resolving this matter, I am attaching the original files, the export, and a screenshot of the application to provide a comprehensive overview.
I am curious to know if there are plans to address these issues in the near future or if there is already a known solution?
aaa.txt
Jindřich Šťovíček.zip
The text was updated successfully, but these errors were encountered: