-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid xml data in planet osm #23
Comments
Did you check whether the MD5 matches (see |
I figured that if the file was corrupt, it would be very unlikely for Unhelpfully, it seems that I started testing the original file on the planet server, but it is taking a very, very long time. I'll update here when it's finished. |
Yes, did match. |
This is a bit weird - the planet file on the server looks completely fine. I grepped it for the way ID you mention, and the result is:
with no So if the file on the server is OK, and the MD5sum matches, and it matches your downloaded file too, does that mean that whatever problem is occurring must be during or after decompression? How are you decompressing? Using |
I have used 7-zip file manager version 19 under windows 10 x64. I will try another decompressor. Thanks for investigating so far. |
This time I tried to uncompress with another tool (https://github.com/philr/bzip2-windows/releases) but same result. Any more guesses? |
Looks to me like you (@gartenkralle) might have a problem with your hardware, faulty memory or so. I suggest running a memory tester. |
I think it's unlikely that a hardware fault would affect the decompression in exactly the same way with two different programs (with different memory layouts, etc...). @gartenkralle are you decompressing the whole file? (In other words, you have a file called |
Did a 2 cycle memory check. No faulty memory found. Yes I decompressed the whole file. Decompressing again and then run MD5sum on it. Results I will report in some days... |
Size: 1.542.302.591.588 Bytes MD5 now running... |
MD5 checksum: dfdff2778d0dfad6569ecc2b3613fbb4 |
Here's what I got, for the same input file (our MD5s match for the MD5: 2cf5fcca63685b13440902f0f1fa24e6 We get the same size, but different MD5s. I think something might be going wrong because it's a 1.4TiB file, and that might be pushing the limits of what the decompression software has been tested with (perhaps some subtle bugs when the file length / offset exceeds 40 bits?) It might be worth trying some other software. I'm using Alternatively, is it possible to do what you wanted without decompressing the whole file? If whatever is parsing the OSM file is capable of streaming (e.g: SAX or event parser) then you could Finally, if all those things won't work, then it might be worth rewriting your parser to use the PBF binary file. The data inside is exactly the same, but the PBF is about half the size of the XML and 10 or more times quicker to parse. @joto's excellent https://github.com/osmcode/libosmium is a well-tested and fast library for parsing PBFs, and there's a suite of utilities (https://github.com/osmcode/osmium-tool) for common tasks such as making geographic extracts and filtering by tags. (I think it builds on Windows, but I don't know enough about Windows to say for sure.) |
Thanks for all your tips. Even with bzip2 under cygwin I got wrong MD5 checksum. Maybe a very low level bug or file system bug. Now I try doing on linux and transfering file to windows. Otherwise I will go with the PBF. |
@gartenkralle : do you have any updates on this? Can this issue be closed now? |
Yes, issue can be closed. The tool which calculated the checksum after decompression was wrong. I did a mistake in my parsing method. In the xml file there are relations which has no members. I have not considered that case. Additionally I did not consider that utf-8 has variable sized chars. After fixing it worked fine. |
Don't know if I am right here. But I found the following data in the planet-210524.osm file. Opening tag (way) doesn't match to the closing tag (relation). Also "chaer type" seems not valid.
This is not the only entry where opening and closing tag doesn't match.
The text was updated successfully, but these errors were encountered: