Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point numbers are not processed correctly, resulting in worthless output data. #1

Open
Nagorak opened this issue Jul 19, 2015 · 8 comments

Comments

@Nagorak
Copy link

Nagorak commented Jul 19, 2015

I am trying to recover data from a Wahoo Fitness database, which includes a large number of floating point data entries. Unfortunately, it appears that floating point numbers are not being processed correctly by the program, resulting in all export data of that type being pegged to the following values: 1056964608.000000, 1090519040.000000, 1073741824.000000, 3238002687.000000. These same values repeat over and over again, regardless of the input values, suggesting the variable holding the data is either getting saturated, or some sort of error is occurring during type conversion.

I have run the program against a known good database, in order to demonstrate the difference between the input values and the values that are actually exported. Below is a segment of the database shown in a SQLite viewer, and that same segment of data after being exported by Undark, shown in Excel (a bit easier to read), and in raw text in Notepad (no formatting).

From the Notepad output you can see that the program is correctly recognizing the data format as double, and trying to output it in that format, but the actual value is maxed out and ends up being worthless.

sqlite input data
undark output- excel
undark output- text

@inflex
Copy link

inflex commented Jul 25, 2015

I've uploaded a patched linux source at http://pldaniels.com/undark/undark-0.7.tar.gz if you want to try that. Basically the problem was that prior to now I'd never needed to interpret floating-point data (typical development process :) ). The standard C double floating point is this case was 32-bit, but the SQLite3 uses 64-bit big-endian, so I've added the 64-bit ntohll() call and changed the handling of the data dumping to use %fl

If someone can test and let me know how that goes, thanks.

@Nagorak
Copy link
Author

Nagorak commented Jul 25, 2015

Thanks for looking into this! The update definitely changed how the data is being processed, but the output is still wrong. A lot of the entries are now set to: -19490628022799998160706764775750376621752450000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 instead of what they were before.

Here are some updated screen shots of the same section of data above:
undark1
undark2

Also, I don't know if this helps or not, but I uploaded a copy of the database I am using to http://www.nagorak.com/workoutdata.sqlite.

This is a good database (not the corrupt one I'm trying to recover), so it can be opened in a database viewer. It's not private or sensitive, just a bunch of Wahoo Fitness workout data.

@inflex
Copy link

inflex commented Jul 28, 2015

Try the 0.7 source pack again, I changed a couple more things and uploaded under the same name.

@witwall
Copy link
Collaborator

witwall commented Jul 29, 2015

@inflex I tried v0.7 under windows and Linux with workoutdata.sqlite , the output is wrong.

@inflex
Copy link

inflex commented Aug 1, 2015

@witwall , try http://pldaniels.com/undark/undark-0.7.1.tar.gz

I'm getting output like:

64,NULL,x9,1,1,x3,x65,201.449142,41.297930,32.787870,-116.981954,x-1,419545700.422913,747.444229
62,NULL,x9,1,1,x3,x65,202.049850,5.956894,32.788083,-116.982225,x-1,419545684.105103,697.684225
61,NULL,x9,1,1,x3,x65,202.007477,8.462074,32.788159,-116.982233,x-1,419545692.259155,706.146299
60,NULL,x9,1,1,x3,x65,202.279846,20.056688,32.788107,-116.982169,x-1,419545675.953064,691.727330
59,NULL,x9,1,1,x3,x65,202.297028,18.457852,32.788001,-116.981996,x-1,419545669.465881,671.670642
58,NULL,x9,1,1,x3,x65,201.869446,9.353317,32.787840,-116.981946,x-1,419545661.295593,653.212790
57,NULL,x9,1,1,x3,x65,202.364227,6.231814,32.787924,-116.981944,x-1,419545654.809082,643.859473
55,NULL,x9,1,1,x3,x65,201.997055,19.874667,32.787955,-116.981968,x-1,419545646.639648,632.268316
56,NULL,x9,1,1,x3,x65,201.516174,5.359343,32.787975,-116.981916,x-1,419545648.320557,637.627659
5

@Nagorak
Copy link
Author

Nagorak commented Aug 1, 2015

That seems to have (mostly) fixed it! I do notice that there are still some slight differences in the output, but they only occur around the ten thousandths place. There must be some slightly different rounding going on.

In any case, for my purposes it doesn't really matter (I actually have no idea why Wahoo Fitness outputs the time value with such extreme accuracy). I just mention it in case you want to try to make the output match exactly.

Thanks a lot for your help!

@inflex
Copy link

inflex commented Aug 1, 2015

It will be a rounding issue. Right now the program will be using the default/internal IEEE754 64-bit encoding, and unless you write a specific implementation of it for C it'll definitely show variances compared to SQLite. Interestingly, as far as I know it's actually a small omission/inaccuracy in SQLite I think ( I recall reading about it when I was digging around the other day ).

@inflex
Copy link

inflex commented Aug 1, 2015

This was the page I was reading -

http://www.exploringbinary.com/incorrect-decimal-to-floating-point-conversion-in-sqlite/

There's also a few different gcc build options regarding floating-point which might be worth exploring if required in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants