Improved String to Decimal parsing #3943

alexbeattie42 · 2024-10-16T17:06:43Z

Fixes issue #3924

Brief summary of changes

String to Decimal Conversion

The above fixed the reading/writing problems but the code was still failing. Even with the imbuing of the input and output file stream, the parser converting strings to decimals is still using the system locale. After enough digging and exploration I found that std::stod is locale dependent. It is a thinly veiled wrapper of std::strtod which does mention the locale dependance. To fix this issue I did the following:

Create a method with the same signature as std::stod in the IO.h class
Implement a string to decimal conversion that is invariate to the system specified locale (the two potential methods are described below)
Replace all calls to std::stod with the new IO::stod method
Parsing with the isstream method (standard c++):

    double result;
    std::istringstream iss(__str);
    iss.imbue(std::locale(_locale));
    iss >> result;

Testing I've completed

Tested against the sample here and it is working

Looking for feedback on...

CHANGELOG.md (choose one)

Will update once it is reviewed and approved to avoid continually rebasing against changelog conflicts while it is under review.

This change is

dependencies/CMakeLists.txt

halleysfifthinc

If I'm reading/understanding correctly, this corrects the number parsing to be locale aware, but is number writing also locale specific?

I think the best case for locale handling would be to follow the robustness principle aka "be conservative in what you send, be liberal in what you accept". In this context, I think that should look like locale independent parsing (i.e. can parse comma or period decimals), and optionally hard-coded locale writing (.trc/.mot/.sto files with period decimals only). That would ensure that users with any locale can read .trc/.mot/.sto files generated anywhere, and (with hard-coded period decimal writing) ensure maximum compatibility reading .trc/.mot/.sto files by any third-party software/code (which might be less flexible with parsing).

alexbeattie42 · 2024-10-16T18:30:40Z

If I'm reading/understanding correctly, this corrects the number parsing to be locale aware, but is number writing also locale specific?

I think the best case for locale handling would be to follow the robustness principle aka "be conservative in what you send, be liberal in what you accept". In this context, I think that should look like locale independent parsing (i.e. can parse comma or period decimals), and optionally hard-coded locale writing (.trc/.mot/.sto files with period decimals only). That would ensure that users with any locale can read .trc/.mot/.sto files generated anywhere, and (with hard-coded period decimal writing) ensure maximum compatibility reading .trc/.mot/.sto files by any third-party software/code (which might be less flexible with parsing).

This is in general a good idea but not possible in this specific instance. The file (at least all the imu .sto files in the examples are this way) looks like this:

time	torso_imu	pelvis_imu	tibia_r_imu	femur_r_imu	tibia_l_imu	femur_l_imu	calcn_r_imu	calcn_l_imu
0	0.01518343195907421,0.7812957439677948,-0.04770728972005249,0.622149852012617	0.1207747690878542,0.8053315183184858,-0.142819722762797,0.5625452226662099	0.3951845214451953,-0.6053549348947597,-0.3937077025151748,-0.5677753444707826	0.3753907991577196,-0.5658507979303541,-0.3468703522649773,-0.646974173448122	0.5835647133912788,0.4063956405762357,-0.6152428563326652,0.3402514310576951	0.5854606233169395,0.3832895330622857,-0.5480699907177834,0.458196767409859	0.2178287352520044,-0.9667284914115302,-0.1327179739292387,-0.01930298909021503	0.1759460998977403,0.3471632588712789,-0.9102461481857205,-0.1413244187453365
0.01	0.0153210137178647,0.781247219174652,-0.04788784790226691,0.6221935415076403	0.1208796815319564,0.8052840031748407,-0.1430221742909909,0.5625392737263002	0.395100589522564,-0.6054134414094215,-0.3937159987716847,-0.5677656219171235	0.3753901734774389,-0.5658535433909461,-0.3469560968783313,-0.6469261564718519	0.5836601509355528,0.4062299483085272,-0.6153208699774565,0.3401445049935765	0.5856153629154232,0.3831113700639101,-0.5481436844014433,0.4580598499061032	0.2178083403178592,-0.9667443847139103,-0.1326299339363174,-0.01934223710304965	0.1759173816325171,0.3471787034472968,-0.9102653245939488,-0.1411986598877146

This parses as time as a decimal number seperated by a tab, followed by 4 values for each quaternion (separated by commas), then a tab, and repeat. So if it were acceptable to write or read numerics with commas you wouldn't know if the comma was a seperator between quaternions or a decimal place indicator. So in order to be able to accept numerics that use commas the file format would have to be changed. Per this comment it seems reasonable to not allow reading or writing commas in numerics.

alexbeattie42 · 2024-10-19T05:50:45Z

Updated to remove File IO changes and only include the minimal changes tostod which is the root of the problem

adamkewley · 2024-10-21T07:38:11Z

These are great changes, and I'm glad someone's giving it a whack, but:

I don't see a reason to not use the fast library

If I had a euro for every time a developer comes along and writes some version of this, I'd have ezc3d, docopt, spdlog, eigen, colpack, adolc, ipopt, metis, mumps, casadi, fastfloat... etc. euros. Everybody thinks this until they have to maintain a build long-term and deal with issues that have opening paragraphs like "I'm trying to build OpenSim on EsotericLinux with a PowerPC chip and it's failing because tropter isn't fetching from sourceforge".

Imo, drop the fastfloat dependency. The file loaders aren't currently performance limiters for end-users (as opposed to, say, simulation performance). The data files OpenSim uses tend to be very small, and IO is probably a bigger factor when parsing them (even if your OS's page cache is hot). Your benchmark shows this much. Regardless of how many stars it has, or its use in chromium, you should've thought at that point "not worth it, the prio is to fix the locale issues" and dropped it altogether.

adamkewley · 2024-10-23T14:12:02Z

This seems like a reasonable enough refactor!

The only thing I'd change is the static initialization of locale_, but this can be handled post-merge

adamkewley · 2024-10-23T14:12:42Z

Nice work @alexbeattie42 👍

halleysfifthinc reviewed Oct 16, 2024

View reviewed changes

dependencies/CMakeLists.txt Outdated Show resolved Hide resolved

halleysfifthinc reviewed Oct 16, 2024

View reviewed changes

alexbeattie42 force-pushed the main branch from b40c9f9 to 41d8dd2 Compare October 18, 2024 17:38

alexbeattie42 closed this Oct 19, 2024

alexbeattie42 force-pushed the main branch from 41d8dd2 to 37e40e1 Compare October 19, 2024 05:46

alexbeattie42 reopened this Oct 19, 2024

alexbeattie42 closed this Oct 21, 2024

alexbeattie42 force-pushed the main branch from 30a9889 to edbc2e4 Compare October 21, 2024 09:03

alexbeattie42 reopened this Oct 21, 2024

alexbeattie42 changed the title ~~Improved file IO and String to Decimal parsing~~ Improved String to Decimal parsing Oct 21, 2024

alexbeattie42 force-pushed the main branch from 25925af to aee41de Compare October 21, 2024 13:03

Minimal stod method change

33445be

alexbeattie42 force-pushed the main branch from 12c3560 to 33445be Compare October 21, 2024 16:40

Add handling for nan values

4ee0e57

alexbeattie42 requested a review from halleysfifthinc October 21, 2024 19:22

adamkewley merged commit 33f3f59 into opensim-org:main Oct 23, 2024
11 checks passed

adamkewley mentioned this pull request Oct 23, 2024

Refactor static variables in OpenSim::IO #3948

Merged

adamkewley added a commit that referenced this pull request Oct 23, 2024

Add CHANGELOG.md mentioning #3943

e0c2a3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved String to Decimal parsing #3943

Improved String to Decimal parsing #3943

alexbeattie42 commented Oct 16, 2024 •

edited

Loading

halleysfifthinc left a comment

alexbeattie42 commented Oct 16, 2024

alexbeattie42 commented Oct 19, 2024

adamkewley commented Oct 21, 2024

adamkewley commented Oct 23, 2024

adamkewley commented Oct 23, 2024

Improved String to Decimal parsing #3943

Improved String to Decimal parsing #3943

Conversation

alexbeattie42 commented Oct 16, 2024 • edited Loading

Brief summary of changes

String to Decimal Conversion

Testing I've completed

Looking for feedback on...

CHANGELOG.md (choose one)

halleysfifthinc left a comment

Choose a reason for hiding this comment

alexbeattie42 commented Oct 16, 2024

alexbeattie42 commented Oct 19, 2024

adamkewley commented Oct 21, 2024

adamkewley commented Oct 23, 2024

adamkewley commented Oct 23, 2024

alexbeattie42 commented Oct 16, 2024 •

edited

Loading