Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved String to Decimal parsing #3943

Merged
merged 2 commits into from
Oct 23, 2024
Merged

Conversation

alexbeattie42
Copy link
Contributor

@alexbeattie42 alexbeattie42 commented Oct 16, 2024

Fixes issue #3924

Brief summary of changes

String to Decimal Conversion

The above fixed the reading/writing problems but the code was still failing. Even with the imbuing of the input and output file stream, the parser converting strings to decimals is still using the system locale. After enough digging and exploration I found that std::stod is locale dependent. It is a thinly veiled wrapper of std::strtod which does mention the locale dependance. To fix this issue I did the following:

  • Create a method with the same signature as std::stod in the IO.h class
  • Implement a string to decimal conversion that is invariate to the system specified locale (the two potential methods are described below)
  • Replace all calls to std::stod with the new IO::stod method
  • Parsing with the isstream method (standard c++):
    double result;
    std::istringstream iss(__str);
    iss.imbue(std::locale(_locale));
    iss >> result;

Testing I've completed

  • Tested against the sample here and it is working

Looking for feedback on...

CHANGELOG.md (choose one)

  • Will update once it is reviewed and approved to avoid continually rebasing against changelog conflicts while it is under review.

This change is Reviewable

Copy link

@halleysfifthinc halleysfifthinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading/understanding correctly, this corrects the number parsing to be locale aware, but is number writing also locale specific?

I think the best case for locale handling would be to follow the robustness principle aka "be conservative in what you send, be liberal in what you accept". In this context, I think that should look like locale independent parsing (i.e. can parse comma or period decimals), and optionally hard-coded locale writing (.trc/.mot/.sto files with period decimals only). That would ensure that users with any locale can read .trc/.mot/.sto files generated anywhere, and (with hard-coded period decimal writing) ensure maximum compatibility reading .trc/.mot/.sto files by any third-party software/code (which might be less flexible with parsing).

@alexbeattie42
Copy link
Contributor Author

If I'm reading/understanding correctly, this corrects the number parsing to be locale aware, but is number writing also locale specific?

I think the best case for locale handling would be to follow the robustness principle aka "be conservative in what you send, be liberal in what you accept". In this context, I think that should look like locale independent parsing (i.e. can parse comma or period decimals), and optionally hard-coded locale writing (.trc/.mot/.sto files with period decimals only). That would ensure that users with any locale can read .trc/.mot/.sto files generated anywhere, and (with hard-coded period decimal writing) ensure maximum compatibility reading .trc/.mot/.sto files by any third-party software/code (which might be less flexible with parsing).

This is in general a good idea but not possible in this specific instance. The file (at least all the imu .sto files in the examples are this way) looks like this:

time	torso_imu	pelvis_imu	tibia_r_imu	femur_r_imu	tibia_l_imu	femur_l_imu	calcn_r_imu	calcn_l_imu
0	0.01518343195907421,0.7812957439677948,-0.04770728972005249,0.622149852012617	0.1207747690878542,0.8053315183184858,-0.142819722762797,0.5625452226662099	0.3951845214451953,-0.6053549348947597,-0.3937077025151748,-0.5677753444707826	0.3753907991577196,-0.5658507979303541,-0.3468703522649773,-0.646974173448122	0.5835647133912788,0.4063956405762357,-0.6152428563326652,0.3402514310576951	0.5854606233169395,0.3832895330622857,-0.5480699907177834,0.458196767409859	0.2178287352520044,-0.9667284914115302,-0.1327179739292387,-0.01930298909021503	0.1759460998977403,0.3471632588712789,-0.9102461481857205,-0.1413244187453365
0.01	0.0153210137178647,0.781247219174652,-0.04788784790226691,0.6221935415076403	0.1208796815319564,0.8052840031748407,-0.1430221742909909,0.5625392737263002	0.395100589522564,-0.6054134414094215,-0.3937159987716847,-0.5677656219171235	0.3753901734774389,-0.5658535433909461,-0.3469560968783313,-0.6469261564718519	0.5836601509355528,0.4062299483085272,-0.6153208699774565,0.3401445049935765	0.5856153629154232,0.3831113700639101,-0.5481436844014433,0.4580598499061032	0.2178083403178592,-0.9667443847139103,-0.1326299339363174,-0.01934223710304965	0.1759173816325171,0.3471787034472968,-0.9102653245939488,-0.1411986598877146

This parses as time as a decimal number seperated by a tab, followed by 4 values for each quaternion (separated by commas), then a tab, and repeat. So if it were acceptable to write or read numerics with commas you wouldn't know if the comma was a seperator between quaternions or a decimal place indicator. So in order to be able to accept numerics that use commas the file format would have to be changed. Per this comment it seems reasonable to not allow reading or writing commas in numerics.

@alexbeattie42
Copy link
Contributor Author

Updated to remove File IO changes and only include the minimal changes tostod which is the root of the problem

@alexbeattie42 alexbeattie42 reopened this Oct 19, 2024
@adamkewley
Copy link
Contributor

These are great changes, and I'm glad someone's giving it a whack, but:

I don't see a reason to not use the fast library

If I had a euro for every time a developer comes along and writes some version of this, I'd have ezc3d, docopt, spdlog, eigen, colpack, adolc, ipopt, metis, mumps, casadi, fastfloat... etc. euros. Everybody thinks this until they have to maintain a build long-term and deal with issues that have opening paragraphs like "I'm trying to build OpenSim on EsotericLinux with a PowerPC chip and it's failing because tropter isn't fetching from sourceforge".

Imo, drop the fastfloat dependency. The file loaders aren't currently performance limiters for end-users (as opposed to, say, simulation performance). The data files OpenSim uses tend to be very small, and IO is probably a bigger factor when parsing them (even if your OS's page cache is hot). Your benchmark shows this much. Regardless of how many stars it has, or its use in chromium, you should've thought at that point "not worth it, the prio is to fix the locale issues" and dropped it altogether.

@alexbeattie42 alexbeattie42 reopened this Oct 21, 2024
@alexbeattie42 alexbeattie42 changed the title Improved file IO and String to Decimal parsing Improved String to Decimal parsing Oct 21, 2024
@adamkewley
Copy link
Contributor

This seems like a reasonable enough refactor!

The only thing I'd change is the static initialization of locale_, but this can be handled post-merge

@adamkewley adamkewley merged commit 33f3f59 into opensim-org:main Oct 23, 2024
11 checks passed
@adamkewley
Copy link
Contributor

Nice work @alexbeattie42 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants