Add support for large decimals with >34 digits. #293

nirosys · 2023-10-18T07:45:47Z

Issue #, if available: #159

Description of changes:
This PR fixes the issue identified in #159 where no more than 34 digits were supported in ion decimals.

There were 2 issues in this PR that should be made aware for anyone working with ion-c and decNumber.

The Decimal Context Needs to Allow More Digits

Originally, ion-python's C extension initialized the decContext used for configuring the decNumber library with one of the available presets, DEC_INIT_DECQUAD. This preset configures the library with the expectation that you'll be using decQuads, which are limited to 128bit, or 34 decimal digits. This configuration lead to ion-c returning IERR_NUMERIC_OVERFLOW when trying to load a larger than 34 digit decimal:

>> import amazon.ion.simpleion as ion
>>> ion.loads("99999999999999999999999999999999999999999999999999999999999999999999999999.999999999999999999999999")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/ion-python/amazon/ion/simpleion.py", line 487, in loads
    return load(ion_buffer, catalog=catalog, single_value=single_value, encoding=encoding, cls=cls,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ion-python/amazon/ion/simpleion.py", line 556, in load
    return load_extension(fp, parse_eagerly=parse_eagerly, single_value=single_value,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ion-python/amazon/ion/simpleion.py", line 519, in load_extension
    value = next(iterator)
            ^^^^^^^^^^^^^^
amazon.ion.exceptions.IonException: IERR_NUMERIC_OVERFLOW 
>>>

When creating the decContext, we can use the digits field to specify how many digits we would like to support before returning overflow errors, or when rounding is required. In this PR I've set that field to 10,000.

decNumbers do not have a well defined size

The definition of decNumber defines the lsu field like this:

typedef struct {
   // .. other stuff not important atm.
   decNumberUnit lsu[DECNUMUNITS];
} decNumber;

DECNUMUNITS is the number of units required to store DECNUMDIGITS digits. We define DECNUMDIGITS at build time, so this struct has a well defined size. The compiler recognizes the size, and expects that every decNumber will be this size. Unfortunately for the compiler, and us, this is not true.

The decNumber library, despite defining a known size, does not consider the size of lsu to be static. If more memory is needed to hold more digits, a larger buffer can be allocated for the decNumber and associated operations know to go beyond DECNUMUNITS units to access more digits.

Because of this an operation as innocent as:

read_number = *(decimal_value.value.num_value);

which the compiler treats as a memcpy, using the struct's size, has the potential to stop short of copying all of the digit data. This results in the decNumber library poke around memory that is beyond the size of the struct:

>>> import amazon.ion.simpleion as ion
>>> ion.loads("99999999999999999999999999999999999999999999999999999999999999999999999999.999999999999999999999999")
Decimal('936521992000000000000000764479192000559583245000000999999999999999.999999999999999999999999')
>>>

This PR removes the copy, and just refers to the decNumber through a pointer. In the case of the decimal being a decQuad, a new decNumber is allocated to receive the converted decQuad. The memory is allocated with the default size of the decNumber since the maximum size of a decQuad is 34 digits, which is what we define for DECNUMDIGITS.

With this change, we no longer get garbage digits beyond the default struct size:

>>> import amazon.ion.simpleion as ion
>>> ion.loads("99999999999999999999999999999999999999999999999999999999999999999999999999.999999999999999999999999")
Decimal('99999999999999999999999999999999999999999999999999999999999999999999999999.999999999999999999999999')
>>>

An important callout here is that when dealing with decNumbers, unless we know for sure that the value will not exceed the default number of digits, we should be allocating buffers for them on the heap, and not trying to work with stack allocated structs.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…nd 34 digits

In order to support 3rd party repos it looks like we can either leave both the revision, and the repo off of the config, or specify both. By not specifying the repo, but specifying the ref/revision, the checkout action will try to pull the revision from the target repo.

nirosys · 2023-10-19T22:48:57Z

Updated the performance regression workflow to update the checkout of PR's source to include the repo. The repo defaults to the target repo if you fill in a rev/ref, which ends up breaking if the PR is made from a fork.

After a successful run, 3 of the 24 perf regression tests failed, with wild numbers (+30-40%), where the passing tests do not reach anywhere near as high. I re-ran the failed tests expecting the spike to be environmental, and 2 of the 3 failed tests succeeded this time showing 1-2% differences vs the original 30-40%, which seems to confirm it is environmental. Going to merge.

nirosys added 2 commits October 17, 2023 17:08

Allow large decimals >34 digits and fix decimal corruption issue beyo…

0e031c4

…nd 34 digits

Free read_member if we've allocated it

d8240e3

tgregg approved these changes Oct 18, 2023

View reviewed changes

cheqianh approved these changes Oct 19, 2023

View reviewed changes

nirosys marked this pull request as ready for review October 19, 2023 22:49

nirosys merged commit fcffdef into amazon-ion:master Oct 19, 2023
32 of 33 checks passed

nirosys mentioned this pull request Oct 19, 2023

C extension Decimal only supports 34 digits. #159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for large decimals with >34 digits. #293

Add support for large decimals with >34 digits. #293

nirosys commented Oct 18, 2023

nirosys commented Oct 19, 2023

Add support for large decimals with >34 digits. #293

Add support for large decimals with >34 digits. #293

Conversation

nirosys commented Oct 18, 2023

The Decimal Context Needs to Allow More Digits

decNumbers do not have a well defined size

nirosys commented Oct 19, 2023