gh-102471, PEP 757: Add PyLong import and export API #121339

vstinner · 2024-07-03T15:28:00Z

Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.

Issue: The C-API for Python to C integer conversion is, to be frank, a mess. #102471

📚 Documentation preview 📚: https://cpython-previews--121339.org.readthedocs.build/

vstinner · 2024-07-03T15:31:47Z

cc @skirpichev @casevh

Include/cpython/longintrepr.h

vstinner · 2024-07-03T20:22:27Z

See also issue #111415

skirpichev

Just my 2c.

The gmpy2 code, used for benchmarking, can be found in my fork:
https://github.com/skirpichev/gmpy/tree/trying-py-import-export

Doc/c-api/long.rst

skirpichev · 2024-07-04T02:31:36Z

Objects/longobject.c

+PyUnstable_Long_Export(PyObject *obj, PyUnstable_LongExport *long_export)
+{
+    if (!PyLong_Check(obj)) {
+        PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
+        return -1;
+    }
+    PyLongObject *self = (PyLongObject*)obj;
+
+    long_export->obj = (PyLongObject*)Py_NewRef(obj);
+    long_export->negative = _PyLong_IsNegative(self);
+    long_export->ndigits = _PyLong_DigitCount(self);
+    if (long_export->ndigits == 0) {
+        long_export->ndigits = 1;
+    }
+    long_export->digits = self->long_value.ob_digit;
+    return 0;
+}


As this mostly give a direct access to the PyLongObject - it almost as fast as using private stuff before.

Old code:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)' 1000000 loops, best of 20: 232 nsec per loop $ python -m timeit -r11 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)' 500000 loops, best of 11: 500 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)' 100000 loops, best of 20: 2.53 usec per loop

With proposed API:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)' 1000000 loops, best of 20: 258 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)' 500000 loops, best of 20: 528 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)' 100000 loops, best of 20: 2.56 usec per loop

skirpichev · 2024-07-04T02:38:51Z

Objects/longobject.c

+PyObject*
+PyUnstable_Long_Import(int negative, size_t ndigits, Py_digit *digits)
+{
+    return (PyObject*)_PyLong_FromDigits(negative, ndigits, digits);
+}


But this is something I would like to avoid. This requires allocation of a temporary buffer and using memcpy. Can we offer a writable layout to use it's digits in the mpz_export directly?

Benchmarks for old code:

$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)' 2000000 loops, best of 11: 111 nsec per loop $ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)' 500000 loops, best of 11: 475 nsec per loop $ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)' 100000 loops, best of 11: 2.39 usec per loop

With new API:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)' 2000000 loops, best of 20: 111 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)' 500000 loops, best of 20: 578 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)' 100000 loops, best of 20: 2.53 usec per loop

This requires allocation of a temporary buffer and using memcpy.

Right, PyLongObject has to manage its own memory.

Can we offer a writable layout to use it's digits in the mpz_export directly?

That sounds strange from the Python point of view and make the internals "less opaque". I would prefer to leak less implementation details.

Right, PyLongObject has to manage its own memory.

I'm not trying to change that. More complete proposal: vstinner#4

gmpy2 patch: https://github.com/skirpichev/gmpy/tree/trying-py-import-export-v2

New benchmarks:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)' 2000000 loops, best of 20: 111 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)' 500000 loops, best of 20: 509 nsec per loop $ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)' 100000 loops, best of 20: 2.44 usec per loop

I would prefer to leak less implementation details.

I don't think this leak anything. It doesn't leak memory management details. PyLong_Import will just do allocation memory. Writting digits will be job for mpz_export, as before.

Without this, it seems - there are noticeable performance regression for integers of intermediate range. Something up to 20% vs 7% on my branch.

Edit: currently, proposed PyUnstable_Long_ReleaseImport() match PyUnstable_Long_ReleaseExport(). Perhaps, it could be one function (say, PyUnstable_Long_ReleaseDigitArray()), but I unsure - maybe it puts some constraints on internals of the PyLongObject.

Objects/longobject.c

Doc/c-api/long.rst

skirpichev · 2024-07-04T03:27:09Z

CC @tornaria, as Sage people might be interested in this feature.

skirpichev · 2024-07-04T05:06:03Z

CC @oscarbenjamin, you may want this for python-flint

vstinner · 2024-07-04T07:08:41Z

I updated my PR:

Use -1 for little endian and +1 for big endian.
Rename PyUnstable_LongExport to PyUnstable_Long_DigitArray.
Add "always succeed" mention in the doc.

Lib/test/test_capi/test_long.py

Misc/NEWS.d/next/C API/2024-07-03-17-26-53.gh-issue-102471.XpmKYk.rst

Doc/whatsnew/3.14.rst

oscarbenjamin · 2024-07-04T11:23:49Z

CC @oscarbenjamin, you may want this for python-flint

Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and fmpz so anything more direct is an improvement. I expect that python-flint would use mpz_import/mpz_export just like gmpy2.

Objects/longobject.c

vstinner · 2024-07-04T12:56:16Z

@skirpichev: I added a PyLongWriter API similar to what @encukou proposed.

Example:

PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
    PyLongWriter *writer = PyLongWriter_Create();
    if (writer == NULL) {
        return NULL;
    }

    if (negative) {
        PyLongWriter_SetSign(writer, -1);
    }

    Py_digit *writer_digits = PyLongWriter_AllocDigits(writer, digit_count);
    if (writer_digits == NULL) {
        goto error;
    }
    memcpy(writer_digits, digits, digit_count * sizeof(digit));

    return (PyLongObject*)PyLongWriter_Finish(writer);

error:
    PyLongWriter_Discard(writer);
    return NULL;
}

The PyLongWriter_Finish() function normalizes the number and gets a small number if needed. Example:

>>> import _testcapi; _testcapi.pylong_import(0, [100, 0, 0]) is 100
True

vstinner · 2024-07-04T13:00:48Z

I mark the PR as a draft until we agree on the API.

skirpichev · 2024-07-04T13:20:48Z

I added a PyLongWriter API similar to what @encukou proposed.

Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow.

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in above proposal.

vstinner · 2024-07-04T13:31:12Z

I updated the PR to remove the PyUnstable_ prefix, replace it with the PyLong prefix.

vstinner · 2024-07-04T13:38:57Z

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in #121339 (comment) proposal.

My concern is to avoid the problem capi-workgroup/api-evolution#36 : avoid exposing _PyLong_New() object until it's fully initialized. The "writer" API hides the implementation details but also makes sure that the object is not "leaked" to Python before it's fully initialized and valid. By the way, the implementation uses functions which are only safe if the object cannot be seen in Python: if Py_REFCNT(obj) is 1.

vstinner · 2024-07-04T13:43:25Z

@skirpichev: Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function? It would be similar to PyLong_FromLong(long value) (but may be less efficient), so I'm not sure if it's relevant.

Doc/c-api/long.rst

skirpichev · 2024-09-17T13:03:13Z

On Tue, Sep 17, 2024 at 04:44:15AM -0700, Victor Stinner wrote: Do you have concerns about Windows where long is only 32-bit?

Yes, we support Windows (

For new APIs, I would prefer to use a type with a known size: int32_t, int64_t, etc.

No, I'm not suggesting to change type for the value field. But long value should also fit in int64_t. I'm just not sure that inlined PyLong_AsInt64 is a fastest solution.

oscarbenjamin · 2024-09-17T15:35:08Z

Do you have concerns about Windows where long is only 32-bit?
Yes, we support Windows (

There is always a way to make this work though like:

#if sizeof(int64_t) == sizeof(long)

Modules/_testcapi/long.c

skirpichev · 2024-09-18T06:12:14Z

Here new benchmarks for PyLong_Export with updated gmpy2 implementation (follows PEP).

Current implementation:

Benchmark	master	new-api
1<<7	265 ns	283 ns: 1.07x slower
1<<38	321 ns	376 ns: 1.17x slower
1<<300	511 ns	602 ns: 1.18x slower
1<<3000	2.35 us	2.44 us: 1.04x slower
Geometric mean	(ref)	1.11x slower

With PyUnstable_Long_CompactValue:

int
PyLong_Export(PyObject *obj, PyLongExport *export_long)
{   
    if (!PyLong_Check(obj)) {
        PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
        return -1;
    }
    PyLongObject *self = (PyLongObject*)obj;

    if (PyUnstable_Long_IsCompact(self)) {
        export_long->value = (int64_t)PyUnstable_Long_CompactValue(self);
        export_long->negative = 0;
        export_long->ndigits = 0;
        export_long->digits = 0;
        export_long->_reserved = 0;
    }
    else {
        export_long->value = 0;
        export_long->negative = _PyLong_IsNegative(self);
        export_long->ndigits = _PyLong_DigitCount(self);
        if (export_long->ndigits == 0) {
            export_long->ndigits = 1;
        }
        export_long->digits = self->long_value.ob_digit;
        export_long->_reserved = (Py_uintptr_t)Py_NewRef(obj);
    }
    return 0;
}

Benchmark	master	new-api
1<<7	265 ns	275 ns: 1.04x slower
1<<38	321 ns	346 ns: 1.08x slower
1<<300	511 ns	539 ns: 1.06x slower
1<<3000	2.35 us	2.37 us: 1.01x slower
Geometric mean	(ref)	1.04x slower

With PyLong_AsLongAndOverflow:

int
PyLong_Export(PyObject *obj, PyLongExport *export_long)
{
    if (!PyLong_Check(obj)) {
        PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
        return -1;
    }
    PyLongObject *self = (PyLongObject*)obj;

    int overflow;
    long value = PyLong_AsLongAndOverflow(obj, &overflow);

    if (!overflow) { 
        export_long->value = (int64_t)value;
        export_long->negative = 0;
        export_long->ndigits = 0;
        export_long->digits = 0;
        export_long->_reserved = 0;
    }   
    else {
        export_long->value = 0;
        export_long->negative = _PyLong_IsNegative(self);
        export_long->ndigits = _PyLong_DigitCount(self);
        if (export_long->ndigits == 0) {
            export_long->ndigits = 1;
        }
        export_long->digits = self->long_value.ob_digit;
        export_long->_reserved = (Py_uintptr_t)Py_NewRef(obj);
    }
    return 0;
}

Benchmark	master	new-api
1<<7	265 ns	276 ns: 1.04x slower
1<<38	321 ns	284 ns: 1.13x faster
1<<300	511 ns	554 ns: 1.08x slower
1<<3000	2.35 us	2.38 us: 1.02x slower
Geometric mean	(ref)	1.00x slower

Probably, last one seems to be fastest. Current implementation for PyLong_Export() also hardest to port.

I think this should fix warnings.

This reverts commit 5d3e224.

Objects/longobject.c

Doc/c-api/long.rst

Co-authored-by: Sergey B Kirpichev <[email protected]>

vstinner requested review from erlend-aasland and corona10 as code owners July 3, 2024 15:28

bedevere-app bot mentioned this pull request Jul 3, 2024

The C-API for Python to C integer conversion is, to be frank, a mess. #102471

Open

bedevere-app bot added the awaiting core review label Jul 3, 2024

vstinner force-pushed the long_export branch from 21dd5b9 to a087b0a Compare July 3, 2024 15:30

vstinner mentioned this pull request Jul 3, 2024

Add public function PyLong_GetDigits() capi-workgroup/decisions#31

Closed

zooba reviewed Jul 3, 2024

View reviewed changes

Include/cpython/longintrepr.h Outdated Show resolved Hide resolved

vstinner requested a review from ericsnowcurrently as a code owner July 3, 2024 20:13

vstinner mentioned this pull request Jul 3, 2024

Consider restoring _PyLong_New() function as public #111415

Closed

skirpichev reviewed Jul 4, 2024

View reviewed changes

skirpichev mentioned this pull request Jul 4, 2024

Consider adding public PyLong_FlipSign() function #120446

Open

skirpichev reviewed Jul 4, 2024

View reviewed changes

oscarbenjamin mentioned this pull request Jul 4, 2024

More efficient conversion from Python int to fmpz flintlib/python-flint#159

Open

skirpichev reviewed Jul 4, 2024

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

vstinner marked this pull request as draft July 4, 2024 13:00

bedevere-app bot removed the awaiting core review label Jul 4, 2024

vstinner force-pushed the long_export branch from a191357 to 294f53a Compare July 4, 2024 13:26

vstinner mentioned this pull request Jul 4, 2024

Avoid creating incomplete/invalid objects capi-workgroup/api-evolution#36

Open