SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV #22653

bulk88 · 2024-10-10T07:49:08Z

Description
I was stepping debugging some XS code, and noticed, if you create a SvUV, with sv_setuv(), then later use SvUV() macro or eqv inline function, %99.999 of the time, Perl_sv_2uv_flags() will execute.

This "SvUV" created with sv_setuv(), or newSVuv()

SV = IV(0x6d2de0) at 0x6d2df0
  REFCNT = 2
  FLAGS = (IOK,pIOK)
  IV = 5369402920

the IV is really a C pointer/opaque pointer from a C library. 100% of the time, SvUV() macro, will execute Perl_sv_2uv_flags(). Because of

SV *
Perl_newSVuv(pTHX_ const UV u)
{
    SV *sv;

    /* Inlining ONLY the small relevant subset of sv_setuv here
     * for performance. Makes a significant difference. */

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

    new_SV(sv);

    /* We're starting from SVt_FIRST, so provided that's
     * actual 0, we don't have to unset any SV type flags
     * to promote to SVt_IV. */
    STATIC_ASSERT_STMT(SVt_FIRST == 0);

    SET_SVANY_FOR_BODYLESS_IV(sv);
    SvFLAGS(sv) |= SVt_IV;
    (void)SvIOK_on(sv);
    (void)SvIsUV_on(sv);

    SvUV_set(sv, u);
    SvTAINT(sv);

    return sv;
}

that

    if (u <= (UV)IV_MAX) {

turns off SVf_IVisUV in final SvUV, then the SvUOK_nog() in SvUV(), says "NOT AN INTEGER" and calls the HEAVY getter Perl_sv_2uv_flags() each time. Since these are pointers, I think on 99% OSes, highest bit in an address, or negative addresses, are kernel space, and prohibited for user mode. Win32 32b and maybe some Solaris'es use and alloc high bit/negative pointer addresses, but I dont think P5 is compatible with such CPUs/C compilers/build configs.

I traced

+#define SvUOK_nog(sv)		((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV|SVs_GMG)) == (SVf_IOK|SVf_IVisUV))
@@ -1568,9 +1587,9 @@ Like sv_utf8_upgrade, but doesn't do magic on C<sv>.
 */
 
 /* Let us hope that bitmaps for UV and IV are the same */
-#define SvIV(sv) (SvIOK(sv) ? SvIVX(sv) : sv_2iv(sv))
-#define SvUV(sv) (SvIOK(sv) ? SvUVX(sv) : sv_2uv(sv))
-#define SvNV(sv) (SvNOK(sv) ? SvNVX(sv) : sv_2nv(sv))
+#define SvIV(sv) (SvIOK_nog(sv) ? SvIVX(sv) : sv_2iv(sv))
+#define SvUV(sv) (SvUOK_nog(sv) ? SvUVX(sv) : sv_2uv(sv))
+#define SvNV(sv) (SvNOK_nog(sv) ? SvNVX(sv) : sv_2nv(sv))

to

Revision: 4bac9ae
Author: Chip Salzenberg [email protected]
Date: 6/22/2012 6:18:18 PM
Message:
Magic flags harmonization.

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

traced to

Revision: 4d55d9a
Author: Steffen Müller [email protected]
Date: 11/28/2014 11:34:00 AM
Message:
Repeat newSViv optimization for newSVuv

and that quotes

Revision: bd30fe8
Author: Eric Herman [email protected]
Date: 11/28/2014 9:08:04 AM
Message:
Speed up newSViv()

see also

Revision: 013abb9
Author: Nicholas Clark [email protected]
Date: 2/1/2012 4:58:14 PM
Message:
Update, correct and clarify the comment in Perl_sv_setuv().

See the correspondence on ticket #36459 for more details.

AKA #8002

So questions, why is #define SvUOK_nog(sv) ((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV|SVs_GMG)) == (SVf_IOK|SVf_IVisUV)) used? isnt the byte size and bitfields for IV and UV 100% identical all CPUs OSes?

Doesnt SVf_IVisUV only matter for NV FP double conversion and stringify?

Does SV MAGIC somehow pass "want UV" vs "want IV" flags to the CPAN XS vtable or CPAN PP tied SCALARs?

Do we throw truncate/overflow console warnings for SvIV ptrs (SVf_IOK yes SVf_IVisUV no), passed to SvUV() getter macro, with the SVIV being "-1"? I can't find such code (XSUB level, not PP ops) in sv.c, but I will ask anyways.

This is bad enough a fix should go into maint releases.

PS, the problem in found in blead Perl 5 core XS and PP and runtime code. Because SvUV() is being used in some places to fetch pointer type integers.

Dirty fix, CPAN author side, always use setiv() SvIV() SvIVX() or ONLY use SvUVX() in XS code to store/get/read/set pointers and pointer like data, hell, just NEVER use SvUV() macro, since ONLY "(UV)-1" and "(UV)-123456" ever use the "fast path" in SvUV().

There needs to be a P5P fix for this. Vs manual hunting and patching of CPAN XS and core.

Steps to Reproduce

sv_setuv(); sv_dump(); SvUV(); and use a C step debugger inside SvUV(); statement

Expected behavior

Do not do a func call inside SvUV(); for SVIV flagged SV heads.

Perl configuration
NA, blead perl 5.41 and many other versions

The text was updated successfully, but these errors were encountered:

ericherman · 2024-10-10T12:40:14Z

How much does:

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

actually save?

Even though 4d55d9a states that this is "Pretty much the same change as bd30fe8", but that particular short-cut is in addition to that.

I wonder if it would be better to simply remove that.

ericherman · 2024-10-10T12:42:14Z

I strongly suspect @tsee tested this with a micro-benchmark, but I do not know what benchmark was used.

builtin.c is very new (2021), and while the code works, it is not a shining good example of C or XS code and best practices -actual bug cv_set_call_checker_flags(cv, builtin->checker, newSVuv(PTR2UV(builtin)), 0); leaks refcnt -remove and reorganize all the misaligned fields in "struct BuiltinFuncDescriptor", on x64, it was 6 ptrs big. After this patch, each element is 2 ptrs big. (21*6*8)-(21*2*8) = 672 bytes saved, from interp binary, and it was almost all null bytes. -To make a bitfield take bits from U8 sub_name_len not U16 ckval, which are 0-~400 perl opcodes. less chance of, faster visibility, and less debugging to do, if very long method names get added and broken, vs breaking perl runtime engine impl details if suddenly the opcodes grow in the future. -S_warn_experimental_builtin() will likely never execute, factor the "builtin->name" pointer read the cold separate static func vs at callers. -S_export_lexical() batch ptr r/w, so CC doesn' have to again emit a r/w to ithread interp var PL_curpad after the SvREFCNT_dec() fn call. Do ++ first, in case an almost impossible exception is thrown from SvREFCNT_dec(). Paranoia, no real reason. -also note inefficiency that pad_add_name_pvn() allocates a SV head, AND upgrades it to SVt_PVCV, and right after S_export_lexical() frees that SV *, and puts our own in. No Perl API exists to allocate a pad slot, with an uninit garbage pad slot returned ready to fill. -pad_add_name_sv->pad_add_name_pvn, pad_add_name_sv() has no perf (HEK? COW?) benefits over pad_add_name_pvn(), and is only a thin wrapper. Don't use SVs in non-SV APIs if a cheaper malloc()/cstring can be used. -"SvUV(ckobj)"->"SvUVX(ckobj)", remove the unconditional "2uv()" fn call. That fix locally the major problem in Perl#22653 . Bc we created this SV, nobody knows about it, and just blindly deref with SvUVX skipping magic and stringifying overhead. ck_builtin_*()s execute alot in mktables and modern perl code, its not the BOOT: sub. -'SV *prototype = newSVpvs(""); SAVEFREESV(prototype);' dont make this temp SV over and over. Use MY_CXT to store our 3 high refcnt COW strings. Using COW shared HEK instead other COW types, since HEK code is much older and "stable", and the "new COW" API in sv.c has a very little API, and I think, untested, would assert, panic, or generically corrupt a CV * leading to a SEGV. sv_sethek() is "safer". Not using flag SVppv_STATIC and C strings, since no API exists for SVppv_STATIC. Maybe one day, SVppv_STATIC can be put in here, for now sv_sethek() is simpler. -Perl_die->Perl_die_nocontext saves CC CPU opcodes on ithread perl. -S_import_sym() do not make multiple redundant cycle through SV code for 1 use "builtin::foo" and "&foo" strings. And S_import_sym() can be called with PP ->import(); many times x 21 on script start up, b/c of many .pm's wanting these subs. -S_cv_is_builtin() we set CvFILE() on our newXS call, it would be bizzare if the CV* does NOT have our CvFILE ro string ptrs anymore. Direct ptr comparison skips the call to libc on for streq. Should be a panic if CV_DYNFILE or alien CvFILE RO string ptr is found. But KISS and only do ptr cmp for perf. -pad_findmy_sv->pad_findmy_pvn, left is thin wrap for right, skip SV * alloc and SVPV manip code and go very close to assembly with a C stack buffer and maybe inlined memcpy()s. -BOOT: newXS_flags()->newXS_len_flags(), remove strlen(), also, share the "$" and "@" PVs between the CV*s. There is a lack of perlapi, and other design discussions, why newXS_len_flags() with RO const char * proto, mallloc()s a copy, and remember Perl rounds up to 16 bytes for a PV *, and "2 PTRs" secret malloc header, so, for prototype string "$", 2 bytes. 32 bytes * 21 XSUBS = 672 or more bytes were wasted using proto arg of newXS_len_flags(). sv_sethek() fixes that. Also remember the SV*/HEK* is reused for ck_entersub_args_proto() later on. savings are disk/file size (RO data) of libperl bin, and proc unconditional startup overhead in CPU time, perm malloc() reduction, faster ->import() per package/per yy_parser

bulk88 · 2024-10-13T12:47:39Z

I strongly suspect @tsee tested this with a micro-benchmark, but I do not know what benchmark was used.

Without researching, (correct this if I am wrong) I think pp_add() the pure perl op codes, always do IV math first, then upgrade to UV math on overflow, then upgrade to NV math on overflow, plus negative numbers are popular jkjk. If pp_math() ops are IV priority first, UV math will definitely be slower if that is the 2nd or 3rd branch/test jump in the C code.

Perhaps some CPUs, gaming/assembly/big data back then, signed ints were some nanoseconds faster than unsigned ints, or zero extend vs sign extend was slightly different speeds on CPUs back then.

Or Steffen??? has a biz algo, where negative numbers -2billion to 0 to +2billion, kept his code in IV integer mode, while UV's constantly overflowed and degraded to slower NV/FPUs. But that is very problem specific to one user.

I want to just change the SvUV() macro, so SVIV no UV flag, take fast path (no fnc call), through SvUV() macro, but the git history is so complicated, and changed in the 2010s, it needs the original authors or chip in, or someone who remembers from back then.

I vaguely remember Father C 10 years ago, being very active in poking and writing fixes for the SV XS magic system, since he found and claimed and probably did fix many cases of "data loss" or data degradation (IV/UV/NV/PV casting IDK where, that destroys round trip ability) through the perlapi setters/getters. Including some kind of flaws with private IOK NOK POK vs public IOK NOK POK.

4bac9ae

Revolves around data loss inside Perl Magic API, and that is the specific commit that caused the damage (broke the fast path in SvUV).

A trolling argument I have is, is its "required" to call sv_2uv() fn call on an IV, incase the IV is negative, and that is a programmer error, or security exploit, and P5 must process abort or throw large console messages?

I'd really like to see this fixed, since there is tons of CPAN code storing pointers in sv_setuv()/SvUV(), that is suffering.

Perhaps, if nobody has technical input, it is just best to be adventurous and change SvUV() and see how much of core breaks, and see if CPAN complaints roll in, or its crickets from CPAN, which means the SvUV change was a success.

I'd like a 2nd opinion, since git blame commits, point to repairing "data loss" 10 years ago, and a slower thought process is needed.

related to "SvUV() macro 100% of time calls Perl_sv_2uv_flags" Perl#22653 Until Perl#22653 is solved, clean up newSVuv() and remove branch to "newSViv()" that is unexplained by git blame. BUT, keep original intent and behaviour of "newSViv()" branch for now. Add asserts to guard against 0x8000,0000 == SVf_IVisUV changing. Value of SVf_IVisUV can change in the future, and there might be (I didn't git blame), logic that sign flag and SVf_IVisUV are equal. But these changes depend on SVf_IVisUV being 0x8000,0000 and must be updated if SVf_IVisUV changes. Change SvXXXV_set() to be an explicity bodyless SV head optimization. MSVC 2022 -O1 combined SET_SVANY_FOR_BODYLESS_IV() and SvIV_set(). But instead of hopes and prayers on "UB" or "ISB/IDB" of CCs that could change at random in any previous or future build number of a CC, do it explictly. Bodyless SV head API is defined by P5P, not CC vendors. 9155444 3/20/2022 3:05:10 PM Perl_newSViv: simplify by using (inline) newSV_type Fix deoptimized Perl_newSViv(). In that commit it forgot about Perl_newSVuv(). Since newSV_type() is a inline fn, and "inline" is CC domain UB optimization. And newSV_type() is far more complex than CPP macro new_SV(), and newSV_type() depends on 100% perfection from CC's LTO engine and ".o" disk format, and possibly depends on the CC breaking ISO C spec with -O3 or -O4. Which turn on extreme SEGV inducing C variable aliasing rules that few C code bases tolerate. Quick examples, a reddit comment (not credible), claims "uint8_t *" and "char *" can not be casted since the CC or CPU has 9 bit bytes or a 9 wire data bus, and ECC parity wire is 9 of 9 for "uint8_t" and 8 of 9 for "signed char" and wire 9 for "char" is the ECC parity wire. The platform's libc's fwritef(), hides the secretly converts 9 bit bytes, to standard 8 bit bytes, making the CC "ISO C compliant". My more realistic scenario, inside newSV_type(). How can the CC know, what if Perl_more_sv() or Perl_more_bodies(), calls mprotect(), modifies "static const struct body_details bodies_by_type [];", calls mprotect() again, and returns execution to newSV_type()? Just switch to new_SV(). Its a CPP macro, not subject to CC UB inlining, and new_SV() only has 1 fn call and is super light weight. Old P5P commits/ML/CPAN dev talk about this area of code being crucial to (CPAN XS) deserializing perf in perl, so perf considerations, with proper asserts, has priority over readability. Links to old core commits in Perl#22653 briefly discuss deserializing perf as rational, so this patch also follows that design idea. Perl_vnewSVpvf(), "malloc(1ch);" which in reality is "malloc(16ch)" makes no sense, since almost zero chance fmtstr+args+\0 <= 16, and perl malloc() round up, is semi-UB/a build flag default on anyways. Using guesstimate malloc(pat_len), increases chances far higher, that a realloc() inside sv_vcatpvfn_flags(), OS realloc(), will realloc() in place, not changing the ptr, esp assuming OS malloc() does bucket of power of 2 allocator algo. Assume, 40ch malloc() fmt string, bucket to 64ch by OS malloc(), throw in a %u 32b, that is max +10ch-2ch for "%u". So output is 48ch. realloc(48ch) is inplace, therefore it is a win.

iabyn · 2024-10-14T13:42:33Z

On Sun, Oct 13, 2024 at 05:48:12AM -0700, bulk88 wrote: Without researching, (correct this if I am wrong) I think pp_add() the pure perl op codes, always do IV math first, then upgrade to UV math on overflow, then upgrade to NV math on overflow,

Yes, it was effectively a way to keep an SV as an integer (now unsigned) for values between IV_MAX and UV_MAX, rather than upgrading to NV. The number-handling code in the core is likely to be optimised around this assumption - i.e. it wasn't envisioned that the UV flag would typically be set for values below IV_MAX.

…

-- Technology is dominated by two types of people: those who understand what they do not manage, and those who manage what they do not understand.

tonycoz · 2024-10-21T00:41:53Z

Win32 32b and maybe some Solaris'es use and alloc high bit/negative pointer addresses, but I dont think P5 is compatible with such CPUs/C compilers/build configs.

We had at least one bug #12993 report where a 32-bit Linux system was allocating memory above the 2G mark in a child thread.

bulk88 · 2024-10-22T08:01:05Z

We had at least one bug #12993 report where a 32-bit Linux system was allocating memory above the 2G mark in a child thread.

So are negative pointers or 1.3 GB long scalars on 32b perl officially supported by P5P or not? I remember sisyphus and rurban some years ago both claim /3GB mode and Win32 32b Perl had instant SEGVs on start up and it was unfixable. I asked did they try to fix the SEGVs, they said yes, but abandoned it after the 3 or 5th SEGV fix they did and then deleted their private branch. I've never compiled /3GB mode myself.

bulk88 added the Needs Triage label Oct 10, 2024

bulk88 mentioned this issue Oct 10, 2024

builtin.c cleanup+optimize+bloat #22659

Open

jkeenan removed the Needs Triage label Oct 10, 2024

bulk88 mentioned this issue Oct 13, 2024

Newsvuvnviv taint api speedup #22662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV #22653

SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV #22653

bulk88 commented Oct 10, 2024

ericherman commented Oct 10, 2024

ericherman commented Oct 10, 2024

bulk88 commented Oct 13, 2024

iabyn commented Oct 14, 2024 via email

tonycoz commented Oct 21, 2024

bulk88 commented Oct 22, 2024

SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV #22653

SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV #22653

Comments

bulk88 commented Oct 10, 2024

ericherman commented Oct 10, 2024

ericherman commented Oct 10, 2024

bulk88 commented Oct 13, 2024

iabyn commented Oct 14, 2024 via email

tonycoz commented Oct 21, 2024

bulk88 commented Oct 22, 2024