Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV #22653

Open
bulk88 opened this issue Oct 10, 2024 · 6 comments

Comments

@bulk88
Copy link
Contributor

bulk88 commented Oct 10, 2024

Description
I was stepping debugging some XS code, and noticed, if you create a SvUV, with sv_setuv(), then later use SvUV() macro or eqv inline function, %99.999 of the time, Perl_sv_2uv_flags() will execute.

This "SvUV" created with sv_setuv(), or newSVuv()

SV = IV(0x6d2de0) at 0x6d2df0
  REFCNT = 2
  FLAGS = (IOK,pIOK)
  IV = 5369402920

the IV is really a C pointer/opaque pointer from a C library. 100% of the time, SvUV() macro, will execute Perl_sv_2uv_flags(). Because of

SV *
Perl_newSVuv(pTHX_ const UV u)
{
    SV *sv;

    /* Inlining ONLY the small relevant subset of sv_setuv here
     * for performance. Makes a significant difference. */

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

    new_SV(sv);

    /* We're starting from SVt_FIRST, so provided that's
     * actual 0, we don't have to unset any SV type flags
     * to promote to SVt_IV. */
    STATIC_ASSERT_STMT(SVt_FIRST == 0);

    SET_SVANY_FOR_BODYLESS_IV(sv);
    SvFLAGS(sv) |= SVt_IV;
    (void)SvIOK_on(sv);
    (void)SvIsUV_on(sv);

    SvUV_set(sv, u);
    SvTAINT(sv);

    return sv;
}

that

    if (u <= (UV)IV_MAX) {

turns off SVf_IVisUV in final SvUV, then the SvUOK_nog() in SvUV(), says "NOT AN INTEGER" and calls the HEAVY getter Perl_sv_2uv_flags() each time. Since these are pointers, I think on 99% OSes, highest bit in an address, or negative addresses, are kernel space, and prohibited for user mode. Win32 32b and maybe some Solaris'es use and alloc high bit/negative pointer addresses, but I dont think P5 is compatible with such CPUs/C compilers/build configs.

I traced

+#define SvUOK_nog(sv)		((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV|SVs_GMG)) == (SVf_IOK|SVf_IVisUV))
@@ -1568,9 +1587,9 @@ Like sv_utf8_upgrade, but doesn't do magic on C<sv>.
 */
 
 /* Let us hope that bitmaps for UV and IV are the same */
-#define SvIV(sv) (SvIOK(sv) ? SvIVX(sv) : sv_2iv(sv))
-#define SvUV(sv) (SvIOK(sv) ? SvUVX(sv) : sv_2uv(sv))
-#define SvNV(sv) (SvNOK(sv) ? SvNVX(sv) : sv_2nv(sv))
+#define SvIV(sv) (SvIOK_nog(sv) ? SvIVX(sv) : sv_2iv(sv))
+#define SvUV(sv) (SvUOK_nog(sv) ? SvUVX(sv) : sv_2uv(sv))
+#define SvNV(sv) (SvNOK_nog(sv) ? SvNVX(sv) : sv_2nv(sv))

to

Revision: 4bac9ae
Author: Chip Salzenberg [email protected]
Date: 6/22/2012 6:18:18 PM
Message:
Magic flags harmonization.

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

traced to

Revision: 4d55d9a
Author: Steffen Müller [email protected]
Date: 11/28/2014 11:34:00 AM
Message:
Repeat newSViv optimization for newSVuv

and that quotes

Revision: bd30fe8
Author: Eric Herman [email protected]
Date: 11/28/2014 9:08:04 AM
Message:
Speed up newSViv()

see also

Revision: 013abb9
Author: Nicholas Clark [email protected]
Date: 2/1/2012 4:58:14 PM
Message:
Update, correct and clarify the comment in Perl_sv_setuv().

See the correspondence on ticket #36459 for more details.

AKA #8002

So questions, why is #define SvUOK_nog(sv) ((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV|SVs_GMG)) == (SVf_IOK|SVf_IVisUV)) used? isnt the byte size and bitfields for IV and UV 100% identical all CPUs OSes?

Doesnt SVf_IVisUV only matter for NV FP double conversion and stringify?

Does SV MAGIC somehow pass "want UV" vs "want IV" flags to the CPAN XS vtable or CPAN PP tied SCALARs?

Do we throw truncate/overflow console warnings for SvIV ptrs (SVf_IOK yes SVf_IVisUV no), passed to SvUV() getter macro, with the SVIV being "-1"? I can't find such code (XSUB level, not PP ops) in sv.c, but I will ask anyways.

This is bad enough a fix should go into maint releases.

PS, the problem in found in blead Perl 5 core XS and PP and runtime code. Because SvUV() is being used in some places to fetch pointer type integers.

Dirty fix, CPAN author side, always use setiv() SvIV() SvIVX() or ONLY use SvUVX() in XS code to store/get/read/set pointers and pointer like data, hell, just NEVER use SvUV() macro, since ONLY "(UV)-1" and "(UV)-123456" ever use the "fast path" in SvUV().

There needs to be a P5P fix for this. Vs manual hunting and patching of CPAN XS and core.

Steps to Reproduce

sv_setuv(); sv_dump(); SvUV(); and use a C step debugger inside SvUV(); statement

Expected behavior

Do not do a func call inside SvUV(); for SVIV flagged SV heads.

Perl configuration
NA, blead perl 5.41 and many other versions

@ericherman
Copy link
Contributor

How much does:

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

actually save?

Even though 4d55d9a states that this is "Pretty much the same change as bd30fe8", but that particular short-cut is in addition to that.

I wonder if it would be better to simply remove that.

@ericherman
Copy link
Contributor

I strongly suspect @tsee tested this with a micro-benchmark, but I do not know what benchmark was used.

bulk88 added a commit to bulk88/perl5 that referenced this issue Oct 10, 2024
builtin.c is very new (2021), and while the code works, it is not a
shining good example of C or XS code and best practices

-actual bug
cv_set_call_checker_flags(cv, builtin->checker, newSVuv(PTR2UV(builtin)), 0);
leaks refcnt

-remove and reorganize all the misaligned fields in
 "struct BuiltinFuncDescriptor", on x64, it was 6 ptrs big. After this
 patch, each element is 2 ptrs big. (21*6*8)-(21*2*8) = 672 bytes saved,
 from interp binary, and it was almost all null bytes.
-To make a bitfield take bits from U8 sub_name_len not U16 ckval, which
 are 0-~400 perl opcodes. less chance of, faster visibility, and less
 debugging to do, if very long method names get added and broken, vs
 breaking perl runtime engine impl details if suddenly the opcodes grow
 in the future.

-S_warn_experimental_builtin() will likely never execute, factor the
 "builtin->name" pointer read the cold separate static func vs at callers.

-S_export_lexical() batch ptr r/w, so CC doesn' have to again emit a r/w to
 ithread interp var PL_curpad after the SvREFCNT_dec() fn call. Do ++
 first, in case an almost impossible exception is thrown from
 SvREFCNT_dec(). Paranoia, no real reason.

-also note inefficiency that pad_add_name_pvn() allocates a SV head, AND
 upgrades it to SVt_PVCV, and right after S_export_lexical() frees that
 SV *, and puts our own in. No Perl API exists to allocate a pad slot, with
 an uninit garbage pad slot returned ready to fill.

-pad_add_name_sv->pad_add_name_pvn, pad_add_name_sv() has no perf
 (HEK? COW?) benefits over pad_add_name_pvn(), and is only a thin wrapper.
 Don't use SVs in non-SV APIs if a cheaper malloc()/cstring can be used.

-"SvUV(ckobj)"->"SvUVX(ckobj)", remove the unconditional "2uv()" fn call.
 That fix locally the major problem in
 Perl#22653 . Bc we created this SV, nobody
 knows about it, and just blindly deref with SvUVX skipping magic and
 stringifying overhead. ck_builtin_*()s execute alot in mktables and
 modern perl code, its not the BOOT: sub.

-'SV *prototype = newSVpvs(""); SAVEFREESV(prototype);' dont make this
 temp SV over and over. Use MY_CXT to store our 3 high refcnt COW
 strings. Using COW shared HEK instead other COW types, since HEK code is
 much older and "stable", and the "new COW" API in sv.c has a very
 little API, and I think, untested, would assert, panic, or generically
 corrupt a CV * leading to a SEGV. sv_sethek() is "safer". Not using
 flag SVppv_STATIC and C strings, since no API exists for SVppv_STATIC.
 Maybe one day, SVppv_STATIC can be put in here, for now sv_sethek() is
 simpler.

-Perl_die->Perl_die_nocontext saves CC CPU opcodes on ithread perl.

-S_import_sym() do not make multiple redundant cycle through SV code for
 1 use "builtin::foo" and "&foo" strings. And S_import_sym() can be called
 with PP ->import(); many times x 21 on script start up, b/c of many
 .pm's wanting these subs.

-S_cv_is_builtin() we set CvFILE() on our newXS call, it would be bizzare
 if the CV* does NOT have our CvFILE ro string ptrs anymore. Direct ptr
 comparison skips the call to libc on for streq. Should be a panic if
 CV_DYNFILE or alien CvFILE RO string ptr is found. But KISS and only do
 ptr cmp for perf.

-pad_findmy_sv->pad_findmy_pvn, left is thin wrap for right, skip SV *
 alloc and SVPV manip code and go very close to assembly with a C stack
 buffer and maybe inlined memcpy()s.

-BOOT: newXS_flags()->newXS_len_flags(), remove strlen(), also, share the
 "$" and "@" PVs between the CV*s. There is a lack of perlapi, and other
 design discussions, why newXS_len_flags() with RO const char * proto,
 mallloc()s a copy, and remember Perl rounds up to 16 bytes for a PV *,
 and "2 PTRs" secret malloc header, so, for prototype string "$", 2 bytes.
 32 bytes * 21 XSUBS = 672 or more bytes were wasted using proto arg
 of newXS_len_flags(). sv_sethek() fixes that. Also remember the SV*/HEK*
 is reused for ck_entersub_args_proto() later on.

savings are disk/file size (RO data) of libperl bin, and proc
unconditional startup overhead in CPU time, perm malloc() reduction,
faster ->import() per package/per yy_parser
@bulk88
Copy link
Contributor Author

bulk88 commented Oct 13, 2024

I strongly suspect @tsee tested this with a micro-benchmark, but I do not know what benchmark was used.

Without researching, (correct this if I am wrong) I think pp_add() the pure perl op codes, always do IV math first, then upgrade to UV math on overflow, then upgrade to NV math on overflow, plus negative numbers are popular jkjk. If pp_math() ops are IV priority first, UV math will definitely be slower if that is the 2nd or 3rd branch/test jump in the C code.

Perhaps some CPUs, gaming/assembly/big data back then, signed ints were some nanoseconds faster than unsigned ints, or zero extend vs sign extend was slightly different speeds on CPUs back then.

Or Steffen??? has a biz algo, where negative numbers -2billion to 0 to +2billion, kept his code in IV integer mode, while UV's constantly overflowed and degraded to slower NV/FPUs. But that is very problem specific to one user.

I want to just change the SvUV() macro, so SVIV no UV flag, take fast path (no fnc call), through SvUV() macro, but the git history is so complicated, and changed in the 2010s, it needs the original authors or chip in, or someone who remembers from back then.

I vaguely remember Father C 10 years ago, being very active in poking and writing fixes for the SV XS magic system, since he found and claimed and probably did fix many cases of "data loss" or data degradation (IV/UV/NV/PV casting IDK where, that destroys round trip ability) through the perlapi setters/getters. Including some kind of flaws with private IOK NOK POK vs public IOK NOK POK.

4bac9ae

Revolves around data loss inside Perl Magic API, and that is the specific commit that caused the damage (broke the fast path in SvUV).

A trolling argument I have is, is its "required" to call sv_2uv() fn call on an IV, incase the IV is negative, and that is a programmer error, or security exploit, and P5 must process abort or throw large console messages?

I'd really like to see this fixed, since there is tons of CPAN code storing pointers in sv_setuv()/SvUV(), that is suffering.

Perhaps, if nobody has technical input, it is just best to be adventurous and change SvUV() and see how much of core breaks, and see if CPAN complaints roll in, or its crickets from CPAN, which means the SvUV change was a success.

I'd like a 2nd opinion, since git blame commits, point to repairing "data loss" 10 years ago, and a slower thought process is needed.

bulk88 added a commit to bulk88/perl5 that referenced this issue Oct 13, 2024
related to "SvUV() macro 100% of time calls Perl_sv_2uv_flags"
Perl#22653

Until Perl#22653 is solved, clean up newSVuv() and remove branch
to "newSViv()" that is unexplained by git blame. BUT, keep original
intent and behaviour of "newSViv()" branch for now.

Add asserts to guard against 0x8000,0000 == SVf_IVisUV changing.
Value of SVf_IVisUV can change in the future, and there might be
(I didn't git blame), logic that sign flag and SVf_IVisUV are equal.
But these changes depend on SVf_IVisUV being 0x8000,0000 and must
be updated if SVf_IVisUV changes.

Change SvXXXV_set() to be an explicity bodyless SV head optimization. MSVC
2022 -O1 combined SET_SVANY_FOR_BODYLESS_IV() and SvIV_set(). But instead
of hopes and prayers on "UB" or "ISB/IDB" of CCs that could change at
random in any previous or future build number of a CC, do it explictly.
Bodyless SV head API is defined by P5P, not CC vendors.

9155444 3/20/2022 3:05:10 PM
Perl_newSViv: simplify by using (inline) newSV_type

Fix deoptimized Perl_newSViv(). In that commit it forgot about
Perl_newSVuv(). Since newSV_type() is a inline fn, and "inline" is
CC domain UB optimization. And newSV_type() is far more complex than
CPP macro new_SV(), and newSV_type() depends on 100% perfection from
CC's LTO engine and ".o" disk format, and possibly depends on the CC
breaking ISO C spec with -O3 or -O4. Which turn on extreme SEGV inducing
C variable aliasing rules that few C code bases tolerate. Quick examples,
a reddit comment (not credible), claims "uint8_t *" and "char *" can not
be casted since the CC or CPU has 9 bit bytes or a 9 wire data bus, and
ECC parity wire is 9 of 9 for "uint8_t" and 8 of 9 for "signed char" and
wire 9 for "char" is the ECC parity wire. The platform's libc's fwritef(), hides the
secretly converts 9 bit bytes, to standard 8 bit bytes, making
the CC "ISO C compliant".

My more realistic scenario, inside newSV_type(). How can the CC know,
what if Perl_more_sv() or Perl_more_bodies(), calls mprotect(), modifies
"static const struct body_details bodies_by_type [];",
calls mprotect() again, and returns execution to newSV_type()?

Just switch to new_SV(). Its a CPP macro, not subject to CC UB inlining,
and new_SV() only has 1 fn call and is super light weight.

Old P5P commits/ML/CPAN dev talk about this area of code being crucial to
(CPAN XS) deserializing perf in perl, so perf considerations,
with proper asserts, has priority over readability.

Links to old core commits in Perl#22653
briefly discuss deserializing perf as rational, so this patch also follows
that design idea.

Perl_vnewSVpvf(), "malloc(1ch);" which in reality is "malloc(16ch)" makes
no sense, since almost zero chance fmtstr+args+\0 <= 16, and perl malloc()
round up, is semi-UB/a build flag default on anyways. Using guesstimate
malloc(pat_len), increases chances far higher, that a realloc() inside
sv_vcatpvfn_flags(), OS realloc(), will realloc() in place, not changing
the ptr, esp assuming OS malloc() does bucket of power of 2
allocator algo. Assume, 40ch malloc() fmt string, bucket to 64ch by
OS malloc(), throw in a %u 32b, that is max +10ch-2ch for "%u".
So output is 48ch. realloc(48ch) is inplace, therefore it is a win.
@iabyn
Copy link
Contributor

iabyn commented Oct 14, 2024 via email

@tonycoz
Copy link
Contributor

tonycoz commented Oct 21, 2024

Win32 32b and maybe some Solaris'es use and alloc high bit/negative pointer addresses, but I dont think P5 is compatible with such CPUs/C compilers/build configs.

We had at least one bug #12993 report where a 32-bit Linux system was allocating memory above the 2G mark in a child thread.

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 22, 2024

We had at least one bug #12993 report where a 32-bit Linux system was allocating memory above the 2G mark in a child thread.

So are negative pointers or 1.3 GB long scalars on 32b perl officially supported by P5P or not? I remember sisyphus and rurban some years ago both claim /3GB mode and Win32 32b Perl had instant SEGVs on start up and it was unfixable. I asked did they try to fix the SEGVs, they said yes, but abandoned it after the 3 or 5th SEGV fix they did and then deleted their private branch. I've never compiled /3GB mode myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants