Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few API functions were (accidentally?) removed in 3.6 #534

Closed
musicinmybrain opened this issue Mar 19, 2024 · 11 comments · Fixed by #545
Closed

A few API functions were (accidentally?) removed in 3.6 #534

musicinmybrain opened this issue Mar 19, 2024 · 11 comments · Fixed by #545

Comments

@musicinmybrain
Copy link
Contributor

Although sleef 3.6 is supposed to be an ABI-compatible update for 3.5.1 (the SONAME version is still 3), comparing the main shared library using fedabipkgdiff from libabigail shows that a handful of symbols were removed. Of these, Sleef_sincospil_u05 and Sleef_sincospil_u35 were documented API functions.

Comparing the ABI of binaries between sleef-3.5.1-33.fc41.x86_64.rpm and sleef-3.6-1.fc41.x86_64.rpm:

================ changes of 'libsleef.so.3.5.1'===============
  Functions changes summary: 4 Removed, 67 Changed (1796 filtered out), 148 Added functions
  Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

  4 Removed functions:

    [D] 'function const Sleef_longdouble2 Sleef_sincospil_u05(long double)'    {Sleef_sincospil_u05}
    [D] 'function const Sleef_longdouble2 Sleef_sincospil_u35(long double)'    {Sleef_sincospil_u35}
    [D] 'function const Sleef_quad2 Sleef_sincospiq_u05(Sleef_quad)'    {Sleef_sincospiq_u05}
    [D] 'function const Sleef_quad2 Sleef_sincospiq_u35(Sleef_quad)'    {Sleef_sincospiq_u35}

This is on x86_64; I haven’t checked other architectures.

In a quick glance at the source code, it wasn’t immediately obvious to me why these symbols disappeared. I do notice that they are all at the end of rename.h, after a comment line:

sleef/src/libm/rename.h

Lines 175 to 181 in a99491a

//
#define xsincospil_u05 Sleef_sincospil_u05
#define xsincospil_u35 Sleef_sincospil_u35
#define xsincospiq_u05 Sleef_sincospiq_u05
#define xsincospiq_u35 Sleef_sincospiq_u35

While nothing in rename.h has changed between 3.5.1 and 3.6, maybe this was affected by changes in mkrename.c?

@musicinmybrain
Copy link
Contributor Author

I bisected this to 1d66bbd.

The changes in sleeflibm_header.h.org.in look suspicious:

1d66bbd#diff-6a6c4d54f858f55978b29ae41b00d4a8b484ddf7908a136abd4f9a003458d2c3L312

@blapie
Copy link
Collaborator

blapie commented Mar 21, 2024

Good point. It seems that scalar long double/quad sincospi implementation was removed along with the long double DFT (where it was used).
I agree that this should have triggered a new major version. I would be keen to re-integrate the function as it can still be useful to users.

Maybe a bit of background from @shibatch could help?

What would be the process to fix 3.6? Would we simply have to create a new patch release?

I will be officially AFK until April 22 (sabbatical), but @joanaxcruz, @joeramsay and @shibatch might be able to follow up on this one.

@musicinmybrain
Copy link
Contributor Author

I’m going to hold off on updating the Fedora package past 3.5.1 for up to another month or two, in the hope that it will still be possible to avoid an ABI break (and update Sleef in existing stable releases).

@blapie
Copy link
Collaborator

blapie commented May 7, 2024

Hi! Sorry about the delay in responding to that. We are planning a patch release including a fix for that in the next month. Will post more in discussions.

@musicinmybrain
Copy link
Contributor Author

Thanks for the update!

blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.

Change-Id: I2e4efa4682e05d351535cf66cfc4adc806f6a67b
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
blapie added a commit to blapie/sleef that referenced this issue May 31, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API shibatch#534. Test should fail until it is fixed by shibatch#545.
@blapie blapie linked a pull request Jun 5, 2024 that will close this issue
@blapie
Copy link
Collaborator

blapie commented Jun 5, 2024

@musicinmybrain We are about to merge #545 which should restore the missing symbols. Please re-open issue if needed, it should close automatically on merge.

@musicinmybrain
Copy link
Contributor Author

I can confirm that applying https://github.com/shibatch/sleef/pull/545.patch to the 3.6 release restores the missing functions, at least on x86_64 – I didn’t take the time to test other architectures.

@blapie
Copy link
Collaborator

blapie commented Jun 5, 2024

Thanks! This should be cross platform, tested it on both x86 and aarch64.

@musicinmybrain
Copy link
Contributor Author

Since PyTorch still needed a const-related change for sleef 3.6 (basically pytorch/pytorch@3c2b5d4, but actually implemented and merged as a much larger commit pytorch/pytorch@56451cd), I think I’m still going to keep stable Fedora releases (F40, F39) and EPELs (EPEL9/8/7) at 3.5.1 as a precaution – but I see no obstacle to updating Rawhide to a patched 3.6 or to the upcoming 3.6.1, for inclusion in Fedora 41 and later.

@blapie
Copy link
Collaborator

blapie commented Jun 7, 2024

Sounds sensible!
Do you know if there is anything we can/should do to solve the PyTorch issue? And should that be integrated to 3.6.1 or 3.7?

@musicinmybrain
Copy link
Contributor Author

Sounds sensible! Do you know if there is anything we can/should do to solve the PyTorch issue? And should that be integrated to 3.6.1 or 3.7?

To be honest, I haven’t studied this closely enough to give really high-quality advice. However, at a glance, from the compiler errors without the PyTorch change, e.g.

In file included from /builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/vec256/vec256.h:12,
                 from /builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/vec.h:6,
                 from /builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/functional_base.h:6,
                 from /builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/functional.h:3,
                 from /builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp:12,
                 from /builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/build/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp.AVX2.cpp:1:
/builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h: In instantiation of ‘at::vec::AVX2::Vectorized<T> at::vec::AVX2::Vectorized16<T>::log() const [with T = c10::BFloat16]’:
/builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/native/cpu/UnaryOpsKernel.cpp:141:3:   required from here
  156 |                 return (x_vec / (kOneVec - x_vec)).log();
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
/builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:482:16: error: invalid conversion from ‘__m256 (*)(__m256)’ to ‘const __m256 (*)(__m256)’ [-fpermissive]
  482 |     return map(Sleef_logf8_u10);
      |                ^~~~~~~~~~~~~~~
      |                |
      |                __m256 (*)(__m256)
/builddir/build/BUILD/python-torch-2.3.0-build/pytorch-v2.3.0/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:268:42: note:   initializing argument 1 of ‘at::vec::AVX2::Vectorized<T> at::vec::AVX2::Vectorized16<T>::map(const __m256 (*)(__m256)) const [with T = c10::BFloat16; __m256 = __m256]’
  268 |   Vectorized<T> map(const __m256 (*const vop)(__m256)) const {
      |                     ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~

it looks like using __attribute__ ((const)) instead of declaring the return values const changed the signatures of a number of functions so that function-pointer declarations in PyTorch, e.g.

https://github.com/pytorch/pytorch/blob/8dd1be49b7c17223ed6de18fac4471abc6735aae/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h#L268

became incompatible. Now they look like:

https://github.com/pytorch/pytorch/blob/543a870943120484db547382ed9ca9538a40f284/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h#L308

Presumably this would affect anything that’s taking pointers to sleef functions without using C++/C23 auto or similar to declare their types.

It’s unfortunate that this technically changed the API, because __attribute__ ((const)) makes much more sense.

If you try to fix this in sleef somehow, it’s probably a good idea to make sure to examine the PyTorch workaround and/or coordinate with the PyTorch people to make sure that the fix doesn’t break PyTorch again.

blapie added a commit that referenced this issue Oct 9, 2024
Add a new workflow to detect unexpected backward-incompatible
changes at precommit, by comparing current version against a
reference version, both generated on Linux with gcc.
It uses abidiff to compare ELF-format shared libraries.
This is the tool used by Fedora to report a recent breach in
API #534. Test should fail until it is fixed by #545.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants