Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA test failure: tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13, but only the _first time_ it is run #3214

Closed
jpivarski opened this issue Aug 14, 2024 · 2 comments
Assignees
Labels
bug The problem described is something that must be fixed gpu Concerns the GPU implementation (backend = "cuda')

Comments

@jpivarski
Copy link
Member

Version of Awkward Array

HEAD

Description and code to reproduce

The first time I ran pytest tests-cuda on a new system, I got one test failure:

=============================================================== test session starts ================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
Matplotlib: 3.9.1
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, reverse-1.7.0, mpl-0.17.0, anyio-4.4.0, mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 694 items                                                                                                                                

tests-cuda/test_1276_cuda_num.py .........                                                                                                   [  1%]
tests-cuda/test_1276_cuda_transfers.py ................                                                                                      [  3%]
tests-cuda/test_1276_cupy_interop.py .                                                                                                       [  3%]
tests-cuda/test_1276_from_cupy.py .....                                                                                                      [  4%]
tests-cuda/test_1300_same_for_numba_cuda.py .......................                                                                          [  7%]
tests-cuda/test_1381_check_errors.py .                                                                                                       [  7%]
tests-cuda/test_1809_array_cuda_jit.py ..............                                                                                        [  9%]
tests-cuda/test_2327_array_interface.py .                                                                                                    [ 10%]
tests-cuda/test_2649_dlpack_support.py .                                                                                                     [ 10%]
tests-cuda/test_2922a_new_cuda_kernels.py ......................................................................                             [ 20%]
tests-cuda/test_2922b_new_cuda_kernels.py .............................                                                                      [ 24%]
tests-cuda/test_3065a_cuda_kernels.py ...................................................................................................... [ 39%]
.....................................................................................................................................        [ 58%]
tests-cuda/test_3065b_cuda_kernels.py ........................                                                                               [ 61%]
tests-cuda/test_3065c_cuda_kernels.py ...................................................                                                    [ 69%]
tests-cuda/test_3086_cuda_concatenate.py ....................                                                                                [ 72%]
tests-cuda/test_3115_array_typed_cuda_jit.py .                                                                                               [ 72%]
tests-cuda/test_3130_cuda_listarray_getitem_next.py ................                                                                         [ 74%]
tests-cuda/test_3136_cuda_argmin_and_argmax.py sssssss                                                                                       [ 75%]
tests-cuda/test_3136_cuda_reducers.py ..................                                                                                     [ 78%]
tests-cuda/test_3140_cuda_jagged_and_masked_getitem.py ..........................                                                            [ 81%]
tests-cuda/test_3140_cuda_slicing.py ....................                                                                                    [ 84%]
tests-cuda/test_3141_cuda_misc.py ......                                                                                                     [ 85%]
tests-cuda/test_3149_complex_reducers.py ......................F.........ssss                                                                [ 90%]
tests-cuda/test_3150_combinations_n_equal_2.py .....................                                                                         [ 93%]
tests-cuda/test_3162_block_boundary_reducers.py ......ss....                                                                                 [ 95%]
tests-cuda/test_3162_cuda_generic_reducer_operation.py .......................s.......                                                       [100%]

===================================================================== FAILURES =====================================================================
________________________________________________________ test_block_boundary_prod_complex13 ________________________________________________________

    def test_block_boundary_prod_complex13():
        np.random.seed(42)
        array = np.random.randint(50, size=1000)
        complex_array = np.vectorize(complex)(
            array[0 : len(array) : 2], array[1 : len(array) : 2]
        )
        content = ak.contents.NumpyArray(complex_array)
        cuda_content = ak.to_backend(content, "cuda", highlevel=False)
        cpt.assert_allclose(
            ak.prod(cuda_content, -1, highlevel=False),
            ak.prod(content, -1, highlevel=False),
        )
    
        offsets = ak.index.Index64(np.array([0, 5, 996, 1000], dtype=np.int64))
        depth1 = ak.contents.ListOffsetArray(offsets, content)
        cuda_depth1 = ak.to_backend(depth1, "cuda", highlevel=False)
>       cpt.assert_allclose(
            to_list(ak.prod(cuda_depth1, -1, highlevel=False)),
            to_list(ak.prod(depth1, -1, highlevel=False)),
        )

array      = array([38, 28, 14, 42,  7, 20, 38, 18, 22, 10, 10, 23, 35, 39, 23,  2, 21,
        1, 23, 43, 29, 37,  1, 20, 32, 11, ...19, 24,  3,  9,  2, 40, 44, 17, 46, 35, 46, 21, 33, 46,
        7, 39, 48, 43, 18, 41, 40, 36,  5, 25, 33, 44,  5, 36])
complex_array = array([38.+28.j, 14.+42.j,  7.+20.j, 38.+18.j, 22.+10.j, 10.+23.j,
       35.+39.j, 23. +2.j, 21. +1.j, 23.+43.j, 29.+...7.j, 46.+35.j, 46.+21.j,
       33.+46.j,  7.+39.j, 48.+43.j, 18.+41.j, 40.+36.j,  5.+25.j,
       33.+44.j,  5.+36.j])
content    = <NumpyArray dtype='complex128' len='500'>
    [38.+28.j 14.+42.j  7.+20.j 38.+18.j 22.+10.j 10.+23.j 35.+39.j 23. +2.j... 44.+17.j 46.+35.j 46.+21.j 33.+46.j  7.+39.j 48.+43.j 18.+41.j
     40.+36.j  5.+25.j 33.+44.j  5.+36.j]
</NumpyArray>
cuda_content = <NumpyArray dtype='complex128' len='500'>
    [38.+28.j 14.+42.j  7.+20.j 38.+18.j 22.+10.j 10.+23.j 35.+39.j 23. +2.j... 44.+17.j 46.+35.j 46.+21.j 33.+46.j  7.+39.j 48.+43.j 18.+41.j
     40.+36.j  5.+25.j 33.+44.j  5.+36.j]
</NumpyArray>
cuda_depth1 = <ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [   0    5  996 1000]
    </Index></offse... 7.+39.j 48.+43.j 18.+41.j 40.+36.j
          5.+25.j 33.+44.j  5.+36.j]
    </NumpyArray></content>
</ListOffsetArray>
depth1     = <ListOffsetArray len='3'>
    <offsets><Index dtype='int64' len='4'>
        [   0    5  996 1000]
    </Index></offse... 7.+39.j 48.+43.j 18.+41.j 40.+36.j
          5.+25.j 33.+44.j  5.+36.j]
    </NumpyArray></content>
</ListOffsetArray>
offsets    = <Index dtype='int64' len='4'>
    [   0    5  996 1000]
</Index>

tests-cuda/test_3149_complex_reducers.py:575: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../miniforge3/lib/python3.11/site-packages/cupy/testing/_array.py:24: in assert_allclose
    numpy.testing.assert_allclose(
        actual     = [(-29843744-33672352j), (nan+nanj), 0j]
        atol       = 0
        desired    = [(-29843744-33672352j), (nan+nanj), (1.4641000000000006-0j)]
        err_msg    = ''
        rtol       = 1e-07
        verbose    = True
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<function assert_allclose.<locals>.compare at 0x7072ae3f9580>, array([-29843744.-33672352.j,        nan      +nanj,
 ...      0.       +0.j]), array([-2.9843744e+07-33672352.j,            nan      +nanj,
        1.4641000e+00       -0.j]))
kwds = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-07, atol=0', 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError: 
E           Not equal to tolerance rtol=1e-07, atol=0
E           
E           Mismatched elements: 1 / 3 (33.3%)
E           Max absolute difference: 1.4641
E           Max relative difference: 1.
E            x: array([-29843744.-33672352.j,        nan      +nanj,
E                          0.       +0.j])
E            y: array([-2.984374e+07-33672352.j,           nan      +nanj,
E                   1.464100e+00       -0.j])

args       = (<function assert_allclose.<locals>.compare at 0x7072ae3f9580>, array([-29843744.-33672352.j,        nan      +nanj,
 ...      0.       +0.j]), array([-2.9843744e+07-33672352.j,            nan      +nanj,
        1.4641000e+00       -0.j]))
func       = <function assert_array_compare at 0x7072f1849e40>
kwds       = {'equal_nan': True, 'err_msg': '', 'header': 'Not equal to tolerance rtol=1e-07, atol=0', 'verbose': True}
self       = <contextlib._GeneratorContextManager object at 0x7072f1870790>

../../miniforge3/lib/python3.11/contextlib.py:81: AssertionError
============================================================= short test summary info ==============================================================
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:18: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:40: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:51: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:115: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3136_cuda_argmin_and_argmax.py:138: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [2] tests-cuda/test_3136_cuda_argmin_and_argmax.py:177: awkward_reduce_argmin and awkward_reduce_argmax are not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:773: awkward_reduce_argmax_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:795: awkward_reduce_argmax_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:817: awkward_reduce_argmin_complex is not implemented
SKIPPED [1] tests-cuda/test_3149_complex_reducers.py:839: awkward_reduce_argmin_complex is not implemented
SKIPPED [1] tests-cuda/test_3162_block_boundary_reducers.py:121: awkward_reduce_argmin is not implemented
SKIPPED [1] tests-cuda/test_3162_block_boundary_reducers.py:139: awkward_reduce_argmax is not implemented
SKIPPED [1] tests-cuda/test_3162_cuda_generic_reducer_operation.py:847: awkward_reduce_argmin is not implemented
FAILED tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13 - AssertionError: 
==================================================== 1 failed, 679 passed, 14 skipped in 21.50s ====================================================

The error is that the numerical result for this array is wrong.

Subsequently re-running this test did not result in any errors. That's very strange. I tried to make a reproducer on Google Colab, but couldn't install CuPy on it.

I also tried uninstalling and reinstalling Awkward:

% pip uninstall awkward 
Found existing installation: awkward 2.6.7
Uninstalling awkward-2.6.7:
  Would remove:
    /home/jpivarski/miniforge3/lib/python3.11/site-packages/_awkward.pth
    /home/jpivarski/miniforge3/lib/python3.11/site-packages/awkward-2.6.7.dist-info/*
    /home/jpivarski/miniforge3/lib/python3.11/site-packages/awkward/juliapkg.json
Proceed (Y/n)? 
  Successfully uninstalled awkward-2.6.7
% pip install -e .
Obtaining file:///home/jpivarski/irishep/awkward
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: awkward-cpp==37 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (37)
Requirement already satisfied: fsspec>=2022.11.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (2024.6.1)
Requirement already satisfied: importlib-metadata>=4.13.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (8.2.0)
Requirement already satisfied: numpy>=1.18.0 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (1.26.4)
Requirement already satisfied: packaging in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from awkward==2.6.7) (24.1)
Requirement already satisfied: zipp>=0.5 in /home/jpivarski/miniforge3/lib/python3.11/site-packages (from importlib-metadata>=4.13.0->awkward==2.6.7) (3.20.0)
Building wheels for collected packages: awkward
  Building editable for awkward (pyproject.toml) ... done
  Created wheel for awkward: filename=awkward-2.6.7-py3-none-any.whl size=5067 sha256=0ddf47f970c3ab51619d8a5d6b0072a315f422f3883f77c4465ad3900915dd27
  Stored in directory: /tmp/pip-ephem-wheel-cache-acpm0m7u/wheels/56/e1/a6/2c4dae09851e882a1c0d9a375beb305bc10de51cda49eccf35
Successfully built awkward
Installing collected packages: awkward
Successfully installed awkward-2.6.7
% pytest tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13
=============================================================== test session starts ================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
Matplotlib: 3.9.1
Freetype: 2.12.1
rootdir: /home/jpivarski/irishep/awkward
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, reverse-1.7.0, mpl-0.17.0, anyio-4.4.0, mock-3.14.0, cov-5.0.0, xdist-3.6.1
collected 1 item                                                                                                                                   

tests-cuda/test_3149_complex_reducers.py .                                                                                                   [100%]

================================================================ 1 passed in 3.69s =================================================================

But that didn't do it.

Maybe this has nothing to do with being the first time, and it's a very rare synchronization bug.

I tried running it 100 times:

for x in 0 1 2 3 4 5 6 7 8 9; do for y in 0 1 2 3 4 5 6 7 8 9; do pytest tests-cuda/test_3149_complex_reducers.py::test_block_boundary_prod_complex13; done; done

but that didn't do it—it still passes.

@ianna, if you can't reproduce it, just close this issue.

@jpivarski jpivarski added bug (unverified) The problem described would be a bug, but needs to be triaged gpu Concerns the GPU implementation (backend = "cuda') labels Aug 14, 2024
@ianna ianna self-assigned this Aug 16, 2024
@ianna ianna added bug The problem described is something that must be fixed and removed bug (unverified) The problem described would be a bug, but needs to be triaged labels Aug 16, 2024
@ianna
Copy link
Collaborator

ianna commented Aug 16, 2024

@jpivarski - yes, I can reproduce it. It did not fail when firstly the single test was run separately as:

python -m pytest tests-cuda/test_3149_complex_reducers.py

but it did fail when it was run as a full set of tests:

python -m pytest tests-cuda

@ianna ianna linked a pull request Aug 16, 2024 that will close this issue
@ianna
Copy link
Collaborator

ianna commented Sep 24, 2024

fixed in #3235

@ianna ianna closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed gpu Concerns the GPU implementation (backend = "cuda')
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants