Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about merge-path algo! #11

Open
dwwcqu opened this issue Oct 27, 2023 · 1 comment
Open

Questions about merge-path algo! #11

dwwcqu opened this issue Oct 27, 2023 · 1 comment

Comments

@dwwcqu
Copy link

dwwcqu commented Oct 27, 2023

When I ported the merge-spmm from cuda-8.0 to cuda11+ and successful compiled, I got the memory fatal access errors when running gspmm --mode=mergepath XXX.mtx.
The followings are compute-sanitizer tool's partial outputs:

========= Invalid __shared__ read of size 4 bytes
=========     at 0x12a0 in /merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:50:void mgpu::BinarySearchIt<(mgpu::MgpuBounds)1, int, const int *, int, mgpu::less<int>>(T3, int &, int &, T4, int, T5)
=========     by thread (0,0,0) in block (2,0,0)
=========     Address 0xf58 is out of bounds
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasearch.cuh:85:int mgpu::BinarySearch<(mgpu::MgpuBounds)1, int, const int *, mgpu::less<int>>(T3, int, T2, T4) [0x1280]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:92:int mgpu::DeviceExpandCsrRows<(int)128, (int)1>(int, int, const int *, int, int, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/device/ctasegreduce.cuh:166:mgpu::SegReduceTerms mgpu::DeviceSegReducePrepareSpmm<(int)128, (int)1>(const int *, int *, int, int, int, int, bool, int *, int *) [0xf80]
=========     Device Frame:/merge-spmm/ext/moderngpu/include/kernels/spmvcsr.cuh:203:void mgpu::KernelSpmmCsr<(int)32, (int)128, (bool)0, (bool)1, float *, int *, int *, const int *, float *, float *, float, mgpu::multiplies<float>, mgpu::plus<float>>(T5, T6, int, T7, T8, T9, T7, T10, T11 *, T11, T12, T13, int) [0xf20]

The source codes I have modified includes cusparse API, warp shuffle functions from __shfl* to __shfl_sync*, __any to __any_sync.

If you have any suggestions, thanks for helps!

@dwwcqu
Copy link
Author

dwwcqu commented Oct 27, 2023

The split row algo and cusparse API tests work OK and passed the datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant