Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hip mfma tests #246

Open
wants to merge 92 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
1594a50
adding inital infrastructure for MFMA test
CRobeck Apr 26, 2022
e3d5aac
working hip mfma with matrix core builtins
CRobeck Apr 27, 2022
332067b
fixing incorrect default problem size
CRobeck Apr 27, 2022
8aa020b
fixing flop calc
CRobeck Apr 27, 2022
bc9770a
fixing some variable names
CRobeck Apr 27, 2022
774d742
setting up problem size infrastructure correctly
CRobeck Apr 28, 2022
83700d6
fixing results array storage
CRobeck Apr 28, 2022
1bbd473
finish multi matrix support
CRobeck Apr 28, 2022
5591849
updating algorithm description
CRobeck Apr 29, 2022
4653c23
adding a few more guard rails for fma builtins. kernel still only imp…
CRobeck Apr 29, 2022
bd47d35
adding some ifdefs to call a seperate mfma kernel if hardware support…
CRobeck Apr 29, 2022
fde4b65
adding reference basic mat-mat kernel for comparison when no mfma ins…
May 3, 2022
47925bb
Merge branch 'develop' into test/mfma
May 3, 2022
7174382
adding problem size support to base hip kernel
May 3, 2022
f5fe861
adding problem size support to base hip kernel
CRobeck May 3, 2022
23866ad
Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…
CRobeck May 3, 2022
5b7bd12
Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…
CRobeck May 3, 2022
5ad1472
Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…
CRobeck May 3, 2022
67c02c4
Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…
CRobeck May 3, 2022
f3b4488
Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…
CRobeck May 3, 2022
799e17c
updating top level cmake file for new mfma test
CRobeck May 3, 2022
451dee2
updating cmake file to add mfma test to basic list
CRobeck May 3, 2022
c015117
adding mfma test to raja perf suite infrastructure
CRobeck May 3, 2022
6a6ccd7
adding mfma base and header files
CRobeck May 3, 2022
43c1c6e
adding mfma seq variant skeleton
CRobeck May 3, 2022
2e75981
adding mfma omp and omp offload variant skeleton structure
CRobeck May 3, 2022
5181219
adding mfma cuda variant skeleton
CRobeck May 3, 2022
ef451af
add inital set of HIP mfma varaint with builtin matrix core instructions
CRobeck May 3, 2022
7eab287
Merge branch 'test/mfma' of https://github.com/llnl/RAJAPerf into tes…
CRobeck May 3, 2022
0bd36f4
add mat_fused_mul_add cuda variant kernel
CRobeck May 5, 2022
3c906a0
filling in seq MAT_FUSED_MUL_ADD seq variant
CRobeck May 6, 2022
464380d
cleaning up MAT_FUSED_MUL_ADD macros a bit
CRobeck May 6, 2022
ee3c695
making MAT_FUSED_MUL_ADD cuda and hip variants more consistent
CRobeck May 6, 2022
24279eb
fixing bug in MAT_FUSED_MUL_ADD Cuda kernel
CRobeck May 6, 2022
75c7a07
making MAT_FUSED_MUL_ADD naming consistent
CRobeck May 6, 2022
75875b9
updating mat fused body and ading raja lam variant
CRobeck May 9, 2022
2a46e30
add data set up to mat_fused_lam
CRobeck May 9, 2022
a4328c8
moving MAT_FUSED_MUL_ADD array init into macro
CRobeck May 9, 2022
be2430f
fixing missing ;
CRobeck May 9, 2022
1ff6f93
finish filling in RAJA_HIP MAT_FUSED_MUL_ADD variant
CRobeck May 10, 2022
9dcd3d8
finish filling in RAJA_CUDA MAT_FUSED_MUL_ADD variant
CRobeck May 10, 2022
66d9756
filling in MAT_FUSED_MUL_ADD OMP variants
CRobeck May 10, 2022
73d745a
add MAT_FUSED_MUL_ADD_DATA_INIT
CRobeck May 10, 2022
9c63d3b
add MAT_FUSED_MUL_ADD_DATA_INIT
CRobeck May 10, 2022
ca7b810
fixing some unused vars
CRobeck May 16, 2022
02633ea
fix indexing issue
CRobeck May 16, 2022
f1fb1ab
fix incorrect order of raja_hip segments
CRobeck May 16, 2022
faf046f
fix incorrect order of raja_cuda segments
CRobeck May 16, 2022
73bd85e
update data types
CRobeck May 16, 2022
4f7dac3
fix indexing issue
CRobeck May 19, 2022
8e119d7
fix indexing issue
CRobeck May 19, 2022
cde8323
fix merge conflict
CRobeck May 19, 2022
e7ef2b4
fix one more indexing issue
CRobeck May 19, 2022
78e2590
fix indexing issue
CRobeck May 19, 2022
feadc64
adding tunings for builtin version of mfma hip kernel
CRobeck May 19, 2022
900671a
fix one more indexing issue
CRobeck May 20, 2022
162a855
clean up some comments and unused vars
CRobeck May 20, 2022
fbf2820
hopefully fixing final indexing issue
CRobeck May 20, 2022
e1ebcdd
removing some unused vars
CRobeck May 20, 2022
f02da32
cleaning up block/grid defs
CRobeck May 20, 2022
59f6766
cleaning up block/grid defs further
CRobeck May 20, 2022
89a8c00
fix spacing
CRobeck May 20, 2022
1156a66
fix spacing
CRobeck May 20, 2022
fd9d80a
spacing
CRobeck May 20, 2022
c7b35a9
spacing
CRobeck May 20, 2022
2d591b2
moving function to unnamed namespace
CRobeck May 20, 2022
30cfca4
Merge branch 'test/mfma' of https://github.com/LLNL/RAJAPerf into tes…
CRobeck May 20, 2022
3ca73ff
Merge branch 'develop' into test/mfma
CRobeck May 20, 2022
18622fb
fixing undefined var
CRobeck May 20, 2022
23b9277
fixing omp looping issue for gcc
CRobeck May 20, 2022
5f46c0e
fixing omp looping issue for gcc
CRobeck May 20, 2022
8f0e512
fixing one more omp looping issue for gcc
CRobeck May 20, 2022
8f07996
omp requires canonical loop forms
CRobeck May 20, 2022
c94586f
update default problem size
CRobeck May 21, 2022
43e277b
update spacing
CRobeck Jun 6, 2022
44c36b4
fix spacing
CRobeck Jun 6, 2022
8e3fbf6
formating clean up
CRobeck Jun 6, 2022
bd2d44c
move getHipArch into common header file as a static free function
CRobeck Jun 7, 2022
49084ef
update out loop for raja hip/cuda variant
CRobeck Jun 7, 2022
5169e6e
update unroll pragma to RAJAPERF_UNROLL
CRobeck Jun 7, 2022
a400d93
fix RAJAPERF_UNROLL def
CRobeck Jun 7, 2022
5bc407f
fix type in cuda variant
CRobeck Jun 7, 2022
3186df1
Update MAT_FUSED_MUL_ADD.hpp
CRobeck Jun 20, 2022
9eb3ce4
cleaning up some variable naming and create a N_Elem var
CRobeck Jul 14, 2022
f8c1f2c
add raja_hiperrchk header
CRobeck Jul 14, 2022
1347434
update lone N_Elem var
CRobeck Jul 14, 2022
cde5c95
update lambda forall
CRobeck Jul 14, 2022
a4ee50d
Merge branch 'test/mfma' of github.com:LLNL/RAJAPerf into test/mfma
CRobeck Jul 14, 2022
5b93123
roll back lambda_hip_forall naming
CRobeck Jul 15, 2022
39fbf15
git rid of some unused static function warning
CRobeck Jul 15, 2022
831e61c
make hipArch more consistent with device naming
CRobeck Jul 15, 2022
4e6a7b3
fixing var naming issue
CRobeck Jul 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,9 @@ blt_add_executable(
basic/INIT_VIEW1D_OFFSET.cpp
basic/INIT_VIEW1D_OFFSET-Seq.cpp
basic/INIT_VIEW1D_OFFSET-OMPTarget.cpp
basic/MAT_FUSED_MUL_ADD.cpp
basic/MAT_FUSED_MUL_ADD-Seq.cpp
basic/MAT_FUSED_MUL_ADD-OMPTarget.cpp
basic/MAT_MAT_SHARED.cpp
basic/MAT_MAT_SHARED-Seq.cpp
basic/MAT_MAT_SHARED-OMPTarget.cpp
Expand Down
6 changes: 6 additions & 0 deletions src/basic/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ blt_add_library(
INIT_VIEW1D_OFFSET-Cuda.cpp
INIT_VIEW1D_OFFSET-OMP.cpp
INIT_VIEW1D_OFFSET-OMPTarget.cpp
MAT_FUSED_MUL_ADD.cpp
MAT_FUSED_MUL_ADD-Seq.cpp
MAT_FUSED_MUL_ADD-Hip.cpp
MAT_FUSED_MUL_ADD-Cuda.cpp
MAT_FUSED_MUL_ADD-OMP.cpp
MAT_FUSED_MUL_ADD-OMPTarget.cpp
MAT_MAT_SHARED.cpp
MAT_MAT_SHARED-Seq.cpp
MAT_MAT_SHARED-Hip.cpp
Expand Down
155 changes: 155 additions & 0 deletions src/basic/MAT_FUSED_MUL_ADD-Cuda.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
// Copyright (c) 2017-20, Lawrence Livermore National Security, LLC
// and RAJA Performance Suite project contributors.
// See the RAJAPerf/LICENSE file for details.
//
// SPDX-License-Identifier: (BSD-3-Clause)
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//

#include "MAT_FUSED_MUL_ADD.hpp"

#include "RAJA/RAJA.hpp"

#if defined(RAJA_ENABLE_CUDA)

#include "common/CudaDataUtils.hpp"

#include <iostream>

namespace rajaperf {
namespace basic {

#define MAT_FUSED_MUL_ADD_DATA_SETUP_CUDA \
const Index_type N = m_N; \
allocAndInitCudaDeviceData(A, m_A, N); \
allocAndInitCudaDeviceData(B, m_B, N); \
allocAndInitCudaDeviceData(D, m_D, N);


#define MAT_FUSED_MUL_ADD_DATA_TEARDOWN_CUDA \
getCudaDeviceData(m_A, A, N); \
getCudaDeviceData(m_B, B, N); \
getCudaDeviceData(m_D, D, N); \
deallocCudaDeviceData(A); \
deallocCudaDeviceData(B); \
deallocCudaDeviceData(D);


template < Index_type block_size >
__launch_bounds__(block_size)
__global__ void mat_fused_mul_add(const Real_ptr A, const Real_ptr B, Real_ptr D,
Index_type N){
constexpr Index_type Ne = 16;
const Index_type N_Elem = N/(Ne*Ne);
for(Index_type ii = 0; ii != N_Elem; ++ii){
Index_type col = threadIdx.x + blockIdx.x * blockDim.x;
Index_type row = threadIdx.y + blockIdx.y * blockDim.y;
MAT_FUSED_MUL_ADD_BODY;
}
}
template < Index_type block_size, typename Lambda >
__launch_bounds__(block_size)
__global__ void mat_fused_lam(Index_type N, Lambda body)
{
constexpr Index_type Ne = 16;
const Index_type N_Elem = N/(Ne*Ne);
for(Index_type ii = 0; ii != N_Elem; ++ii){
Index_type col = threadIdx.x + blockIdx.x * blockDim.x;
Index_type row = threadIdx.y + blockIdx.y * blockDim.y;
body(ii,col,row);
}
}
template < size_t block_size >
void MAT_FUSED_MUL_ADD::runCudaVariantImpl(VariantID vid)
{
const Index_type run_reps = getRunReps();
const Index_type N = m_N;
constexpr Index_type Ne = 16;
const Index_type N_Elem = N/(Ne*Ne);

constexpr Index_type block_x = gpu_block_size::sqrt(block_size);
constexpr Index_type block_y = gpu_block_size::sqrt(block_size);
dim3 blockDim(block_x, block_y);
dim3 gridDim(static_cast<size_t>(RAJA_DIVIDE_CEILING_INT(Ne, block_size)),
static_cast<size_t>(RAJA_DIVIDE_CEILING_INT(Ne, block_size)),
static_cast<size_t>(1));

MAT_FUSED_MUL_ADD_DATA_SETUP;

MAT_FUSED_MUL_ADD_DATA_INIT;

if (vid == Base_CUDA) {

MAT_FUSED_MUL_ADD_DATA_SETUP_CUDA;

startTimer();
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {
mat_fused_mul_add<block_size><<<dim3(gridDim), dim3(blockDim)>>>(A, B, D, N);
}
stopTimer();

MAT_FUSED_MUL_ADD_DATA_TEARDOWN_CUDA;

} else if (vid == Lambda_CUDA) {

MAT_FUSED_MUL_ADD_DATA_SETUP_CUDA;

startTimer();
for (RepIndex_type irep = 0; irep < run_reps; ++irep) {

auto mat_fused_lamda =
[=] __device__ (Index_type ii, Index_type row, Index_type col) {
MAT_FUSED_MUL_ADD_BODY;
};
mat_fused_lam<block_size, decltype(mat_fused_lamda)>
<<<dim3(gridDim), dim3(blockDim)>>>(N, mat_fused_lamda);
}
stopTimer();

MAT_FUSED_MUL_ADD_DATA_TEARDOWN_CUDA;

} else if (vid == RAJA_CUDA) {

MAT_FUSED_MUL_ADD_DATA_SETUP_CUDA;

startTimer();
RAJA::RangeSegment row_range(0, Ne);
RAJA::RangeSegment col_range(0, Ne);
RAJA::RangeSegment ii_range(0, N_Elem);
using EXEC_POL =
RAJA::KernelPolicy<
RAJA::statement::CudaKernel<
RAJA::statement::For<2, RAJA::cuda_block_z_loop,
RAJA::statement::Tile<1, RAJA::tile_fixed<block_size>, RAJA::cuda_block_y_direct,
RAJA::statement::Tile<0, RAJA::tile_fixed<block_size>, RAJA::cuda_block_x_direct,
RAJA::statement::For<1, RAJA::cuda_thread_y_direct,
RAJA::statement::For<0, RAJA::cuda_thread_x_direct,
RAJA::statement::Lambda<0>
>
>
>
>
>
>
>;
RAJA::kernel<EXEC_POL>(RAJA::make_tuple(row_range, col_range, ii_range),
[=] RAJA_DEVICE (Index_type row, Index_type col, Index_type ii) {
MAT_FUSED_MUL_ADD_BODY;
});
stopTimer();

MAT_FUSED_MUL_ADD_DATA_TEARDOWN_CUDA;


} else {
getCout() << "\n MAT_FUSED_MUL_ADD : Unknown Cuda variant id = " << vid
<< std::endl;
}
}

RAJAPERF_GPU_BLOCK_SIZE_TUNING_DEFINE_BIOLERPLATE(MAT_FUSED_MUL_ADD, Cuda)

} // end namespace basic
} // end namespace rajaperf

#endif // RAJA_ENABLE_CUDA
Loading