Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multihead arch cutlass int8 qkv average #2035

Draft
wants to merge 89 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
aa865a2
WIP attention body changes
ankan-ban Mar 22, 2022
d6728dc
more updates to match training code
ankan-ban Mar 23, 2022
49a8143
fix few crashes
ankan-ban Mar 23, 2022
7d96c63
use the right encoder block for body!
ankan-ban Mar 23, 2022
f5fe737
fix output of AttentionBody
ankan-ban Mar 23, 2022
72cbc13
move pos encoding table to common header file
ankan-ban Mar 25, 2022
af2db3d
add hack to match training side bug
ankan-ban Mar 28, 2022
3c2639c
fix build error
ankan-ban Mar 28, 2022
410c4f9
remove hack for plies ply plane training side bug
ankan-ban Mar 29, 2022
b6c8e43
Fix attention body/head size
ankan-ban Sep 12, 2022
e0f94f0
Merge branch 'master' into attentionbody-cuda
almaudoh-1 Dec 14, 2022
844186b
Add input gating kernel.
almaudoh-1 Dec 15, 2022
780d47f
Completed input gating
almaudoh-1 Dec 15, 2022
f25f0bd
Add input gating, smolgen and sqrrelu.
almaudoh-1 Dec 24, 2022
e1cd35e
Fixed unstable softmax implementation.
almaudoh-1 Dec 24, 2022
b5d6930
Remove debug log
almaudoh-1 Dec 24, 2022
e9cda40
Tilp's fix for smolgen gemms.
almaudoh-1 Dec 25, 2022
d7a8adf
Merge branch 'master' into attentionbody-cuda
almaudoh-1 Dec 25, 2022
e98ef8d
Remove debug code
almaudoh-1 Dec 26, 2022
b5afc19
Add tilps perf improvement on existing attention qkv matmuls.
almaudoh-1 Dec 27, 2022
9f9304b
Fix cudnn build failures.
almaudoh-1 Dec 27, 2022
edbd8a8
Add tilps perf patch for fused smolgen weights add / softmax
almaudoh-1 Dec 28, 2022
f1f485a
Merge branch 'master' into attentionbody-cuda
almaudoh-1 Dec 28, 2022
d207abe
Fix errors in non-attentionbody nets.
almaudoh-1 Jan 8, 2023
70b0521
Add multistream support. Allow new attentionbody nets.
almaudoh-1 Jan 9, 2023
ba83ef8
Merge pull request #28 from almaudoh/attentionbody-cuda
ankan-ban Jan 14, 2023
32cf3a4
Merge branch 'LeelaChessZero:master' into attention-opts
ankan-ban Mar 4, 2023
eb184f4
add 8 elements per thread layernorm
ankan-ban Mar 4, 2023
6e0161a
Try fused MHA from cutlass
ankan-ban Mar 4, 2023
c62cf2d
first (somewhat) working INT8 attempt
ankan-ban Apr 24, 2023
c2e6349
clean up some unused/test code
ankan-ban Apr 24, 2023
a2eb7d2
try int8 for more layers
ankan-ban Apr 25, 2023
451cc30
per-column quantization for outputs
ankan-ban May 25, 2023
b9e6711
integrate changes from master for bigger layer norms
ankan-ban May 25, 2023
787090b
Common changes for new multiple head architecture.
almaudoh Sep 7, 2023
ef5f8e3
Merge branch 'master' into multihead-arch-common
almaudoh Sep 7, 2023
9cbc072
Fix typo bug.
almaudoh Sep 7, 2023
bf5d82a
Fix circleci failures.
almaudoh Sep 8, 2023
4157c4f
Remove no-single-branch
almaudoh Sep 8, 2023
9d7b0db
Implement new input encoding architecture.
almaudoh-1 Sep 12, 2023
f17621d
Merge remote-tracking branch 'upstream/master' into multihead-arch-cuda
almaudoh-1 Sep 12, 2023
9746ea4
Fix layer norms and add new multiple heads.
almaudoh-1 Sep 13, 2023
f720573
Add new method to get value error from neural net.
almaudoh-1 Sep 22, 2023
2cdb015
Merge branch 'multihead-arch-common' into multihead-arch-cuda
almaudoh-1 Sep 22, 2023
916b3fa
Refactor value head into separate class. Add short term value error h…
almaudoh-1 Sep 25, 2023
ad5390b
Add error to NodeToProcess.
almaudoh-1 Sep 25, 2023
5c7421a
Merge branch 'multihead-arch-common' into multihead-arch-cuda
almaudoh-1 Sep 25, 2023
bd34a6f
Fix bug in value head bias add.
almaudoh-1 Sep 29, 2023
bfbe14c
Support for multihead architecture in protobuf.
almaudoh-1 Oct 5, 2023
4cdfab3
Add backward compatibility adjustments to old nets to work in multihe…
almaudoh Oct 8, 2023
135c151
Merge branch 'multihead-arch-common' into multihead-arch-cuda
almaudoh-1 Oct 8, 2023
3c0ded9
Add backward compatibility.
almaudoh-1 Oct 8, 2023
c71d885
Merge branch 'attention-opts' into multihead-arch-cuda-cutlass-fmha
almaudoh-1 Jan 6, 2024
4333f15
Fix conflict resolution artifacts.
almaudoh-1 Jan 6, 2024
288a337
Fix omissions.
almaudoh-1 Jan 6, 2024
742567e
Merge branch 'master' into multihead-arch-cuda
almaudoh-1 Mar 1, 2024
316b85c
Remove short-term error value accessor.
almaudoh-1 Mar 2, 2024
284eb27
Remove old artifacts from network_legacy
almaudoh-1 Mar 2, 2024
d446d12
Fix backend to use MultiHeadWeights struct.
almaudoh-1 Mar 2, 2024
c38b568
File formatting.
almaudoh-1 Mar 2, 2024
eb26621
Fix layernorm epsilon for older attentionbody nets.
almaudoh-1 Mar 2, 2024
8a2009d
Minor comment fixes.
almaudoh-1 Mar 2, 2024
4ffee57
Change 'optimistic_st' key to 'optimistic' in policy head map.
almaudoh-1 Mar 3, 2024
ee81336
Switch cudnn to cuda for multiheadformat.
almaudoh-1 Mar 3, 2024
b14ad8b
Merge branch 'multihead-arch-cuda' into multihead-arch-cuda-cutlass-fmha
almaudoh-1 Mar 6, 2024
9a7bba2
Merge remote-tracking branch 'upstream/master' into multihead-arch-cu…
almaudoh-1 Mar 6, 2024
1275584
Merge remote-tracking branch 'upstream/master' into multihead-arch-cu…
almaudoh-1 Mar 7, 2024
e364f2b
Merge remote-tracking branch 'ankan/int8-expts' into multihead-cutlas…
almaudoh-1 Mar 9, 2024
58ae0ae
Fix buffer naming, fix source to build.
almaudoh-1 Mar 11, 2024
d0a8536
Remove value error head inference.
almaudoh-1 Mar 7, 2024
0a70093
Merge branch 'master' into multihead-arch-cutlass-int8
almaudoh Mar 23, 2024
405046a
Fix conflict resolution artefacts.
almaudoh Mar 24, 2024
451fbf3
WIP. Reworked int8 to use scaling factors stored in weights. Added ke…
almaudoh Apr 22, 2024
2682eef
Add additional scaling factor for matmul accumulator. Rename variable…
almaudoh May 1, 2024
0f61c2f
Remove debug outputs.
almaudoh May 3, 2024
71ec58b
Add quantization to embedding layer FFN. Add weights clipping for nor…
almaudoh May 5, 2024
01c24e5
Update gemms to provide int8->fp32 for correct results. Remove old an…
almaudoh May 8, 2024
09fcd56
Change gemms to int32 - wip
almaudoh May 12, 2024
868b9ac
Fix bugs in int8 implementation - ith extra (ssuper) pair of eyes fro…
almaudoh May 13, 2024
30bb640
Fix bugs in int8 implementation - with extra (super) pair of eyes fro…
almaudoh May 13, 2024
3201c0b
Merge branch 'multihead-arch-cutlass-int8' of https://github.com/alma…
almaudoh May 13, 2024
28912e6
Merge remote-tracking branch 'upstream/master' into multihead-arch-cu…
almaudoh May 13, 2024
e82761a
Fix promotion to double for clipMatrix.
almaudoh May 13, 2024
4e3a650
Fix scratch size and change epiloge compute to int32.
almaudoh May 13, 2024
dddc978
Fuse FFN2 quantize to FFN2 dequantize+bias-add. 2% speedup.
almaudoh May 13, 2024
adad545
Implement int8 in all gemms except QKV. Fuse dequant-bias + add+quant…
almaudoh May 16, 2024
870ebba
Remove epsilon from quantize
almaudoh May 16, 2024
80cc673
Split QKV to allow use of int8->int8 cutlass matmul.
almaudoh Jun 2, 2024
7fceeeb
Implement fused QKV with averaged scaling factors.
almaudoh Jun 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions build.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
setlocal

rem 1. Set the following for the options you want to build.
set CUDNN=true
set CUDNN=false
set CUDA=true
set DX12=false
set OPENCL=false
Expand All @@ -11,10 +11,12 @@ set DNNL=false
set OPENBLAS=false
set EIGEN=false
set TEST=false
set CUTLASS=true

rem 2. Edit the paths for the build dependencies.
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0
set CUDNN_PATH=%CUDA_PATH%
set CUTLASS_INCLUDE_PATH=C:\dev\cutlass-2.11.0\include
set OPENBLAS_PATH=C:\OpenBLAS
set MKL_PATH=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries\windows\mkl
set DNNL_PATH=C:\dnnl_win_1.1.1_cpu_vcomp
Expand All @@ -34,13 +36,13 @@ if exist "C:\Program Files\Microsoft Visual Studio\2022" (
where /q cl
if errorlevel 1 call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
set backend=vs2022
) else if exist "C:\Program Files (x86)\Microsoft Visual Studio\2019" (
) else if exist "D:\Program Files (x86)\Microsoft Visual Studio\2019" (
where /q cl
if errorlevel 1 call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
if errorlevel 1 call "D:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
set backend=vs2019
) else (
where /q cl
if errorlevel 1 call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
if errorlevel 1 call "D:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvarsall.bat" amd64
set backend=vs2017
)

Expand All @@ -63,6 +65,7 @@ meson build --backend %backend% --buildtype release -Ddx=%DX12% -Dcudnn=%CUDNN%
-Dmkl_include="%MKL_PATH%\include" -Dmkl_libdirs="%MKL_PATH%\lib\intel64" -Ddnnl_dir="%DNNL_PATH%" ^
-Dopencl_libdirs="%OPENCL_LIB_PATH%" -Dopencl_include="%OPENCL_INCLUDE_PATH%" ^
-Dopenblas_include="%OPENBLAS_PATH%\include" -Dopenblas_libdirs="%OPENBLAS_PATH%\lib" ^
-Dcutlass_include="%CUTLASS_INCLUDE_PATH%" -Dcutlass="%CUTLASS%" ^
-Ddefault_library=static

if errorlevel 1 exit /b
Expand Down
2 changes: 1 addition & 1 deletion libs/lczero-common
Submodule lczero-common updated 1 files
+21 −0 proto/net.proto
14 changes: 14 additions & 0 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,11 @@ if get_option('build_backends')
cuda_arguments += ['-ccbin=' + get_option('nvcc_ccbin')]
endif
cuda_cc = get_option('cc_cuda') # Unfortunately option cuda_cc is reserved.
if get_option('cutlass')
add_project_arguments('-DUSE_CUTLASS', language : 'cpp')
cuda_arguments += ['-DUSE_CUTLASS']
cuda_arguments += ['-I', get_option('cutlass_include')]
endif
nvcc_extra_args = []
if cuda_cc != ''
nvcc_extra_args = ['-arch=compute_' + cuda_cc, '-code=sm_' + cuda_cc]
Expand Down Expand Up @@ -522,6 +527,15 @@ if get_option('build_backends')
depend_files: 'src/neural/cuda/winograd_helper.inc',
command : [nvcc, nvcc_extra_args, cuda_arguments]
)

if get_option('cutlass')
nvcc_cutlass_args = ['-arch=compute_80', '-code=sm_80']
files += custom_target('cuda cutlass code',
input : 'src/neural/cuda/cutlass_kernels.cu',
output : outputname,
command : [nvcc, nvcc_cutlass_args, cuda_arguments]
)
endif

files += custom_target('cuda fp16 code',
input : 'src/neural/cuda/fp16_kernels.cu',
Expand Down
10 changes: 10 additions & 0 deletions meson_options.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ option('cudnn_include',
value: ['/opt/cuda/include/', '/usr/local/cuda/include/', '/usr/lib/cuda/include/'],
description: 'Paths to cudnn include directory')

option('cutlass_include',
type: 'string',
value: '/usr',
description: 'Paths to cutlass include directory')

option('build_backends',
type: 'boolean',
value: true,
Expand Down Expand Up @@ -78,6 +83,11 @@ option('plain_cuda',
value: true,
description: 'Enable CUDA backend')

option('cutlass',
type: 'boolean',
value: false,
description: 'Enable cutlass lib for cuda backend. Only supports Ampere+ right now')

option('opencl',
type: 'boolean',
value: true,
Expand Down
Loading