-
Notifications
You must be signed in to change notification settings - Fork 5
/
README
executable file
·164 lines (150 loc) · 4.58 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
#!/bin/bash
##############################################################
#
# HPL-AI Mixed-Precision Benchmark v2.3d -- April 23, 2021
#
##############################################################
#
# Check out <https://wu-kan.cn/_posts/2021-03-14-HPL-AI/> for
# the full document and the latest information.
#
##############################################################
#
# A quick start to build and run a few tests via spack:
#
# git clone https://github.com/SYSU-SCC/sysu-scc-spack-repo
# spack repo add --scope=site sysu-scc-spack-repo
# spack install sysu-scc-spack-repo.hpl-ai
# spack load hpl-ai
# cp `spack location --install-dir hpl-ai`/bin/HPL.dat HPL.dat
# mpirun -n 4 xhpl_ai
#
##############################################################
#
# build from source:
#
# First the following softwares are required on your system:
# C&C++ compiler, autoconf, autoconf-archive, automake, mpi,
# blas, blaspp
#
# You can easily install and load the requirements via spack
# <https://github.com/spack/spack/releases/tag/v0.16.1>.
#
# I just tested with the followings, while other versions or
# libraries might work as well:
spack unload -a
spack load [email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load [email protected]%[email protected]
spack load blaspp%[email protected]+openmp \
^openblas%[email protected] threads=openmp
# Then boostrap the configuration files by typing:
autoreconf -ivf
# The user is given the opportunity to compile the software
# with some specific compile options:
#
# CPPFLAGS=" -DHPLAI_T_AFLOAT=double "
#
# CPPFLAGS=" -DHPLAI_DEVICE_BLASPP_GEMM "
#
# CPPFLAGS=" -DHPLAI_DEVICE_BLASPP_TRSM "
#
# CPPFLAGS=" -DHPLAI_GEN_BLASPP_TRSM "
# (generic trsm had not been implemented in [email protected]
#
# CPPFLAGS=" -DHPL_COPY_L "
#
# CPPFLAGS=" -DHPL_CALL_CBLAS "
#
# CPPFLAGS=" -DHPL_CALL_VSIPL "
# (deperated
#
# CPPFLAGS=" -DHPL_DETAILED_TIMING "
# (deperated
#
# To configure the build and prepare for compilation run:
if true; then
./configure
else
# Note: to use device blaspp routines, you may need to
# enable device support of blaspp:
#
# spack load blaspp%[email protected]+openmp+cuda
#
# and then:
#
./configure \
LIBS=" -lcudart -lcublas " \
CPPFLAGS=" -DBLASPP_WITH_CUBLAS \
-DHPLAI_DEVICE_BLASPP_GEMM \
-DHPLAI_DEVICE_BLASPP_TRSM "
fi
# Then compile:
make -j
# The configuration file must be called HPL.dat.
#
# You can copy the configuration file from the original HPL,
# or create a configuration file anew.
#
# Most of the performance parameters can be tuned.
if true; then
cat >HPL.dat <<EOF
HPLinpack and HPL-AI benchmark input file
National Supercomputer Center in Guangzhou, Sun Yat-sen University
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
16384 131072 Ns
2 # of NBs
192 384 4096 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
2 Qs
16.0 threshold
1 # of panel fact
2 1 0 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
2 8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 1 0 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 2 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
1 SWAP (0=bin-exch,1=long,2=mix)
192 swapping threshold
1 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
16 memory alignment in HPLAI_T_AFLOAT (> 0)
EOF
else
cp testing/ptest/HPL.dat HPL.dat
fi
# Finally run and compare with the original hpl-2.3:
OMP_NUM_THREADS=2 $(which mpirun) -n 4 testing/xhpl
OMP_NUM_THREADS=2 $(which mpirun) -n 4 testing/xhpl_ai
# If you download HPL-AI via git, you can clean the builds by:
git clean -d -f -q
##############################################################
#
# The newest version of HPL-AI is available at
# <https://github.com/wu-kan/HPL-AI/releases>
#
##############################################################
#
# Bugs are tracked at
# <https://github.com/wu-kan/HPL-AI/issues>
#
##############################################################
#
# The souce code of HPL-AI is licensed under `COPYING`.
#
# The souce code of hpl-2.3 is licensed under `COPYRIGHT`.
#
##############################################################