tmBLAS (version 0.1)

Overview

Templated mixed-precision Basic Linear Algebra Subprograms (tmBLAS) is a reference BLAS implementation for mixed-precision computation implemented using C++ template based on BLAS++. For mixed-precision computation, tmBLAS decouples data-types of operator and operands in BLAS routines; each operand can take a different data-type, and it can perform operations with higher precision than the operands' data-type.

tmBLAS is a template library, which you can instantiate with any data type you want, but has already instantiated routines with half, float, double, quadruple, and octuple-precision input/output data and operations, which also support operations in one level higher precision than the data precision.

For decoupling data-types of operator and operands, some routines require an additional working array to store intermediate values. Users can select to prepare the working array prior to calling routine or to ask the routine for dynamic allocation and deallocation of them.

Instantiated data-types

Mono-precision and mixed-precision operations in one level higher precision than the data precision for the following data-types are instantiated.

half-precision as half (binary16)
single-precision as float (binary32)
double-precision as double (binary64)
quadruple-precision as dd_real (using QD library, 106-bit mantissa) or mpfr128 (using MPFR, 113-bit mantissa, equivalent to binary128)
octuple-precision as qd_real (using QD library, 212-bit mantissa) or mpfr256 (using MPFR, 237-bit mantissa, equivalent to binary256)

Routines

LEVEL 1

scal: x = a * x
axpy: y = a * x + y
dot: dot product // conjg is only used for complex data
nrm2: Euclidean norm // non-unit stride mixedpmul -> madd
iamax: index of max abs value // without mixed precision arithmetic

LEVEL 2

gemv: matrix vector multiply
symv: symmetric matrix vector multiply
hemv: Hermitian version of symv
trmv: triangular matrix vector multiply
trsv: solving triangular matrix problems
ger: rank 1 operation A := alpha * x * y' + A
geru: rank 1 operation A := alpha * x * y^T + A
syr: symmetric rank 1 operation A = alpha * x * x' + A
her: Hermitian version of syr with "real" alpha
syr2: symmetric rank 2 operation A = alpha * x * y' + alpha * y * x' + A
her2: Hermitian version of syr2

LEVEL 3

gemm: matrix matrix multiply
symm: symmetric matrix matrix multiply
hemm: Hermitian version of symm
syrk: symmetric rank-k update to a matrix
herk: Hermitian version of syrk
syr2k: symmetric rank-2k update to a matrix
her2k: Hermitian version of syr2k
trmm: triangular matrix matrix multiply
trsm: solving triangular matrix with multiple right hand sides

Extension

gemmt: synmmetric result of matrix matrix multiply
omatcopy: block copy of the matrix to the out place
csrgemv: sparse matrix vector multiply (CSR format)
csrsymv: sparse symmetric matrix vector multiply (CSR format)
csrhemv: sparse Hermitian matrix vetor multiply (CSR format)
csrgemm: sparse matrix matrix multiply (CSR format)
csrsymm: sparse symmetric matrix matrix multiply (CSR format)
csrhemm: sparse Hermitian matrix matrix multiply (CSR format)

Note: the mixture of real and complex is not supported.

Requirements

C++ compiler with C++11 (gcc, clang, icc)
BLAS++ (for test): https://github.com/icl-utk-edu/blaspp
BLAS and CBLAS (for test)
LAPACK and MKL (for test)
QD (for quadruple and octuple): https://www.davidhbailey.com/dhbsoftware/
MPFR (for quadruple and octuple)
half.hpp (half precision emulator): https://github.com/melowntech/half/
real.hpp (C++ wrapper for MPFR): http://chschneider.eu/programming/mpfr_real/

Downloads (tar ball)

https://www.r-ccs.riken.jp/labs/lpnctrt/projects/tmblas/

How to build

Modify "Makefile.generic.inc"
Place "half.hpp" and "real.hpp" in src/
$ make

Test

The algorithms written as templates were validated in the following way: we confirmed that the results were bit-level consistent with BLAS++, using floating-point inputs that were truncated to avoid rounding errors in double precision operations. These tests were performed because some of the routines changed the loop order from the BLAS++ code, which changed the order of the floating-point operations, resulting in a loss of reproducibility. For TRSV and TRSM, we use reduced data obtained by factorization in LAPACK as the same way of BLAS++ and truncate them as other routines. Testing for the accuracy of the mixed-precision routines are still under consideration.

test-cblas: blaspp's test, comparing with CBLAS (MKL) in single and double precision
test-blaspp: check bit-level consistency of results with blaspp's templated codes (included) in double, quadruple, and octuple precision using truncated floating-point input
test-self: tests for arithmetic operators and sparse routines

Publications

Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura, tmBLAS: a Mixed Precision BLAS by C++ Template, ISC High Performance (ISC 2023), research poster session, May, 2023.
Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura, Multiple and Mixed Precision BLAS with C++ Template, 5th R-CCS International Symposium, poster presentation, Feb. 6, 2023.

Contact

Atsushi Suzuki, R-CCS
Daichi Mukunoki, R-CCS
Toshiyuki Imamura, R-CCS
Large-scale Parallel Numerical Computing Technology Research Team (https://www.r-ccs.riken.jp/labs/lpnctrt/index.html), RIKEN Center for Computational Science

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
blaspp/include		blaspp/include
src		src
test-blaspp		test-blaspp
test-cblas		test-cblas
test-self		test-self
testsweeper		testsweeper
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
Makefile.generic.inc		Makefile.generic.inc
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tmBLAS (version 0.1)

Overview

Instantiated data-types

Routines

LEVEL 1

LEVEL 2

LEVEL 3

Extension

Requirements

Downloads (tar ball)

How to build

Test

Publications

Contact

About

Releases

Packages

Languages

License

RIKEN-RCCS/tmblas

Folders and files

Latest commit

History

Repository files navigation

tmBLAS (version 0.1)

Overview

Instantiated data-types

Routines

LEVEL 1

LEVEL 2

LEVEL 3

Extension

Requirements

Downloads (tar ball)

How to build

Test

Publications

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages