Skip to content

RIKEN-RCCS/tmblas

Repository files navigation

tmBLAS (version 0.1)

Overview

Templated mixed-precision Basic Linear Algebra Subprograms (tmBLAS) is a reference BLAS implementation for mixed-precision computation implemented using C++ template based on BLAS++. For mixed-precision computation, tmBLAS decouples data-types of operator and operands in BLAS routines; each operand can take a different data-type, and it can perform operations with higher precision than the operands' data-type.

tmBLAS is a template library, which you can instantiate with any data type you want, but has already instantiated routines with half, float, double, quadruple, and octuple-precision input/output data and operations, which also support operations in one level higher precision than the data precision.

For decoupling data-types of operator and operands, some routines require an additional working array to store intermediate values. Users can select to prepare the working array prior to calling routine or to ask the routine for dynamic allocation and deallocation of them.

Instantiated data-types

Mono-precision and mixed-precision operations in one level higher precision than the data precision for the following data-types are instantiated.

  • half-precision as half (binary16)
  • single-precision as float (binary32)
  • double-precision as double (binary64)
  • quadruple-precision as dd_real (using QD library, 106-bit mantissa) or mpfr128 (using MPFR, 113-bit mantissa, equivalent to binary128)
  • octuple-precision as qd_real (using QD library, 212-bit mantissa) or mpfr256 (using MPFR, 237-bit mantissa, equivalent to binary256)

Routines

LEVEL 1

  • scal: x = a * x
  • axpy: y = a * x + y
  • dot: dot product // conjg is only used for complex data
  • nrm2: Euclidean norm // non-unit stride mixedpmul -> madd
  • iamax: index of max abs value // without mixed precision arithmetic

LEVEL 2

  • gemv: matrix vector multiply
  • symv: symmetric matrix vector multiply
  • hemv: Hermitian version of symv
  • trmv: triangular matrix vector multiply
  • trsv: solving triangular matrix problems
  • ger: rank 1 operation A := alpha * x * y' + A
  • geru: rank 1 operation A := alpha * x * y^T + A
  • syr: symmetric rank 1 operation A = alpha * x * x' + A
  • her: Hermitian version of syr with "real" alpha
  • syr2: symmetric rank 2 operation A = alpha * x * y' + alpha * y * x' + A
  • her2: Hermitian version of syr2

LEVEL 3

  • gemm: matrix matrix multiply
  • symm: symmetric matrix matrix multiply
  • hemm: Hermitian version of symm
  • syrk: symmetric rank-k update to a matrix
  • herk: Hermitian version of syrk
  • syr2k: symmetric rank-2k update to a matrix
  • her2k: Hermitian version of syr2k
  • trmm: triangular matrix matrix multiply
  • trsm: solving triangular matrix with multiple right hand sides

Extension

  • gemmt: synmmetric result of matrix matrix multiply
  • omatcopy: block copy of the matrix to the out place
  • csrgemv: sparse matrix vector multiply (CSR format)
  • csrsymv: sparse symmetric matrix vector multiply (CSR format)
  • csrhemv: sparse Hermitian matrix vetor multiply (CSR format)
  • csrgemm: sparse matrix matrix multiply (CSR format)
  • csrsymm: sparse symmetric matrix matrix multiply (CSR format)
  • csrhemm: sparse Hermitian matrix matrix multiply (CSR format)

Note: the mixture of real and complex is not supported.

Requirements

Downloads (tar ball)

How to build

  1. Modify "Makefile.generic.inc"
  2. Place "half.hpp" and "real.hpp" in src/
  3. $ make

Test

The algorithms written as templates were validated in the following way: we confirmed that the results were bit-level consistent with BLAS++, using floating-point inputs that were truncated to avoid rounding errors in double precision operations. These tests were performed because some of the routines changed the loop order from the BLAS++ code, which changed the order of the floating-point operations, resulting in a loss of reproducibility. For TRSV and TRSM, we use reduced data obtained by factorization in LAPACK as the same way of BLAS++ and truncate them as other routines. Testing for the accuracy of the mixed-precision routines are still under consideration.

  • test-cblas: blaspp's test, comparing with CBLAS (MKL) in single and double precision
  • test-blaspp: check bit-level consistency of results with blaspp's templated codes (included) in double, quadruple, and octuple precision using truncated floating-point input
  • test-self: tests for arithmetic operators and sparse routines

Publications

  • Atsushi Suzuki, Daichi Mukunoki, Toshiyuki Imamura, tmBLAS: a Mixed Precision BLAS by C++ Template, ISC High Performance (ISC 2023), research poster session, May, 2023.
  • Daichi Mukunoki, Atsushi Suzuki, Toshiyuki Imamura, Multiple and Mixed Precision BLAS with C++ Template, 5th R-CCS International Symposium, poster presentation, Feb. 6, 2023.

Contact

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published