Skip to content
This repository has been archived by the owner on Dec 10, 2023. It is now read-only.

Latest commit

 

History

History
103 lines (76 loc) · 3.74 KB

README.md

File metadata and controls

103 lines (76 loc) · 3.74 KB

DoubleDouble.jl

Note: This package is no longer maintained. I suggest using DoubleFloats.jl instead.

DoubleDouble.jl is a Julia package for performing extended-precision arithmetic using pairs of floating-point numbers. This is commonly known as "double-double" arithmetic, as the most common format is a pair of C-doubles (Float64 in Julia), although DoubleDouble.jl will actually work for any floating-point type. Its aim is to provide accurate results without the overhead of BigFloat types.

The core routines are based on the ideas and algorithms of Dekker (1971).

Interface

The main type is Double, with two floating-point fields: hi, storing the leading bits, and lo storing the remainder. hi is stored to full precision and rounded to nearest; hence, for any Double x, we have abs(x.lo) <= 0.5 * eps(x.hi). Although these types can be created directly, the usual interface is the Double function:

julia> using DoubleDouble

julia> x = Double(pi)
Double{Float64}(3.141592653589793,1.2246467991473532e-16)

julia> eps(x.hi)
4.440892098500626e-16

The other type defined is Single, which is simply a wrapper for a floating-point type, but whose results will be promoted to Double.

Examples

Exact products and remainders

By exploiting this property, we can compute exact products of floating point numbers.

julia> u, v = 64 * rand(), 64 * rand()
(15.59263373822506,39.07676672446341)

julia> w = Single(u) * Single(v)
Double{Float64}(609.3097112086186, -5.3107663829696295e-14)

Note that the product of two Singles is a Double: the hi element of this double is equal to the usual rounded product, and the lo element contains the exact difference between the exact value and the rounded.

This can be used to get an accurate remainder

julia> r = rem(w, 1.0)
Double{Float64}(0.309711208618584, 1.6507898617445858e-17)

julia> Float64(r)
0.309711208618584

This is much more accurate than taking ordinary products, and gives the same answer as using BigFloats:

julia> rem(u*v, 1.0)
0.3097112086186371

julia> Float64(rem(big(u) * big(v), 1.0))
0.309711208618584

However, since the DoubleDouble version is carried out using ordinary floating-point operations, it is of the order of 1000x faster than the BigFloat version.

Correct rounding with non-exact floats

If a number cannot be exactly represented by a floating-point number, it may be rounded incorrectly when used later, e.g.

julia> pi * 0.1
0.3141592653589793

julia> Float64(big(pi) * 0.1)
0.31415926535897937

We can also do this computation using Doubles (note that the promotion rules mean that only one needs to be specified):

julia> Float64(Double(pi) * 0.1)
0.31415926535897937

julia> Float64(pi * Single(0.1))
0.31415926535897937

Emulated FMA

The fused multiply-add (FMA) operation is an intrinsic floating-point operation that allows the evaluation of a * b + c, with rounding occurring only at the last step. This operation is unavailable on 32-bit x86 architecture, and available only on the most recent x86_64 chips, but can be emulated via double-double arithmetic:

julia> 0.1 * 0.1 + 0.1
0.11000000000000001

julia> Float64(big(0.1) * 0.1 + 0.1)
0.11

julia> Base.fma(a::Float64,b::Float64,c::Float64) = Float64(Single(a) * Single(b) + Single(c))
fma (generic function with 1 method)

julia> fma(0.1, 0.1, 0.1)
0.11