The exponential function and specifically the natural exponential function exp is used often in machine learning, simulated annealing and many calculations within the sciences. Part of the reason why the natural exponential is so useful is that it is quite unique in that it is its own derivative. Put differently, the slope of the function is equal to its height. A fast non-table-lookup approximation of the natural exponential function would be very useful to accelerate many applications of exp.

In 1999 Nicol N. Schraudolph published a note in Neural Computation 11, 853–862 (1999) on “A Fast, Compact Approximation of the Exponential Function”. The note presents an approximation to the exponential that exploits the format of the IEEE-754 double precision floating point format.

The Algorithm

IEEE-754 floating point numbers are represented in the form . is the exponent bias. The 64-bit double precision format has 52 mantissa (m) bits, 11 exponent (x) bits, one sign (s) bit, and exponents range from −1022 to 1023. The figure below shows the bit format.

The bits may also be manipulated by accessing the memory of a 64-bit float as two 32-bit integers. One can, for example, raise the exponent of a number by directly setting the x bits in the high 32-bits of the 64-bit float. So setting the high integer to sets the floating point value to for integer .

What is cool about the exponent in IEEE-754 is that the fractional part of an exponent may be allowed to flow over to the mantissa. The fractional part then naturally linearly ‘interpolates’ between and for example.

Now the final pieces of the solution are that for and the constant can be pre-factored into the integer mantissa scale. See in the code below that is used instead of . Note that the same method can similarly be used to approximate other exponential functions.

Some Code and Performance Results.

The code for the double precision floating point version of the algorithm is shown below. The ‘c’ value has been set to zero so that fast_exp(0.0) = 1.0. Schraudolph’s note has more details on how to set this value.

    //! Approximate exp by Schraudolph, 1999 - double precision floating point version.
    ALWAYS_INLINE double fast_exp(const double x) noexcept {
        // Based on Schraudolph 1999, A Fast, Compact Approximation of the Exponential Function.
        // - See the improved fast_exp_64 implementation below!
        // - Valid for x in approx range (-700, 700).
        union{double d_; int32_t i_[2];} uid; //This could be moved to the thread scope.
        //BBBD(sizeof(uid)!=8)
        uid.i_[0] = 0;
        uid.i_[1] = int32_t(double((1<<20) / log(2.0)) * x + double((1<<20) * 1023 - 0)); //c=0 for 1.0 at zero.
        return uid.d_;
    }    

The performance of the fast_exp function was measured on an Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz (turbo boosted to 3.30GHz). The normal double precision exponential from math.h takes around 6ns per call while the fast exponential takes around 0.5ns or approximately two clock cycles on this platform.

The images below compares two sigmoid curves at different scales. The green curve is generated with the fast exponential. As explained in the paper, the global fit is reasonably good. The implicit linear interpolation of the algorithm is evident when one looks closer while the staircase effect is evident when looking at the sixth decimal scale.

It would be interesting to instead of 32-bit integers use a 64-bit integer so that all the mantissa bits may be used for the fractional exponent. This should reduce the staircase effect and 64-bit integers are quite fast on modern CPUs.

[Edit: I tried the above and it works pretty well. The below adaptation uses a 64-bit integer. The performance and accuracy is equivalent and the staircase effect is no longer present :-) ]

    //! Approximate exp adapted from Schraudolph, 1999 - double precision floating point version.
    ALWAYS_INLINE double fast_exp_64(const double x) noexcept {
        // Based on Schraudolph 1999, A Fast, Compact Approximation of the Exponential Function.
        // - Adapted to use 64-bit integer; reduces staircase effect.
        // - Valid for x in approx range (-700, 700).
        union{double d_; int64_t i_;} uid; //This could be moved to the thread scope.
        //BBBD(sizeof(uid)!=8)
        uid.i_ = int64_t(double((int64_t(1) << 52) / log(2.0)) * x + double((int64_t(1) << 52) * 1023 - 0)); //c=0 for 1.0 at zero.
        return uid.d_;
    }

Summary

Schraudolph’s note presents a detailed analysis of the accuracy of the approximation and is quite an interesting read. In summary, the accuracy is reasonable for most applications and my tests show that it is about 10x quicker than the standard C++ math library’s version.

Given the reduced accuracy of the approximation it is likely a good idea to test the impact of the approximation in your solution. The basic recommendation is to start with exp and then to apply fast_exp where it makes sense. The full source with execution timing is available in my Bits-O-Cpp GitHub repo.