Open Source Go conference

Mar 18th - 19th 2019

Seminole Hard Rock Hotel and Casino
Hollywood, FL

Gustavo's IEEE-754 Brain Teaser

Author image

William Kennedy

Back in June, Gustavo Niemeyer posted the following question on his Labix.org blog:


Assume uf is an unsigned integer with 64 bits that holds the IEEE-754 representation for a binary floating point number of that size.

How can you tell if uf represents an integer number?

I can’t talk for you, but I write business applications. I just don’t have the background to quickly knock out an answer for a question like this. Fortunately Gustavo posted the logic for the answer. I thought it would be fun to better understand the question and break down his answer. I am going to work with a 32 bit number just to make everything easier.

How does the IEEE-754 standard store a floating point number in a binary format?  To start I found these two pages:

http://steve.hollasch.net/cgindex/coding/ieeefloat.html
http://class.ece.iastate.edu/arun/CprE281_F05/ieee754/ie5.html

The IEEE-754 specification represents a floating point number in base 2 scientific notation using a special binary format. If you don’t know what I mean by base 2 then look at my post on Understanding Type In Go (http://www.goinggo.net/2013/07/understanding-type-in-go.html).

Scientific notation is an efficient way of writing very large or small numbers. It works by using a decimal format with a multiplier. Here are some examples:

Base 10 NumberScientific NotationCalculationCoefficientBaseExponentMantissa
7007e+27 * 10271020
4,900,000,0004.9e+94.9 * 1094.9109.9
5362.635.36263e+35.36263 * 1035.36263103.36263
-0.003453.45e-33.45 * 10-33.4510-3.45
0.0851.36e-41.36 * 2-41.362-4.36


In normal scientific notation form there is always just one digit on the left side of the decimal point. For base 10 numbers that digit must be between 1 through 9 and for base 2 numbers that digit can only be 1.

The entire number in scientific notation is called the Coefficient. The Mantissa is all the numbers to the right of the decimal point. These terms are important so take the time to study and understand the chart above.

How we move the decimal point to that first position determines the value of the Exponent. If we have to move the decimal point to the left, the Exponent is a positive number, to the right, it is a negative number. Look at the chart above and see the Exponent value for each example.

The Base and the Exponent work together in the notation. The exponent determines the "Power Of" calculation we need to perform on the base. In the first example the number 7 is multiplied by 10 (The Base) to the power of 2 (The Exponent) to get back to the original base 10 number 700. We moved the decimal point to the left two places to convert 700 to 7.00, which made the Exponent +2 and created the notation of 7e+2.

The IEEE-754 standard does not store base 10 scientific notation numbers but base 2 scientific notation numbers. The last example in the chart above represents the base 10 number 0.085 in base 2 scientific notation. Let’s learn how that notation is calculated.

Base 10 NumberScientific NotationCalculationCoefficientBaseExponentMantissa
0.0851.36e-41.36 * 2-41.362-4.36

We need to divide the base 10 number (0.085) by some power of two so we get a 1 + Fraction value. What do I mean by a 1 + Fraction value? We need a number that looks like the Coefficient in the example, 1 + .36. The IEEE-754 standard requires that we have a "1." in the Coefficient. This allows us to only have to store the Mantissa and give us an extra bit of precision.

If we use brute force you will see when we finally get the 1 + Fraction value for 0.085:

0.085 / 2-1 = 0.17
0.085 / 2-2 = 0.34
0.085 / 2-3 = 0.68
0.085 / 2-4 = 1.36   ** We found the 1 + Fraction

An exponent of -4 gives us the 1 + Fraction value we need. Now we have everything we need to store the base 10 number 0.085 in IEEE-754 format.

Let’s look at how the bits are laid out in the IEEE-754 format.

PrecisionSignExponentFraction BitsBias
Single (32 Bits)1 [31]8 [30-23]23 [22-00]127
Double (64 Bits)1 [63]11 [62-52]52 [51-00]1023

The bits are broken into three parts. There is a bit reserved for the sign, bits for the exponent and bits that are called fraction bits. The fractions bits are where we store the mantissa as a binary fraction.

If we store the value of 0.085 using Single Precision (a 32 bit number), the bit pattern in IEEE-754 would look like this:

SignExponent (123)Fraction Bits (.36)
00111 1011010 1110 0001 0100 0111 1011

The Sign bit, the leftmost bit, determines if the number is positive or negative. If the bit is set to 1 then the number is negative else it is positive.

The next 8 bits represent the Exponent. In our example, the base 10 number 0.085 is converted to 1.36 * 2-4 using base 2 scientific notation. Therefore the value of the exponent is -4. In order to be able to represent negative numbers, there is a Bias value. The Bias value for our 32 bit representation is 127. To represent the number -4, we need to find the number, that when subtract against the Bias, gives us -4. In our case that number is 123. If you look at the bit pattern for the Exponent you will see it represents the number 123 in binary.

The remaining 23 bits are the Fraction bits. To calculate the bit pattern for the fraction bits, we need to calculate and sum binary fractions until we get the value of the Mantissa, or a value that is as close as possible. Remember, we only need to store the Mantissa because we always assume that the "1." value exists.

To understand how binary fractions are calculated, look at the following chart. Each bit position from left to right represents a fractional value:

BinaryFractionDecimalPower
0.1120.52-1
0.01140.252-2
0.001180.1252-3

We need to set the correct number of fraction bits that add up to or gets us close enough to the mantissa. This is why we can sometimes lose precision.

011110 0001 0100 0111 1011 = (0.36)
BitValueFractionDecimalTotal
24140.250.25
4161160.06250.3125
5321320.031250.34375
6641640.0156250.359375
112048120480.000488281250.35986328125
138192181920.00012207031250.3599853515625
1713107211310720.000007629394530.35999298095703
1826214412621440.000003814697270.3599967956543
1952428815242880.000001907348630.35999870300293
201048576110485760.000000953674320.35999965667725
224194304141943040.000000238418580.35999989509583
238388608183886080.000000119209290.36000001430512

You can see that setting these 12 bits get us to the value of 0.36 plus some extra fractions.

Let’s sum up what we now know about the IEEE-754 format:

1. Any base 10 number to be stored is converted to base 2 scientific notation.
2. The base 2 scientific notation value we use must follow the 1 + Fraction format.
3. There are three distinct sections in the format.
4. The Sign bit determines if the number is positive or negative.
5. The Exponent bits represent a number that needs to be subtracted against the Bias.
6. The Fraction bits represent the Mantissa using binary fraction summation.

Let’s prove that our analysis is correct about the IEEE-754 format. We should be able to store the number 0.85 and display bit patterns and values for each section that match everything we have seen.

The following code stores the number 0.085 and displays the IEEE-754 binary representation:

package main

import (
    "fmt"
    "math"
)

func main() {
    var number float32 = 0.085

    fmt.Printf("Starting Number: %f\n\n", number)

    // Float32bits returns the IEEE 754 binary representation
    bits := math.Float32bits(number)

    binary := fmt.Sprintf("%.32b", bits)

    fmt.Printf("Bit Pattern: %s | %s %s | %s %s %s %s %s %s\n\n",
        binary[0:1],
        binary[1:5], binary[5:9],
        binary[9:12], binary[12:16], binary[16:20],
        binary[20:24], binary[24:28], binary[28:32])

    bias := 127
    sign := bits & (1 << 31)
    exponentRaw := int(bits >> 23)
    exponent := exponentRaw - bias

    var mantissa float64
    for index, bit := range binary[9:32] {
        if bit == 49 {
            position := index + 1
            bitValue := math.Pow(2, float64(position))
            fractional := 1 / bitValue

            mantissa = mantissa + fractional
        }
    }

    value := (1 + mantissa) * math.Pow(2, float64(exponent))

    fmt.Printf("Sign: %d Exponent: %d (%d) Mantissa: %f Value: %f\n\n",
        sign,
        exponentRaw,
        exponent,
        mantissa,
        value)
}

When we run the program we get the following output:

Starting Number: 0.085000

Bit Pattern: 0 | 0111 1011 | 010 1110 0001 0100 0111 1011

Sign: 0 Exponent: 123 (-4) Mantissa: 0.360000 Value: 0.085000

If you compare the displayed bit pattern with our example above, you will see that it matches. Everything we learned about IEEE-754 is true.

Now we should be able to answer Gustavo’s question. How can we tell if the value being stored is an integer? Here is a function, thanks to Gustavo’s code, that tests if the IEEE-754 stored value is an integer:

func IsInt(bits uint32, bias int) {
    exponent := int(bits >> 23) - bias - 23
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
    intTest := (coefficient & (1 << uint32(-exponent) - 1))

    fmt.Printf("\nExponent: %d Coefficient: %d IntTest: %d\n",
        exponent,
        coefficient,
        intTest)

    if exponent < -23 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    if exponent < 0 && intTest != 0 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    fmt.Printf("INTEGER\n")
}

So how does this function work?

Let’s start with the first condition which tests if the Exponent is less than -23. If we use the number 1 as our test number, the exponent will be 127 which is the same as the Bias. This means when we subtract the Exponent against the Bias we will get zero.

Starting Number: 1.000000

Bit Pattern: 0 | 0111 1111 | 000 0000 0000 0000 0000 0000

Sign: 0 Exponent: 127 (0) Mantissa: 0.000000 Value: 1.000000


Exponent: -23 Coefficient: 8388608 IntTest: 0
INTEGER

The test function adds an extra subtraction of 23, which represents the starting bit position for the Exponent in the IEEE-754 format. That is why you see -23 for the Exponent value coming from the test function.

PrecisionSignExponentFraction BitsBias
Single (32 Bits)1 [31]8 [30-23]23 [22-00]127

This subtraction is required to help with the second test. So any value that is less than -23 must be less than one (1) and therefore not an integer.

To understand how the second test works, let’s use an integer value. This time we will set the number to 234523 in the code and run the program again.

Starting Number: 234523.000000

Bit Pattern: 0 | 1001 0000 | 110 0101 0000 0110 1100 0000

Sign: 0 Exponent: 144 (17) Mantissa: 0.789268 Value: 234523.000000


Exponent: -6 Coefficient: 15009472 IntTest: 0
INTEGER

The second test looks for two conditions to identify if the number is not an integer. This requires the use of bitwise mathematics. Let’s look at the math we are performing in the function:

    exponent := int(bits >> 23) - bias - 23
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
    intTest := coefficient & ((1 << uint32(-exponent)) - 1)

The coefficient calculation is adding the 1 + to the Mantissa so we have the base 2 Coefficient value.

When we look at the first part of the coefficient calculation we see the following bit patterns:

coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)

Bits:                   01001000011001010000011011000000
(1 << 23) - 1:          00000000011111111111111111111111
bits & ((1 << 23) - 1): 00000000011001010000011011000000

The first part of the coefficient calculation removes the bits for the Sign and Exponent from the entire IEEE-754 bit pattern.

The second part of the coefficient calculation adds the "1 +" into the binary bit pattern:

coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)

bits & ((1 << 23) - 1): 00000000011001010000011011000000
(1 << 23):              00000000100000000000000000000000
coefficient:            00000000111001010000011011000000

Now that the coefficient bit pattern is set, we can calculate the intTest value:

exponent := int(bits >> 23) - bias - 23
intTest := (coefficient & ((1 << uint32(-exponent)) - 1))

exponent:                     (144 - 127 - 23) = -6
1 << uint32(-exponent):       000000
(1 << uint32(-exponent)) - 1: 111111

coefficient:                 00000000111001010000011011000000
1 << uint32(-exponent)) - 1: 00000000000000000000000000111111
intTest:                     00000000000000000000000000000000

The value of the exponent we calculate in the test function is used to determine the number of bits we will compare against the Coefficient. In this case the exponent value is -6. That is calculated by subtracting the stored Exponent value (144) against the Bias (127) and then against the starting bit position for the Exponent (23). This gives us a bit pattern of 6 ones (1’s). The final operation takes those 6 bits and AND’s them against the rightmost 6 bits of the Coefficient to get the intTest value.

The second test is looking for an exponent value that is less than zero (0) and an intTest value that is NOT zero (0). This would indicate the number being stored is not an integer. In our example with 234523, the Exponent is less than zero (0), but the value of intTest is zero (0). We have an integer.

I have included the sample code in the Go playground so you can play with it.

http://play.golang.org/p/3xraud43pi

If it wasn’t for Gustavo’s code I could never have identified the solution. Here is a link to his solution:

http://bazaar.launchpad.net/~niemeyer/strepr/trunk/view/6/strepr.go#L160

Here is a copy of the code that you can copy and paste:

package main

import (
    "fmt"
    "math"
)

func main() {
    var number float32 = 234523

    fmt.Printf("Starting Number: %f\n\n", number)

    // Float32bits returns the IEEE 754 binary representation
    bits := math.Float32bits(number)

    binary := fmt.Sprintf("%.32b", bits)

    fmt.Printf("Bit Pattern: %s | %s %s | %s %s %s %s %s %s\n\n",
        binary[0:1],
        binary[1:5], binary[5:9],
        binary[9:12], binary[12:16], binary[16:20],
        binary[20:24], binary[24:28], binary[28:32])

    bias := 127
    sign := bits & (1 << 31)
    exponentRaw := int(bits >> 23)
    exponent := exponentRaw - bias

    var mantissa float64
    for index, bit := range binary[9:32] {
        if bit == 49 {
            position := index + 1
            bitValue := math.Pow(2, float64(position))
            fractional := 1 / bitValue

            mantissa = mantissa + fractional
        }
    }

    value := (1 + mantissa) * math.Pow(2, float64(exponent))

    fmt.Printf("Sign: %d Exponent: %d (%d) Mantissa: %f Value: %f\n\n",
        sign,
        exponentRaw,
        exponent,
        mantissa,
        value)

    IsInt(bits, bias)
}

func IsInt(bits uint32, bias int) {
    exponent := int(bits>>23) - bias - 23
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
    intTest := coefficient & ((1 << uint32(-exponent)) - 1)

    ShowBits(bits, bias, exponent)

    fmt.Printf("\nExp: %d Frac: %d IntTest: %d\n",
        exponent,
        coefficient,
        intTest)

    if exponent < -23 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    if exponent < 0 && intTest != 0 {
        fmt.Printf("NOT INTEGER\n")
        return
    }

    fmt.Printf("INTEGER\n")
}

func ShowBits(bits uint32, bias int, exponent int) {
    value := (1 << 23) - 1
    value2 := (bits & ((1 << 23) - 1))
    value3 := (1 << 23)
    coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)

    fmt.Printf("Bits:\t\t\t%.32b\n", bits)
    fmt.Printf("(1 << 23) - 1:\t\t%.32b\n", value)
    fmt.Printf("bits & ((1 << 23) - 1):\t\t%.32b\n\n", value2)

    fmt.Printf("bits & ((1 << 23) - 1):\t\t%.32b\n", value2)
    fmt.Printf("(1 << 23):\t\t\t%.32b\n", value3)
    fmt.Printf("coefficient:\t\t\t%.32b\n\n", coefficient)

    value5 := 1 << uint32(-exponent)
    value6 := (1 << uint32(-exponent)) - 1
    inTest := (coefficient & ((1 << uint32(-exponent)) - 1))

    fmt.Printf("1 << uint32(-exponent):\t\t%.32b\n", value5)
    fmt.Printf("(1 << uint32(-exponent)) - 1:\t%.32b\n\n", value6)

    fmt.Printf("coefficient:\t\t\t%.32b\n", coefficient)
    fmt.Printf("(1 << uint32(-exponent)) - 1:\t%.32b\n", value6)
    fmt.Printf("intTest:\t\t\t%.32b\n", inTest)
}

I want to thank Gustavo for posting the question and giving me something to really fight through to understand.

Go Training

We have taught Go to thousands of developers all around the world since 2014. There is no other company that has been doing it longer and our material has proven to help jump start developers 6 to 12 months ahead of their knowledge of Go. We know what knowledge developers need in order to be productive and efficient when writing software in Go.

Our classes are perfect for both experienced and beginning engineers. We start every class from the beginning and get very detailed about the internals, mechanics, specification, guidelines, best practices and design philosophies. We cover a lot about "if performance matters" with a focus on mechanical sympathy, data oriented design, decoupling and writing production software.


Capital One
Cisco
Visa
Teradata
Red Ventures

Interested in Ultimate Go Corporate Training and special pricing?

Let’s Talk Corporate Training!

Ultimate Go Programming LiveLessons

Ultimate Go Programming LiveLessons provides an intensive, comprehensive, and idiomatic view of the Go programming language. This course focuses on both the specification and implementation of the language, including topics ranging from language syntax, design, and guidelines to concurrency, testing, and profiling. This class is perfect for anyone who wants a jump-start in learning Go or wants a more thorough understanding of the language and its internals.