BINARY FLOATS WITH HIDDEN BIT


DIGITAL  PDP-11, VAX, ALPHA
PIXEL - SHADING  2.0
EDUCATIONALS
ZUSE  Z1, Z3
IEEE  754


by:   J.H.M. Bonten


First date of publication: 05 october 2006
Large table refurbished: 01 & 09 october 2007
DEC's gap near zero elucidated: 26 december 2007
Improvement of whole text and tables: 8 march 2009
       bit patterns given in examples
       exceptions listed
       legenda improved
Cray removed from table: 6 may 2009 + 22 october 2009

Machines and applications:

   Digital PDP-11 pedigree, viz. the PDP-11, VAX and ALPHA.
   Most modern machines, which use the IEEE-754 formats.
   Modern video-graphics boards using floats for color intensity.
   Scholar education about these floating-point notations.
   Konrad Zuse's Z1 and Z3, built in 1936 - 1941 in Germany.

Contents

Back to index of numeric formats


LAY-OUT OF THE FLOATING-POINT NUMBERS

There are two general classes of floating points with a hidden bit in common use: one defined by Digital Equipment Corporation (= DEC) and the other defined by IEEE.  The third class defined by Konrad Zuse is not in use anymore. Therefore it will be discussed separately.

The floating point formats defined by the Digital Equipment Corporation for their PDP-11 pedigree are pretty similar to those described by IEEE in their definition standard 754, but not exactly equal. In fact the IEEE-definition is derived from the Digital definition by the microprocessor company Intel. The definition for the fairly small color-defining floats in some modern video graphics cards matches the IEEE definition 754. The tiny floats for educational purposes match it too.

The numbers consist of a sign-bit, an exponent and a mantissa. The exponent is written in excess notation such that its value will never be negative. So the integer value represented by the bit pattern of the exponent is greater than the exponent value it stands for:
           expon_int = expon_value + excess_bias
The zero exponent integer coincides with the minimum exponent value. Exceptions and special numbers are linked to one value of this exponent.

The mantissa is one bit longer than its sequence of bits written in the number. This extra bit is called 'hidden bit'. This construction is possible only when the base of the exponent is 2, which is both in the PDP-11 machines and the IEEE-standard. The hidden bit is assumed to be 1 always when the exponent integer is greater than zero. When the exponent integer equals zero (i.e. the exponent has its minimum value) than this bit is assumed to be 1 or 0, sometimes depending on the 'visible bits' in the mantissa. The visible part of the mantissa is always assumed to be a fractional part.

The hidden bit delivers an extra bit place for free, although at the cost of processing time. It makes normalization of the mantissa obligatory always, even when the arithmetics do not require it. Normalization means that the first non-zero bit is put on the leftmost place in the mantissa field. Here it is obligatory since it must coincide with the hidden bit. If it does not coincide, then the whole contents of the mantissa has to be shifted to the left or right until it does. The value of the exponent has to be updated accordingly. This shifting and updating takes precious processor time.

Back to contents


DIFFERENCES IN NOTATION BETWEEN DEC AND IEEE

Despite the similarities there are important differences between the IEEE-notation and the Digital-notation of the floating-point numbers. (Both's integer numbers are equal.)

At first the hidden bit is given another position. IEEE assumes this bit before the fractional period and Digital assumes it immediately after that period. According to IEEE the visible part of the mantissa ('visman') starts immediately after the period, whilst according to Digital it starts behind the hidden bit. Thus the value range of the total mantissa is:

    IEEE:      1.0 =<  (1.visman)  < 2.0
    Digital:   0.5 =< (0.1 visman) < 1.0

At second the excess-biases in the notation of the exponent differ. That of Digital is one greater than that of IEEE. For an exponent of n bits Digital uses the excess value 2^(n-1), whilst IEEE uses the value 2^(n-1)-1. For example Digital would give a bias of 2048 to an exponent of 12 bits, whilst IEEE would give it a bias of 2047.

Both effects together make that the bit pattern in an IEEE- float represents a number four times in size of the value the same bit pattern in a Digital-float stands for. Thus the value IEEE assigns to a bit pattern is four times the value Digital assigns to the exactly same bit pattern. (This only holds for regular numbers which is the bulk of all used numbers, not for special numbers.)

It is perfectly possible to redefine the value of the Digital mantissa as 1.visman and thus make it equal to that of the IEEE mantissa. Then its range goes from 1.0 until 2.0 also. To compensate this redefinition the bias of the Digital exponent must be increased by 1.  For an exponent of 12 bits it becomes 2049.  This redefinition clarifies better the abovementioned factor 4.  Also it would make easier some of the discussions.in the following texts of this document. Nevertheless it is not done in order to not go too far away from the company's manuals.

The third difference between the definitions by Digital and IEEE concerns the the exponent value to which the special numbers and errors are linked. For this Digital uses the value 0 and IEEE uses the maximum exponent 1111...   The bit pattern of the mantissa determines the type of the special number and its numerical or nonnumerical value.

At fourth IEEE uses unnormalized mantissas for extremely small values, whilst Digital does not use such mantissas at all.

WARNING:

Although the normalized values are what you mostly see when your program is working with real data, proper handling of the rest of the values (denorms, error-values, infinities) is vitally important; otherwise you'll get all sorts of horrible results that are difficult to understand and usually impossible to fix.

Back to contents


EXPONENT VALUES AND BIT PATTERNS

Next the general definitions of both formats will be given. In these definitions five values in relation to the exponent size are important. These are the maximum integer bit-pattern value and the excess bias value. In the table they shown for an exponent of n bits, together with an example of 8 bits, of 11 bits and of 15 bits.

Because of the excess bias one must discern between the integer value of the bit-pattern of an exponent and the actual exponent value this pattern stands for. The maximum actual value shows clearly its difference with the maximum bit-pattern value.

     DEFINER     VALUE      EXAMPLE 8 AND 11 AND 15 BITS

  Excess bias of exponent for the bulk of numbers
     Digital:   2^(n-1)           128    1024    16384
     IEEE for normalized mantissas:
                2^(n-1) - 1       127    1023    16383
     IEEE for unnormalized mantissas:
                2^(n-1) - 2       126    1022    16382

  Minimum bit-pattern value of exponent for bulk of numbers
     Digital:   1 (0 is not bulk)   1       1        1
     IEEE for normalized mantissas:
                1                   1       1        1
     IEEE for unnormalized mantissas:
                0                   0       0        0
  Minimum actual value of exponent for the bulk of numbers
     Digital:   1-2^(n-1)        -127   -1023   -16383
     IEEE:      2-2^(n-1)        -126   -1022   -16382

  Maximum bit-pattern value of exponent for bulk of numbers
     Digital:   2^n - 1           255    2047    32767
     IEEE:      2^n - 2           254    2046    32766
  Maximum actual value of exponent for the bulk of numbers
     Digital:   2^(n-1)-1         127    1023    16383
     IEEE:      2^(n-1)-1         127    1023    16383

Back to contents


DEFINITIONS FOR THE FLOATS

In the official definitions the expression  (-1)^signbit  is used. Here this is abbreviated to S, for readability. The integer value represented by the bit pattern of the exponent is called 'exb'.

The definition of a Digital floating-point word is:

   CONDITION:                 FLOAT VALUE:
exb=0 and S_visman = +0    0.0    [clean zero]
exb=0 and S_visman > +0    0.0    [dirty zero]
exb=0 and S_visman =< -0   Undefined value
0 < exb =< 2^n-1           S * 2^(exb-2^(n-1)) * (0.1_visman)
                                       [with hidden bit]

The definition of an IEEE floating-point word is:

   CONDITION:                 FLOAT VALUE:
exb=2^n-1 and visman<>0    Not a Number [NaN]
exb=2^n-1 and visman=0     S * Infinity [signed infinity]
exb=0   and visman=0       S * 0.0      [signed zero]
exb=0   and visman<>0      S * 2^(2-2^(n-1)) * (0.visman)
                                       [unnormalized]
0 < exb =< 2^n-2           S * 2^(exb+1-2^(n-1)) * (1.visman)
                                       [with hidden bit]

In the case of an 8-bits exponent one gets:

  255 replaces 2^n-1
  128 replaces 2^(n-1)
  127 replaces 2^(n-1)-1
  126 replaces 2^(n-1)-2

The definition of a Digital floating-point word becomes:

     CONDITION:                 FLOAT VALUE:
  exb=0 and S_visman = +0    0.0    [clean zero]
  exb=0 and S_visman > +0    0.0    [dirty zero]
  exb=0 and S_visman =< -0   Undefined value
  0 < exb =< 255             S * 2^(exb-128) * (0.1_visman)
                                       [with hidden bit]

The definition of an IEEE floating-point word becomes:

     CONDITION:              FLOAT VALUE:
  exb=255 and visman<>0   Not a Number [NaN]
  exb=255 and visman=0    S * Infinity [signed infinity]
  exb=0   and visman=0    S * 0.0      [signed zero]
  exb=0   and visman<>0   S * 2^(-126) * (0.visman)  [unnorm.]
  0 < exb =< 254          S * 2^(exb-127) * (1.visman)
                                       [with hidden bit]

Exb = integer value of exponent bit-pattern.

The definition PS2.0 for the small floats for pixel shading in some video-graphics cards obeys the IEEE rules. The tiny educational floats for help in understanding the floating-point bit-patterns obey them too.

Back to contents


MANTISSA AND ACCURACIES

Accuracies

In storing and displaying numeric values two important types of accuracy are in use: the absolute and the relative. The absolute accuracy of a number is the absolute value of the change in the value of that number when the least significant bit (= last bit) of its mantissa is toggled (= inverted).

The relative accuracy is the ratio between the absolute accuracy and the actual numeric value. It is the absolute accuracy divided by the actual value. Therefore it equals the absolute accuracy when the numeric value is 1.

This type of accuracy makes much sense when the number is normalized, since the division blots out the influence of the exponent. It is at its best when the value of the mantissa is at its maximum (= bit pattern 11111...) and at its worst when that value is at its minimum (= bit pattern 10000...).  In a binary computer the difference is a factor of nearly 2.

To blot out this difference the concept of the guaranteed relative accuracy is introduced. This is the relative accuracy at its worst, i.e. when the value of the normalized mantissa is at its minimum. Thus it depends solely on the mantissa length, actually in a binary computer on the number of mantissa bits minus one.

So in every binary representation one mantissa bit should not be taken into account. Thus in the case of DEC and IEEE one simply has to 'forget' the hidden bit. Then only the number of bits in the visible part of the mantissa have to be counted. This number is indicated by m.  Thus the guaranteed relative accuracy becomes 2^(-m).

This relative accuracy determines the number of decimal digits that can be represented reliably by the sequence of bits. The longer the mantissa is the more digits this sequence can store well. This 'decimal accuracy' is expressed as m * log(2).  For example, the number of digits that fit well in a mantissa with 112 visible bits equals 33.7 (note the 'broken digit'!).

The absolute accuracy is at least the guaranteed accuracy multiplied with the 'exponented exponent', i.e. with 2^exponent_value.

Unnormalized numbers

For an unnormalized (and also for a normalized) number the absolute accuracy equals the relative accuracy multiplied with the 'exponented exponent', i.e. with  2^exponent_value.  The minimum non-zero value of an unnormalized number equals its absolute accuracy.

The definition of the IEEE format shows that the exponent bitpatterns 00...000 and 00...001 deliver the same exponent value. Merely they indicate a different useage of the mantissa. The pattern 00...000 says that the mantissa is unnormalized and without hidden bit, whilst the pattern 00...001 says the mantissa is normalized with a hidden bit.

Consequently when the exponent bit-pattern will never exceed 00...001 the ordinary integer arithmetic can be applied on the mantissa together with the last bit of the exponent.

Deviant E-format

Remarkably the IEEE E-format does not hide the first mantissa bit!  This bit stays visible in the mantissa. This bit is zero when all bits in the exponent are zero. It is one when at least one exponent bit is one. Consequently the integer-like arithmetic on the extremely small values works less simply in the E-format.

Special values

According to the IEEE-754 definition the number represents a special value when the exponent has the bit pattern 1111... . The type of the special value is determined by the bit pattern of the visible part of the mantissa. The types are:

Infinity      <= 00..00 (positive/negative depends on +/- sign)
NaN Signaling <= 00..01 to 01..11
Indeterminate <= 10..00 (or quiet NaN when sign is positive)
NaN Quiet     <= 10..01 to 11..11

In the DEC definition the zero exponent bit-pattern is shared by the ordinary value 0.0 and the nonnumeric values Undefined and DirtyZero. Therefore the unnormalized numbers are absent.

The zero-gap of DEC

The absence of the unnormalized numbers in the definition by DEC causes a gap between the value 0.0 and the smallest non-zero value. This gap is very much larger than the space between the smallest non-zero value and the value immediately next to it.

The following drawing shows the effect of this gap on the series of possible values along the continuous numerical axis. Herein the visual part of the mantissa (= visman) is assumed to have three bits.

DEC:
  expon = ..000   expon = ..001       expon = ..010           e
|<------------->|<------------->|<--------------------->|<-----
 +---------------+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+----+
 0               A
  |_____________|
        gap


IEEE:
  expon = ..000   expon = ..001       expon = ..010           e
|<------------->|<------------->|<--------------------->|<-----
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+----+
 0 ^             A
|_________________________________|
   integer arithmetics possible
       (except in E-format)

Each plus sign stands for a value that can be represented by the computer. The minus signs represent all values that cannot be represented. The letter A points to the minimum normalized value. The caret (^) points to the minimum unnormalized nonzero value.   0 is the zero value.

The drawing shows that DEC has no value available between 0 and A, whilst IEEE has many.

Back to contents


MINIMUM AND MAXIMUM VALUES

In the calculations of the minimum value of a float the number of bits in the mantissa can be important, i.e. the number bits in its visible part. Again here this number is indicated by the letter m. In the calculations that approximate the maximum value this number appears to be not important. As above the number of bits in the exponent is indicated by n.

The mantissa of an unnormalized float is at minimum non-zero value when all bits except the most right one in the visible part are zero. This value is:

     DEC:    unnormalized mantissas are not applied
     IEEE:   2^(-m)
The exponent value of an unnormalized float is:
     IEEE:   -(2^(n-1)-2)  =  2-2^(n-1)
Since the exponent is at its smallest value the minimum value of the whole float is:
     IEEE:   2^(-m) * 2^(2-2^(n-1)) = 4 / 2^(m+2^(n-1))

The mantissa of a normalized float is at minimum value when all bits in its visible part are zero. This value is exactly:

     DEC:    0.5
     IEEE:   1.0
The minimum value of the exponent of a normalized float is:
     DEC:    1-2^(n-1)
     IEEE:   1-(2^(n-1)-1)  =  2-2^(n-1)
A normalized float is at its minimum value when both the mantissa and the exponent have their lowest values. It is:
     DEC:    0.5 * 2^(1-2^(n-1))      =      2^(-2^(n-1))
     IEEE:   1.0 * 2^(1-(2^(n-1)-1))  =  4 * 2^(-2^(n-1))

The mantissa of a float is at maximum value when all bits in its visible part equal one. This value is:

     DEC:    (2^(m+1)-1)/(2^(m+1))  =  1 - 1/(2^(m+1))
     IEEE:   (2^(m+1)-1)/(2^m)      =  2 - 1/(2^m)
The more bits the mantissa has the better it approximates the value 1 (by DEC) or 2 (by IEEE).
The maximum value of the exponent is:
     DEC:    2^n-1-2^(n-1)      =  2^(n-1)-1
     IEEE:   2^n-2-(2^(n-1)-1)  =  2^(n-1)-1
A normalized float is at its maximum value when both the mantissa and the exponent have their highest values. It is:
     DEC:    (1-1/(2^(m+1))) * 2^(2^(n-1)-1)  =
              = (1-1/(2^(m+1))) * 2^(2^(n-1)) / 2 =
              = approximately =  2^(2^(n-1)-1)
     IEEE:   (2-1/(2^m)) * 2^(2^(n-1)-1)  =
              = (1-1/(2^(m+1))) * 2^(2^(n-1))  =
              = approximately =  2^2^(n-1)

The non-numerical values occur at the exponent value:

     DEC:    -2^(n-1)
             = exponent bit-pattern 00...000
     IEEE:   2^n-1-(2^(n-1)-1)  =  +2^(n-1)
             = exponent bit-pattern 11...111

The range of normalized values is not symmetrical around the number 1.  If it were symmetrical, then:
           normalized_minimum = 1 / approximate_maximum
or written elsewise:
           normalized_minimum * approximate_maximum = 1

Actually the latter multiplication gives:

     DEC:    2^(-2^(n-1)) * 2^(2^(n-1)-1) = 0.5
     IEEE:   4 * 2^(-2^(n-1)) * 2^2^(n-1) = 4

Back to contents


REMARKS ABOUT SOME COMPUTERS

Some peculiarities of a few computers are described here in shorthand. More elaborate descriptions about these machines can be found by looking in the main index file of this internet site.

Zuse

ACKNOWLEDGEMENT

Due to the allied air raids on the German cities in 1944 and 1945 the information about the old Zuse computers is sometimes contradictory. Therefore the word structure I describe here is the reconstruction I assume as most probable.
END OF ACKNOWLEDGEMENT

Already before World War 2 the German aeroplane engineer Konrad Zuse created a computer wherein the numbers are stored binarily in hidden-bit notation. The definition of this notation differs from those by IEEE and DEC.  Herein the mantissa is like in IEEE: 1.visman, so its value ranges from 1.0 until 2.0.

The exponent is written in sign+magnitude notation. Infinity is its only special value, occupying the bit pattern for its maximum 01111..., irrespective the value of the mantissa.

Although the Undefined value can result from an arithmetic operation it is not written in the resulting number. It is noted only in a kind of 'processor status register'. In Zuse's first machines this register is displayed on the user's console.

The exponent bit-pattern 10000... sets the whole numeric value to 0.0, irrespective the value of the mantissa. The latter implies that the unnormalized values are not used. Consequently around this 0.0 the series of numeric values shows the same near-zero gap as DEC has.

Since the definition by Zuse is not in use anymore, only the numerical data of the memory words in the logically equal Z1 and Z3 computers are given here. A more extensive description of these machines and the data of some later Zuse computers are given elsewhere in this internet site.

DEC not 36 bits

None of the Digital machines with hidden-bit notation ever uses 36-bits words. These words are applied in another, incompatible system-line of Digital: the PDP-10, Decsystem-10 and Decsystem-20, which show all bits in the mantissa. The word formats of these machines are described elsewhere in this internet site.

The Digital machines with the hidden bit use word sizes that are powers of two. They all belong to the PDP-11 pedigree. Their memory structure is shown elsewhere in the internet site.

Cray super-computer

In the years 1970-s and 1980-s Seymour Cray produced the first model range of the super computers designed by himself, the Cray-1 c.s. The word formats in these machines are described extensively elsewhere in this internet site.

In the years 1990-s the Cray company abandoned these own word formats in favor of the Digital Alpha processor. At present it uses the formats of the Intel's mathematical co-processor 8087, viz. the four 'official' IEEE STEX-formats (see listing).

The format given by Paul Gray in a small section of his course on mathematics cannot be confirmed, so is not listed.

IBM-390

The IBM-390 computer applies these IEEE-STEX formats too, but it also applies the three old formats of the IBM-360 which are described extensively elsewhere in this internet site. Nowadays its successor, the modern Z-series, applies besides the three IEEE and three hexadecimal formats also a new decimal format, the Packed Decimal Encoding (PDE) that is described extensively elsewhere in this internet site.

Back to contents


EXAMPLES OF BIT-PATTERNS

The bit pattern of some numbers in the three definitions are given for an exponent size of 7 bits and a visman size of 14 bits. In these examples the most significant bit is at the left side and the least significant bit is at the right side of the computer word. In the examples the +/- sign of the mantissa is always kept positive for the ordinary numeric non-zero values. So their bit Sm is 0 in these examples.

The first three blocks in the table of each definition show the ordinary normalized numbers. Their mantissas are the same for all three definitions. Only their exponents differ. IEEE and DEC give these in excess-bias notation. These differ by 2.  ZUSE gives the exponent in sign+magnitude notation.

In all three tables the third line of the fourth block shows the minimum normalized bit-pattern and the decimal value it stands for (rounded to four digits). The fifth block in the IEEE table shows some unnormalized bit patterns. This block is absent in the other two tables since the definitions of DEC and ZUSE do not apply unnormalized numbers. Consequently they show the abovementioned zero-gap.

In the IEEE definition integer arithmetics can be applied on the 0 and all numbers in the fourth and fifth block and the numbers in between, i.e. on all numbers from 0 upto 4337E-22.  In the other two definitions the zero-gap impedes this.

In all tables the last line in the last block gives the maximum ordinary value. Its bit pattern is always normalized. All mantissa bits are 1.  Here the numeric value of each maximium is:
     ZUSE: max.value = 9223e+18
     IEEE: max.value = 1845e+19
     DEC:   max.value = 9223e+18

Legenda for all three tables:
    x = don't care = bit value is irrelevant
    v = at least one bit in this series must be 1

ZUSE

                ,-- exponent sign
                |
decimal     S   V   exponent           m a n t i s s a
 value      m  |-|-----------| |-----------------------------|

1100e-03    o   o o o o o o o   o o o 1 1 o o   1 1 o o 1 1 o
2200e-03    o   o o o o o o 1   o o o 1 1 o o   1 1 o o 1 1 o
3300e-03    o   o o o o o o 1   1 o 1 o o 1 1   o o 1 1 o o 1
4400e-03    o   o o o o o 1 o   o o o 1 1 o o   1 1 o o 1 1 o
5500e-03    o   o o o o o 1 o   o 1 1 o o o o   o o o o o o o
6600e-03    o   o o o o o 1 o   1 o 1 o o 1 1   o o 1 1 o o 1

6060e-02    o   o o o o 1 o 1   1 1 1 o o 1 o   o 1 1 o o 1 1
6006e-01    o   o o o 1 o o 1   o o 1 o 1 1 o   o o 1 o o 1 1
1000e-06    o   1 o o 1 o 1 o   o o o o o 1 1   o o o 1 o o 1
1110e-04    o   1 o o o 1 o o   1 1 o o o 1 1   o 1 o 1 o o 1
2220e-04    o   1 o o o o 1 1   1 1 o o o 1 1   o 1 o 1 o o 1
3000e-03    o   o o o o o o 1   1 o o o o o o   o o o o o o o

8000e-03    o   o o o o o 1 1   o o o o o o o   o o o o o o o
4000e-03    o   o o o o o 1 o   o o o o o o o   o o o o o o o
2000e-03    o   o o o o o o 1   o o o o o o o   o o o o o o o
1000e-03    o   o o o o o o o   o o o o o o o   o o o o o o o
5000e-04    o   1 o o o o o 1   o o o o o o o   o o o o o o o
2500e-04    o   1 o o o o 1 o   o o o o o o o   o o o o o o o
1250e-04    o   1 o o o o 1 1   o o o o o o o   o o o o o o o

2168e-19    o   1 1 1 1 1 1 o   o o o o o o o   o o o o o o o
2168e-19    o   1 1 1 1 1 1 1   1 1 1 1 1 1 1   1 1 1 1 1 1 1
1084e-19    o   1 1 1 1 1 1 1   o o o o o o o   o o o o o o o
5421e-20            [unnormalized is not in this definition]

0 (+0,-0)   o   1 o o o o o o   x x x x x x x   x x x x x x x
+Infinity   o   o 1 1 1 1 1 1   x x x x x x x   x x x x x x x
-Infinity   1   o 1 1 1 1 1 1   x x x x x x x   x x x x x x x
Undefined           [not applied in this definition]
max.value   o   o 1 1 1 1 1 o   1 1 1 1 1 1 1   1 1 1 1 1 1 1

IEEE

                   biased
decimal     S     exponent             m a n t i s s a
 value      m  |-------------| |-----------------------------|

1100e-03    o   o 1 1 1 1 1 1   o o o 1 1 o o   1 1 o o 1 1 o
2200e-03    o   1 o o o o o o   o o o 1 1 o o   1 1 o o 1 1 o
3300e-03    o   1 o o o o o o   1 o 1 o o 1 1   o o 1 1 o o 1
4400e-03    o   1 o o o o o 1   o o o 1 1 o o   1 1 o o 1 1 o
5500e-03    o   1 o o o o o 1   o 1 1 o o o o   o o o o o o o
6600e-03    o   1 o o o o o 1   1 o 1 o o 1 1   o o 1 1 o o 1

6060e-02    o   1 o o o 1 o o   1 1 1 o o 1 o   o 1 1 o o 1 1
6006e-01    o   1 o o 1 o o o   o o 1 o 1 1 o   o o 1 o o 1 1
1000e-06    o   o 1 1 o 1 o 1   o o o o o 1 1   o o o 1 o o 1
1110e-04    o   o 1 1 1 o 1 1   1 1 o o o 1 1   o 1 o 1 o o 1
2220e-04    o   o 1 1 1 1 o o   1 1 o o o 1 1   o 1 o 1 o o 1
3000e-03    o   1 o o o o o o   1 o o o o o o   o o o o o o o

8000e-03    o   1 o o o o 1 o   o o o o o o o   o o o o o o o
4000e-03    o   1 o o o o o 1   o o o o o o o   o o o o o o o
2000e-03    o   1 o o o o o o   o o o o o o o   o o o o o o o
1000e-03    o   o 1 1 1 1 1 1   o o o o o o o   o o o o o o o
5000e-04    o   o 1 1 1 1 1 o   o o o o o o o   o o o o o o o
2500e-04    o   o 1 1 1 1 o 1   o o o o o o o   o o o o o o o
1250e-04    o   o 1 1 1 1 o o   o o o o o o o   o o o o o o o
4337e-19    o   o o o o o 1 o   o o o o o o o   o o o o o o 1

4337e-19    o   o o o o o 1 o   o o o o o o o   o o o o o o o
4337e-19    o   o o o o o o 1   1 1 1 1 1 1 1   1 1 1 1 1 1 1
2168e-19    o   o o o o o o 1   o o o o o o o   o o o o o o o

2168e-19    o   o o o o o o o   1 1 1 1 1 1 1   1 1 1 1 1 1 1
1084e-19    o   o o o o o o o   1 o o o o o o   o o o o o o o
5421e-20    o   o o o o o o o   o 1 o o o o o   o o o o o o o
2711e-20    o   o o o o o o o   o o 1 o o o o   o o o o o o o
6617e-23    o   o o o o o o o   o o o o o o o   o o o o 1 o 1
5294e-23    o   o o o o o o o   o o o o o o o   o o o o 1 o o
3970e-23    o   o o o o o o o   o o o o o o o   o o o o o 1 1
2647e-23    o   o o o o o o o   o o o o o o o   o o o o o 1 o
1323e-23    o   o o o o o o o   o o o o o o o   o o o o o o 1

0 (+0,-0)   o   o o o o o o o   o o o o o o o   o o o o o o o
+Infinity   o   1 1 1 1 1 1 1   o o o o o o o   o o o o o o o
-Infinity   1   1 1 1 1 1 1 1   o o o o o o o   o o o o o o o
NaN quiet   x   1 1 1 1 1 1 1   O v v v v v v   v v v v v v v
NaN signal  x   1 1 1 1 1 1 1   1 v v v v v v   v v v v v v v
(unusable)  x   1 1 1 1 1 1 1   1 o o o o o o   o o o o o o o
max.value   o   1 1 1 1 1 1 o   1 1 1 1 1 1 1   1 1 1 1 1 1 1

DEC

                   biased
decimal     S     exponent             m a n t i s s a
 value      m  |-------------| |-----------------------------|

1100e-03    o   1 o o o o o 1   o o o 1 1 o o   1 1 o o 1 1 o
2200e-03    o   1 o o o o 1 o   o o o 1 1 o o   1 1 o o 1 1 o
3300e-03    o   1 o o o o 1 o   1 o 1 o o 1 1   o o 1 1 o o 1
4400e-03    o   1 o o o o 1 1   o o o 1 1 o o   1 1 o o 1 1 o
5500e-03    o   1 o o o o 1 1   o 1 1 o o o o   o o o o o o o
6600e-03    o   1 o o o o 1 1   1 o 1 o o 1 1   o o 1 1 o o 1

6060e-02    o   1 o o o 1 1 o   1 1 1 o o 1 o   o 1 1 o o 1 1
6006e-01    o   1 o o 1 o 1 o   o o 1 o 1 1 o   o o 1 o o 1 1
1000e-06    o   o 1 1 o 1 1 1   o o o o o 1 1   o o o 1 o o 1
1110e-04    o   o 1 1 1 1 o 1   1 1 o o o 1 1   o 1 o 1 o o 1
2220e-04    o   o 1 1 1 1 1 o   1 1 o o o 1 1   o 1 o 1 o o 1
3000e-03    o   1 o o o o 1 o   1 o o o o o o   o o o o o o o

8000e-03    o   1 o o o 1 o o   o o o o o o o   o o o o o o o
4000e-03    o   1 o o o o 1 1   o o o o o o o   o o o o o o o
2000e-03    o   1 o o o o 1 o   o o o o o o o   o o o o o o o
1000e-03    o   1 o o o o o 1   o o o o o o o   o o o o o o o
5000e-04    o   1 o o o o o o   o o o o o o o   o o o o o o o
2500e-04    o   o 1 1 1 1 1 1   o o o o o o o   o o o o o o o
1250e-04    o   o 1 1 1 1 1 o   o o o o o o o   o o o o o o o

2168e-19    o   o o o o o 1 1   o o o o o o o   o o o o o o o
1084e-19    o   o o o o o 1 o   o o o o o o o   o o o o o o o
5421e-20    o   o o o o o o 1   o o o o o o o   o o o o o o o
2711e-20            [unnormalized is not in this definition]

0 (clean)   o   o o o o o o o   o o o o o o o   o o o o o o o
0 (dirty)   o   o o o o o o o   v v v v v v v   v v v v v v v
Infinity            [not applied in this definition]
Undefined   1   o o o o o o o   x x x x x x x   x x x x x x x
max.value   O   1 1 1 1 1 1 1   1 1 1 1 1 1 1   1 1 1 1 1 1 1

Back to contents


LIST OF HIDDEN-BIT FLOAT-FORMATS

The actually applied formats are in listing:

Bits and Machines

de-     name   abbrev. #bytes |---- #bits -----|  |- used in -|
finer                         total expon visman  |- machine -|
                                     (n)   (m)
Education --    s2e3    0.75     6    3     2            ED
Education --    s5e3    1.125    9    3     5            ED
Education --    s3e4     1       8    4     3            ED

PS2.0     --    s10e5    2      16    5    10                NV
PS2.0     --    s16e7    3      24    7    16                AT

IEEE   single     S      4      32    8    23      J I A 9   DX
IEEE   twin       T      8      64   11    52      J I A 9  
IEEE   enhanced   E     10      80   15  1+63        I
IEEE   extended   X     16     128   15   112            9

DEC    float      F      4      32    8    23      P V A
DEC    double     D      8      64    8    55      P V
DEC    grand(?)   G      8      64   11    52        V A
DEC    hyper(?)   H     16     128   15   112        V

Zuse   Zuse-1     Z1    2.75    22    7    14            Z3

Exponent values

de-    abbr.  for spec.  maxexb   |-- excess bias --|  maximum
finer          values    ordin.   unnormal normalized    value

Educa  s2e3       7          6          2         3          3
Educa  s5e3       7          6          2         3          3
Educa  s3e4      15         14          6         7          7

PS2.0 s10e5      31         30         14        15         15
PS2.0 s16e7     127        126         62        63         63

IEEE   S        255        254        126       127        127
IEEE   T       2047       2046       1022      1023       1023
IEEE   E      32767      32766      16382     16383      16383
IEEE   X      32767      32766      16382     16383      16383

DEC    F          0        255         --       128        127
DEC    D          0        255         --       128        127
DEC    G          0       2047         --      1024       1023
DEC    H          0      32767         --     16384      16383

Zuse   Z1        63         62         --   sign+magn.      63

Binary extremes and accuracy

de-    abbr.  |------ binary range of abs.value ------|  relat.
finer         min(unnormal) min(normalized) max(nearly)  accur

Educa  s2e3   2^-(2+2)         2^-2         2^+4          2^-2
Educa  s5e3   2^-(5+2)         2^-2         2^+4          2^-5
Educa  s3e4   2^-(3+6)         2^-6         2^+8          2^-3

PS2.0 s10e5   2^-(10+14)       2^-14        2^+16        2^-10
PS2.0 s16e7   2^-(16+62)       2^-62        2^+64        2^-16

IEEE   S      2^-(23+126)      2^-126       2^+128       2^-23
IEEE   T      2^-(52+1022)     2^-1022      2^+1024      2^-52
IEEE   E      2^-(63+16382)    2^-16382     2^+16384     2^-63
IEEE   X      2^-(112+16382)   2^-16382     2^+16384    2^-112

DEC    F           --          2^-128       2^+127       2^-23
DEC    D           --          2^-128       2^+127       2^-55
DEC    G           --          2^-1024      2^+1023      2^-52
DEC    H           --          2^-16384     2^+16383    2^-112

Zuse   Z1          --          2^-63        2^+63        2^-14

Decimal extremes and accuracy

de-    abbr. |---- decimal range of abs.value -------|  decimal
finer        min(unnormal) min(normalized) max(nearly)   accur.

Educa  s2e3    0.0625        0.25          16              0.6
Educa  s5e3    0.0078125     0.25          16              1.5
Educa  s3e4    0.001953125   0.015625      256             0.9

PS2.0 s10e5    5.97E-8       6.11E-5       65536           3.0
PS2.0 s16e7    3.31E-24      2.17E-19      1.84E+19        4.8

IEEE   S       1.41E-45      1.18E-38      3.40E+38        6.9
IEEE   T       4.95E-324     2.23E-308     1.79E+308      15.6
IEEE   E       3.65E-4951    3.37E-4932    1.18E+4932     18.9
IEEE   X       6.48E-4966    3.37E-4932    1.18E+4932     33.7

DEC    F          --        0.294E-38      1.70E+38        6.9
DEC    D          --        0.294E-38      1.70E+38       16.5
DEC    G          --        0.557E-308    0.898E+308      15.6
DEC    H          --        0.841E-4932   0.594E+4932     33.7

Zuse   Z1         --        1.085E-19     0.922E+19        4.2

Mantissa ranges

The value range of each non-zero mantissa is in decimal:

       IEEE:    1.0 =<  (1.visman)  < 2.0
       DEC:     0.5 =< (0.1 visman) < 1.0
       ZUSE:    1.0 =<  (1.visman)  < 2.0

Special values

IEEE:  Exponent bit-pattern = 111...
       sign and visman bit-pattern gives meaning:
       ----     ------------------       -------
         -       00..00                negative infinity
         +       00..00                positive infinity
        -/+      00..01 to 01..11      signaling NaN
         -       10..00                indeterminate
         +       10..00                quiet NaN (don't use it)
        -/+      10..01 to 11..11      quiet NaN

DEC:   Exponent bit-pattern = 000...
       sign and visman bit-pattern gives meaning:
       ----     ------------------       -------
         +       00..00                clean zero (value: 0.0)
         +       00..01 to 11..11      dirty zero (value: 0.0)
         -       00..00 to 11..11      Undefined value

ZUSE:  Exponent is in sign+magnitude notation.
       Exponent bit-pattern = 0000.. (= positive zero value)
                 is the ordinary zero and makes the number
                          to have a value from 1.0 until 2.0
       Exponent bit-pattern = 1000.. (= negative zero value)
                 gives numeric value 0.0 irrespective
                          sign and visman bit-pattern.
       Exponent bit-pattern = 0111.. (= highest exponent value)
       sign and visman bit-pattern gives meaning:
       ----     ------------------       -------
         -       irrelevant            negative infinity
         +       irrelevant            positive infinity

Undefined values

IEEE:
Both NaN-s are the undefined values resulting from floating-point operations.

DEC:
The Undefined value can result from a floating-point operation.

ZUSE:
Although the undefined value can result from a floating-point operation it is not written in the resulting number. It is noted only in the processor status register.

Back to contents

LEGENDA TO THIS LIST

Machine types

   Digital Equiment Corp.: P = PDP-11, V = VAX, A = Alpha
   J = HewlettPackard-9000 series, JavaVirtualMachine
   I = Intel-PC 8086+8087 and successors, Motorola-68*
   9 = IBM: 390, AS400. Z-series
   Video-graphics cards: NV = NVidia, AT = AMD-ATi, DX = Micrsoft DirectX9 specs
   ED = for educational purposes
   Z3 = Konrad Zuse: Z1 and Z3 (in pre-war Germany)

Actual maxima of the small formats

       Educa s2e3:   max = 14
       Educa s5e3:   max = 15.75
       Educa s3e4:   max = 240
       PS2.0 s10e5:  max = 65504

The s2e3 float can be drawn easily on paper with its 55 numeric values marked along the continuous value axis.

Mantissa range of DEC

The value range of each non-zero mantissa is in decimal:

    IEEE and Zuse:    1.0 =<  (1.visman)  < 2.0
    DEC:              0.5 =< (0.1 visman) < 1.0

When the DEC-mantissa is defined in the way of IEEE and Zuse, then the exponent values of DEC must be redefined. They become:

       DEC Exponent values for  mantissa = 1.visman

de-    abbr.  for spec.  maxexb   |-- excess bias --|  maximum
finer          values    ordin.   unnormal normalized    value

DEC    F          0        255         --       129        126
DEC    D          0        255         --       129        126
DEC    G          0       2047         --      1025       1022
DEC    H          0      32767         --     16385      16382

All other figures in the table remain unchanged. The special values will not change either.

Miscellaneous remarks

   All signbits are in all definitions and formats:    0 = '+', 1 = '-'.
   'Visman' = visible part of the mantissa.
   'Maxexb ordin.' = maximum integer of the exponent bit pattern used for the ordinary values.
   The notation 'SmEn' means: Sign bit + n bits for exponent + m bits for mantissa
   The PS2.0-s23e8 format equals the IEEE-S format.
   Other name for s10e5 format is 'fp16'.
   Other name for s16e7 format is 'fp24'.
   Other name for eXtended (X) format is 'Quadruple (Q)'.
   The E-format does not hide the first mantissa bit !

   The IEEE rules for floats with hidden bit are obeyed by all Educa and all PS2.0 formats.
   This obedience also holds for the special values.
   IEEE and Zuse use the maximum exponent for the special values only.
   Digital uses the zero exponent for both the ordinary zero and its special values.
   Historically the Zuse and the DEC-F and -D formats are the oldest formats in this list.

Decimal table-properties

   The minimum values are rounded upwards, and the maximum values are rounded downwards.
   Here the rounding to nearest is not applied. Consequently for example:
      Maximum of IEEE-T format = 1.7976931348623159E+308
      is notated here as 1.79E+308
      in many other publications as 1.80E+308
   For all twelve rounding protocols, these inclusive: see elsewhere in this internet site.

   The relative and decimal accuracies hold for the numbers in normalized state, i.e. with value > min(normalized).
   These accuracies are the so-called guaranteed accuracies, see also elsewhere in this internet site.
   The absolute accuracy of an unnormalized number equals the unnormalized non-zero minimum.
   The decimal accuracy values are rounded downwards.

Back to contents


PROGRAMMING NAMES

de-    abbr.   type connotation in language
finer          Fortran       C,C++,Java

IEEE    S      real*4        float
IEEE    T      real*8        double
IEEE    E      real*10       long double
IEEE   X,Q     real*16       long double

The word Real is the old word for Float.
The distinction between the E and X format in C and C++ depends on the type of the computer or on a presetting in the program.

Back to contents


LINKS

Various manuals and handbooks, e.g:
Digital:
   PDP-11 processor handbook + Fortran language reference manual
   VAX Fortran language reference manual
   Alpha Architecture handbook, version 4, EC-QD2KC-TE, oct.1998
   PDP-11 formats
Hewlett-Packard:
   HP-UX Fortran language manual, for HP-9000 series 300 or 500

Back to contents

Back to index of numeric formats