First date of publication: 05 october 2006
Large table refurbished:
01 & 09 october 2007
DEC's gap near zero elucidated: 26 december 2007
Improvement of whole text and tables: 8 march 2009
bit patterns given in examples
exceptions listed
legenda improved
Cray removed from table: 6 may 2009 + 22 october 2009
Digital PDP-11 pedigree, viz. the PDP-11, VAX and ALPHA.
Most modern machines, which use the IEEE-754 formats.
Modern video-graphics boards using floats for color intensity.
Scholar education about these floating-point notations.
Konrad Zuse's Z1 and Z3, built in 1936 - 1941 in Germany.
There are two general classes of floating points with a hidden bit in common use: one defined by Digital Equipment Corporation (= DEC) and the other defined by IEEE. The third class defined by Konrad Zuse is not in use anymore. Therefore it will be discussed separately.
The floating point formats defined by the Digital Equipment Corporation for their PDP-11 pedigree are pretty similar to those described by IEEE in their definition standard 754, but not exactly equal. In fact the IEEE-definition is derived from the Digital definition by the microprocessor company Intel. The definition for the fairly small color-defining floats in some modern video graphics cards matches the IEEE definition 754. The tiny floats for educational purposes match it too.
The numbers consist of a sign-bit, an exponent and a mantissa.
The exponent is written in excess notation such that its value
will never be negative. So the integer value represented by
the bit pattern of the exponent is greater than the exponent
value it stands for:
expon_int = expon_value + excess_bias
The zero exponent integer coincides with the minimum exponent
value. Exceptions and special numbers are linked to one value
of this exponent.
The mantissa is one bit longer than its sequence of bits written in the number. This extra bit is called 'hidden bit'. This construction is possible only when the base of the exponent is 2, which is both in the PDP-11 machines and the IEEE-standard. The hidden bit is assumed to be 1 always when the exponent integer is greater than zero. When the exponent integer equals zero (i.e. the exponent has its minimum value) than this bit is assumed to be 1 or 0, sometimes depending on the 'visible bits' in the mantissa. The visible part of the mantissa is always assumed to be a fractional part.
The hidden bit delivers an extra bit place for free, although at the cost of processing time. It makes normalization of the mantissa obligatory always, even when the arithmetics do not require it. Normalization means that the first non-zero bit is put on the leftmost place in the mantissa field. Here it is obligatory since it must coincide with the hidden bit. If it does not coincide, then the whole contents of the mantissa has to be shifted to the left or right until it does. The value of the exponent has to be updated accordingly. This shifting and updating takes precious processor time.
Despite the similarities there are important differences between the IEEE-notation and the Digital-notation of the floating-point numbers. (Both's integer numbers are equal.)
At first the hidden bit is given another position. IEEE assumes this bit before the fractional period and Digital assumes it immediately after that period. According to IEEE the visible part of the mantissa ('visman') starts immediately after the period, whilst according to Digital it starts behind the hidden bit. Thus the value range of the total mantissa is:
IEEE: 1.0 =< (1.visman) < 2.0 Digital: 0.5 =< (0.1 visman) < 1.0
At second the excess-biases in the notation of the exponent differ. That of Digital is one greater than that of IEEE. For an exponent of n bits Digital uses the excess value 2^(n-1), whilst IEEE uses the value 2^(n-1)-1. For example Digital would give a bias of 2048 to an exponent of 12 bits, whilst IEEE would give it a bias of 2047.
Both effects together make that the bit pattern in an IEEE- float represents a number four times in size of the value the same bit pattern in a Digital-float stands for. Thus the value IEEE assigns to a bit pattern is four times the value Digital assigns to the exactly same bit pattern. (This only holds for regular numbers which is the bulk of all used numbers, not for special numbers.)
It is perfectly possible to redefine the value of the Digital mantissa as 1.visman and thus make it equal to that of the IEEE mantissa. Then its range goes from 1.0 until 2.0 also. To compensate this redefinition the bias of the Digital exponent must be increased by 1. For an exponent of 12 bits it becomes 2049. This redefinition clarifies better the abovementioned factor 4. Also it would make easier some of the discussions.in the following texts of this document. Nevertheless it is not done in order to not go too far away from the company's manuals.
The third difference between the definitions by Digital and IEEE concerns the the exponent value to which the special numbers and errors are linked. For this Digital uses the value 0 and IEEE uses the maximum exponent 1111... The bit pattern of the mantissa determines the type of the special number and its numerical or nonnumerical value.
At fourth IEEE uses unnormalized mantissas for extremely small values, whilst Digital does not use such mantissas at all.
Although the normalized values are what you mostly see when your program is working with real data, proper handling of the rest of the values (denorms, error-values, infinities) is vitally important; otherwise you'll get all sorts of horrible results that are difficult to understand and usually impossible to fix.
Next the general definitions of both formats will be given. In these definitions five values in relation to the exponent size are important. These are the maximum integer bit-pattern value and the excess bias value. In the table they shown for an exponent of n bits, together with an example of 8 bits, of 11 bits and of 15 bits.
Because of the excess bias one must discern between the integer value of the bit-pattern of an exponent and the actual exponent value this pattern stands for. The maximum actual value shows clearly its difference with the maximum bit-pattern value.
DEFINER VALUE EXAMPLE 8 AND 11 AND 15 BITS Excess bias of exponent for the bulk of numbers Digital: 2^(n-1) 128 1024 16384 IEEE for normalized mantissas: 2^(n-1) - 1 127 1023 16383 IEEE for unnormalized mantissas: 2^(n-1) - 2 126 1022 16382 Minimum bit-pattern value of exponent for bulk of numbers Digital: 1 (0 is not bulk) 1 1 1 IEEE for normalized mantissas: 1 1 1 1 IEEE for unnormalized mantissas: 0 0 0 0 Minimum actual value of exponent for the bulk of numbers Digital: 1-2^(n-1) -127 -1023 -16383 IEEE: 2-2^(n-1) -126 -1022 -16382 Maximum bit-pattern value of exponent for bulk of numbers Digital: 2^n - 1 255 2047 32767 IEEE: 2^n - 2 254 2046 32766 Maximum actual value of exponent for the bulk of numbers Digital: 2^(n-1)-1 127 1023 16383 IEEE: 2^(n-1)-1 127 1023 16383
In the official definitions the expression (-1)^signbit is used. Here this is abbreviated to S, for readability. The integer value represented by the bit pattern of the exponent is called 'exb'.
The definition of a Digital floating-point word is:
CONDITION: FLOAT VALUE: exb=0 and S_visman = +0 0.0 [clean zero] exb=0 and S_visman > +0 0.0 [dirty zero] exb=0 and S_visman =< -0 Undefined value 0 < exb =< 2^n-1 S * 2^(exb-2^(n-1)) * (0.1_visman) [with hidden bit]
The definition of an IEEE floating-point word is:
CONDITION: FLOAT VALUE: exb=2^n-1 and visman<>0 Not a Number [NaN] exb=2^n-1 and visman=0 S * Infinity [signed infinity] exb=0 and visman=0 S * 0.0 [signed zero] exb=0 and visman<>0 S * 2^(2-2^(n-1)) * (0.visman) [unnormalized] 0 < exb =< 2^n-2 S * 2^(exb+1-2^(n-1)) * (1.visman) [with hidden bit]
In the case of an 8-bits exponent one gets:
255 replaces 2^n-1 128 replaces 2^(n-1) 127 replaces 2^(n-1)-1 126 replaces 2^(n-1)-2
The definition of a Digital floating-point word becomes:
CONDITION: FLOAT VALUE: exb=0 and S_visman = +0 0.0 [clean zero] exb=0 and S_visman > +0 0.0 [dirty zero] exb=0 and S_visman =< -0 Undefined value 0 < exb =< 255 S * 2^(exb-128) * (0.1_visman) [with hidden bit]
The definition of an IEEE floating-point word becomes:
CONDITION: FLOAT VALUE: exb=255 and visman<>0 Not a Number [NaN] exb=255 and visman=0 S * Infinity [signed infinity] exb=0 and visman=0 S * 0.0 [signed zero] exb=0 and visman<>0 S * 2^(-126) * (0.visman) [unnorm.] 0 < exb =< 254 S * 2^(exb-127) * (1.visman) [with hidden bit]
Exb = integer value of exponent bit-pattern.
The definition PS2.0 for the small floats for pixel shading in some video-graphics cards obeys the IEEE rules. The tiny educational floats for help in understanding the floating-point bit-patterns obey them too.
In storing and displaying numeric values two important types of accuracy are in use: the absolute and the relative. The absolute accuracy of a number is the absolute value of the change in the value of that number when the least significant bit (= last bit) of its mantissa is toggled (= inverted).
The relative accuracy is the ratio between the absolute accuracy and the actual numeric value. It is the absolute accuracy divided by the actual value. Therefore it equals the absolute accuracy when the numeric value is 1.
This type of accuracy makes much sense when the number is normalized, since the division blots out the influence of the exponent. It is at its best when the value of the mantissa is at its maximum (= bit pattern 11111...) and at its worst when that value is at its minimum (= bit pattern 10000...). In a binary computer the difference is a factor of nearly 2.
To blot out this difference the concept of the guaranteed relative accuracy is introduced. This is the relative accuracy at its worst, i.e. when the value of the normalized mantissa is at its minimum. Thus it depends solely on the mantissa length, actually in a binary computer on the number of mantissa bits minus one.
So in every binary representation one mantissa bit should not be taken into account. Thus in the case of DEC and IEEE one simply has to 'forget' the hidden bit. Then only the number of bits in the visible part of the mantissa have to be counted. This number is indicated by m. Thus the guaranteed relative accuracy becomes 2^(-m).
This relative accuracy determines the number of decimal digits that can be represented reliably by the sequence of bits. The longer the mantissa is the more digits this sequence can store well. This 'decimal accuracy' is expressed as m * log(2). For example, the number of digits that fit well in a mantissa with 112 visible bits equals 33.7 (note the 'broken digit'!).
The absolute accuracy is at least the guaranteed accuracy multiplied with the 'exponented exponent', i.e. with 2^exponent_value.
For an unnormalized (and also for a normalized) number the absolute accuracy equals the relative accuracy multiplied with the 'exponented exponent', i.e. with 2^exponent_value. The minimum non-zero value of an unnormalized number equals its absolute accuracy.
The definition of the IEEE format shows that the exponent bitpatterns 00...000 and 00...001 deliver the same exponent value. Merely they indicate a different useage of the mantissa. The pattern 00...000 says that the mantissa is unnormalized and without hidden bit, whilst the pattern 00...001 says the mantissa is normalized with a hidden bit.
Consequently when the exponent bit-pattern will never exceed 00...001 the ordinary integer arithmetic can be applied on the mantissa together with the last bit of the exponent.
Remarkably the IEEE E-format does not hide the first mantissa bit! This bit stays visible in the mantissa. This bit is zero when all bits in the exponent are zero. It is one when at least one exponent bit is one. Consequently the integer-like arithmetic on the extremely small values works less simply in the E-format.
According to the IEEE-754 definition the number represents a special value when the exponent has the bit pattern 1111... . The type of the special value is determined by the bit pattern of the visible part of the mantissa. The types are:
Infinity <= 00..00 (positive/negative depends on +/- sign) NaN Signaling <= 00..01 to 01..11 Indeterminate <= 10..00 (or quiet NaN when sign is positive) NaN Quiet <= 10..01 to 11..11
In the DEC definition the zero exponent bit-pattern is shared by the ordinary value 0.0 and the nonnumeric values Undefined and DirtyZero. Therefore the unnormalized numbers are absent.
The absence of the unnormalized numbers in the definition by DEC causes a gap between the value 0.0 and the smallest non-zero value. This gap is very much larger than the space between the smallest non-zero value and the value immediately next to it.
The following drawing shows the effect of this gap on the series of possible values along the continuous numerical axis. Herein the visual part of the mantissa (= visman) is assumed to have three bits.
DEC: expon = ..000 expon = ..001 expon = ..010 e |<------------->|<------------->|<--------------------->|<----- +---------------+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+----+ 0 A |_____________| gap IEEE: expon = ..000 expon = ..001 expon = ..010 e |<------------->|<------------->|<--------------------->|<----- +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+----+ 0 ^ A |_________________________________| integer arithmetics possible (except in E-format)
Each plus sign stands for a value that can be represented by the computer. The minus signs represent all values that cannot be represented. The letter A points to the minimum normalized value. The caret (^) points to the minimum unnormalized nonzero value. 0 is the zero value.
The drawing shows that DEC has no value available between 0 and A, whilst IEEE has many.
In the calculations of the minimum value of a float the number of bits in the mantissa can be important, i.e. the number bits in its visible part. Again here this number is indicated by the letter m. In the calculations that approximate the maximum value this number appears to be not important. As above the number of bits in the exponent is indicated by n.
The mantissa of an unnormalized float is at minimum non-zero value when all bits except the most right one in the visible part are zero. This value is:
DEC: unnormalized mantissas are not applied IEEE: 2^(-m)The exponent value of an unnormalized float is:
IEEE: -(2^(n-1)-2) = 2-2^(n-1)Since the exponent is at its smallest value the minimum value of the whole float is:
IEEE: 2^(-m) * 2^(2-2^(n-1)) = 4 / 2^(m+2^(n-1))
The mantissa of a normalized float is at minimum value when all bits in its visible part are zero. This value is exactly:
DEC: 0.5 IEEE: 1.0The minimum value of the exponent of a normalized float is:
DEC: 1-2^(n-1) IEEE: 1-(2^(n-1)-1) = 2-2^(n-1)A normalized float is at its minimum value when both the mantissa and the exponent have their lowest values. It is:
DEC: 0.5 * 2^(1-2^(n-1)) = 2^(-2^(n-1)) IEEE: 1.0 * 2^(1-(2^(n-1)-1)) = 4 * 2^(-2^(n-1))
The mantissa of a float is at maximum value when all bits in
its visible part equal one. This value is:
DEC: (2^(m+1)-1)/(2^(m+1)) = 1 - 1/(2^(m+1)) IEEE: (2^(m+1)-1)/(2^m) = 2 - 1/(2^m)The more bits the mantissa has the better it approximates the value 1 (by DEC) or 2 (by IEEE).
DEC: 2^n-1-2^(n-1) = 2^(n-1)-1 IEEE: 2^n-2-(2^(n-1)-1) = 2^(n-1)-1A normalized float is at its maximum value when both the mantissa and the exponent have their highest values. It is:
DEC: (1-1/(2^(m+1))) * 2^(2^(n-1)-1) = = (1-1/(2^(m+1))) * 2^(2^(n-1)) / 2 = = approximately = 2^(2^(n-1)-1) IEEE: (2-1/(2^m)) * 2^(2^(n-1)-1) = = (1-1/(2^(m+1))) * 2^(2^(n-1)) = = approximately = 2^2^(n-1)
The non-numerical values occur at the exponent value:
DEC: -2^(n-1) = exponent bit-pattern 00...000 IEEE: 2^n-1-(2^(n-1)-1) = +2^(n-1) = exponent bit-pattern 11...111
The range of normalized values is not symmetrical around the
number 1. If it were symmetrical, then:
normalized_minimum = 1 / approximate_maximum
or written elsewise:
normalized_minimum * approximate_maximum = 1
Actually the latter multiplication gives:
DEC: 2^(-2^(n-1)) * 2^(2^(n-1)-1) = 0.5 IEEE: 4 * 2^(-2^(n-1)) * 2^2^(n-1) = 4
Some peculiarities of a few computers are described here in shorthand. More elaborate descriptions about these machines can be found by looking in the main index file of this internet site.
Due to the allied air raids on the German cities in 1944 and 1945 the information about the old Zuse computers is sometimes contradictory. Therefore the word structure I describe here is the reconstruction I assume as most probable.
END OF ACKNOWLEDGEMENT
Already before World War 2 the German aeroplane engineer Konrad Zuse created a computer wherein the numbers are stored binarily in hidden-bit notation. The definition of this notation differs from those by IEEE and DEC. Herein the mantissa is like in IEEE: 1.visman, so its value ranges from 1.0 until 2.0.
The exponent is written in sign+magnitude notation. Infinity is its only special value, occupying the bit pattern for its maximum 01111..., irrespective the value of the mantissa.
Although the Undefined value can result from an arithmetic operation it is not written in the resulting number. It is noted only in a kind of 'processor status register'. In Zuse's first machines this register is displayed on the user's console.
The exponent bit-pattern 10000... sets the whole numeric value to 0.0, irrespective the value of the mantissa. The latter implies that the unnormalized values are not used. Consequently around this 0.0 the series of numeric values shows the same near-zero gap as DEC has.
Since the definition by Zuse is not in use anymore, only the numerical data of the memory words in the logically equal Z1 and Z3 computers are given here. A more extensive description of these machines and the data of some later Zuse computers are given elsewhere in this internet site.
None of the Digital machines with hidden-bit notation ever uses 36-bits words. These words are applied in another, incompatible system-line of Digital: the PDP-10, Decsystem-10 and Decsystem-20, which show all bits in the mantissa. The word formats of these machines are described elsewhere in this internet site.
The Digital machines with the hidden bit use word sizes that are powers of two. They all belong to the PDP-11 pedigree. Their memory structure is shown elsewhere in the internet site.
In the years 1970-s and 1980-s Seymour Cray produced the first model range of the super computers designed by himself, the Cray-1 c.s. The word formats in these machines are described extensively elsewhere in this internet site.
In the years 1990-s the Cray company abandoned these own word formats in favor of the Digital Alpha processor. At present it uses the formats of the Intel's mathematical co-processor 8087, viz. the four 'official' IEEE STEX-formats (see listing).
The format given by Paul Gray in a small section of his course on mathematics cannot be confirmed, so is not listed.
The IBM-390 computer applies these IEEE-STEX formats too, but it also applies the three old formats of the IBM-360 which are described extensively elsewhere in this internet site. Nowadays its successor, the modern Z-series, applies besides the three IEEE and three hexadecimal formats also a new decimal format, the Packed Decimal Encoding (PDE) that is described extensively elsewhere in this internet site.
The bit pattern of some numbers in the three definitions are given for an exponent size of 7 bits and a visman size of 14 bits. In these examples the most significant bit is at the left side and the least significant bit is at the right side of the computer word. In the examples the +/- sign of the mantissa is always kept positive for the ordinary numeric non-zero values. So their bit Sm is 0 in these examples.
The first three blocks in the table of each definition show the ordinary normalized numbers. Their mantissas are the same for all three definitions. Only their exponents differ. IEEE and DEC give these in excess-bias notation. These differ by 2. ZUSE gives the exponent in sign+magnitude notation.
In all three tables the third line of the fourth block shows the minimum normalized bit-pattern and the decimal value it stands for (rounded to four digits). The fifth block in the IEEE table shows some unnormalized bit patterns. This block is absent in the other two tables since the definitions of DEC and ZUSE do not apply unnormalized numbers. Consequently they show the abovementioned zero-gap.
In the IEEE definition integer arithmetics can be applied on the 0 and all numbers in the fourth and fifth block and the numbers in between, i.e. on all numbers from 0 upto 4337E-22. In the other two definitions the zero-gap impedes this.
In all tables the last line in the last block gives the maximum ordinary value. Its bit pattern is always normalized. All mantissa bits are 1. Here the numeric value of each maximium is:
ZUSE: max.value = 9223e+18
IEEE: max.value = 1845e+19
DEC: max.value = 9223e+18
Legenda for all three tables:
x = don't care = bit value is irrelevant
v = at least one bit in this series must be 1
,-- exponent sign | decimal S V exponent m a n t i s s a value m |-|-----------| |-----------------------------| 1100e-03 o o o o o o o o o o o 1 1 o o 1 1 o o 1 1 o 2200e-03 o o o o o o o 1 o o o 1 1 o o 1 1 o o 1 1 o 3300e-03 o o o o o o o 1 1 o 1 o o 1 1 o o 1 1 o o 1 4400e-03 o o o o o o 1 o o o o 1 1 o o 1 1 o o 1 1 o 5500e-03 o o o o o o 1 o o 1 1 o o o o o o o o o o o 6600e-03 o o o o o o 1 o 1 o 1 o o 1 1 o o 1 1 o o 1 6060e-02 o o o o o 1 o 1 1 1 1 o o 1 o o 1 1 o o 1 1 6006e-01 o o o o 1 o o 1 o o 1 o 1 1 o o o 1 o o 1 1 1000e-06 o 1 o o 1 o 1 o o o o o o 1 1 o o o 1 o o 1 1110e-04 o 1 o o o 1 o o 1 1 o o o 1 1 o 1 o 1 o o 1 2220e-04 o 1 o o o o 1 1 1 1 o o o 1 1 o 1 o 1 o o 1 3000e-03 o o o o o o o 1 1 o o o o o o o o o o o o o 8000e-03 o o o o o o 1 1 o o o o o o o o o o o o o o 4000e-03 o o o o o o 1 o o o o o o o o o o o o o o o 2000e-03 o o o o o o o 1 o o o o o o o o o o o o o o 1000e-03 o o o o o o o o o o o o o o o o o o o o o o 5000e-04 o 1 o o o o o 1 o o o o o o o o o o o o o o 2500e-04 o 1 o o o o 1 o o o o o o o o o o o o o o o 1250e-04 o 1 o o o o 1 1 o o o o o o o o o o o o o o 2168e-19 o 1 1 1 1 1 1 o o o o o o o o o o o o o o o 2168e-19 o 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1084e-19 o 1 1 1 1 1 1 1 o o o o o o o o o o o o o o 5421e-20 [unnormalized is not in this definition] 0 (+0,-0) o 1 o o o o o o x x x x x x x x x x x x x x +Infinity o o 1 1 1 1 1 1 x x x x x x x x x x x x x x -Infinity 1 o 1 1 1 1 1 1 x x x x x x x x x x x x x x Undefined [not applied in this definition] max.value o o 1 1 1 1 1 o 1 1 1 1 1 1 1 1 1 1 1 1 1 1
biased decimal S exponent m a n t i s s a value m |-------------| |-----------------------------| 1100e-03 o o 1 1 1 1 1 1 o o o 1 1 o o 1 1 o o 1 1 o 2200e-03 o 1 o o o o o o o o o 1 1 o o 1 1 o o 1 1 o 3300e-03 o 1 o o o o o o 1 o 1 o o 1 1 o o 1 1 o o 1 4400e-03 o 1 o o o o o 1 o o o 1 1 o o 1 1 o o 1 1 o 5500e-03 o 1 o o o o o 1 o 1 1 o o o o o o o o o o o 6600e-03 o 1 o o o o o 1 1 o 1 o o 1 1 o o 1 1 o o 1 6060e-02 o 1 o o o 1 o o 1 1 1 o o 1 o o 1 1 o o 1 1 6006e-01 o 1 o o 1 o o o o o 1 o 1 1 o o o 1 o o 1 1 1000e-06 o o 1 1 o 1 o 1 o o o o o 1 1 o o o 1 o o 1 1110e-04 o o 1 1 1 o 1 1 1 1 o o o 1 1 o 1 o 1 o o 1 2220e-04 o o 1 1 1 1 o o 1 1 o o o 1 1 o 1 o 1 o o 1 3000e-03 o 1 o o o o o o 1 o o o o o o o o o o o o o 8000e-03 o 1 o o o o 1 o o o o o o o o o o o o o o o 4000e-03 o 1 o o o o o 1 o o o o o o o o o o o o o o 2000e-03 o 1 o o o o o o o o o o o o o o o o o o o o 1000e-03 o o 1 1 1 1 1 1 o o o o o o o o o o o o o o 5000e-04 o o 1 1 1 1 1 o o o o o o o o o o o o o o o 2500e-04 o o 1 1 1 1 o 1 o o o o o o o o o o o o o o 1250e-04 o o 1 1 1 1 o o o o o o o o o o o o o o o o 4337e-19 o o o o o o 1 o o o o o o o o o o o o o o 1 4337e-19 o o o o o o 1 o o o o o o o o o o o o o o o 4337e-19 o o o o o o o 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2168e-19 o o o o o o o 1 o o o o o o o o o o o o o o 2168e-19 o o o o o o o o 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1084e-19 o o o o o o o o 1 o o o o o o o o o o o o o 5421e-20 o o o o o o o o o 1 o o o o o o o o o o o o 2711e-20 o o o o o o o o o o 1 o o o o o o o o o o o 6617e-23 o o o o o o o o o o o o o o o o o o o 1 o 1 5294e-23 o o o o o o o o o o o o o o o o o o o 1 o o 3970e-23 o o o o o o o o o o o o o o o o o o o o 1 1 2647e-23 o o o o o o o o o o o o o o o o o o o o 1 o 1323e-23 o o o o o o o o o o o o o o o o o o o o o 1 0 (+0,-0) o o o o o o o o o o o o o o o o o o o o o o +Infinity o 1 1 1 1 1 1 1 o o o o o o o o o o o o o o -Infinity 1 1 1 1 1 1 1 1 o o o o o o o o o o o o o o NaN quiet x 1 1 1 1 1 1 1 O v v v v v v v v v v v v v NaN signal x 1 1 1 1 1 1 1 1 v v v v v v v v v v v v v (unusable) x 1 1 1 1 1 1 1 1 o o o o o o o o o o o o o max.value o 1 1 1 1 1 1 o 1 1 1 1 1 1 1 1 1 1 1 1 1 1
biased decimal S exponent m a n t i s s a value m |-------------| |-----------------------------| 1100e-03 o 1 o o o o o 1 o o o 1 1 o o 1 1 o o 1 1 o 2200e-03 o 1 o o o o 1 o o o o 1 1 o o 1 1 o o 1 1 o 3300e-03 o 1 o o o o 1 o 1 o 1 o o 1 1 o o 1 1 o o 1 4400e-03 o 1 o o o o 1 1 o o o 1 1 o o 1 1 o o 1 1 o 5500e-03 o 1 o o o o 1 1 o 1 1 o o o o o o o o o o o 6600e-03 o 1 o o o o 1 1 1 o 1 o o 1 1 o o 1 1 o o 1 6060e-02 o 1 o o o 1 1 o 1 1 1 o o 1 o o 1 1 o o 1 1 6006e-01 o 1 o o 1 o 1 o o o 1 o 1 1 o o o 1 o o 1 1 1000e-06 o o 1 1 o 1 1 1 o o o o o 1 1 o o o 1 o o 1 1110e-04 o o 1 1 1 1 o 1 1 1 o o o 1 1 o 1 o 1 o o 1 2220e-04 o o 1 1 1 1 1 o 1 1 o o o 1 1 o 1 o 1 o o 1 3000e-03 o 1 o o o o 1 o 1 o o o o o o o o o o o o o 8000e-03 o 1 o o o 1 o o o o o o o o o o o o o o o o 4000e-03 o 1 o o o o 1 1 o o o o o o o o o o o o o o 2000e-03 o 1 o o o o 1 o o o o o o o o o o o o o o o 1000e-03 o 1 o o o o o 1 o o o o o o o o o o o o o o 5000e-04 o 1 o o o o o o o o o o o o o o o o o o o o 2500e-04 o o 1 1 1 1 1 1 o o o o o o o o o o o o o o 1250e-04 o o 1 1 1 1 1 o o o o o o o o o o o o o o o 2168e-19 o o o o o o 1 1 o o o o o o o o o o o o o o 1084e-19 o o o o o o 1 o o o o o o o o o o o o o o o 5421e-20 o o o o o o o 1 o o o o o o o o o o o o o o 2711e-20 [unnormalized is not in this definition] 0 (clean) o o o o o o o o o o o o o o o o o o o o o o 0 (dirty) o o o o o o o o v v v v v v v v v v v v v v Infinity [not applied in this definition] Undefined 1 o o o o o o o x x x x x x x x x x x x x x max.value O 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The actually applied formats are in listing:
de- name abbrev. #bytes |---- #bits -----| |- used in -| finer total expon visman |- machine -| (n) (m) Education -- s2e3 0.75 6 3 2 ED Education -- s5e3 1.125 9 3 5 ED Education -- s3e4 1 8 4 3 ED PS2.0 -- s10e5 2 16 5 10 NV PS2.0 -- s16e7 3 24 7 16 AT IEEE single S 4 32 8 23 J I A 9 DX IEEE twin T 8 64 11 52 J I A 9 IEEE enhanced E 10 80 15 1+63 I IEEE extended X 16 128 15 112 9 DEC float F 4 32 8 23 P V A DEC double D 8 64 8 55 P V DEC grand(?) G 8 64 11 52 V A DEC hyper(?) H 16 128 15 112 V Zuse Zuse-1 Z1 2.75 22 7 14 Z3
de- abbr. for spec. maxexb |-- excess bias --| maximum finer values ordin. unnormal normalized value Educa s2e3 7 6 2 3 3 Educa s5e3 7 6 2 3 3 Educa s3e4 15 14 6 7 7 PS2.0 s10e5 31 30 14 15 15 PS2.0 s16e7 127 126 62 63 63 IEEE S 255 254 126 127 127 IEEE T 2047 2046 1022 1023 1023 IEEE E 32767 32766 16382 16383 16383 IEEE X 32767 32766 16382 16383 16383 DEC F 0 255 -- 128 127 DEC D 0 255 -- 128 127 DEC G 0 2047 -- 1024 1023 DEC H 0 32767 -- 16384 16383 Zuse Z1 63 62 -- sign+magn. 63
de- abbr. |------ binary range of abs.value ------| relat. finer min(unnormal) min(normalized) max(nearly) accur Educa s2e3 2^-(2+2) 2^-2 2^+4 2^-2 Educa s5e3 2^-(5+2) 2^-2 2^+4 2^-5 Educa s3e4 2^-(3+6) 2^-6 2^+8 2^-3 PS2.0 s10e5 2^-(10+14) 2^-14 2^+16 2^-10 PS2.0 s16e7 2^-(16+62) 2^-62 2^+64 2^-16 IEEE S 2^-(23+126) 2^-126 2^+128 2^-23 IEEE T 2^-(52+1022) 2^-1022 2^+1024 2^-52 IEEE E 2^-(63+16382) 2^-16382 2^+16384 2^-63 IEEE X 2^-(112+16382) 2^-16382 2^+16384 2^-112 DEC F -- 2^-128 2^+127 2^-23 DEC D -- 2^-128 2^+127 2^-55 DEC G -- 2^-1024 2^+1023 2^-52 DEC H -- 2^-16384 2^+16383 2^-112 Zuse Z1 -- 2^-63 2^+63 2^-14
de- abbr. |---- decimal range of abs.value -------| decimal finer min(unnormal) min(normalized) max(nearly) accur. Educa s2e3 0.0625 0.25 16 0.6 Educa s5e3 0.0078125 0.25 16 1.5 Educa s3e4 0.001953125 0.015625 256 0.9 PS2.0 s10e5 5.97E-8 6.11E-5 65536 3.0 PS2.0 s16e7 3.31E-24 2.17E-19 1.84E+19 4.8 IEEE S 1.41E-45 1.18E-38 3.40E+38 6.9 IEEE T 4.95E-324 2.23E-308 1.79E+308 15.6 IEEE E 3.65E-4951 3.37E-4932 1.18E+4932 18.9 IEEE X 6.48E-4966 3.37E-4932 1.18E+4932 33.7 DEC F -- 0.294E-38 1.70E+38 6.9 DEC D -- 0.294E-38 1.70E+38 16.5 DEC G -- 0.557E-308 0.898E+308 15.6 DEC H -- 0.841E-4932 0.594E+4932 33.7 Zuse Z1 -- 1.085E-19 0.922E+19 4.2
The value range of each non-zero mantissa is in decimal:
IEEE: 1.0 =< (1.visman) < 2.0 DEC: 0.5 =< (0.1 visman) < 1.0 ZUSE: 1.0 =< (1.visman) < 2.0
IEEE: Exponent bit-pattern = 111... sign and visman bit-pattern gives meaning: ---- ------------------ ------- - 00..00 negative infinity + 00..00 positive infinity -/+ 00..01 to 01..11 signaling NaN - 10..00 indeterminate + 10..00 quiet NaN (don't use it) -/+ 10..01 to 11..11 quiet NaN DEC: Exponent bit-pattern = 000... sign and visman bit-pattern gives meaning: ---- ------------------ ------- + 00..00 clean zero (value: 0.0) + 00..01 to 11..11 dirty zero (value: 0.0) - 00..00 to 11..11 Undefined value ZUSE: Exponent is in sign+magnitude notation. Exponent bit-pattern = 0000.. (= positive zero value) is the ordinary zero and makes the number to have a value from 1.0 until 2.0 Exponent bit-pattern = 1000.. (= negative zero value) gives numeric value 0.0 irrespective sign and visman bit-pattern. Exponent bit-pattern = 0111.. (= highest exponent value) sign and visman bit-pattern gives meaning: ---- ------------------ ------- - irrelevant negative infinity + irrelevant positive infinity
Educa s2e3: max = 14 Educa s5e3: max = 15.75 Educa s3e4: max = 240 PS2.0 s10e5: max = 65504
The s2e3 float can be drawn easily on paper with its 55 numeric values marked along the continuous value axis.
The value range of each non-zero mantissa is in decimal:
IEEE and Zuse: 1.0 =< (1.visman) < 2.0 DEC: 0.5 =< (0.1 visman) < 1.0
When the DEC-mantissa is defined in the way of IEEE and Zuse, then the exponent values of DEC must be redefined. They become:
DEC Exponent values for mantissa = 1.visman de- abbr. for spec. maxexb |-- excess bias --| maximum finer values ordin. unnormal normalized value DEC F 0 255 -- 129 126 DEC D 0 255 -- 129 126 DEC G 0 2047 -- 1025 1022 DEC H 0 32767 -- 16385 16382
All other figures in the table remain unchanged. The special values will not change either.
All signbits are in all definitions and formats:
0 = '+', 1 = '-'.
'Visman' = visible part of the mantissa.
'Maxexb ordin.' = maximum integer of the exponent bit
pattern used for the ordinary values.
The notation 'SmEn' means: Sign bit + n bits for exponent
+ m bits for mantissa
The PS2.0-s23e8 format equals the IEEE-S format.
Other name for s10e5 format is 'fp16'.
Other name for s16e7 format is 'fp24'.
Other name for eXtended (X) format is 'Quadruple (Q)'.
The E-format does not hide the first mantissa bit !
The IEEE rules for floats with hidden bit are obeyed by
all Educa and all PS2.0 formats.
This obedience also holds for the special values.
IEEE and Zuse use the maximum exponent for the special
values only.
Digital uses the zero exponent for both the ordinary zero
and its special values.
Historically the Zuse and the DEC-F and -D formats are the
oldest formats in this list.
The relative and decimal accuracies hold for the numbers in
normalized state, i.e. with value > min(normalized).
These accuracies are the so-called guaranteed accuracies,
see also elsewhere in this internet site.
The absolute accuracy of an unnormalized number equals the
unnormalized non-zero minimum.
The decimal accuracy values are rounded downwards.
de- abbr. type connotation in language finer Fortran C,C++,Java IEEE S real*4 float IEEE T real*8 double IEEE E real*10 long double IEEE X,Q real*16 long double
The word Real is the old word for Float.
The distinction between the E and X format in C and C++
depends on the type of the computer or on a presetting in
the program.
Various manuals and handbooks, e.g:
Digital:
PDP-11 processor handbook +
Fortran language reference manual
VAX Fortran language reference manual
Alpha Architecture handbook, version 4, EC-QD2KC-TE, oct.1998
PDP-11 formats
Hewlett-Packard:
HP-UX Fortran language manual, for HP-9000 series 300 or 500