IEEE-754-2008

First date of publication: 05 october 2006

First big modification: 13 october 2006

Addition of Chen-Ho code: 06 october 2009

- introduction
- argumentation: saving data-bus width
- the number in the expanded state
- densely packed decimal
- extra digit and exception values
- all listed together
- accuracy and comparison with binary
- links

The IEEE-754-2008 Packed Decimal Encoding (= PDE) is a way of encoding decimal numbers. Its preliminary version was named IEEE-754r. This method is intended to squeeze out (nearly) all unused space in the set of available bit-patterns. Ordinarily this waste occurs when decimal digits are stored.

The PDE-method relies heavily on a method for compressing decimals called Densely Packed Decimal (= DPD). But where in the PDE the DPD cannot be used, another encoding scheme is used, as we shall see. Thus virtually no available bit pattern is left unused. Nearly all represent a numeric value.

A binary computer is binarily oriented. It can handle well the numbers written in binary format, but less well the numbers written in decimal format. Especially the storage of the decimal digits is a problem. These digits do not fit well in the memory words. This leads to a waste of the storage space in the memory and of the data bus. For example, the ASCII-coding and EBCDIC-coding as used in databases and the older versions of the Cobol-language waste more than half of the space. The much more efficient BCD-coding leads to a waste of about only one-sixth of the space.

At first sight it does not make much sense to use an even better compressing code to save space in core memory or hard disk. Squeezing out the last unused bit pattern like the last drop of water out of a sponge seems ridiculous since such a code does not save more than one sixth of storage space, but generally is difficult to comprehend. In the present days core memory and disk space are very cheap. So making the memory or disk larger to compensate this waste does not cost much. Thirty years ago these devices were extremely expensive. Then the savings of one-sixth would have made sense. Now the risk of a programming error is much more costly.

Besides the fair efficiency the Binary Coded Decimal (= BCD) notation is also the most simple way of notating a decimal number. This notation requires a four bit unit for each digit to store the ten values 0 to 9. The BCD is quite concise and simple.

However, it still makes sense to use a better compressing code, not to save memory, but to save another parameter in the computer, the 'bus width'. This parameter is very important since it determines fiercely the computer's 'personality', more than the processing speed and the storage capacity do. This width of the data bus determines the number of bits that can be transported simultaneously between the core memory and the CPU during one memory-access cycle.

Remarkably, despite the hughe progression in computer power, processing speed and storage capacity during the last fourty years the bus width has not changed much. The modern computers have approximately the same width as the old ones had.

Right from the beginning the computers got the bus width they still have today, as the examples show. The world's first useful computer, the Zuze Z3, had 22 bits. The Philips EL-X8 had 27 bits. The Univac 1100 had 36 bits. The Burroughs 6700 had 48 bits (with parity and tag bits 52). The Control-Data 6600 had 60 bits. The world's first supercomputer, the Cray-1, had 64 data bits and 8 check bits.

Although the modern domestic PC and its descendants outperform these behomoths they still have roughly the same buswidth. They have 32 or 64 bits. The modern behomoth Cray-X1E has 64 too.

REMARK:

In early days also narrower computers were made. Konrad Zuse's
experimental Z2 was 16 bits wide, the mini-computer Digital
PDP-11 was 16 bits wide, the PDP-8 was 12 bits wide, and
Intel's microprocessors 8008 and 4004 were 8 bits and 4 bits
wide. At present such narrow computers are not made anymore as
a multi-purpose 'stand-alone' machine. Now they are made at
very low cost mainly to be embedded in a wide variety of
apparatuses for household, industry, transport and recreation.
Example: the single-chip computer 8051.

:END OF REMARK.

So the modern bus widths do not differ significantly from the old bus widths. Apparently the data bus cannot be expanded easily. Therefore its width remains costly, even in the present days.

The reason for this cost is that time is money, so processing time and memory-cycle time are relatively expensive. To save these times one must pour as much useful data as possible through the confined data-bus in as few memory cycles as possible. The delay by a mechanism for compressing the original data into only one memory word and later expanding it back can be much cheaper than storing this data into two words and later retrieving it from these two words, even when the second of these two words is not fully used.

For example: Let us assume a computer with a fairly small bus width of 11 bits. When not further compressed three decimal digits occupy 12 bits and so need two memory fetches for the transport to a CPU register. The first fetch gets eleven bits and the second fetch gets the single twelfth bit. When further compressed the three digits can occupy as few as 10 bits and so require only one memory fetch. Herein one bit is superfluous. When this is occupied, e.g. by a +/- sign, the number of possible values in this single fetch even doubles.

Without compression we need two fetches for one value out of a set of 10*10*10 = 1000 different values, with it we need one fetch for one value out of 2000 values. Without the compression only a two-digit value can be transported in one memory cycle. This is a value out of a set of 200 signed values. Two bits stay unused.

This example clarifies how a small profit in number of bits leads to a giant profit in number of bit patterns and decrease of memory accesses. Therefore it makes sense to compress the digits as tight as possible and to squeeze out all unused bit patterns, even when the new coding mechanism is prone to the costly programming errors. Bit patterns should not be wasted as unusables, even not a few percents of them. All bits in the computer words should be used fully. Numbers should occupy a bus-width as small as possible. If one does not watch this carefully the waste will soar rapidly and virtually out of control.

The four-bits unit wherein a BCD-digit is stored is called a 'nibble'. It is able to store six more values, 10 to 15. These values are never used. This means that a lot of 'value space' (or: 'bit-pattern space' = set of available bit-patterns) is wasted, 37.5 percent for this digit. And matters are even worse. In a series of digits the waste of value space cumulates for every digit. The following table lists this waste depending on the number of digits.

#digits #bits value-space #used_patterns percentage_lost 0 0 1 1 0 1 4 16 10 37.5 2 8 256 100 60.9 3 12 4096 1000 75.6 4 16 65536 10000 84.7 5 20 1048576 100000 90.5 6 24 16777216 1000000 94.0

For example: A sequence of twelve bits can contain 4096 different bit patterns. When three digits are stored in this sequence only 1000 of these patterns are used. The other 3096 patterns are discarded. This gives the hughe number of 75.6 percent! Conclusion: Avoid any waste of value space!

Therefore a compression to save the value space is recommended. For example, a sequence of ten bits can contain 1024 different bit patterns. This is slightly more than 1000, the minimum number required to store three digits. So a compression of the twelve into the ten bits is possible. Also clearly the small waste of bits, only 2 out of 12 bits = 16.7 percent, leads to a great waste of many values. As already stated this waste in values is very expensive. The compression reduces this waste from 75.6 percent to only 2.4 percent. For a series of 30 digits (= 10 groups of 3 digits) this loss accumulates to 26.8 percent. Without the compression the loss would approximate the 100 percent! Wasting a few bits leads to wasting many values.

One question remains: Is such a compression mechanism that squeezes three digits into ten bits not too complex, or does it not slow down processing too much? That nice mechanism exists. It is the Packed Decimal Encoding system. It is described in the four following sections:

- The number in the expanded state,
- The core of the compression: Densely Packed Decimal,
- Storage of additional digit and exceptional values,
- Listing of all properties.

The modern word for mantissa is 'coefficient'. 'Mantissa' is the space werein the coefficient is stored.

In the expanded state the numeric value is written as a
coefficient with an exponent. The exponent is written as a
binary integer number. Its base is ten (= 10). The coefficient
is written as a sequence of digits, each stored in a 4-bits
unit. The +/- sign of the whole number is written in a separate
bit. Thus the value is:

*Value = Sign * DigitsSequence * 10 ^ ExponentInteger*

Note that there are two zero values: +0 and -0.

The digits sequence is stored in a confined space of fixed length, the "mantissa". Shifting the coefficient over the mantissa to the left over one position means multiplying the coefficient value with ten. Shifting to the right means dividing by ten. This holds only when no non-zero digit gets lost by falling out of the mantissa by being pushed over the edge. Otherwise a rounding operation or even an error handler has to be invoked.

When the integer value of the exponent is adjusted appropriately and simultaneously with the shift the numeric value of the total number will remain unchanged. Example: +0583E-10 equals +5830E-11. Sometimes such shift annex adjustment is required to make the lay-out easier for a mathematical operation.

Sometimes an application requires that the coefficient is stored in the mantissa such that the first digit in that space is never zero (except when the whole value is zero). The sequence is stored in the space left-justified with its leftmost non-zero digit. The exponent value is adapted appropriately. After this shift to the utter left together with the adaptation the number is called normalized.

Normalization is the well-known way of storage applied ubiquitously in the binary arithmetic with the hidden bit, like in IEEE-754 or DEC-PDP11. But a hidden digit similar to that hidden bit is not possible, so not present. Therefore in general normalization is not obligatory for a decimal number. The first digit in the mantissa is allowed to be zero.

Actually the mantissa contains a virtual decimal point. One digit location stands before this point. The other locations stand after it and together they form the fractional part. When the coefficient is normalized, then 1 =< coefficient < 10. When not normalized then 0 =< coefficient < 10. The accuracy is the number of all digit positions in the mantissa.

The exponent is written with an excess bias. This means that the integer value written in the exponent area is always greater by a fixed constant than the actual exponent value it represents. By this way negative exponent notations are avoided although the exponent value can be negative. Thus the exponent has got rid of its signbit.

Primarily the bias is choosen such that the smallest normalized
(nonzero!) numeric value is the inverse of the greatest value
(of course with the same +/- sign). Thus:

*SmallestNormalizedValue = 1 / GreatestValue*

Therefore the bias value must be chosen half the size of the
range of the exponent integer.

Inversely stated the expression with the inversion becomes:

*GreatestValue = 1 / SmallestNormalizedValue*

When the exponent bias value is choosen in the right way,
actually an overflow will occur now. The reason is that the
coefficient of the GreatestValue is not 10.00000, but it is
9.99999, so just a very tiny little bit too small. To avoid
this effect the excess bias is decreased by one. Then the
inversion formula becomes:

*GreatestValue = 100 / SmallestNormalizedValue*

This formula also diminishes the chance of an overflow when
dividing by an unnormalized number.

Consequently the normalized numbers are not located symmetrically around the value 1. This non-symmetry occurs also in the binary floats of DEC-PDP11 and of IEEE-754. There the inversion formula gives the value 0.5 or 4.

Example: Suppose the range of the exponent integer is from 0 to 99. So the range size is 100. The bias value primarily would become 50. Then the minimum normalized value would be 1.00000E-50 and the maximum value would be 9.99999E+49 = approximately = 1.00000E+50. Actually the bias is lowered by one to 49. So the minimum normalized value becomes 1.00000E-49 and the maximum value becomes 9.99999E+50 which approximates 1.00000E+51.

The PDE-compression has one drawback for the exponent: the exponent cannot use its full value range. Its first two bits are not allowed to be 1 simultaneously. At least one of them must be 0. So the allowed combinations are: 00, 01 and 10. Thus the greatest exponent integer is 1011111... . The reason will be explained later.

The core of the packed-decimal encoding IEEE-754-2008 is the so-called Densely Packed Decimal (= DPD) compression. As shown above, the BCD-code results in a great loss of 'bit-pattern space'. The DPD compression is one of the methods to use this space for compressing a long string of digits. This compression method shows a brilliant ingenuity.

The compression mechanism must be very fast since it is applied for every decimal number in every calculation. Therefore the compression must be performed by simple and fast straightforward 'flow-through' logic, not by the time-consuming clock-paced arithmetic operators like addition, multiplication and table look-up. Also it must allow the original digits to be retrieved easily.

For both reasons the whole string of digits that represents the value should not be compressed as one whole monolithical entity and also not be translated into a binary number. Therefore it is subdivided into small groups of consecutive digits. These groups are compressed separately.

For the DPD-compression the size of such a group is chosen three digits. So it compresses a group of three digits (= 12 bits in BCD) into 2.5 nibbles (= 10 bits) without any loss of information.

As shown previously the small profit in number of bits leads to a giant profit in number of bit patterns. A sequence of ten bits can contain 1024 different bit patterns. A sequence of twelve bits can contain 4096 different bit patterns. When three digits are stored in this sequence only 1000 of these patterns are used. This is even slightly less than the 1024 possible patterns in the ten-bits sequence. So a compression of the twelve into the ten bits is feasible.

The encoding system relies heavily on the structure of the bit patterns ubiquitously used to store a decimal digit into four bits. This is the so-called Binary Coded Decimal (= BCD) code. The table of storage is:

BCD-storage: Digit Bit-pattern 0 0 0 0 0 1 0 0 0 1 2 0 0 1 0 3 0 0 1 1 4 0 1 0 0 5 0 1 0 1 6 0 1 1 0 7 0 1 1 1 8 1 0 0 0 9 1 0 0 1

Let us look more closely to the ten bit patterns. At first the patterns can be split up in a group of the first three bits and the single last bit. After every group of three the last bit is 0 or 1. So this bit cannot be compressed. Therefore it must be copied directly and unchanged into the compressed code. Thus three bits of the total compressed code are dedicated to store the last bit of the three digits.

When the last bit is removed from every digit new 3-bits mini-digits remain:

mini-digit storage: Digit Bit-pattern 0 0 0 0 1 0 0 1 2 0 1 0 3 0 1 1 4 1 0 0

The three mini-digits require 3*3 = 9 bits to be stored without compression. With compression they only need 7 bits. The bits of these three digits are named a, b, c, e, f, g, i, j and k. The resulting bits are named v, w, x, s, t, p and q. These two naming sequences do not seem logical, but later on they will be.

The bit-pattern scheme of the mini-digit shows there are four of them with the first bit zero: 0 until 3. These are called the small digits. There is one large digit, 4, starting with bit one. This division into four small digits and one large digit is the basis of the DPD-encoding scheme. The first bit of each digit will not be copied directly. Only the middle and last bit of the small digits are copied directly. Those of the large digit also are not.

One out of the receiving bits, v, always becomes a bit with a value 'invented' by the encoding system. The others can become it occasionally. The asterix in the encoding scheme below indicates they have become. The 'non-invented' bits are copied straight from the original digits.

v w x s t p q -- All three digits are small -- * All digits are small 0 j k f g b c -- One of the digits is large -- 1 * * Right digit is large 1 0 0 f g b c Middle digit is large 1 0 1 j k b c Left digit is large 1 1 0 f g j k -- Two of the digits are large -- 1 1 1 * * Right digit is small 1 1 1 0 0 j k Middle digit is small 1 1 1 0 1 f g Left digit is small 1 1 1 1 0 b c -- All three digits are large -- 1 1 1 1 1 * * All digits are large 1 1 1 1 1 0 0 [Unused encoded sequence] 1 1 1 1 1 0 1 [Unused encoded sequence] 1 1 1 1 1 1 0 [Unused encoded sequence] 1 1 1 1 1 1 1

The numbers of the different groups of resulting sequences are:

when all three digits are small: 4 * 4 * 4 = 64 sequences when one of the digits is large: 3 * 4 * 4 = 48 sequences when two of the digits are large: 3 * 4 = 12 sequences when all three digits are large: 1 = 1 sequence ------- + total gained by encoding 125 sequences unavailable by encoding 3 sequences ------- + total possible bit-patterns p,q,...,w 128 sequences

Total number of different original groups of three mini-digits
is:

5 * 5 * 5 = 125 series

The basic idea of the DPD mechanism was given by Tien Chi Chen and Irving T. Ho in 1975. Their compression mechanism is named Chen-Ho code. Its coding table for the mini digits looks like:

+++++++++++++++++++++++++++++++ + C H E N - H O C O D E + p q r t u w x +++++++++++++++++++++++++++++++ -- All three digits are small -- * All digits are small 0 b c f g j k -- One of the digits is large -- 1 * * Left digit is large 1 0 0 f g j k Middle digit is large 1 0 1 b c j k Right digit is large 1 1 0 f g b c -- Two of the digits are large -- 1 1 1 * * Left digit is small 1 1 1 0 0 b c Middle digit is small 1 1 1 0 1 f g Right digit is small 1 1 1 1 0 j k -- All three digits are large -- 1 1 1 1 1 * * All digits are large 1 1 1 1 1 0 0 [Unused encoded sequence] 1 1 1 1 1 0 1 [Unused encoded sequence] 1 1 1 1 1 1 0 [Unused encoded sequence] 1 1 1 1 1 1 1

As one sees, this table looks quite similar to that of the DPD encoder. In fact, both the Chen-Ho encoder and the DPD encoder operate according to the same principles. They differ only in the appearance of their results. So the differences between the two tables of the results are cosmetic only. Transforming the Chen-Ho encoder into the DPD encoder is a cosmetic act only. In spite of this fact the result of the DPD encoder appears to be more friendly and versatile in use.

The cause is that in the Chen-Ho code the two bits j and k of the right digit are copied always into the same bits in the result, viz. into w and x. Never they are moved to other bits, like the bits f and g are. In the DPD-code the bits b and c of the left digit appear to 'stay at home' in the bits p and q. Right this different choice for the home-staying bits makes the DPD encoding more versatile than the Chen-Ho encoding.

Due to this choice in the Chen-Ho table the bits w and x are always 0 when the right digit is zero. In the DPD table the bits p and q are always zero when the left digit is zero. Since actually these two bits are the leftmost ones in the encoded result (as we will see below), they thus become leading zeroes.

Often in a computer leading zeroes can be omitted. When the left mini-digit is zero, then the bits p and q are always zero. If one knows on beforehand that the left digit is zero always, then the two bits p and q can be omitted and the DPD result fits in five bits. Thus the result seems to be made of two digits only.

Since in the similar situation the Chen-Ho system creates trailing zeroes it has been abandoned in favor of the DPD system. The DPD result becomes even nicer when the whole DPD encoding table is hustled, as is shown below.

In 2002 Michael F. Cowlishaw transformed the Chen-Ho encoder into the DPD encoder and also hustled the bits in the result, in order to make that result very user friendly.

He hustled the DPD encoding scheme such that the order of all resulting bits is changed. Actually, the order is reversed more or less. The bits are bundled into three groups of two, and the order of these groups is reversed. However, the leftmost single bit is not moved to the right edge of the table, but like a maverick to somewhere in the midst.

The reason for the transformation and this hustling is that when the left digit is zero or when the left and middle digits both are zero, then the resulting non-zero part of the code is right-aligned in the seven-bits field. Otherwise stated: when a leading digit is zero two leading bits in the resulting DPD code become zero too. (Note that the reverse is not true!)

Consequently a sequence of two mini-digits can be encoded into a five-bit field and one single mini-digit goes into a three-bit field. And the encoded field can be expanded at its left side simply by padding with zeroes, without changing the value it represents.

Moreover, the bit pattern of the encoded sequence equals the pattern of the original sequence (except two leading zeros) when that original sequence represents a small value. Then the original sequence of bits seems to be copied directly into the result. This happens when the left mini-digit is zero and the middle mini-digit is not large, i.e. the sequence of mini-digits is 034 or less. Thus a single mini-digit also seems to be copied directly. So the conversion of the ubiquitously used small numbers is very easy.

The upper part of the conversion table becomes:

p q s t v w x -- All three digits are small -- * All digits are small b c f g 0 j k when left digit is zero, then: 0 0 f g 0 j k when left & middle is 0, then 0 0 0 0 0 j k -- One of the digits is large -- 1 * * Right digit is large b c f g 1 0 0 when left digit is zero, then: 0 0 f g 1 0 0 when left & middle is 0, then 0 0 0 0 1 0 0 Examples: bit-pattern mini-digits original encoded 0 2 3 000 010 011 00 10 0 11 0 1 0 000 001 000 00 01 0 00 0 3 4 000 011 100 00 11 1 00 0 0 3 000 000 011 00 00 0 11 0 0 4 000 000 100 00 00 1 00

The trailing bits of the original three decimal digits are now added to the encoding scheme. Therefore the twelve bits of these digits are named a, b, c, d, e, f, g, h, i, j, k and m. The ten bits resulting from the encoding are named p, q, r, s, t, u, v, w, x and y. The counted numbers for the different groups of original digits and the resulting encoded sequences have to be multiplied by 8. For example, there are 8 * 125 = 1000 different series of 3 digits. The number of unused encoded sequences is 8 * 3 = 24.

The last bits of the decimal digits are simply encoded thus:

(The Chen-Ho encoder copies these three bits in a similar way: s=d, v=h, y=m.)r = d, u = h, y = m

When they are added then the final coding scheme becomes:

when: -------- t h e n : --------- a e i p q r s t u v w x y -- All digits small -- * All digits small 0 0 0 b c d f g h 0 j k m -- One digit large --- 1 * * Right digit large 0 0 1 b c d f g h 1 0 0 m Middle digit large 0 1 0 b c d j k h 1 0 1 m Left digit large 1 0 0 j k d f g h 1 1 0 m -- Two digits large -- * * 1 1 1 Right digit small 1 1 0 j k d 0 0 h 1 1 1 m Middle digit small 1 0 1 f g d 0 1 h 1 1 1 m Left digit small 0 1 1 b c d 1 0 h 1 1 1 m -- All digits large -- * * 1 1 1 1 1 All digits large 1 1 1 0 0 d 1 1 h 1 1 1 m [Unused encode] - 0 1 d 1 1 h 1 1 1 m [Unused encode] - 1 0 d 1 1 h 1 1 1 m [Unused encode] - 1 1 d 1 1 h 1 1 1 m

This final scheme explains why previously the naming sequences were chosen seemingly illogically.

Encoding a single digit results in the original BCD bit-pattern of the digit padded at its left with zeros.

For all three-digit values from 000 to 079 the encoded bit pattern equals the original bit pattern except two leading zeroes.

The values 0 upto 199 result in a code wherein the two leftmost bits p and q are zero. When these two bits are omitted the result fits in an 8-bits byte.

Thus the result of the values 0 upto 99 even fits in a series of 7 bits. So encoding two BCD digits saves one bit.

The bits 'invented' during the encoding process act as controller bits. They steer the decoding process and the 'invention' of some bits for the digits. The other bits are copied directly to the digits. The following decoding scheme (wherein '.' means don't-care) leads to the expansion into the original digits:

w h e n : -------- t h e n : -------- s t v w x a b c d e f g h i j k m -- All digits small -- All digits small . . 0 . . 0 p q r 0 s t u 0 w x y -- One digit large --- Right digit large . . 1 0 0 0 p q r 0 s t u 1 0 0 y Middle digit large . . 1 0 1 0 p q r 1 0 0 u 0 s t y Left digit large . . 1 1 0 1 0 0 r 0 s t u 0 p q y -- Two digits large -- Right digit small 0 0 1 1 1 1 0 0 r 1 0 0 u 0 p q y Middle digit small 0 1 1 1 1 1 0 0 r 0 p q u 1 0 0 y Left digit small 1 0 1 1 1 0 p q r 1 0 0 u 1 0 0 y -- All digits large -- All digits large 1 1 1 1 1 1 0 0 r 1 0 0 u 1 0 0 y

Both the encoding and the decoding can be performed by a fairly simple circuitry of electronic hardware gates. Since it is a straightforward 'flow-through' electro-logic, its speed is very high. The encoding is:

p = (a*f*i) + (a*j) + b q = (a*g*i) + (a*k) + c r = d s = (-a*e*j) + (f*-i) + (-a*f) + (e*i) t = (-a*e*k) + (a*i) + g u = h v = a + e + i w = (-e*j) + (e*i) + a x = (-a*k) + (a*i) + e y = m

The decoding is:

a = (-s*v*w) + (t*v*w*x) + (v*w*-x) b = (p*s*x) + (p*-w) + (p*-v) c = (q*s*x) + (q*-w) + (q*-v) d = r e = (t*v*-w*x) + (s*v*w*x) + (-t*v*x) f = (p*t*v*w*x) + (s*-x) + (s*-v) g = (q*t*w) + (t*-x) + (t*-v) h = u i = (t*v*w*x) + (s*v*w*x) + (v*-w*-x) j = (p*-s*-t*w) + (s*v*-w*x) + (p*w*-x) + (-v*w) k = (q*-s*-t*v*w) + (q*v*w*-x) + (t*v*-w*x) + (-v*x) m = y

Herein is: * = AND, + = OR, - = NOT.

(The logic circuitry for the Chen-Ho en/decoder looks fairly similar. It has the same complexity, size, speed and cost.)

The definers wanted the number of digits in the coefficient to be approximately the same as the accuracy in the binary IEEE-754 format. So they wanted 7 digits for the 32-bits word, 16 digits for the 64-bits word and 34 digits for the 128-bits word. In all three cases this is one digit more than a number dividable by 3. Such digit would be written in BCD code that spoils six out of the sixteen available bit patterns. To save these patterns this digit is intermingled with two bits of the binarily written exponent into a 'Combination Field' of five digits. Also the non-numerical values Infinity and Not-a-Number (NaN) are squeezed into this field.

Thus the combination field has got a fairly artificial encoding scheme (wherein '.' means don't-care):

Type of value of combination exponent coefficient's the total number field bits extra digit Finite a b c d e a b 0 c d e Finite 1 1 a b e a b 1 0 0 e Infinite 1 1 1 1 0 . . . . . . NaN 1 1 1 1 1 . . . . . .

Note the displacement of the exponent bits a and b in the combination field when the coefficient digit becomes large. When the numeric value is finite these two exponent bits are never allowed to be 1 simultaneously. At least one of them must be zero. So the requirement is: a * b = 0, otherwise the value is not finite. The allowed combinations are 00, 01 and 10, not 11. Therefore the two bits can represent only three bit patterns, not four. Consequently the total number of bit patterns required for this encoding scheme is 3 * 10 + 2 = 32.

In case of the 'value' Infinity the +/- sign gives the sign of
the infinity.

In case of the 'value' NaN the bit immediately following the
combination field (i.e. ET-MSB = the most significant bit of
the so-called exponent tail, see later) says the type of the
NaN: 0 = quiet, 1 = signaling. (Note: for this the +/- sign is
not used.)

This leads to the following table that gives the meaning of the first seven bits of the whole number:

--- 7-bits pattern --- -- decoded combi.fld -- Type of +/- combination ET- exponent coefficient's value sign field MSB bits extra digit Finite . a b c d e . a b 0 c d e Finite . 1 1 a b e . a b 1 0 0 e + Infinite 0 1 1 1 1 0 . . . . . . . - Infinite 1 1 1 1 1 0 . . . . . . . NaN quiet . 1 1 1 1 1 0 . . . . . . NaN signal . 1 1 1 1 1 1 . . . . . .

Alltogether only the first byte of the number has to be inspected to determine whether the number is finite or is a special value. And if the latter, what special value it is.

**WARNING:**

Although the ordinary values are what you mostly see when your
program is working with real data, proper handling of the rest
of the values (NANs, Infinities) is vitally important.
Otherwise you'll get all sorts of horrible results that are
difficult to understand and usually impossible to fix.

For the two exponent bits the two most significant bits of the exponent are chosen. Numerically they stand immediately at the left side of the ET-MSB. Similarly the extra digit is appointed to be the most significant digit of the coefficient. Thus it is the digit that stands before the decimal point. So the combination field has become the most significant part of the whole numeric representation.

The remainder of the exponent is put in the exponent tail (= ET) or 'Exponent Continuation Field'. Together with the two bits in the Combination field it forms the integer-valued exponent which is written in binary. Consequently the bit pattern of the greatest exponent value is 1011111... .

The ET-MSB is the leftmost bit in the exponent tail. When the bit-pattern of the total combination field indicates NaN this bit tells whether the NaN is quiet (0) or signaling (1).

The remainder of the coefficient is written in the coefficient tail or 'Coefficient Continuation Field'. It is the fractional part of the coefficient. This tail is encoded fully in Densely Packed Decimal. It consists of groups of three digits which are compressed into groups of ten bits, as described previously.

When the digit in the combination field is zero the whole number is said to be 'unnormalized' or 'denormalized'. When this digit is not zero, i.e. from 1 up to 9, then the whole number is said to be 'normalized'.

The value range of the coefficient is 0 =< coeff =< 9.9999... ,
thus 0 =< coeff < 10.0

The coefficient is normalized when the digit in the combination
field is not zero. Then 1.0 =< coeff =< 9.9999... ,
thus 1.0 =< coeff < 10.0

The coefficient is unnormalized when the digit in this combi-
field equals zero. Then 0.0 =< coeff =< 0.9999... ,
thus 0.0 =< coeff < 1.0

NOTE: Normalization is not obligatory!! It has the advantage of maximum relative accuracy, but it has the disadvantage of slower processing.

The exponent is stored as a binary integer number. Since its first two bits are not allowed to be 1 simultaneously its maximum value has the bit pattern 1011111... . Consequently the greatest allowed integer value of an exponent of n bits is (2^n-1)-2^(n-2) = 3 * 2^(n-2) - 1.

Of course its smallest integer value is 00000... . So the number of different exponent values is 3*2^(n-2). For the normalized values to be in symmetry around 1 the excess bias should be 3*2^(n-3). Actually the normalized values are chosen to be symmetrical around 10, and so the excess bias has become 3*2^(n-3)-1.

Example: For an exponent of 8 bits the greatest integer value is 3*2^(8-2)-1 = 191, and the actual bias becomes 3*2^(8-3)-1 = 95. The resulting exponent-value ranges from -95 to +96.

The leftmost seven bits of the number indicate whether that number represents a special value or an ordinary numeric value. When representing the latter the seven bits are part of it. Then they also show whether that number is normalized or not. The bit patterns for the special and unnormalized numbers are:

NaN signaling . 1 1 1 1 1 1 NaN quiet . 1 1 1 1 1 0 - Infinite 1 1 1 1 1 0 . + Infinite 0 1 1 1 1 0 . Unnormalized Finite * 1 0 0 0 0 * Unnormalized Finite * 0 1 0 0 0 * Unnormalized Finite * 0 0 0 0 0 *

Herein:

. = don't care, not important for the value

* = don't care, although important for the total value

All other bit patterns represent a normalized ordinary value.

The total bit pattern of the whole number written in the
IEEE-754-2008 Packed Decimal Encoding system is subdivided into
four fields, from left to right:

- +/- sign of the whole number (1 bit: 0 ='+', 1 ='-')
- Combination field (5 bits)
- Exponent tail (arbitrary length)
- Coefficient tail (groups of 10 bits)

The drawing of the structure is:

+---+-------------------+-------------+--------------------+ |+/-| combination field | exponent | coefficient tail | |si | for 2 exp.bits, 1 | tail in | groups of 3 digits | | gn| digit and specials| binary | Dense.Pack.Decimal | +---+-------------------+-------------+--------------------+ 1 bit 5 bits ^ groups of 10 bits ^ 7-th bit = ET-MSB, also for NaN quiet/signal

These PDE-numbers are available in three types, each having its own number of bits, viz. 32, 64 and 128 bits. In all three types the combination field and the two bits at its sides (sign and ET-MSB) are the same. Only the two tails differ in length. The properties of all three types are listed here.

Type of number ---> Short Medium Long Small Middle Large --- Field structure in bits --- Length of total number field 32 64 128 Length of sign ( 0 ='+', 1 ='-') 1 1 1 Length of combination field 5 5 5 Length of exponent tail 6 8 12 Length of coefficient tail 20 50 110 --- Exponent (fully binary) --- Total number of bits 8 10 14 Maximum Integer value 191 767 12287 Minimum Integer value 0 0 0 Excess bias offset 95 383 6143 Maximum Actual value 96 384 6144 Minimum Actual value -95 -383 -6143 --- Coefficient (in decimal) --- Number of 10-bit DPD-groups 2 5 11 Number of digits in tail 6 15 33 Number of digits before decimal point 1 1 1 Total number of digits 7 16 34 Accuracy normal.guar. in dec.digits 6 15 33 --- Numeric values (absolute)--- Maximum value, approximately 1.E+97 1.E+385 1.E+6145 Minimum value, normalized 1.E-95 1.E-383 1.E-6143 Minimum nonzero un-normalized 1.E-101 1.E-398 1.E-6176

The abbreviations for the three systems worldwide ubiquitously used for encoding the decimal digits and numbers are:

BCD = Binary Coded Decimal = simplest way to store a digit

DPD = Densely Packed Decimal = packing 3 digits into 10 bits

PDE = Packed Decimal Encoding = IEEE-754-2008 definition for
storing a numeric value in floating-point notation

Comparison with the IEEE-754 definition for binary floats shows that the value range of each of the three types of decimal floats is larger than the the range of the binary float with the same length. Also the maximum value is greater. The reason is the greater exponent range. Consequently the conversion of a binary float to a decimal float with the same length will never result in an overflow. But the reverse conversion may do.

The number of digits in the decimal notation seems to equal approximately that achieved by the binary notation. So the decimal notation would be better in spite of the loss of some bit patterns that are not used by the DPD-encoding? In fact it is not. Adding one to the last digit of the coefficient has relatively more impact to the value of the coefficient when the coefficient's first digit is 1 than when it is 9. This impact is even at its greatest when the other digits are 0.

Therefore the accuracy in the normalized state is lower when the coefficient value is 1.000... than when it is 9.999... . So the guaranteed accuracy ('accuracy normal.guar.' in the table above) is one digit lower than the number of digits in the coefficient. So it equals the number of digits in the tail. Thus it is 6, 15 and 33 for the three types of decimal floats. This is lower than the guaranteed accuracy of the binary floats, which is 6.9, 15.6 and 33.7. The difference is more than half a digit.

This lower accuracy enables the use of a greater exponent base, 10 in stead of 2. Thus the value the exponent stands for becomes greater. Here the trade-off between the accuracy and the exponent range is bended slightly in favor of the bigger exponent-range at the expense of the coefficient's accuracy. In terms of accuracy the decimal float is not as good as the binary float. In terms of value-range it is better.

- IBM laboratories: General decimal arithmetic
- Specification for decimal arithmetic
- Packing the decimals densely
- A summary of Chen-Ho decimal data encoding
- Set-up of the total number (in HTML)
- Set-up of the total number (in PDF)
- Decimal calculation: history, arguments and ideas
- Argumentation for calculating decimally (slides)
- DPD and old ideas about exponent (slides)
- Bibliography of material on decimal arithmetic
- Storage-Efficient Representation of Decimal Data, by Tien Chi Chen & Irving T. Ho, in CACM (18)1, pp.49-52, January 1975.
- Densely Packed Decimal Encoding, by Michael F. Cowlishaw, in IEE Proceedings - Computers and Digital Techniques, Vol. 149 #3, ISSN 1350-2387, pp.102-104, IEE, London, May 2002.