PROPOSALS
FOR FIXED-LENGTH DECIMALS
AND FOR TYPE NAMES

by:   J.H.M. Bonten


First date of publication: 21 december 2006
General modification: 18 july 2007
Corrections in third chapter: 18 september 2007
Renaming 754r into 754-2008: 23 september 2008
Addition of manufacturer length names: 10 february 2009

Contents

The three chapters in this document are fully independent. The two chapters 'unsigned PDE-format' and 'fixed-point numbers' are proposals for types of decimal numbers additional to the three types in the Packed Decimal Encoding IEEE-754-2008. The third chapter proposes a redefinition of the names of numeric data types for a better inter-language communication.

Back to index of numeric formats



UNSIGNED PDE-FORMAT

The fourth mode of the hypothetical multi-decimal computer leads to a proposal for a second extra small PDE-format. It is unsigned and can be used in video-graphics cards. Also it can serve for students education.

Video tones (colors and luminances) are always positive. The negative values have the same result as the zero value: black. So the value that expresses the intensity of a tone does not need to be negative. Always it can be zero or positive. This enables the deletion of the sign bit in favor of a bit in the exponent tail in the extra small PDE-format.

The data-word structures of both extra small PDE-formats are:

Fourth mode of hypothetical decimal computer:

   +---+-------------------+-----------------------------+
   |+/-| combination field |         10 bits for         |
   |si | for exponent, one |         3 digits in         |
   | gn| digit and specials|    Densely Packed Decimal   |
   +---+-------------------+-----------------------------+
    1 b.      5 bits                 10 bits

Unsigned extra small:

   +-------------------+---+-----------------------------+
   | combination field |exp|         10 bits for         |
   | for 2 exp.bits, 1 |ta |         3 digits in         |
   | digit and specials| il|    Densely Packed Decimal   |
   +-------------------+---+-----------------------------+
          5 bits        1 b.         10 bits

In the latter format the single bit in the exponent tail also serves for indicating whether a NaN is quiet or signaling. In the hypothetical decimal computer this is done by the first bit of the DPD-group.

In listing both formats will be:

Type of number --->               Fourth     Un-
                                    mode     signed

--- Field structure in bits ---
Length of total number field          16         16
Length of sign                         1          0
Length of combination field            5          5
Length of exponent tail                0          1
Length of coefficient tail            10         10

--- Exponent (fully binary) ---
Total number of bits                   2          3
Maximum Integer value                  2          5
Minimum Integer value                  0          0
Excess bias offset                     1          2
Maximum Actual value                   1          3
Minimum Actual value                  -1         -2

--- Coefficient (in decimal) ---
Number of 10-bit DPD-groups            1          1
Number of digits in tail               3          3
Number of digits before decimal point  1          1
Total number of digits                 4          4
Accuracy normal.guar. in dec.digits    3          3

--- Numeric values (absolute)---
Maximum value, approximately         100      10000
Minimum value, normalized            0.1       0.01
Minimum nonzero un-normalized     0.0001    0.00001

Back to contents



FIXED-POINT NUMBERS

Introduction

Not always floating-point numbers are desired as the result of a numeric calculation. Often fixed-point numbers are desired, especially in monetary applications like invoices. Therefore the programming language Cobol often works with numbers in fixed-point notation. First this way of notation for humans is described, then how it can be stored economically in a modern decimal computer.

Human notation and precision

The human notation of a fixed-point number is a sequence of digits, a period and a second sequence of digits. Generally an exponent is not added to this notation. Example: 3456.789.

The most important property of this human notation is the length of the series of digits at the right of the decimal period. This length is fixed for every individual fixed-point number. It will not change during the run of the program.

This length gives an indication of the tiniest value that must be added or subtracted to change the number. In fact it determines the difference between two consecutive values of the number. Therefore let us call this length the 'precision'. In the example it is three digits, so the precision is 3.

The difference itself is called the 'absolute accuracy' of the number. In the example 345.67 this accuracy is 0.01, and the precision is 2.  The better (higher) the precision the smaller the absolute accuracy is. In the number 3.45678 the precision is 5 and the absolute accuracy is 0.00001.  In the integer number 345 the precision is 0 and the absolute accuracy is 1.

The precision can be negative in which case the floating-point like exponent-notation has to be used to discard the trailing zeros. In the number 340000 the precision is 0, whilst in the equally valued numbers 34E4 and 3.4E5 and 0.34E6 the precision is negative: -4.

The concept of the 'relative accuracy' does not make much sense for the fixed-point numbers. That is left to the realm of the floating-point numbers where it is very important. It indicates the total number of digits used in the coefficient of the number. The precision only counts the number of digits at the right of the decimal period.

This precision is very important in banking applications. All amount-figures in a chain-addition must be aligned at the right side of the column (e.g. on an invoice), and then the periods of all figures must stand in a straight vertical line. So all figures must have the same precision.

Very often this precision is 2 (=> the absolute accuracy is 1 cent). But the relative accuracy is much less important. An amount of ten dollars (= 4 digits) can stand on an invoice form together with an amount of one million dollar (= 12 digits).

Fixed-point decimals into the computer

In a computer the floating-point numbers can be used to store the fixed-point numbers, although only when they do not have an obligation for 'normalization'. Therefore the binary numbers of DEC and IEEE-754 are not apt for storing fixed-point numbers. Their hidden-bit system requires normalization always.

In the fixed-point mode of operation the exponent of the floating-point number is kept constant. Only the coefficient and the sign are changed. Consequently another word for "floating-point number" is "normalizable number". And another word for "fixed-point number" is "un-normalizable number".

The permanently fixed exponent may give a theoretical problem: a floating-point number with such an exponent is a contradictio in terminis. The discussion below shows that the fixed exponent still is a practical application.

Besides working with the fixed-point notation humans also like to work with decimal numbers. When they enter a value into the computer and later retrieve it they assume that it has not been changed by the computer, even not in the last digit. It should change never, even not in one out of a billion cases. This is possible by two ways:
- use a binarily operating computer with a very high accuracy,
- use a decimally operating computer.

In every binary computer a rounding occurs when a non-integral decimal value is transformed into a binary value. When this binary value is transformed back into a decimal value a second rounding occurs. When the computer is very accurate (= uses very many bits to store its binary numbers) then the second rounding is always in the direction opposite to the first rounding, and so the original decimal number always returns.

But in a very long series of calculations, e.g. the addition of a thousand or a million numbers, the first rounding might become visible. When the sum is re-transformed into a decimal value the second rounding may go into the wrong direction and the last digit becomes too low or too high by one.

A decimal computer does not have this problem. It stores the values and operates on them in a decimal way. So the humans can better retrace the way of its operations and thus get more confidence in it. Therefore our discussion is confined to this type of computers.

Storage apt for Cobol

The Picture definition of Cobol describes the structure of the storage for a decimal fixed-point number. It tells the total length in digit positions assigned to a number and the length of the part at the right side of the decimal period (= the fractional part). Example:
     PIC 9999V999 means: total length = 7, length of fraction = 3

In fact the picture equals the maximum value that can be stored in the number. In our example 9999.999.  Therefore in the following examples the letter V (= virtual decimal period) will be replaced by an ordinary decimal point. The picture also tells the precision of the number: the length of the fraction.

The picture definition can fairly easily be translated into a floating-point number with fixed exponent. First the importance of the precision is stressed by right-justifying the series of digit positions (without the decimal period) in the coefficient of the floating-point number. Then the exponent is adjusted appropriately to keep the stored values right. Hereafter it is kept unchanged during all future arithmetic operations, like it is frozen.

All digit positions thus assigned by the picture definition to the fixed-point number (without the virtual or real decimal period) must fit in the coefficient of the decimal floating-point storage. If there are less positions than can fit in the coefficient the number is padded at left with zeroes (which can be overwritten by the future operations). If there are more then an overflow error occurs. Removing positions in the fraction (and thus rounding-off) to make the number fit is not applied during the definition of the number.

Example: Let us assume a computer wherein the coefficient of the floating-point numbers contains six decimal digits. A virtual decimal point is between the first and second digit. Then the resulting storage of some numbers will become:

    picture       stored as
    -------       ---------
    999.999       9.99999E2
    999.99        0.99999E3
    999.9         0.09999E4
    999. = 999    0.00999E5
    99. = 99      0.00099E5
    9. = 9        0.00009E5
    9.9           0.00099E4
    99.9          0.00999E4
    999.9         0.09999E4
    9999.9        0.99999E4
    99.9999       9.99999E1
    999.9999       overflow

During the calculations the leading zeroes in the numeric storage must not stay zero obligatorily. Every digit can be stored on their place. So a number with the picture 9999.99 can be stored on the place originally intended for a number with the picture 9.99.  However, a number with the picture 9.999 or the picture 9.9 cannot be stored on that place since the fraction has a different length.

So the padding with zeroes makes that the original length of the number gets lost or must be stored somewhere outside the number. But the much more important length of the fraction is saved by the exponent.

Note the importance of trailing zeroes in an actual value. They cannot be discarded by shifting the coefficient to the right, since the frozen exponent cannot be changed.

    picture      value      stored as
    -------      -----      ---------
      9           3         0.00003E5
      9.9         3.0       0.00030E4
      9.99        3.00      0.00300E3
     99.9        34.5       0.00345E4
     99.99       34.50      0.03450E3
    999.99      345.67      0.34567E3
    999.999      34.5678      error
   9999.99    34567.80       overflow

Note that in these examples the stored values 0.00003E5, 0.00030E4 and 0.00300E3 represent the same value, but they represent different precisions, viz. 0, 1 and 2.  The same holds for the values 0.00345E4 and 0.03450E3 with the precisions 1 and 2.   A special assignment operator which will be described later will retain the precision of the receiving number.

Equalities, fixing floats

The computer must have two different relational operators for testing equality. One operator tests the bitwise equality and the other tests the arithmetic equality. For example:
  0.00999E4 = 0.09990E3         => false at test on bit-equality         => true at test on numeric equality

The old Burroughs 6700 had similar two tests. In its language Extended-Algol the test on bit-equality was written as "IS" and the test on numerical equality was written as "=" or "EQL".   Thus "A IS B" and "A=B" or "A EQL B".   Of course bit-equality implies numerical equality always. The reverse is not true.

In some modern computers when the exponent in a floating-point number is at its lowest value and should go below this value the variable goes into the fixed-point mode. It stops trying to shift the coefficient to the left to save the right digits of the value. Otherwise the exponent would fall below its minimum. The coefficient has become un-normalizable and so its relative accuracy may lower. As soon as the leftmost non-zero digit of the coefficient might go too far to the left, the number returns into the floating-point mode. (A similar thing happens in the binary floating-point system IEEE-754 which is used by nearly all modern computers.)

Saving the exponent

Note that the fixed-point notation has a vile snag. It works fine only when an ordinary value is stored in the receiving number. When an overflow or error value arises this value must be stored perhaps in the same way as in the floating-point notation. Then in many computers the exponent value is damaged and thus lost. The exponent needs to be re-initialized before another value is stored in the number.

This re-initialization can be performed either by the software or by the hardware. The latter case is not always possible. Luckily it is in the PDE-definition IEEE-754-2008 when the hardware is extended slightly. According to this definition the special value is stored in the first byte of the number. The extension is that immediately before this storage the first byte should be saved by copying it into the last byte of the number. The damage to the coefficient thus made is harmless. The other bytes of the number must remain unscaved and thus keep their old information to save the other bits in the exponent. Prior to the storage of an ordinary value the saved byte is copied back into the first byte. Thus the exponent value is restored.

To keep the compatibility between the floating-point numbers and the fixed-point numbers in the PDE-definition IEEE-754-2008 the hardware should always save the first byte when a special value is stored, irrespective its operation mode, float or fixed.

Three assignment operators

One might assume that the hardware for the mathematical calculations becomes very complex since it should contain both versions of each elementary arithmetic operator, the float-version and the fixed-version. Actually only the assignment operator needs to be doubled, or even be tripled.

All arithemtic operators like +, -, *, /, sin, log and so on accept all data, float and fixed. They do not discern between them. Their resulting output is always a float number. Its exponent can be every possible value, not only a predefined one. Generally the addition and subtraction of two values with equal exponents will result in an output with the same exponent, but even that is not obligatory.

There must be two different versions of the assignment operator B=A, one with floating-point output and the other with fixed-point output. The first one works in the well-known way: it simply stores the bit-pattern of its output into the receiving number and thus destroys completely the original contents in that number.

The fixed-point version works differently. First it looks at the exponent notated in the receiving number. It will not change that exponent. But it shifts the coefficient of its output such that the position fits to the exponent of the receiver. Then this shifted coefficient is copied to the receiving number. So only the coefficient is rewritten, not the exponent. (Of course the +/- sign is rewritten too.)

Examples:
The old value in the receiver B is 0.00090E4.  The input value A is 0.00007E5.  The result of the assignment B=A is 0.00070E4.  This is a shift of the coefficient to the left.
Old value of B is 0.00999E4.  Input value A is 0.07770E3.  Result in B becomes 0.00777E4.  This is a shift to the right.

When because of a shift to the left a non-zero digit passes the left edge of the coefficient the receiving number cannot store properly the value, and an error will occur. When because of a shift to the right one or more non-zero digits pass beyond the right edge of the coefficient, a rounding operation is invoked. The right-protruding digits are discarded and sometimes the remaining part of the coefficient is increased by one. The assignment operator must be told by its calling program which rounding operation to apply, e.g. round-off or cut-off (= 'truncation') or banking-rounding, etc.

When the exponent has te be kept constant during the full length of a chain addition (e.g. the list of the financial data on an invoice) then after each single addition the fixed-point assignment has to be applied.

Actually the computer must have three different types of the assignment operation B=A.  They are in an example with the starting values  B = 9.00090E4  and  A = 0.00777E3:

     operator type                         resulting B
     -------------                         -----------
   bitwise assignment (= copy exactly)      0.00777E3
   fixed assignment (e.g. cut-off mode)     0.00077E4
                    (e.g. round-off mode)   0.00078E4
   float assignment gives one of
            its possible results:           7.77000E0
                                            0.77700E1
                                            0.07770E2
                                            0.00777E3

Conclusion

The addition of only one operator to the hardware of a computer operating with decimal floating-point numbers makes that computer much more versatile. This operator is the 'fixed-point assignment' that does not change the output exponent.

Thus the modern Packed Decimal Encoding IEEE-754-2008 can be used also for many fixed-point applications simply by extending the hardware with that operator. Then when a special value is stored the hardware should always save the exponent by copying the number's first byte into its last byte.

Back to contents

Back to index of numeric formats



UNIVERSAL NAMES FOR DATA TYPES

Introduction

This text proposes new conventions for naming the arithmetic data in order to get rid of the present-day mess of names. This proposal is based on a series of computer-word lengths that often are powers of two, like in the IBM-360 (see elsewhere in this internet site), and on the storage of the numeric values according to the definitions IEEE-754 and IEEE-754-2008.

Discarding the old word-length names

The arithmetic data are stored in bit groups that are called 'computer words'. Each of these units has a number of bits which is called length. The possible length is one out of a set of lengths predefined by the computer hardware and thus by its manufacturer. The name of such a word length should depend only on the length itself, never on the use of the word and thus independent from the arithmetic categories.

The length-naming proposal is intended primarily to discard the many length indicators ubiquitously used in the present-day realm of programming, like ShortWord, LongWord, HalfWord, FullWord, SingleWord, DoubleWord, QuadrupleWord (= QuadWord), TwinWord. These indicators are used often in a fuzzy way since their meaning changes over the time and depends on the length of the word that varies between the computer models.

Often the fairly small word length of Digital's PDP-11 is taken as the reference, which is 16 bits. Therefore the ordinary 32-bits word in a PC is called a DoubleWord!  The computer world seems to stick to this small word while the arithmetic data become ever bigger and bigger.

The length indications like 128, 64 or 32 are evenly clumsy to many users. The program text gets many of such numbers that actually are not numbers, and thus becomes less clear. Often these 'numbers' are connected with a preceding name by an underscore or a hyphen (= minus-sign), sometimes by an asterix. These symbols to improve readability actually make it worse.

Therefore this proposal first advocates the use of names and not of such numbers. These new names must be accompanied by clear and uniform definitions. The best is when in all programming languages the same name is used for the same length of the data items, irrespective the language.

A Cobol committee still advises the use of numbers. But these numbers are not based on the length of the computer words in bits or bytes, but on the guaranteed accuracy with which the numeric value can be stored. This advise is written at the end of the proposal.

New length names

Length names like 'long' and 'short' are proposed, although now in the way the manufacturers of clothes (e.g. T-shirts) apply them. Therefore many of the proposed names for the different word-lengths are derived from this apparel-size naming. Therein X means extra, S = short, M = medium and L = long.  Therein E means extra too. But now this letter will not be used. It is reserved for later use.

The names Small and Large should not be used as synonyms for Short and Long since these names can give an indication about the value of the number also. A 'small number' can mean a number with a tiny value, e.g. 1.E-80.  The word 'short' can mean only a small number of bits. Similar holds for the words 'large' (e.g. 1.E+800) and 'long'. Therefore the programming folks should agree to use the words 'small' and 'large' solely for the value in the number, and the words 'short' and 'long' only for the length in bits of the number.

The new length names are divided into three groups. Each group belongs to a group of lengths that fulfill a mathematical formula. The groups are:

Each length indicator can be written both by its full name or by an abbreviation. The indicator for the enhanced length equals that of the corresponding basic length with a preceding prefix. This prefix is ENHANCED or A (from enhAnced). Similarly the triple-half length has the prefix TRIPLEHALF (no '-') or T.

So the list of the new length names becomes very large:

Possible length indicators:
#bits    abbrev.    f u l l   n a m e   comments
-----    -------    -----------------   --------
   1         B           BIT            B means bit, not byte
   2        DB           DUOBIT         name 'quit' is not used
   3       TDB      TRIPLEHALF DUOBIT
   4         N           NIBBLE
   5        AN       ENHANCED NIBBLE
   6        TN      TRIPLEHALF NIBBLE   sixbit
   8         Y           BYTE           Y also from babY-size
  10        AY       ENHANCED BYTE
  12        TY      TRIPLEHALF BYTE
  16        DY           DUOBYTE        this equals exactly XS
  20       ADY       ENHANCED DUOBYTE   this equals exactly AXS
  24       TDY      TRIPLEHALF DUOBYTE  this equals exactly TXS

  16        XS           EXTRA SHORT
  20       AXS       ENHANCED EXTRA SHORT
  24       TXS      TRIPLEHALF EXTRA SHORT
  32         S           SHORT
  40        AS       ENHANCED SHORT
  48        TS      TRIPLEHALF SHORT
  64         M           MEDIUM
  80        AM       ENHANCED MEDIUM
  96        TM      TRIPLEHALF MEDIUM
 128         L           LONG
 160        AL       ENHANCED LONG
 192        TL      TRIPLEHALF LONG
 256        XL           EXTRA LONG
 320       AXL       ENHANCED EXTRA LONG
 384       TXL      TRIPLEHALF EXTRA LONG

Several notes:
-- When needed this list can be extended easily with the giant XXL, AXXL and TXXL formats (resp. 512, 640 and 768 bits).
-- The 16-bits word has two names: duobyte and extra short. For the computer there is no difference between both. But for man it can be. To help his program documentation it is advisable to use the name XS for numerics and the name DY for non-numerics. Similar holds for the enhanced and triple-half versions.
-- The medium length is fairly long already: 64 bits, not 32 or 16 bits. And 'long' is already 128 bits, not 64 like often now.
-- The following three lengths do not exist, since they would result in broken bits:

Forbidden length indicators:
#bits    abbrev.    f u l l   n a m e   comments
-----    -------    -----------------   --------
 1.25       AB       ENHANCED BIT       not existing
 1.50       TB      TRIPLEHALF BIT      not existing
 2.50      ADB       ENHANCED DUOBIT    not existing

Length names for actual computers

The lenghts of the (partial) word formats in most computers can be described by this naming system, but alas not in all computers. For example, the lengths used by the old Univac-1100 (36 bits), the old Digital PDP-10 (36 bits) and the old Philips EL-X8 (27 bits) cannot. But all lengths in the old Burroughs 6700 c.s. can. According to this proposal their names would be:

Burroughs 6700, 7700, 7900 and Unisys-A:
#bits    abbrev.      used for
-----    -------      --------
   1         B        shortest unit of information
   3       TDB        octade = octal digit in 'binary' number
   4         N        decimal 'text'-digit in BCD
   6        TN        text character in BCL
   8         Y        text character in EBCDIC
  48        TS        full word
  96        TM        double word

The pedigrees of the Digital PDP-11 and IBM-360 use basic lengths only:

DEC-PDP-11 and IBM-360:
#bits  abbrev.  full name     comments
-----  -------  ---------     --------
   1      B     BIT           shortest unit of information
   4      N     NIBBLE        bin.coded digit; for commerce
   8      Y     BYTE          text character in ASCII / EBCDIC
  16     DY     DUOBYTE      (text character in Unicode)
  16     XS     EXTRA SHORT   DEC's basic word; short integer
  32      S     SHORT         IBM's basic word; long-int; float
  64      M     MEDIUM        double-precision/long float
 128      L     LONG          DEC-octaword; quadr./extend.float

The lengths used by a modern Intel-PC are:

Intel-PC:
#bits  abbrev.  full name              comments
-----  -------  ---------              --------
   1      B     BIT                    shortest unit of info.
   8      Y     BYTE                   for ASCII character
  16     DY     DUOBYTE                this equals exactly XS
  16     XS     EXTRA SHORT            for Unicode character
  24    TXS     TRIPLEHALF EXTRA SHORT   for ATi-video graphics
  32      S     SHORT                  for float + long-integer
  64      M     MEDIUM                 for double-precision
  80     AM     ENHANCED MEDIUM       for 8087 Math.Coprocessor

The computer that calculates according to all IEEE-754(-2008) definitions may handle the lengths in the following list that embraces the Intel-PC list above:

Computer using IEEE-754(-2008):
#bits  abbrev.  full name              comments
-----  -------  ---------              --------
   1      B     BIT                    smallest unit of info.
   4      N     NIBBLE                 for BCD-coded digit
   8      Y     BYTE                   for ASCII character
  10     AY     ENHANCED BYTE          for a 3-digits DPD-unit
  16     DY     DUOBYTE                for Unicode character,
  16     XS     EXTRA SHORT             and for video graphics
  24    TXS     TRIPLEHALF EXTRA SHORT      for video graphics
  32      S     SHORT                  for float + long-integer
  64      M     MEDIUM                 for double + medium-PDE
  80     AM     ENHANCED MEDIUM        for FPP 8087 by Intel
 128      L     LONG                   for long-PDE

Of course, no computer will ever use all lengths in the giant table, otherwise its hardware would become much too complex. Nevertheless all words and abbreviations in that table should be reserved for the length indications. In the language Cobol perhaps they all should become reserved words. The examples illustrate that length indicators not used by the one computer are used by another computer, and that indicators presently not used might be used in the future.

Comparison of length names

Every manufacturer and institution uses its own names for the word lengths. Here is a listing of several of them. Herein [xx] means 'Possible, although not used actually', and (n/a) means 'Not applicable or not applied'.

       Fortran  Microsoft  Digital     Internat.     JHM.Bonten
         e.g.     e.g.       DEC      Busin.Mach.    in this
#bytes  real*i   int__k     PDP-11      IBM-360      proposal
------  ------  --------  ---------  ------------   -----------
 1/8    (n/a)   [ __1 ]      bit          bit         bit
 1/4    (n/a)   [ __2 ]     (n/a)        (n/a)        duobit
 1/2    (n/a)   [ __4 ]     nibble       nibble       nibble
  1      *1       __8        byte         byte        byte
  2      *2       __16       word      half word    extra short
  4      *4       __32    long word   (full) word    short word
  8      *8       __64    quad word   double word   medium word
 16      *16      __128   octa word   extend.word     long word
 32     [*32]   [ __256 ]    (n/a)       (n/a)      extra long

This table shows that every designer uses his own definition for the (medium-)word. DEC gives it 16 bits, IBM gives it 32 bits, and in this proposal it gets 64 bits.

Numeric categories

Besides the length of the data the way wherein they are used is important. All data that are handled by the same way are grouped together into a 'category'. Thus several arithmetic categories exist. Five (or seven) of those exist in the realm of the IEEE-754(-2008) definitions. The names for these categories should become:

abbrev.    full name                comments
-------    ---------                --------
UNSIG      UNSIGNED                 always binary integer
BININT     BINARY INTEGER           generally in 2-complement
BINFLOT    BINARY FLOATING POINT    with the hidden bit
BINFIX     BINARY FIXED POINT       (not defined at present)
DECINT     DECIMAL INTEGER          (not defined at present)
DECFLOT    DECIMAL FLOATING POINT   with the DPD compression
DECFIX     DECIMAL FIXED POINT      see the proposal above

Category and length together

The complete type-name of a word for arithmetic use consists of the name of the arithemtic category followed by the name of the length. This name concatenation can be done both by the full names and by the abbreviations. For example one may get:

full name                           abbreviation
---------                           ------------
UNSIGNED BYTE                       UNSIG B
UNSIGNED EXTRA SHORT                UNSIG XS
BINARY INTEGER MEDIUM               BININT M
BINARY FLOAT ENHANCED MEDIUM        BINFLOT AM
DECIMAL FIXED EXTRA LONG            DECFIX XL

Not all combinations of arithmetic categories and length indicators are possible. A few of those which cannot exist are UNSIGNED EXTRA LONG and DECIMAL FLOAT NIBBLE.  Nevertheless, for the sake of security in the future every combination of a category name and a length indicator must be seen as a syntactically legal type name, even when the result is such an 'impossible' data item. By this way future names like BINARY INTEGER EXTRA LONG are already protected now.

The following table gives the combinations that are possible presently:

length ->   Y    XS   TXS     S     M    AM     L    XL

UNSIG       1     1     0     1     1     0     0     0

BININT      1     1     0     1     1     0     0     0

BINFLOT     1     1     1     1     1     1     1     0

BINFIX   (  1     1     0     1     1     1     1     0  ) ??

DECINT   (  0     1     0     1     1     0     1     0  ) ??

DECFLOT     0     1     0     1     1     0     1     0

DECFIX      0     1     0     1     1     0     1     0

Naming in actual languages

The proposed way of naming should be done in every programming language, Cobol, Fortran, Algol, C/C++, Java, and so on, although adapted to the specific language. Thus the naming and useable arithmetic operations are fully compatible between all languages. There will be no confusion about the meaning of the names and the results of the operations.

In the language Cobol both the full name and the abbreviation can be used. The name parts are connected by hyphens. In Algol, Fortran, C/C++ and Java only the abbreviation can be used. In Algol and Fortran the underscore is used for the connection. In C/C++ and Java the connection is performed by the tactical use of uppercase and lowercase letters. Other languages will get their own adaptation of the naming conventions. Thus one gets for example:

Cobol:     BINARY-FLOAT-TRIPLEHALF-EXTRA-SHORT
           BINFLOT-TXS
Fortran:   BINFLOT_TXS
           binflot_txs
C/C++:     binflotTXS

Cobol:     DECIMAL-FIXED-LONG
           DECFIX-L
Fortran:   DECFIX_L
           decfix_l
C/C++:     decfixL

Two sets of compilers

The following part of the proposal is to discard all present compilers of all languages. Of course this is impossible since it would lead to a giant mess in the world. In reality two compilers should be made for every language. One compiler uses the old names for the arithmetic data and the other uses the new names. The 'old' compiler will not be updated to handle types of arithmetic data other than it already handles. The 'new' compiler is not able to handle the old data-type names. Thus in one program text either the old names or the new names can be used, but not both simultaneously. Consequently no confusion between the old and the new names will arise.

Nevertheless the 'old' programs stay compilable and can always be updated. Also the linkage between them and the 'new' ones remains possible since the compatibility between the object codes is not lost. The novice programmers should learn to use the new names primarily.

Example in three languages

Example: The following three subroutines are identical:

'new' Cobol

IDENTIFICATION DIVISION.
PROGRAM-ID. SIMPLE-DIVIDER.
DATA DIVISION.
LINKAGE SECTION.
  01  NUMERATOR      BINARY-INTEGER-EXTRA-SHORT.
  01  DENOMINATOR    BINFLOT-S.
  01  QUOTIENT       BINFLOT-M.
PROCEDURE DIVISION USING NUMERATOR, DENOMINATOR, QUOTIENT.
DIVIDE NUMERATOR BY DENOMINATOR GIVING QUOTIENT.
EXIT PROGRAM.

'new' Fortran

SUBROUTINE SIMPLE_DIVIDER (NUMERATOR, DENOMINATOR, QUOTIENT)
BININT_XS NUMERATOR
BINFLOT_S DENOMINATOR
BINFLOT_M QUOTIENT
QUOTIENT = NUMERATOR / DENOMINATOR
END

'new' C/C++

void SimpleDivider ( binintXS* Numerator,
                     binflotS* Denominator,
                     binflotM* Quotient )
{ *Quotient = *Numerator / *Denominator ; }

In the 'old' languages this subroutine is written as:

'old' Fortran

SUBROUTINE SIMPLE_DIVIDER (NUMERATOR, DENOMINATOR, QUOTIENT)
INTEGER*2 NUMERATOR
REAL DENOMINATOR
DOUBLE PRECISION QUOTIENT
QUOTIENT = NUMERATOR / DENOMINATOR
END

'old' C/C++

void SimpleDivider ( short int* Numerator,
                     float* Denominator,
                     long float* Quotient )
{ *Quotient = *Numerator / *Denominator ; }

'old' Cobol

not possible.

Complex data

A number of the type COMPLEX is a structure that consists of two ordinary numbers. In theory each of these numbers might belong to its own category and and have its own length. Thus a hughe bunch of different complex types might arise, in fact far over hundred types. This is unmanegeable. So a confinement must be made.

The complex data are used only for scientific and technical purposes. Therefore both numbers should be binary, floating point and not too short. To make things even more easy both numbers should have exactly the same length. Thus only four complex types are left.

This number of four types is doubled since there are two different ways for using the complex data: as a Cartesian system (C-complex) and as a polar system (P-complex). In the Cartesian system the first number is called the REAL part and the second number is the IMAGINARY part. In the polar system the numbers are called respectively the RADIUS and the ANGLE which is often called PHI (always in radians). In this system the radius should be non-negative always.

Thus the table of the categories should be extended by:

abbrev.    full name                comments
-------    ---------                --------
CCOMPLEX   CARTESIAN COMPLEX        structure: {RE, IM}
PCOMPLEX   POLAR COMPLEX            structure: {RAD, PHI}

Both numbers in the complex number are always BINARY FLOATING POINT and have always the same length. Since the complex number consists of two numbers its length is twice that of such number. The length name of the complex number is the length name of one composing number.

Consequently only for the complex numbers the table with the lengthes must be modified into:

#bits  abbrev.  full name              comments
-----  -------  ---------              --------
  64      S     SHORT                  64 bits = 2 x 32 bits
 128      M     MEDIUM                128 bits =... and so on.
 160     AM     ENHANCED MEDIUM        When FPP 8087 is used
 256      L     LONG

Then the table with the possible category-length combinations can be extended by:

length ->   Y    XS   TXS     S     M    AM     L    XL

CCOMPLEX    0     0     0     1     1     1     1     0

PCOMPLEX    0     0     0     1     1     1     1     0

Since the complex arithmetic is used only in scientific and some technical applications it may not be implemented in many languages. Of course Fortran will get it and Cobol will not. The Cartesian complex is in use by Fortran already for over 40 years. Even today many modern languages do not have it.

Reversely, for nearly 50 years Cobol handles numeric data of a category that Fortran and C/C++ do not have. It is the category Display which is described in the Proposals for Cobol.

Names telling the guaranteed accuracy

The Cobol committe SC22-WG4 proposes not to use the length-name suffix in the type names for the arithmetic floating-point or fixed-point numerics but to include the number of digits their coefficients can stand for. This number says more to many users than an abstract length-name suffix. It gives a good indication about the applicability of each numeric. For the same reason using the word length in bits as a suffix (like by Microsoft) should be detested even more than the length-name.

The committee wants to mention the maximum number of digits. However, it might be better to mention the number the coefficient can stand for at least, i.e. the guaranteed accuracy. It is even at best to mention the integral truncation of this accuracy. Then the number is the same for the binary and the decimal numerics of equal word-length when they are written in the IEEE-754(-2008) format.

The table lists the word lengths and the accuracies:

    -- computer-word --     ---- guaranteed accuracy -----
 [not complex]  length       754     754-2008     after
     # bits      name       binary    decimal   truncation
     ------     ------      ------   --------   ----------
        16        XS          3.0        3           3
        32         S          6.9        6           6
        64         M         15.6       15          15
       128         L         33.7       33          33

The truncated guaranteed accuracy is applied as a suffix, as the following examples show:

    n u m e r i c   t y p e s   e x a m p l e s   w i t h :

   # bits        length-name     max.# digits    guar.accuracy
 ----------      -----------     ------------    -------------
 BINFLOT-16       BINFLOT-XS      BINFLOT-4       BINFLOT-3
 BINFLOT-32       BINFLOT-S       BINFLOT-7       BINFLOT-6
 DECFLOT-64       DECFLOT-M       DECFLOT-16      DECFLOT-15
  DECFIX-128       DECFIX-L        DECFIX-34       DECFIX-33
            >>-------- more convenient -------->>

Back to contents

Back to index of numeric formats