FOR FIXED-LENGTH DECIMALS

AND FOR TYPE NAMES

First date of publication: 21 december 2006

General modification: 18 july 2007

Corrections in third chapter: 18 september 2007

Renaming 754r into 754-2008: 23 september 2008

Addition of manufacturer length names: 10 february 2009

- unsigned PDE-format
- fixed-point numbers
- universal names for data types
- introduction
- discarding the old word-length names
- new length names
- length names for actual computers
- comparison of length names
- numeric categories
- category and length together
- naming in actual languages
- two sets of compilers
- example in three languages
- complex data
- names telling the guaranteed accuracy

The three chapters in this document are fully independent. The two chapters 'unsigned PDE-format' and 'fixed-point numbers' are proposals for types of decimal numbers additional to the three types in the Packed Decimal Encoding IEEE-754-2008. The third chapter proposes a redefinition of the names of numeric data types for a better inter-language communication.

The fourth mode of the hypothetical multi-decimal computer leads to a proposal for a second extra small PDE-format. It is unsigned and can be used in video-graphics cards. Also it can serve for students education.

Video tones (colors and luminances) are always positive. The negative values have the same result as the zero value: black. So the value that expresses the intensity of a tone does not need to be negative. Always it can be zero or positive. This enables the deletion of the sign bit in favor of a bit in the exponent tail in the extra small PDE-format.

The data-word structures of both extra small PDE-formats are:

+---+-------------------+-----------------------------+ |+/-| combination field | 10 bits for | |si | for exponent, one | 3 digits in | | gn| digit and specials| Densely Packed Decimal | +---+-------------------+-----------------------------+ 1 b. 5 bits 10 bits

+-------------------+---+-----------------------------+ | combination field |exp| 10 bits for | | for 2 exp.bits, 1 |ta | 3 digits in | | digit and specials| il| Densely Packed Decimal | +-------------------+---+-----------------------------+ 5 bits 1 b. 10 bits

In the latter format the single bit in the exponent tail also serves for indicating whether a NaN is quiet or signaling. In the hypothetical decimal computer this is done by the first bit of the DPD-group.

In listing both formats will be:

Type of number ---> Fourth Un- mode signed --- Field structure in bits --- Length of total number field 16 16 Length of sign 1 0 Length of combination field 5 5 Length of exponent tail 0 1 Length of coefficient tail 10 10 --- Exponent (fully binary) --- Total number of bits 2 3 Maximum Integer value 2 5 Minimum Integer value 0 0 Excess bias offset 1 2 Maximum Actual value 1 3 Minimum Actual value -1 -2 --- Coefficient (in decimal) --- Number of 10-bit DPD-groups 1 1 Number of digits in tail 3 3 Number of digits before decimal point 1 1 Total number of digits 4 4 Accuracy normal.guar. in dec.digits 3 3 --- Numeric values (absolute)--- Maximum value, approximately 100 10000 Minimum value, normalized 0.1 0.01 Minimum nonzero un-normalized 0.0001 0.00001

Not always floating-point numbers are desired as the result of a numeric calculation. Often fixed-point numbers are desired, especially in monetary applications like invoices. Therefore the programming language Cobol often works with numbers in fixed-point notation. First this way of notation for humans is described, then how it can be stored economically in a modern decimal computer.

The human notation of a fixed-point number is a sequence of digits, a period and a second sequence of digits. Generally an exponent is not added to this notation. Example: 3456.789.

The most important property of this human notation is the length of the series of digits at the right of the decimal period. This length is fixed for every individual fixed-point number. It will not change during the run of the program.

This length gives an indication of the tiniest value that must be added or subtracted to change the number. In fact it determines the difference between two consecutive values of the number. Therefore let us call this length the 'precision'. In the example it is three digits, so the precision is 3.

The difference itself is called the 'absolute accuracy' of the number. In the example 345.67 this accuracy is 0.01, and the precision is 2. The better (higher) the precision the smaller the absolute accuracy is. In the number 3.45678 the precision is 5 and the absolute accuracy is 0.00001. In the integer number 345 the precision is 0 and the absolute accuracy is 1.

The precision can be negative in which case the floating-point like exponent-notation has to be used to discard the trailing zeros. In the number 340000 the precision is 0, whilst in the equally valued numbers 34E4 and 3.4E5 and 0.34E6 the precision is negative: -4.

The concept of the 'relative accuracy' does not make much sense for the fixed-point numbers. That is left to the realm of the floating-point numbers where it is very important. It indicates the total number of digits used in the coefficient of the number. The precision only counts the number of digits at the right of the decimal period.

This precision is very important in banking applications. All amount-figures in a chain-addition must be aligned at the right side of the column (e.g. on an invoice), and then the periods of all figures must stand in a straight vertical line. So all figures must have the same precision.

Very often this precision is 2 (=> the absolute accuracy is 1 cent). But the relative accuracy is much less important. An amount of ten dollars (= 4 digits) can stand on an invoice form together with an amount of one million dollar (= 12 digits).

In a computer the floating-point numbers can be used to store the fixed-point numbers, although only when they do not have an obligation for 'normalization'. Therefore the binary numbers of DEC and IEEE-754 are not apt for storing fixed-point numbers. Their hidden-bit system requires normalization always.

In the fixed-point mode of operation the exponent of the floating-point number is kept constant. Only the coefficient and the sign are changed. Consequently another word for "floating-point number" is "normalizable number". And another word for "fixed-point number" is "un-normalizable number".

The permanently fixed exponent may give a theoretical problem: a floating-point number with such an exponent is a contradictio in terminis. The discussion below shows that the fixed exponent still is a practical application.

Besides working with the fixed-point notation humans also like to work with decimal numbers. When they enter a value into the computer and later retrieve it they assume that it has not been changed by the computer, even not in the last digit. It should change never, even not in one out of a billion cases. This is possible by two ways:

- use a binarily operating computer with a very high accuracy,

- use a decimally operating computer.

In every binary computer a rounding occurs when a non-integral decimal value is transformed into a binary value. When this binary value is transformed back into a decimal value a second rounding occurs. When the computer is very accurate (= uses very many bits to store its binary numbers) then the second rounding is always in the direction opposite to the first rounding, and so the original decimal number always returns.

But in a very long series of calculations, e.g. the addition of a thousand or a million numbers, the first rounding might become visible. When the sum is re-transformed into a decimal value the second rounding may go into the wrong direction and the last digit becomes too low or too high by one.

A decimal computer does not have this problem. It stores the values and operates on them in a decimal way. So the humans can better retrace the way of its operations and thus get more confidence in it. Therefore our discussion is confined to this type of computers.

The Picture definition of Cobol describes the structure of the storage for a decimal fixed-point number. It tells the total length in digit positions assigned to a number and the length of the part at the right side of the decimal period (= the fractional part). Example:

PIC 9999V999 means: total length = 7, length of fraction = 3

In fact the picture equals the maximum value that can be stored in the number. In our example 9999.999. Therefore in the following examples the letter V (= virtual decimal period) will be replaced by an ordinary decimal point. The picture also tells the precision of the number: the length of the fraction.

The picture definition can fairly easily be translated into a floating-point number with fixed exponent. First the importance of the precision is stressed by right-justifying the series of digit positions (without the decimal period) in the coefficient of the floating-point number. Then the exponent is adjusted appropriately to keep the stored values right. Hereafter it is kept unchanged during all future arithmetic operations, like it is frozen.

All digit positions thus assigned by the picture definition to the fixed-point number (without the virtual or real decimal period) must fit in the coefficient of the decimal floating-point storage. If there are less positions than can fit in the coefficient the number is padded at left with zeroes (which can be overwritten by the future operations). If there are more then an overflow error occurs. Removing positions in the fraction (and thus rounding-off) to make the number fit is not applied during the definition of the number.

Example: Let us assume a computer wherein the coefficient of the floating-point numbers contains six decimal digits. A virtual decimal point is between the first and second digit. Then the resulting storage of some numbers will become:

picture stored as ------- --------- 999.999 9.99999E2 999.99 0.99999E3 999.9 0.09999E4 999. = 999 0.00999E5 99. = 99 0.00099E5 9. = 9 0.00009E5 9.9 0.00099E4 99.9 0.00999E4 999.9 0.09999E4 9999.9 0.99999E4 99.9999 9.99999E1 999.9999 overflow

During the calculations the leading zeroes in the numeric storage must not stay zero obligatorily. Every digit can be stored on their place. So a number with the picture 9999.99 can be stored on the place originally intended for a number with the picture 9.99. However, a number with the picture 9.999 or the picture 9.9 cannot be stored on that place since the fraction has a different length.

So the padding with zeroes makes that the original length of the number gets lost or must be stored somewhere outside the number. But the much more important length of the fraction is saved by the exponent.

Note the importance of trailing zeroes in an actual value. They cannot be discarded by shifting the coefficient to the right, since the frozen exponent cannot be changed.

picture value stored as ------- ----- --------- 9 3 0.00003E5 9.9 3.0 0.00030E4 9.99 3.00 0.00300E3 99.9 34.5 0.00345E4 99.99 34.50 0.03450E3 999.99 345.67 0.34567E3 999.999 34.5678 error 9999.99 34567.80 overflow

Note that in these examples the stored values 0.00003E5, 0.00030E4 and 0.00300E3 represent the same value, but they represent different precisions, viz. 0, 1 and 2. The same holds for the values 0.00345E4 and 0.03450E3 with the precisions 1 and 2. A special assignment operator which will be described later will retain the precision of the receiving number.

The computer must have two different relational operators for testing equality. One operator tests the bitwise equality and the other tests the arithmetic equality. For example:

0.00999E4 = 0.09990E3
=> false at test on bit-equality
=> true at test on numeric equality

The old Burroughs 6700 had similar two tests. In its language Extended-Algol the test on bit-equality was written as "IS" and the test on numerical equality was written as "=" or "EQL". Thus "A IS B" and "A=B" or "A EQL B". Of course bit-equality implies numerical equality always. The reverse is not true.

In some modern computers when the exponent in a floating-point number is at its lowest value and should go below this value the variable goes into the fixed-point mode. It stops trying to shift the coefficient to the left to save the right digits of the value. Otherwise the exponent would fall below its minimum. The coefficient has become un-normalizable and so its relative accuracy may lower. As soon as the leftmost non-zero digit of the coefficient might go too far to the left, the number returns into the floating-point mode. (A similar thing happens in the binary floating-point system IEEE-754 which is used by nearly all modern computers.)

Note that the fixed-point notation has a vile snag. It works fine only when an ordinary value is stored in the receiving number. When an overflow or error value arises this value must be stored perhaps in the same way as in the floating-point notation. Then in many computers the exponent value is damaged and thus lost. The exponent needs to be re-initialized before another value is stored in the number.

This re-initialization can be performed either by the software or by the hardware. The latter case is not always possible. Luckily it is in the PDE-definition IEEE-754-2008 when the hardware is extended slightly. According to this definition the special value is stored in the first byte of the number. The extension is that immediately before this storage the first byte should be saved by copying it into the last byte of the number. The damage to the coefficient thus made is harmless. The other bytes of the number must remain unscaved and thus keep their old information to save the other bits in the exponent. Prior to the storage of an ordinary value the saved byte is copied back into the first byte. Thus the exponent value is restored.

To keep the compatibility between the floating-point numbers and the fixed-point numbers in the PDE-definition IEEE-754-2008 the hardware should always save the first byte when a special value is stored, irrespective its operation mode, float or fixed.

One might assume that the hardware for the mathematical calculations becomes very complex since it should contain both versions of each elementary arithmetic operator, the float-version and the fixed-version. Actually only the assignment operator needs to be doubled, or even be tripled.

All arithemtic operators like +, -, *, /, sin, log and so on accept all data, float and fixed. They do not discern between them. Their resulting output is always a float number. Its exponent can be every possible value, not only a predefined one. Generally the addition and subtraction of two values with equal exponents will result in an output with the same exponent, but even that is not obligatory.

There must be two different versions of the assignment operator B=A, one with floating-point output and the other with fixed-point output. The first one works in the well-known way: it simply stores the bit-pattern of its output into the receiving number and thus destroys completely the original contents in that number.

The fixed-point version works differently. First it looks at the exponent notated in the receiving number. It will not change that exponent. But it shifts the coefficient of its output such that the position fits to the exponent of the receiver. Then this shifted coefficient is copied to the receiving number. So only the coefficient is rewritten, not the exponent. (Of course the +/- sign is rewritten too.)

Examples:

The old value in the receiver B is 0.00090E4. The input value A is 0.00007E5. The result of the assignment B=A is 0.00070E4. This is a shift of the coefficient to the left.

Old value of B is 0.00999E4. Input value A is 0.07770E3. Result in B becomes 0.00777E4. This is a shift to the right.

When because of a shift to the left a non-zero digit passes the left edge of the coefficient the receiving number cannot store properly the value, and an error will occur. When because of a shift to the right one or more non-zero digits pass beyond the right edge of the coefficient, a rounding operation is invoked. The right-protruding digits are discarded and sometimes the remaining part of the coefficient is increased by one. The assignment operator must be told by its calling program which rounding operation to apply, e.g. round-off or cut-off (= 'truncation') or banking-rounding, etc.

When the exponent has te be kept constant during the full length of a chain addition (e.g. the list of the financial data on an invoice) then after each single addition the fixed-point assignment has to be applied.

Actually the computer must have three different types of the assignment operation B=A. They are in an example with the starting values B = 9.00090E4 and A = 0.00777E3:

operator type resulting B ------------- ----------- bitwise assignment (= copy exactly) 0.00777E3 fixed assignment (e.g. cut-off mode) 0.00077E4 (e.g. round-off mode) 0.00078E4 float assignment gives one of its possible results: 7.77000E0 0.77700E1 0.07770E2 0.00777E3

The addition of only one operator to the hardware of a computer operating with decimal floating-point numbers makes that computer much more versatile. This operator is the 'fixed-point assignment' that does not change the output exponent.

Thus the modern Packed Decimal Encoding IEEE-754-2008 can be used also for many fixed-point applications simply by extending the hardware with that operator. Then when a special value is stored the hardware should always save the exponent by copying the number's first byte into its last byte.

This text proposes new conventions for naming the arithmetic data in order to get rid of the present-day mess of names. This proposal is based on a series of computer-word lengths that often are powers of two, like in the IBM-360 (see elsewhere in this internet site), and on the storage of the numeric values according to the definitions IEEE-754 and IEEE-754-2008.

The arithmetic data are stored in bit groups that are called 'computer words'. Each of these units has a number of bits which is called length. The possible length is one out of a set of lengths predefined by the computer hardware and thus by its manufacturer. The name of such a word length should depend only on the length itself, never on the use of the word and thus independent from the arithmetic categories.

The length-naming proposal is intended primarily to discard the many length indicators ubiquitously used in the present-day realm of programming, like ShortWord, LongWord, HalfWord, FullWord, SingleWord, DoubleWord, QuadrupleWord (= QuadWord), TwinWord. These indicators are used often in a fuzzy way since their meaning changes over the time and depends on the length of the word that varies between the computer models.

Often the fairly small word length of Digital's PDP-11 is taken as the reference, which is 16 bits. Therefore the ordinary 32-bits word in a PC is called a DoubleWord! The computer world seems to stick to this small word while the arithmetic data become ever bigger and bigger.

The length indications like 128, 64 or 32 are evenly clumsy to many users. The program text gets many of such numbers that actually are not numbers, and thus becomes less clear. Often these 'numbers' are connected with a preceding name by an underscore or a hyphen (= minus-sign), sometimes by an asterix. These symbols to improve readability actually make it worse.

Therefore this proposal first advocates the use of names and not of such numbers. These new names must be accompanied by clear and uniform definitions. The best is when in all programming languages the same name is used for the same length of the data items, irrespective the language.

A Cobol committee still advises the use of numbers. But these numbers are not based on the length of the computer words in bits or bytes, but on the guaranteed accuracy with which the numeric value can be stored. This advise is written at the end of the proposal.

Length names like 'long' and 'short' are proposed, although now in the way the manufacturers of clothes (e.g. T-shirts) apply them. Therefore many of the proposed names for the different word-lengths are derived from this apparel-size naming. Therein X means extra, S = short, M = medium and L = long. Therein E means extra too. But now this letter will not be used. It is reserved for later use.

The names Small and Large should not be used as synonyms for Short and Long since these names can give an indication about the value of the number also. A 'small number' can mean a number with a tiny value, e.g. 1.E-80. The word 'short' can mean only a small number of bits. Similar holds for the words 'large' (e.g. 1.E+800) and 'long'. Therefore the programming folks should agree to use the words 'small' and 'large' solely for the value in the number, and the words 'short' and 'long' only for the length in bits of the number.

The new length names are divided into three groups. Each group belongs to a group of lengths that fulfill a mathematical formula. The groups are:

- Basic:

the number of bits is a power of two.

length = 2^k, with k is nonnegative integer.

example: byte = 2^3 = 8 bits. - Enhanced:

the number of bits is one quarter more than basic.

length = (2^k) * 5/4

example: enhanced byte = 2^3 * 5/4 = 10 bits - Triple-half:

the number of bits is one half more than basic.

length = (2^k) * 3/2

example: triplehalf byte = 2^3 * 3/2 = 12 bits

Each length indicator can be written both by its full name or by an abbreviation. The indicator for the enhanced length equals that of the corresponding basic length with a preceding prefix. This prefix is ENHANCED or A (from enhAnced). Similarly the triple-half length has the prefix TRIPLEHALF (no '-') or T.

So the list of the new length names becomes very large:

Possible length indicators: #bits abbrev. f u l l n a m e comments ----- ------- ----------------- -------- 1 B BIT B means bit, not byte 2 DB DUOBIT name 'quit' is not used 3 TDB TRIPLEHALF DUOBIT 4 N NIBBLE 5 AN ENHANCED NIBBLE 6 TN TRIPLEHALF NIBBLE sixbit 8 Y BYTE Y also from babY-size 10 AY ENHANCED BYTE 12 TY TRIPLEHALF BYTE 16 DY DUOBYTE this equals exactly XS 20 ADY ENHANCED DUOBYTE this equals exactly AXS 24 TDY TRIPLEHALF DUOBYTE this equals exactly TXS 16 XS EXTRA SHORT 20 AXS ENHANCED EXTRA SHORT 24 TXS TRIPLEHALF EXTRA SHORT 32 S SHORT 40 AS ENHANCED SHORT 48 TS TRIPLEHALF SHORT 64 M MEDIUM 80 AM ENHANCED MEDIUM 96 TM TRIPLEHALF MEDIUM 128 L LONG 160 AL ENHANCED LONG 192 TL TRIPLEHALF LONG 256 XL EXTRA LONG 320 AXL ENHANCED EXTRA LONG 384 TXL TRIPLEHALF EXTRA LONG

Several notes:

-- When needed this list can be extended easily with the giant XXL, AXXL and TXXL formats (resp. 512, 640 and 768 bits).

-- The 16-bits word has two names: duobyte and extra short. For the computer there is no difference between both. But for man it can be. To help his program documentation it is advisable to use the name XS for numerics and the name DY for non-numerics.
Similar holds for the enhanced and triple-half versions.

-- The medium length is fairly long already: 64 bits, not 32 or 16 bits. And 'long' is already 128 bits, not 64 like often now.

-- The following three lengths do not exist, since they would result in broken bits:

Forbidden length indicators: #bits abbrev. f u l l n a m e comments ----- ------- ----------------- -------- 1.25 AB ENHANCED BIT not existing 1.50 TB TRIPLEHALF BIT not existing 2.50 ADB ENHANCED DUOBIT not existing

The lenghts of the (partial) word formats in most computers can be described by this naming system, but alas not in all computers. For example, the lengths used by the old Univac-1100 (36 bits), the old Digital PDP-10 (36 bits) and the old Philips EL-X8 (27 bits) cannot. But all lengths in the old Burroughs 6700 c.s. can. According to this proposal their names would be:

Burroughs 6700, 7700, 7900 and Unisys-A: #bits abbrev. used for ----- ------- -------- 1 B shortest unit of information 3 TDB octade = octal digit in 'binary' number 4 N decimal 'text'-digit in BCD 6 TN text character in BCL 8 Y text character in EBCDIC 48 TS full word 96 TM double word

The pedigrees of the Digital PDP-11 and IBM-360 use basic lengths only:

DEC-PDP-11 and IBM-360: #bits abbrev. full name comments ----- ------- --------- -------- 1 B BIT shortest unit of information 4 N NIBBLE bin.coded digit; for commerce 8 Y BYTE text character in ASCII / EBCDIC 16 DY DUOBYTE (text character in Unicode) 16 XS EXTRA SHORT DEC's basic word; short integer 32 S SHORT IBM's basic word; long-int; float 64 M MEDIUM double-precision/long float 128 L LONG DEC-octaword; quadr./extend.float

The lengths used by a modern Intel-PC are:

Intel-PC: #bits abbrev. full name comments ----- ------- --------- -------- 1 B BIT shortest unit of info. 8 Y BYTE for ASCII character 16 DY DUOBYTE this equals exactly XS 16 XS EXTRA SHORT for Unicode character 24 TXS TRIPLEHALF EXTRA SHORT for ATi-video graphics 32 S SHORT for float + long-integer 64 M MEDIUM for double-precision 80 AM ENHANCED MEDIUM for 8087 Math.Coprocessor

The computer that calculates according to all IEEE-754(-2008) definitions may handle the lengths in the following list that embraces the Intel-PC list above:

Computer using IEEE-754(-2008): #bits abbrev. full name comments ----- ------- --------- -------- 1 B BIT smallest unit of info. 4 N NIBBLE for BCD-coded digit 8 Y BYTE for ASCII character 10 AY ENHANCED BYTE for a 3-digits DPD-unit 16 DY DUOBYTE for Unicode character, 16 XS EXTRA SHORT and for video graphics 24 TXS TRIPLEHALF EXTRA SHORT for video graphics 32 S SHORT for float + long-integer 64 M MEDIUM for double + medium-PDE 80 AM ENHANCED MEDIUM for FPP 8087 by Intel 128 L LONG for long-PDE

Of course, no computer will ever use all lengths in the giant table, otherwise its hardware would become much too complex. Nevertheless all words and abbreviations in that table should be reserved for the length indications. In the language Cobol perhaps they all should become reserved words. The examples illustrate that length indicators not used by the one computer are used by another computer, and that indicators presently not used might be used in the future.

Every manufacturer and institution uses its own names for the word lengths. Here is a listing of several of them. Herein [xx] means 'Possible, although not used actually', and (n/a) means 'Not applicable or not applied'.

Fortran Microsoft Digital Internat. JHM.Bonten e.g. e.g. DEC Busin.Mach. in this #bytes real*i int__k PDP-11 IBM-360 proposal ------ ------ -------- --------- ------------ ----------- 1/8 (n/a) [ __1 ] bit bit bit 1/4 (n/a) [ __2 ] (n/a) (n/a) duobit 1/2 (n/a) [ __4 ] nibble nibble nibble 1 *1 __8 byte byte byte 2 *2 __16 word half word extra short 4 *4 __32 long word (full) word short word 8 *8 __64 quad word double word medium word 16 *16 __128 octa word extend.word long word 32 [*32] [ __256 ] (n/a) (n/a) extra long

This table shows that every designer uses his own definition for the (medium-)word. DEC gives it 16 bits, IBM gives it 32 bits, and in this proposal it gets 64 bits.

Besides the length of the data the way wherein they are used is important. All data that are handled by the same way are grouped together into a 'category'. Thus several arithmetic categories exist. Five (or seven) of those exist in the realm of the IEEE-754(-2008) definitions. The names for these categories should become:

abbrev. full name comments ------- --------- -------- UNSIG UNSIGNED always binary integer BININT BINARY INTEGER generally in 2-complement BINFLOT BINARY FLOATING POINT with the hidden bit BINFIX BINARY FIXED POINT (not defined at present) DECINT DECIMAL INTEGER (not defined at present) DECFLOT DECIMAL FLOATING POINT with the DPD compression DECFIX DECIMAL FIXED POINT see the proposal above

The complete type-name of a word for arithmetic use consists of the name of the arithemtic category followed by the name of the length. This name concatenation can be done both by the full names and by the abbreviations. For example one may get:

full name abbreviation --------- ------------ UNSIGNED BYTE UNSIG B UNSIGNED EXTRA SHORT UNSIG XS BINARY INTEGER MEDIUM BININT M BINARY FLOAT ENHANCED MEDIUM BINFLOT AM DECIMAL FIXED EXTRA LONG DECFIX XL

Not all combinations of arithmetic categories and length indicators are possible. A few of those which cannot exist are UNSIGNED EXTRA LONG and DECIMAL FLOAT NIBBLE. Nevertheless, for the sake of security in the future every combination of a category name and a length indicator must be seen as a syntactically legal type name, even when the result is such an 'impossible' data item. By this way future names like BINARY INTEGER EXTRA LONG are already protected now.

The following table gives the combinations that are possible presently:

length -> Y XS TXS S M AM L XL UNSIG 1 1 0 1 1 0 0 0 BININT 1 1 0 1 1 0 0 0 BINFLOT 1 1 1 1 1 1 1 0 BINFIX ( 1 1 0 1 1 1 1 0 ) ?? DECINT ( 0 1 0 1 1 0 1 0 ) ?? DECFLOT 0 1 0 1 1 0 1 0 DECFIX 0 1 0 1 1 0 1 0

The proposed way of naming should be done in every programming language, Cobol, Fortran, Algol, C/C++, Java, and so on, although adapted to the specific language. Thus the naming and useable arithmetic operations are fully compatible between all languages. There will be no confusion about the meaning of the names and the results of the operations.

In the language Cobol both the full name and the abbreviation can be used. The name parts are connected by hyphens. In Algol, Fortran, C/C++ and Java only the abbreviation can be used. In Algol and Fortran the underscore is used for the connection. In C/C++ and Java the connection is performed by the tactical use of uppercase and lowercase letters. Other languages will get their own adaptation of the naming conventions. Thus one gets for example:

Cobol: BINARY-FLOAT-TRIPLEHALF-EXTRA-SHORT BINFLOT-TXS Fortran: BINFLOT_TXS binflot_txs C/C++: binflotTXS Cobol: DECIMAL-FIXED-LONG DECFIX-L Fortran: DECFIX_L decfix_l C/C++: decfixL

The following part of the proposal is to discard all present compilers of all languages. Of course this is impossible since it would lead to a giant mess in the world. In reality two compilers should be made for every language. One compiler uses the old names for the arithmetic data and the other uses the new names. The 'old' compiler will not be updated to handle types of arithmetic data other than it already handles. The 'new' compiler is not able to handle the old data-type names. Thus in one program text either the old names or the new names can be used, but not both simultaneously. Consequently no confusion between the old and the new names will arise.

Nevertheless the 'old' programs stay compilable and can always be updated. Also the linkage between them and the 'new' ones remains possible since the compatibility between the object codes is not lost. The novice programmers should learn to use the new names primarily.

Example: The following three subroutines are identical:

IDENTIFICATION DIVISION. PROGRAM-ID. SIMPLE-DIVIDER. DATA DIVISION. LINKAGE SECTION. 01 NUMERATOR BINARY-INTEGER-EXTRA-SHORT. 01 DENOMINATOR BINFLOT-S. 01 QUOTIENT BINFLOT-M. PROCEDURE DIVISION USING NUMERATOR, DENOMINATOR, QUOTIENT. DIVIDE NUMERATOR BY DENOMINATOR GIVING QUOTIENT. EXIT PROGRAM.

SUBROUTINE SIMPLE_DIVIDER (NUMERATOR, DENOMINATOR, QUOTIENT) BININT_XS NUMERATOR BINFLOT_S DENOMINATOR BINFLOT_M QUOTIENT QUOTIENT = NUMERATOR / DENOMINATOR END

void SimpleDivider ( binintXS* Numerator, binflotS* Denominator, binflotM* Quotient ) { *Quotient = *Numerator / *Denominator ; }

In the 'old' languages this subroutine is written as:

SUBROUTINE SIMPLE_DIVIDER (NUMERATOR, DENOMINATOR, QUOTIENT) INTEGER*2 NUMERATOR REAL DENOMINATOR DOUBLE PRECISION QUOTIENT QUOTIENT = NUMERATOR / DENOMINATOR END

void SimpleDivider ( short int* Numerator, float* Denominator, long float* Quotient ) { *Quotient = *Numerator / *Denominator ; }

A number of the type COMPLEX is a structure that consists of two ordinary numbers. In theory each of these numbers might belong to its own category and and have its own length. Thus a hughe bunch of different complex types might arise, in fact far over hundred types. This is unmanegeable. So a confinement must be made.

The complex data are used only for scientific and technical purposes. Therefore both numbers should be binary, floating point and not too short. To make things even more easy both numbers should have exactly the same length. Thus only four complex types are left.

This number of four types is doubled since there are two different ways for using the complex data: as a Cartesian system (C-complex) and as a polar system (P-complex). In the Cartesian system the first number is called the REAL part and the second number is the IMAGINARY part. In the polar system the numbers are called respectively the RADIUS and the ANGLE which is often called PHI (always in radians). In this system the radius should be non-negative always.

Thus the table of the categories should be extended by:

abbrev. full name comments ------- --------- -------- CCOMPLEX CARTESIAN COMPLEX structure: {RE, IM} PCOMPLEX POLAR COMPLEX structure: {RAD, PHI}

Both numbers in the complex number are always BINARY FLOATING POINT and have always the same length. Since the complex number consists of two numbers its length is twice that of such number. The length name of the complex number is the length name of one composing number.

Consequently only for the complex numbers the table with the lengthes must be modified into:

#bits abbrev. full name comments ----- ------- --------- -------- 64 S SHORT 64 bits = 2 x 32 bits 128 M MEDIUM 128 bits =... and so on. 160 AM ENHANCED MEDIUM When FPP 8087 is used 256 L LONG

Then the table with the possible category-length combinations can be extended by:

length -> Y XS TXS S M AM L XL CCOMPLEX 0 0 0 1 1 1 1 0 PCOMPLEX 0 0 0 1 1 1 1 0

Since the complex arithmetic is used only in scientific and some technical applications it may not be implemented in many languages. Of course Fortran will get it and Cobol will not. The Cartesian complex is in use by Fortran already for over 40 years. Even today many modern languages do not have it.

Reversely, for nearly 50 years Cobol handles numeric data of a category that Fortran and C/C++ do not have. It is the category Display which is described in the Proposals for Cobol.

The Cobol committe SC22-WG4 proposes not to use the length-name suffix in the type names for the arithmetic floating-point or fixed-point numerics but to include the number of digits their coefficients can stand for. This number says more to many users than an abstract length-name suffix. It gives a good indication about the applicability of each numeric. For the same reason using the word length in bits as a suffix (like by Microsoft) should be detested even more than the length-name.

The committee wants to mention the maximum number of digits. However, it might be better to mention the number the coefficient can stand for at least, i.e. the guaranteed accuracy. It is even at best to mention the integral truncation of this accuracy. Then the number is the same for the binary and the decimal numerics of equal word-length when they are written in the IEEE-754(-2008) format.

The table lists the word lengths and the accuracies:

-- computer-word -- ---- guaranteed accuracy ----- [not complex] length 754 754-2008 after # bits name binary decimal truncation ------ ------ ------ -------- ---------- 16 XS 3.0 3 3 32 S 6.9 6 6 64 M 15.6 15 15 128 L 33.7 33 33

The truncated guaranteed accuracy is applied as a suffix, as the following examples show:

n u m e r i c t y p e s e x a m p l e s w i t h : # bits length-name max.# digits guar.accuracy ---------- ----------- ------------ ------------- BINFLOT-16 BINFLOT-XS BINFLOT-4 BINFLOT-3 BINFLOT-32 BINFLOT-S BINFLOT-7 BINFLOT-6 DECFLOT-64 DECFLOT-M DECFLOT-16 DECFLOT-15 DECFIX-128 DECFIX-L DECFIX-34 DECFIX-33 >>-------- more convenient -------->>