Character sets

Information about the Macintosh & Windows character sets


Other pages will tell you exactly how Unicode characters are displayed in browsers, how to do this and which characters are numerically defined. See the Unicode information index page.


Summary

NL: Overzichten van de Macintosh en Windows tekensets en het lettertype Symbol.

What type of questions will be answered by this page?
To give a couple of examples:
  • The Spanish n-tilde (ñ) typed on a Mac shows as an endash (-) on a Windows machine. Why? Does this happen haphazardly?
  • The n-tilde (ñ) typed on a Windows shows as a capital O with gravus accent (Ò) on a Mac. (Mac users with a TrueType + bitfont of Geneva will see a box; display this page in Times or set an awkward font size like 13 points).
  • 16 Windows characters cannot be shown on a Macintosh with one of the standard fonts.
  • 25 Macintosh characters cannot be displayed on a Windows computer with a standard font.
  • Are there any differences between the Windows Latin 1 encoding and the internationally standardized ISO-8859-1 encoding?
  • Where is the Euro currency symbol?
  • Can I display reliably any of the Symbol font characters?
  • What are the names of all these Symbols?

 

Background
If you type some text in a word processor and save it, your computer has to convert your keystrokes to numbers. A big problem is: what numbers will the computer use?
Several decades ago the American Standard Code for Information Interchange or ASCII emerged as a standard. Originally only 7 bits or 128 numbers were used to encode A-Z, a-z, 0-9, punctuation marks, some control characters and other signs. The computer industry felt this were not enough number positions to encode all necessary characters, so an extra bit was used. Unfortunately there were many completely different 8-bits ASCII and other encoding schemes developed and the confusion remains until today.

Diacritics and such
Why should you worry about encodings? Most people write simple texts for which the 7-bits ASCII encoding gives excellent results. But many people would like to write texts in their native tongue, including characters with diacritics (u-umlaut, e-acute, n-tilde, etc.), or they would like to use special symbols, even as simple as currency symbols (yen, florin, peseta, etc.).
Using any special symbol beyond A-Z, a-Z, 0-1 or some punctuation marks gives horrible problems. If you write an email on a Macintosh computer and send it to a Windows user, the recipient will holler that your stuff is not very readable. If a Windows user layouts a beautiful webpage with diacritics in the text, the Macintosh reader whines that the text is filled with funny characters. Matters grow worse if you try to use really special characters like the Symbol font in mathematical equations or Russian characters.

Unicode
Wasn't Unicode invented a decade ago just to put an end to all encoding difficulties?
Yes, but some people think Unicode is still not complete and other people are not interested in using Unicode by the rules. Matters would be very simple if every font maker would provide a big font file with all 65535 possible Unicode 1.0 characters - and even more in Unicode 3, which would suffice to display all possible characters and symbols in all languages of the world. Of course every piece of software would have to be able to use these Unicode fonts. In that case every Macintosh, Windows, Linux, whatever computer user would find that the Greek delta is at the same place in his font, so he could communicate to every other user without any shadow of a doubt.
Why the computer industry is not able or not willing to apply this scheme is not clear. Reality is that we are still stuck with many incompatible situations.

How to encode Unicode characters in a web page?
Look up the decimal, not the hexadecimal value of the character you would like to encode. Put it in the html page as follows:
€
if you would like to encode the Euro sign. Do not forget the semicolon - an often made mistake.
Between the <head> and </head> tags in the web page a charset meta instruction has to be placed:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Use only 1 charset meta per page and put it in the head section.

Tables
There are already several webpages giving a comprehensive overview of the history of all encoding schemes. Below you can find the exact differences between the Macintosh MacRoman encoding, the Windows encoding and the ISO-8859-1 encoding. Furthermore you can find an overview of the Symbol font in several encodings.
The tables will open in a new window and this text will stay on your screen.
Would you like to print the tables? Try Landscape orientation. The Symbol tables can be printed in Portrait orientation; probably the last table has to be scaled by 95%.

From Macintosh to Windows
The tables Macintosh viewpoint 1 and Macintosh viewpoint 2 show all difficult characters. Difficult are all ASCII characters with a numerical value of 128 or above. These two tables show the 'upper' ASCII characters of any standard Macintosh font.
The meaning of the columns is as follows.
* Column Mac: ASCII value on the Mac.
* Column char: the character itself.
* Column Wnd: the ASCII value of that character on a Windows PC.
* Column PostScript: the PostScript name of the character.
The PostScript name of the character is useful to identify probably strange characters, for instance character number 245. This is not the number 1, but an i without a dot. Why didn't I give the Unicode names? Well, they are much longer. The shorter PostScript names made the tables more compact.

From Windows to Macintosh
The tables Windows viewpoint 1 and Windows viewpoint 2 show all difficult characters. Difficult are all ASCII characters with a numerical value of 128 or above. These two tables show the 'upper' ASCII characters from any standard Windows font and the standard Latin 1 encoding.
* Column Mac: ASCII value on the Mac. Note that the tables are sorted according to the Wnd column.
* Column char: the character itself.
* Column Wnd: the ASCII value of that character on a Windows PC.
* Column PostScript: the PostScript name of the character.

Characters that Macintosh and Windows have in common
The tables Common 1 and Common 2 show all characters that can be displayed on a Macintosh and on a Windows PC as well. Of course the characters have different ASCII values on both computers. The tables ares sorted by character function, so first the accented letters, then diacritics, punctuation marks, footnote symbols, currency symbols and other symbols. Note that instead of the round currency symbol, most new Macintosh fonts display the Euro currency symbol.
The Unicode column shows the correct decimal entity-number for encoding the character.

Problem characters
Although Macintosh and Windows have many characters in common - be it not always at the same position - some characters are not displayable on the Macintosh and some are not displayable on a Windows PC, unless you use a special font. In this table the problem characters are displayed.
All traditional PC and Mac fonts use 1 byte for encoding and that means that there are 256 different positions available for characters. In principle Windows fonts have no characters defined for 9 dispersed positions above 127.
The biggest difference between standard Windows encoding and the internationally standardized ISO 8859-1 encoding is that there are no characters defined between and including the positions 128 - 159 in ISO.
Another problem is the Euro currency symbol. In most of the new Macintosh fonts the general currency symbol has been replaced by the Euro symbol and in most of the new Windows fonts a formerly unused position got the Euro symbol. There is even a Unicode position defined for the Euro symbol: 20AC hexadecimal, which is the same as 8364 decimal.
The Unicode column shows the correct decimal entity-number for encoding the character.

The Symbol font
Macintosh and Windows computers and PostScript printers can display characters of the Symbol font. Two types of characters can be found in this font: Greek letters and mathematical symbols. Since the encoding under Mac OS and Windows is the same, problems should not occur, but unfortunately they are horrible. If a Windows user produces a nice math page with equations in the Symbol font, then a Mac user will usually not be able to display the page properly, whatever he or the maker will try.
Only one way leads to a possible success: encode all Symbol characters in your webpage with Unicode entities. Do not use hexadecimal numbers, but decimal numbers. Of course the viewer of your page must install the Symbol font or equivalent.
The following pages display the encodings for the Symbol font, expressed as ASCII-like character positions in decimal and hex, Unicode encoding, PostScript name and Unicode name. Nothing is defined between 127 and 159 decimal or at 255.
The newest Apple fonts have the (old) currency sign on position 160 (Unicode 164 decimal). Note that many math programs install their own Symbol font without the currency sign.
Position 32 - 55
Position 56 - 79
Position 80 - 103
Position 104 - 126
Position 160 - 183
Position 184 - 207
Position 208 - 229
Position 230 - 254

 


© Oscar van Vlijmen, January 2000

URL of this page: http://home.kpn.nl/vanadovv/uni/charsets.html
Page last updated: 2002-03-15

Go to/Back to the index.