The place of the character set in a document
To clarify what character sets are for (and what they do not do) it is illustrative to consider the
layers of specification of a document stored within a computer.
Assume that the document is something such as the word processor file for this paper:
- on a physical device (floppy disk) there is a file
- the document is held in the file
- the document contains text
- the text contains formatting codes and characters
- the characters are encoded.
Thus the character set comes right at the bottom of the tree. It is concerned with capturing only
the content of the text of the document. It is not concerned with the physical layout or typography of the document. It is not concerned with the actual semantics of the document.
For completeness it is also a good idea to define the following:
- character
- a single text element conveying either sound and/or meaning
- character set
- a fixed number of code points representing different characters
- alphabet
- a fixed number of character necessary to write a language
-
- a rendering of the shapes (or glyphs) for the characters of a character set
- typeface
- a minor visual variation of a typeface (such as size or bold)
Having considered a little history and some definitions we are now in a position to worry about
the current state of affairs.
Back to table of contents
To next section: Characters of the world
To previous section: History of character encoding