`Word' data type et al._(re)

Mon, 16 Jun 1997 15:25:56 +0200



Peter Gerwinski wrote:

> I noticed a lot of confusion about Integer types in GNU Pascal.

So did I (read: I was confused in the beginning ;-). I was going to
post something about it sometime anyway...

>       32       Integer      Word           [unsigned] int == long
>       64       LongInt      LongWord       [unsigned] long long
>
> (BTW:  The number of bits is the same on all platforms, isn't it?)

Even if it is now, it probably won't be forever, I guess...

(What about the 64 bit platforms (Alpha)? Isn't Integer 64 bits
there? Should it be? What about LongInt there?)

Is it always (on every platform) guaranteed that "Integer" is strictly
bigger than "ShortInt", and "LongInt" is strictly bigger than "Integer"?

BTW: Is 64 bits the biggest possible size (at least for 32 bit platforms),
or would it be possible to make 128 bit types? (If not, this would
probably decide how big the ClassID for objects will be, BTW...)
If so, it might probably be called "LongLongInt", and then I'd also
suggest something like "LongestInt" or so to get the actually biggest
possible type on every system.

> I was told more than once that `Word' should have 16 bits (like in
> Borland Pascal).  I made it 32 bits because this is the "natural"
> size on a 32-bit system (like GPC), and it has the same size as
> `Integer' (like in Borland Pascal;-).

I think this is a good idea. However, for BP compatibility one might
define Word as 16 bit in the bpcompat system unit or so, but this should
not be the gpc default, IMHO.

Christian Wendt wrote:

> It could be depend on the Compilerswitches? (there's a --borland-pascal
> switch?)
> [...]
> advantage: BP programms can be compiled more identically.
> (maybe someone uses specific type to swap sign by adding.... %-)

Argh! But there are cases where the size is necessary (e.g. data files
that should be read). I'd favour an explicit declaration instead of this
compiler switch, because in many places, even in BP programs, you don't
necessarily want the BP sizes (e.g. one often uses "Integer" without
much thinking about the exact size, just to have "an integer data type",
so it wouldn't hurt to use gpc's 32 bit integer instead).

For a quick BP->gpc conversion, one could use the BP identifiers declared
in system. Then, to improve the program, one could find those places
where the size really matters, and put types like int8 (8 bit integer)
or such there. Those can be declared with gpc and BP.

Peter Gerwinski wrote:

> AFAIK, a "word" is defined to be the "natural" unit of the computer,
> but maybe this definition has changed since I have learned it.

I think this is correct. However, the word "Word" seems inappropriate
to me. In natural language, this word has a different meaning (as this
sentence shows), and in mathematics and computer science it has usually
also another meaning (a finite sequence of elements of a given "alphabet").
I think "Word" in this sense is only common in assembler, and crept into
Pascal through BP (correct me if I'm wrong).

So, what else? In mathematics, one says "natural numbers", but I don't
think "natural" would be a natural name for this data type... ;-).
"unsigned integer" as in C doesn't seem so good, either, simply because
it's two words.

What about a name as "ordinal" (or "cardinal" -- which one would be
better?)? In the following I'll write "ordinal", and in analogy to
"LongInt" and "ShortInt" also "LongOrd" and "ShortOrd". (The similarity
in name to the Ord function is, of course, not accidental, since ord(x)
will always be "ordinal", except if x is a signed integer, or one of the
"strange" enum values we discussed recently.)

To me, such a name would seem more "high level" (Pascalish) than "word"
(low level, assembler) or "unsigned" (medium level, C).

> It would not be difficult to implement a compiler switch (*$16-bit *) (or
> a command-line option `--16-bit') to make `Integer' and `Word' 16 bits etc.,
> but I am not so sure that this is a good idea since GPC is a true 32-bit
> compiler.  If we want to look out for a different "natural" data size for
> GPC, it should not be 16 bits, but 64 bits!

I don't think such a switch would be a good idea, or even necessary.
If someone wants Word to be 16 bits, one can just redeclare it
(e.g. in system) -- after all, "Word" is not a reserved word...

Orlando Llanes wrote:

> I have an idea, perhaps there could be platform dependant types, for
> example: VAR MyVar : PCByte; This way it would be treated as 8-bits on all
> platforms, the natural types can stay, all one has to do is use the
> platform type to compensate for it if necessary. This way, 1) the data
> works consistently across all platforms because (for example) they know
> that a PCByte is 8 bits, 2) If a byte is a different size on another
> platform, then it doesn't matter because the PCByte is the same no matter
> where it's compiled. I guess this could be accomplished by making the
> PCByte a natural Byte, but in the code, anything above the 8 bits is not
> used.

I agree (I also don't like things like "__byte__ integer", because it
reminds me of C ;-), and I find such identifiers with many underscores
"ugly" -- IMHO, if they're necessary at all, they should be there only
in a few low level modules/units). But I wouldn't like the name "PCByte".
I don't think "PCByte" will be easy to remember...

What about the following (the meanings are obvious):

Int8    Ord8
Int16   Ord16
Int32   Ord32
Int64   Ord64
Int128  Ord128 (?)
[...]

If subrange types would automatically have the correct size (currently
they don't -- unless within a packed array or record), and "subranges"
are also allowed for ranges that exceed integer (like "Ordinal" or
"LongInt"), one could just declare all of these types in a normal way
like "Int8=-$80..$7F" ... "Ord128=0..$FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF". :-)

For BP->gpc, as I said above, the types to choose depend on what's really
intended. I think one could use about the following rules (please check
them carefully; I made several assumptions about the types of which I'm
not sure they're valid on all platforms):

BP: ShortInt
gpc: Int8

BP: Byte
gpc: Ord8

BP: Integer
gpc:
- If exactly a "16 bit signed integer" is wanted, Int16
- If an integer of at least 16 bits is wanted, ShortInt
- If just "an integer" is wanted (maybe implying that it can be handled
  most efficiently), Integer

BP: Word
gpc:
- If exactly a "16 bit unsigned integer" is wanted, Ord16
- If an unsigned integer of at least 16 bits is wanted, ShortOrd
- If just "an unsigned integer" is wanted (maybe implying that it can be
  handled most efficiently), Ordinal
- If the biggest possible unsigned integer is wanted (though this might be
  rare, since one would probably rather use LongInt then in BP though
  that's signed), LongOrd

BP: LongInt
gpc:
- If exactly a "32 bit signed integer" is wanted, Int32
- If an integer of at least 32 bits is wanted, Integer
- If an integer type is wanted that's bigger than Integer (e.g. to hold
  the result of a multiplication of two integers), LongInt
- If the biggest possible [signed] integer type is wanted, LongInt
  (LongLongInt? LongestInt?)
- If an unsigned (!) integer of at least 31 bits is wanted, Ordinal
- If the biggest possible unsigned (!) integer is wanted, LongOrd

(Did I miss any possible intentions? ;-)

Another thing, only partly related to the above: is there any support
for endianness (i.e. byte order of integers that are bigger than 1 byte)
in gpc yet? (This would be necessary e.g. to read binary files on
different platforms.) If not, I'd suggest funtions like "SmallEndian" and
"BigEndian", defined for all integer types, which convert a value, given
in the normal machine's endianness, to a small endian or big endian value,
resp. (or vice versa, which is the same conversion). On each given
platform, one half of the functions would do nothing, the other half would
reverse the byte order. If there's a conditional define concerning
endianness, such functions could be implemented in Pascal with {$IFDEF}'s,
otherwise they'd have to be provided by the compiler.

Next topic:-)

While we are talking about integer sizes, what about boolean sizes?
Usually, a boolean is 1 bit, of course, but in some cases (e.g. system
calls) one needs booleans of 1 byte (this would only be a difference
in a packed array or record) or 2 bytes (e.g. Windoze API, cf. BP's
"bool" type), perhaps even more. So it might be useful to have "bool8",
"bool16", ... . The internal value 0 would be false, and any other value
would be interpreted as true, while the standard true value is 1 (or -1?).
Syntactically they would be treated just like booleans (i.e. assignment
compatibility between each other, but not with integers; can be used in
if/while/until conditions; work with and/or/xor/not/...).

And one more topic (also related to some kind of sub-types, at least):

Dr John Stockton  wrote once in c.l.p.b.:

> A better language might include sub-types of real,
>         R>0     R>=0    R<0     R<=0    R<>0 ,
> with rangechecking.

...or, more generally, arbitrary intervals. Or even more generally,
sub-types with a user-defined function(super_type):boolean to decide
if an element belongs to the sub-type. (Of course, subranges of
ordinal types are a special case of this, too, at least in theory...)

Since all of this would only affect range checking, it's not a current
topic for now. With range checking, it might not even be very difficult
to implement, just using the user-defined function instead of the
"standard" range checking function...

BTW: Does the PXSC standard have any ideas about such things?
-- 
Frank Heckenbach, Erlangen, Germany
heckenb@mi.uni-erlangen.de
Turbo Pascal:   http://www.mi.uni-erlangen.de/~heckenb/programs.htm
Internet links: http://www.mi.uni-erlangen.de/~heckenb/links.htm


Frank Heckenbach (heckenb@mi.uni-erlangen.de)

HTML conversion by Lluís de Yzaguirre i Maura
Institut de Lingüística Aplicada - Universitat "Pompeu Fabra"
e-mail: de_yza@upf.es