OOP_(re)

Mon, 2 Jun 1997 14:24:33 +0200



Peter Gerwinski wrote:

> It's even more trivial:  At the moment, I artificially forbid constructors
> to be virtual.  To enable them, I would just have to take out the error
> message.

Fine. :-)

> The only problem with this:  As I wrote, I am initializing the object
> *outside* the constructor's body.  If I enable virtual constructors, this
> must not change any more.

I don't think it's a problem, because if a constructor is called virtually,
the object must have been initialized before (otherwise the constructor's
address couldn't be found). (I'm assuming, "initializing" means only setting
the VMT link, or does it do anything more?)

But there is another problem, namely to decide whether to call the
constructor virtually (i.e. looking up its address via the VMT) or
"statically". If there's a reliable way to detect if an object has been
initialized (like we discussed earlier), this would be easy (virtually if
initialized, statically otherwise).

Otherwise I guess, one needs more complicated rules, like: always statically
when used with "New" or on variables of object types (in contrast to
pointers). And virtually in the other cases? I'm not so sure -- so it would
be better to have a reliable way to detect if an object has been initialized
-- would be nice anyway.

> Okay for ordinal types, sets and pointers, not sure about Reals.
> Strings initialized to zero this way are BROKEN because they get
> a capacity of zero which makes them useless.  :-( That's one reason
> why I would like to have `ShortString's in GPC.)

This means that Fillchar'ing an object with 0 is a bad idea (even if you
exclude the VMT link)!
So, Pierre, it seems like you'll have to renounce this feature from TV.

> > > `--store-object-names', and switched ON in Delphi compatibility mode.
> >
> > ... and OFF otherwise!
>
> Of course!  That's what I meant. :-)

I thought so -- just wanted to make really clear.

> Internally, objects are ordinary records anyway, so this makes no difference
> in hacking GPC - except that if I introduce an `interface' type I have to
> remember that something *is* an interface for no other purpose than outputting
> error messages.

But actually this is the essential purpose of typing at all... ;-)

> However, the above was not meant as a suggestion how to implement interfaces
> into GPC.  It was just my attempt to understand what they are.

As a scientific model, it seems good (unifying as much as possible), but
I think in actual programs, it's easier to distinguish by using different
keywords. Perhaps the rules are also more comprehensible this way, and
programmers familiar with Delphi or Java will recognize what they know.

> (-: Great!  It seems that I have got it!  This increases the chance that
> I will be able to implement all this into GPC. :-)

I could perhaps help with designing the structures and algorithms (see
below ;-), if desired, but probably not with hacking the compiler...

> > - Its type is a 64/128 bit integer.
>
> Really?  In David's example it's a string constant!
>
>     const
>         SIID_IActiveScriptSiteWindow = '{D10F6761-83E9-11cf-8F20-00805F2CD064}';

Yes, I saw this too, but didn't spend much thought about it.
This string constant looks suspiciously like a 32-digit hex number, i.e. a
128 bit integer. Does Delphi convert this to an integer internally (Delphi
users, please!)? And what are the (usual) meanings of the different "parts"
of this string?

Anyway, I think at least in gpc the internal representation should be an
integer, not a string. If necessary for compatibility, it might be a good
feature to convert such strings to numbers automatically -- though in native
gpc code one would rather write it as "$D10F676183E911cf8F2000805F2CD064" --
do/will such big numbers work? (Of course, one would not actually write such
numbers every time, but use some constants and add "small numbers" to them.)

> I agree, *and* having that we could claim more compatibility to Delphi thus
> making GPC more attractive for a lot of possible users.

*If* that's really Delphi-compatible! Things like this "QueryInterface" make
me think things are more compilcated in Delphi.

> > No! There aren't any instances of interfaces!
>
> Only if we explicitly forbid them.  There is no technical reason why they
> shouldn't be instantiated.

There's a very simple reason: because they have abstract methods! (Unless
empty interfaces, of course, but I doubt someone would want to instantiate
an empty interface...)

> > A variable of a pointer-to-an-interface type must point ot the actual object
> > (which can be of any type that implements that interface), and (somehow) give
> > the information where in this type's VMT the methods of that interface are
> > located.

To illustrate what I mean, let's say interface I declares methods I1 and I2.
Type T has fields F1, F2, declares methods M1, M2 and implements I (and I's
methods, of course). Variable V of type T. This would look like (assuming
everything is 4 bytes, just for simplicity):

V:
Offset Field
  0    Pointer to VMT of T
  4    F1
  8    F2

VMT of T:
Offset Field
  0    Size
  4    NegSize
  8    [... perhaps ObjID and such]
 20*   M1
 24    M2
 28    I1
 32    I2

* arbitrarily chosen

So the methods of I start at offset 28 in the VMT of type T -- and in the VMT
of all descendants of T (!), and the compiler knows that.

Let P be of type ^I, and let's assign "P:=@V". Then the internal
representation of P would look like

Offset Field
  0    Pointer to the fields of V, i.e. "@V" in the conventional sense
  4    (Pointer to VMT of T) + 28 ("VMT link to I within T")

If we go further and assume that J is a sub-interface of I that has just I2,
then J's methods start 32 in T, and generally they start at n+4 in the VMT
of a type that implements I with I's methods starting at n.

So if Q is of type ^J, it's legal to assign "Q:=P". Internally, it would
just copy the first "field" of P (the actual pointer) to Q, and add 4 to
the second one in Q.

> What about this:  Instead of an additional VMT field, the interface is
> represented in each instance of the object as an integer field which holds
> the offset of itself inside the object:
>
>     Type
>       MyObj = object ( MyInterface )  (* No "primary ancestor" *)
>         foo, bar: Integer;
>       end (* MyObj *);
>
> is represented as
>
>            [vmt field] [interface offset] [foo] [bar]
>     byte#  0           4                  8     12
>     value  @vmt_MyObj  4                  foo   bar
>           /             \
> 	 /               \
>     `PointerToObject'     \
>     is pointing here.    `PointerToInterface' is
>                          pointing here and can
> 			 look up the address of
> 			 the whole object by
> 			 substracting the integer
> 			 value pointed to.
>
> [...]
>
> This implies:
>
>   - Each instance of each object gets one additional integer field for each
>     interface the object inherits from.

Including each interface those interfaces inherit from -- which *might* be
quite a few...

>   - Each VMT gets additional pointer fields pointing to the VMTs of the
>     interfaces the object inherits from.

You mean something like the following?

VMT of T:
Offset Field
  0    Size
  4    NegSize
  8    [... perhaps ObjID and such]
 20    Offset of I, i.e. Pointer to I1 in *this* VMT
 24    Offset of J, i.e. Pointer to I2 in *this* VMT
 28    M1
 32    M2
 36    I1
 40    I2

V:
Offset Field
  0    Pointer to VMT of T
  4    4 (Interface offset I)
  8    8 (Interface offset J)
 12    F1
 16    F2

>   - Pointers to interfaces have the same format as all other pointers, but
>     they don't point to the beginning of the object but to an integer field
>     inside the object.  An explicit pointer conversion from "pointer to
>     object" to "pointer to interface" does actually change the value of the
>     pointer.

Also from "pointer to interface" to "pointer of sub-interrface".

For sub-interfaces (AFAICS), their offsets must be stored at a fixed
position after (or before) the main interface's offset -- in my example
above, J's offset must immediately follow I's, so any ^I can be assigned
to a ^J by adding 4. However, this seems possible to guarantee.

An interesting method, AFAICS it will work, too.

The disadvantage is that instances and VMTs get bigger. In my method,
pointers to interfaces get bigger, so that's a tradeoff.

I think, however, that pointers to interfaces are rarer (and mostly they
appear as parameters, i.e. only temporarily), so my method might be a bit
more economical with memory.

> Yes.  Here's the stone (info -f standards -n Portability):
>
>     You can assume that all pointers have the same format, regardless of
>     the type they point to, and that this is really an integer. There are
>     some weird machines where this isn't true, but they aren't important;

I think I heard about such a "weird machine" recently... ;-)

> This means that we mustn't introduce another format for pointers

I didn't find that node in my info file, but well...

What kind of pointers are they talking about? Pointers in C, I assume.
What effect has this on what we call a pointer in Pascal? AFAICS, it's just
a matter of definition. So let's call it an "i-pointer" (pointer to an
interface) or whatever, consisting of a (conventional) pointer and a VMT
pointer. What should prevent us from creating such a data structure as an
i-pointer (a structure that can even be declared in plain Pascal as a
record)?

BTW: Don't they also (implicitly or expicitly) assume that a pointer contains
the address of the first byte of the data it references, which is not the
case for your way of interface pointers? So the same argument ("i-pointer")
would apply to this kind of pointers, too, wouldn't it?

> (* BTW, they also say:
>
>     As for systems that are not like Unix, such as MSDOS, Windows, the
>     Macintosh, VMS, and MVS, supporting them is usually so much work that it
>     is better if you don't.
>
> which I strongly recommend to ignore!  I do not like at all some well-known
> "operating system" of a well-known company, but just ignoring it is a nice method
> to commit social suicide among computer users. *)

DJGPP and EMX are not just plain DO$ -- especially concerning pointers,
maybe they meant not supporting x86 real mode...

> > The following variable declarations could all be legal:
> >
> > T
> > ^T
> > ^I1
> > ^I1 T
> > ^I1 I2
> > ^I1 I2 T
> > ...
>
> You mean: legal types for a variable?  A list of types separated by spaces?

The syntax is another issue... - but considering that many interfaces (at
least in the typical Java examples) have names ending in "-able", this could
be a relatively "^readable understandable syntax". ;-)

Of course, I don't think one would need such declarations often, but
perhaps sometimes... -- especially the relatively simple form "^I1 T", as in
"^streamable TObject" could be of some practical use. In your model, this is
possible, too, but it complicates calling methods of T and accessing fields
of T a little.

If someone has better suggestions for the syntax (for this and the other
stuff), please!

(* BTW: Theoretically, this is quite interesting: if one consideres
(mathematically) types as sets (of all possible instances of the type and --
in the case of object types -- all possible derived types), the typing
mechanisms correspond to some set operations (subrange: subset;
record/array: set product; variant record: union; "set of": power set;
derived object type: subset). The one above would be the intersection, which
is not present up to now (because there isn't much there that could be
intersected nontrivially without MI/interfaces). *)

> > In general: P can be a variable of type pointer of (n interfaces I1 .. In and
> > optionally one object type T).
> > 
> > Legal assignments to P are objects of any type that implements all I1 .. In
> > (and is T or a descandant of T, if T is given).
> > 
> > The internal representation of P consists of the actual address of the object
> > and n addresses that point to the first method of each of the n interfaces
> > inside the VMT of the actual type of the object.
> > 
> > I hope this was understadable so far -- if not, I can try to explain again.
> 
> I am not sure that I have understood it.  Further explanation cannot hurt.
> 
> How would the virtual method calls
> [see below]
> work with this representation?
>
>     PointerToInterface:= @MyObject;

- Use "@MyObject" as the first "field" of "PointerToInterface"
- For each interface that "PointerToInterface" points to:
  - Get the VMT link of MyObject
  - Add to this the offset of the first method of the interface in the VMT
      of MyObject (known at compile time)
  - Store the result in the corresponding "field" of "PointerToInterface"

>     PointerToInterface^.MyMethod;

- Pass the first "field" of "PointerToInterface" as "Self" parameter
- Get the right "field" of "PointerToInterface" (i.e. the field corresponding
    to the interface that MyMethod belongs to -- there can be several
    matching interfaces (due to inheritance between interfaces), in this
    case it doesn't matter which one is used)
- Add to this the offset of MyMethod relative to the first method of this
    interface within the VMT -- this is constant for any type that
    implements this interface.
- Dereference the result and do the call

(In the case that "PointerToInterface" points not also to interface(s), but
also to a type T, and that MyMethod is a method of T, one could call MyMethod
more easily through T's VMT, of course.)

>     PointerToObject:= @MyObject;
>     PointerToObject^.MyMethod;

Just as it does now. Pointers to object types are not affected in any way.

> I am sceptical about this "multiple pointer" representation because there are
> *many* places in the GPC front-end relying on the fact that all pointers have
> the same format.  I don't even know all of them.

Just give it a different name then internally (like i-pointer, or here
i1-pointer, i2-pointer, ...). This might help to spot all the places where
conversions are necessary (because an i*-pointer can't be accidentally
confused with a conventional pointer then).

I mean: internally, i-pointers are a completely different data type than
pointers. Calling them pointers (and accessing them with ^ and @) is just
a syntactical issue, isn't it?

> Now we have two of them:  The solution with offsets stored in the object is
> O(1) as well.  (-: Implement both? ;-)

I fear it would only increase confusion -- and it seems tricky enough to get
one version right at all. I think we should decide for one way. Any opinions?

> > So with the first solution, AFAICS, the ObjIDs for interfaces are not needed
> > (in contrary to what I said above) -- they can be accepted in Delphi
> > compatibility mode, but I see no need for them...
>
> Agreed.  (With both solutions discussed above.)

True (it really seems Delphi uses some kind of searching for interfaces...)
-- but anyway, I like the "[]" syntax for ObjIDs for normal object types.

> > There may be some types that need a persistent ID (i.e. one that cannot
> > simply regenerated with each storing, for whatever reason), but then again,
> > ID should be a field of these special types only.
>
> I have *lots* of such types,

Really -- their IDs must be the very same in a different program run?
If so, why? I'm asking just out of curiosity.

> > > Use:  Think of a tree of objects holding numerical data.  A method of
> > > an object somewhere in that tree wants to calculate something.  For this
> > > purpose it needs some data stored elsewhere in the tree.  Then the
> > > unique ID can be used to locate that other data object.
> >
> > For this purpose, I'd use a pointer to thar other object.
>
> The whole purpose of the ID is to *get* that pointer.

But I still can't understand why you can't simply use the pointer from the
beginning, i.e. when you assign IDs (like "Factor1ID := Factor1^.ID"), you
just do "Factor1ID := @Factor1". Then you have the pointer when you need it.
Is it that the whole object "Factor1" can change (be reallocated), not only
its contents change -- the "aliasing bug"? In this case, I'd use a pointer to
a pointer to "Factor1" (like a handle to a memory block) -- this doesn't save
space, but would eliminate the need to search for a pointer via an ID, and
might be more efficient. But this is getting a bit off-topic for this
thread...

> This example is one method how to implement the `is' operator.  Another one
> - which seems the most practical for me and will probably be the way to go -
> is to store inheritance information in each VMT.

Yes, I'd like this very much (via a parent class pointer, and a list of
child classes, probably)! And, BTW, there were a few requests in clpb about
such a thing which is missing in BP -- so it will be another advantage
compared to BP, though not Delphi... :-)

After all, replacing "Typeof(V)=Typeof(T)" by "V IS T" is not only shorter,
more readable and looks more like a "clean" language, but doesn't break if
a descendant of T is used for V. In my experience, there are only few cases
where really an exact type match is wanted, usually one would rather include
(possible) descendants if one could.

> > No -- there can't be any instances of MyObj, the compiler should check this.
> > That's the main goal of the whole thing: to make these checks at
> > compile-time, not at run-time.
>
> This is how Delphi behaves?

I don't know about Delphi. This is how Java (and, AFAIR, C++) behaves, and
how it should behave (do all checks at compile-time rather than at run-time
if possible).

> > The OOP way to do this is: if something doesn't suit you, derive a new class,
> > and apply all modifications you want to the new class.
>
> Hmm ... I essentially re-worte Turbo Vision to run in graphics mode,
> while introducing many extensions.  (* I am calling the result "BO4" -
> Benutzeroberflaeche, 4. Versuch (German) which means user interface,
> 4. try. *)  I paid 600DM (~$400) to get the source because I couldn't stand to
> derive a new class from just *everything*, re-implementing the same extensions
> in each new class everywhere which would have been placed most naturally in
> `tView'.  (* Some weeks later, Borland reduced the price of the source to
> 50DM. *):

This sounds like a bad design of TV (no surprise to me...). If designed well,
changes should only be necessary in relatively few places.

Of course, if changes are made to "gpc's TV" to make it more easily adaptable
to graphics mode (or whatever), these changes could go into the main
distribution. The actual graphics mode changes (which should be rather few
then) should go into a separate package.

If this doesn't work well (i.e. if many diffs between text and graphics
version remain), this might be an indication that TV and TV/graphics are
simply different enough to not share the same classes. (And actually, one
would hardly ever use objects of both packages in the same program.) So they
could be simply two packages with "similar" source then...

> What we can do is to patch Turbo Vision for use with GPC and to distribute
> the patch.  The `GNU patch' program works very reasonably, and 90% of those
> who intend to use TV with GPC will be Borland customers who have the TV
> source anyway.

Probably 100% as soon as there's a "real gpc" TV-like library.

> > Part of the reason Borland has not released Turbo Vision into the public
> > domain is Turbo Vision remains an important and valuable feature of
> > Borland's Turbo Pascal 7.0 for DOS, which is currently available from
> > Borland.
>
> That's a joke?

Borland's kind of humour!:-(

> Turbo Pascal 7.0 is *dead* because it's no longer supported.
> That's one reson why we are working on GPC - a *living* compiler!
> What Borland still sells are parts of the corpse ... |~(  

So they adapted to M$ in yet another sense...

> And let us work on
> something which will make the world forget that TV did ever exist!  :-)

...and Borland as well (if they behave like this) -- sad to say so...

> I have attempted to compile TVision with GPC, but it lacked to much object
> support.

...and x86 real mode assembler support, I suppose... ;-(
-- 
Frank Heckenbach, Erlangen, Germany
heckenb@mi.uni-erlangen.de
Turbo Pascal:   http://www.mi.uni-erlangen.de/~heckenb/programs.htm
Internet links: http://www.mi.uni-erlangen.de/~heckenb/links.htm


Frank Heckenbach (heckenb@mi.uni-erlangen.de)

HTML conversion by Lluís de Yzaguirre i Maura
Institut de Lingüística Aplicada - Universitat "Pompeu Fabra"
e-mail: de_yza@upf.es