OOP

Sun, 1 Jun 1997 19:45:15 +0200 (MET DST)


According to Frank Heckenbach:
> 
> Peter, what about making virtual constructors soon? Should be relatively
> easy (I hope), just "combining" the properties of constructors and virtual
> methods? ;-)

It's even more trivial:  At the moment, I artificially forbid constructors
to be virtual.  To enable them, I would just have to take out the error
message.

The only problem with this:  As I wrote, I am initializing the object
*outside* the constructor's body.  If I enable virtual constructors, this
must not change any more.

> What can be assumed about data fields after FillChar'ing them with 0?
> I suppose, for all ordinary types, one can assume the value with "ord 0",
> right? For real numbers, probably nothing can be assumed!? What about sets,
> can [] be assumed? Pointers=NIL? ...

Okay for ordinal types, sets and pointers, not sure about Reals.
Strings initialized to zero this way are BROKEN because they get
a capacity of zero which makes them useless.  :-( That's one reason
why I would like to have `ShortString's in GPC.)

> ... and if it doesn't add misfeatures to other people, or the misfeatures
> can be turned off completely, as with:
> 
> > `--store-object-names', and switched ON in Delphi compatibility mode.
> 
> ... and OFF otherwise!

Of course!  That's what I meant. :-)

> I'm not sure if this notion really helps. Doing it your way, the compiler
> has to distinguish between regular object types and interfaces. And I think
> the code gets clearer if interfaces are clearly recognizable as such.
> I don't see an advantage of your way: interfaces can't be instantiated,
> anyway (since they're abstract), and a (regular) object type that should be
> "inherited" form an interface, can simple implement this interface and
> inherit from no type.

Internally, objects are ordinary records anyway, so this makes no difference
in hacking GPC - except that if I introduce an `interface' type I have to
remember that something *is* an interface for no other purpose than outputting
error messages.  In this sense, I would prefer not to introduce interfaces
as another data type but to allow MI for some special cases which I would
have to check anyway.  (See below for a relativation of this.)

However, the above was not meant as a suggestion how to implement interfaces
into GPC.  It was just my attempt to understand what they are.

> > [...]
>
> Correct. :-)

(-: Great!  It seems that I have got it!  This increases the chance that
I will be able to implement all this into GPC. :-)

> (Of course, there can be conflicting method identifiers with interfaces that
> are implemented, but these should simple generate "duplicate identifier"
> errors.)

(Agreed.)

> As I said above, I'd vote for the second idea ("interface" is a keyword
> anyway). AFAICS, the Delphi syntax as shown in the example by David looks
> like we could adopt most of it -- as I said, I'm not sure about the IUnknown
> bit, but I think it can be optional (just declared as an empty interface for
> compatibility reasons).

Agreed.  If a language dialect already exists which (i) supports what we
want and (ii) does it in a clean way, we should adopt it rather than
inventing a new method.

> Also, I think Delphi's syntax
> 
>    IWhatever = interface(...) [ID];
> 
> looks like a convenient way to declare the ObjID, also for regular object
> types.
> 
> To sum up (again) what I now think about ObjIDs:
> 
> - ObjID (or whatever it will be called) is an object constant of every
>   object type and interface type.
> 
> - Its type is a 64/128 bit integer.

Really?  In David's example it's a string constant!

    const
        SIID_IActiveScriptSiteWindow = '{D10F6761-83E9-11cf-8F20-00805F2CD064}';
    
    type
      IActiveScriptSiteWindow = interface(IUnknown)
          [SIID_IActiveScriptSiteWindow]

> [...]
> 
> This removes all needs for class registration, and perhaps solves some
> problems with interfaces. I think I like that!

I agree, *and* having that we could claim more compatibility to Delphi thus
making GPC more attractive for a lot of possible users.

> > (* Hmm ... the above rules for "careful use of MI" could be useful for *)
> > (* C++ programmers ... perhaps we should tell them?                   :*)
> 
> Hmm ... I think I know some more rules for "careful use of C[++]". Should
> we tell them? Would they listen to us? Would they laugh at us? ...

Who knows ...

> > > AFAICS, the only thing that really makes problems are variables (or
> > > parameters) of interface types.
> >
> > What's the problem?  An instance of an interface would be an empty object,
> > containing nothing besides the VMT pointer.
> 
> No! There aren't any instances of interfaces!

Only if we explicitly forbid them.  There is no technical reason why they
shouldn't be instantiated.  (And there is no practical reason why they should
be, so it's safe to forbid it.;)

> A variable of a pointer-to-an-interface type must point ot the actual object
> (which can be of any type that implements that interface), and (somehow) give
> the information where in this type's VMT the methods of that interface are
> located.

Then the interface must appear as an additional VMT field either in each
instance of the object or in its VMT.  In the first case, the assignment

    PointerToInterface:= PointerToObject;

would add some number to the value of `PointerToObject', so
`PointerToInterface' will point to the VMT of the interface, not that of the
object.  In the second case, the same assignment would implicitly dereference
`PointerToObject', look up the VMT of the interface within the VMT of the
object and let `PointerToInterface' point to that ... ah - no!  This would
not work, because then

    PointerToInterface^.SomeMethod;

would have no chance to locate the implicit `Self' parameter.  Okay, so forget
about the second idea above; each object gets an additional VMT field for each
interface it inherits.  (Even then I am not sure that the above mechanism will
work in all cases ... :-/ )

> > Do you mean:  If an object implements an interface (in Java sense) it must
> > always be accessed through a pointer?  If so, why?
> 
> No, if type T implements interface I, there can be a variable V of type T,
> no problem. But V can't be of type I (since interfaces can't be
> instantiated). You can, however, declare a variable P of type ^I and assign
> @V to P (since V has all the properties that I demands).
> 
> This is no special rule, it follows from the fact that interfaces can't be
> instantiated. The same holds for abstract object types. Assuming TObject is
> abstract, there can't be a variable of type TObject, but there can be
> variables of type PObject, and there can be VAR parameters of type TObject.

Ah - now I understand.  :-)  I tend to believe now that the above mechanism
(assigning the address of an additional VMT field to `PointerToInterface')
can work ...

What about this:  Instead of an additional VMT field, the interface is
represented in each instance of the object as an integer field which holds
the offset of itself inside the object:

    Type
      MyObj = object ( MyInterface )  (* No "primary ancestor" *)
        foo, bar: Integer;
      end (* MyObj *);

is represented as

           [vmt field] [interface offset] [foo] [bar]
    byte#  0           4                  8     12
    value  @vmt_MyObj  4                  foo   bar
          /             \
	 /               \
    `PointerToObject'     \
    is pointing here.    `PointerToInterface' is
                         pointing here and can
			 look up the address of
			 the whole object by
			 substracting the integer
			 value pointed to.

Now, how to call a virtual method of the interface?  The following steps are 
needed to perform the call "PointerToInterface^.MyMethod"  (each step
corresponds essentially to one instruction on a CISC processor like the
iX86 - or to one "tree node" passed from the GPC front-end to the GNU
compiler's back-end):

  - Dereference `PointerToInterface' and get one integer.

  - Substract the integer from the pointer and get the address of the object.

  - Pass the new pointer as the `Self' parameter.

  - Dereference the new pointer and get the address of the VMT.

  - Find the address of the VMT of the interface in the VMT of the object at
    an offset which is derived from the integer we got in the first step.

  - Find the address of the virtual method at a fixed place in the VMT of the
    interface, and do the call.

In contrast, the following steps are needed to call an "ordinary" virtual
method "PointerToObject^.MyMethod":

  - Dereference `PointerToObject' and get the address of the VMT.

  - Pass `PointerToObject' as the `Self' parameter.

  - Find the address of the virtual method at a fixed place in the VMT, and do
    the call.

This implies:

  - Calling a virtual method inherited through an interface roughly takes
    twice the time of calling an "ordinary" virtual method, but it's still
    O(1) (no real search required).

  - Each instance of each object gets one additional integer field for each
    interface the object inherits from.

  - Each VMT gets additional pointer fields pointing to the VMTs of the
    interfaces the object inherits from.

  - Pointers to interfaces have the same format as all other pointers, but
    they don't point to the beginning of the object but to an integer field
    inside the object.  An explicit pointer conversion from "pointer to
    object" to "pointer to interface" does actually change the value of the
    pointer.

> > > - A "pointer" to an interface variable consists of two parts: the actual
> > >   pointer to the variable, and the VMT offset of the first method (or,
> > >   alternatively, directly the adress of the first method in the VMT).
> > >
> > >   Disadvantage: The pointer gets twice as big. The difference must be
> > >   considered when assigning it to another pointer (this could be an untyped
> > >   pointer or a pointer of one of the "parent" interfaces - in the latter
> > >   case the VMT offset has to be adjusted).
> >
> > I'm afraid we can forget about this for that reason.
> 
> Why? Is there a rule carved in stone that a pointer must consist only of a
> memory address?

Yes.  Here's the stone (info -f standards -n Portability):

    You can assume that all pointers have the same format, regardless of
    the type they point to, and that this is really an integer.  There are
    some weird machines where this isn't true, but they aren't important;
    don't waste time catering to them.  Besides, eventually we will put
    function prototypes into all GNU programs, and that will probably make
    your program work even on weird machines.

This means that we mustn't introduce another format for pointers - and that we
can savely assume that you can cast a pointer to an integer (see my mail
about 8-byte pointers on the DEC Alpha ... @#*!).

(* BTW, they also say:

    As for systems that are not like Unix, such as MSDOS, Windows, the
    Macintosh, VMS, and MVS, supporting them is usually so much work that it
    is better if you don't.

which I strongly recommend to ignore!  I do not like at all some well-known
"operating system" of a well-known company, but just ignoring it is a nice method
to commit social suicide among computer users. *)

> Actually, I'm going to take this a bit further (I don't think Java has this,
> but why not):
> 
> Let T be any object type and In be some interfaces.
> 
> The following variable declarations could all be legal:
> 
> T
> ^T
> ^I1
> ^I1 T
> ^I1 I2
> ^I1 I2 T
> ...

You mean: legal types for a variable?  A list of types separated by spaces?

> In general: P can be a variable of type pointer of (n interfaces I1 .. In and
> optionally one object type T).
> 
> Legal assignments to P are objects of any type that implements all I1 .. In
> (and is T or a descandant of T, if T is given).
> 
> The internal representation of P consists of the actual address of the object
> and n addresses that point to the first method of each of the n interfaces
> inside the VMT of the actual type of the object.
> 
> I hope this was understadable so far -- if not, I can try to explain again.

I am not sure that I have understood it.  Further explanation cannot hurt.

How would the virtual method calls

    PointerToInterface:= @MyObject;
    PointerToInterface^.MyMethod;

and

    PointerToObject:= @MyObject;
    PointerToObject^.MyMethod;
   
work with this representation?

I am sceptical about this "multiple pointer" representation because there are
*many* places in the GPC front-end relying on the fact that all pointers have
the same format.  I don't even know all of them.

> > That's the problem why I initially asked about MI.
> 
> The problem would not be any easier with MI.
> [Proof: You showed above that interfaces are just a special case of MI. ;-]

I meant:  It is easier with interfaces than with MI because they are just
a special case.

> > Does anybody know how C++ solves that problem?  Or Java?  Or Delphi?
> 
> No, but from Delphi's interface IDs I gather it uses something like the
> second way. (And since it runs under Windoze, efficiency doesn't matter,
> anyway... ;-)

One day, I will compile some Delphi stuff and look at the generated code in
order to figure this out.  (But I wouldn't mind if somebody were faster than
I and would tell me the result ... ;-)

> > But this is quite an interesting problem - not a technical one, how to
> > implement this-or-that without interfering with that-or-this syntax from
> > another dialect.  Here we have a problem where it is not even clear that
> > a fast (i.e. O(1)) solution exists.  :-(-:
> 
> There is a O(1) solution -- the first one!

Now we have two of them:  The solution with offsets stored in the object is
O(1) as well.  (-: Implement both? ;-)

> It increases the memory needed for (pointer-to-)interface type variables/
> parameters, but this might just be the prize we have to pay. It's O(1) in
> size and speed, and it takes more space than now only when one actually uses
> interfaces. Doesn't seem too bad to me!

What I am really afraid of is how much of GPC would break when we change the
size of some pointers ... :-/

> So with the first solution, AFAICS, the ObjIDs for interfaces are not needed
> (in contrary to what I said above) -- they can be accepted in Delphi
> compatibility mode, but I see no need for them...

Agreed.  (With both solutions discussed above.)

> > > If you used the address everywhere you use the ID now, you would know,
> > > wouldn't you?
> > >
> > > [...]
> > >
> > > Yes, but what do you need to do with IDs?
> >
> > The unique ID can be stored in a stream; the address cannot.
> 
> A valid point! But I think the IDs should be generated within the storing
> routines, and resolved (to pointers) within the loading routines. This can
> be done quite efficiently, O(n log n), perhaps O(n^2) worst case. No need
> to keep the IDs during the (regular) operation, wasting memory and time.
> 
> There may be some types that need a persistent ID (i.e. one that cannot
> simply regenerated with each storing, for whatever reason), but then again,
> ID should be a field of these special types only.

I have *lots* of such types, but I agree that only those should have that ID.
There's no need to equip *all* objects with an ID.

> > Use:  Think of a tree of objects holding numerical data.  A method of
> > an object somewhere in that tree wants to calculate something.  For this
> > purpose it needs some data stored elsewhere in the tree.  Then the
> > unique ID can be used to locate that other data object.
> 
> For this purpose, I'd use a pointer to thar other object.

The whole purpose of the ID is to *get* that pointer.

> When storing the whole thing to a stream, the pointers can be converted to
> (numerical) IDs that are unique to this data structure in this stream at this
> time. While loading the stream, the IDs can be converted back to pointers.
> (This takes some programming effort, but it's a one-time job! I think I could
> program these conversions if necessary.)

My problem is that I need a numerical value from some object somewhere in the
application where I don't even know whether it already exists.  Having IDs, I
can do a search for the other object.  If it exists, I get a pointer to it;
otherwise I get `Nil' and can do some action to make the other object appear
...

All this is done in a generalized `HandleEvent' method which will be in my
`BaseObj'.  Methods don't use up memory per instance, only per VMT which can
be neglected, so this does not harm performance.

> > > Where do you get the SelfIDs from? Perhaps a list of IDs stored in a parent
> > > object? You could put the addresses there instead, couldn't you?
> >
> > The IDs must be arranged in a way that you can read off them what kind of
> > object we have.
> 
> ??? Now I think you lost me!
> 
> I assume, by "kind of object" you mean its type, right?

Yes.

> But the type information is already there (through the VMT link), isn't it?
> Any procedure can check the VMT link (together with the "IS" operator) to
> examine the type of any object it has a pointer to -- and usually, type
> destinctions should not be made be the caller at all, but by the called
> object (through virtual methods).

This example is one method how to implement the `is' operator.  Another one
- which seems the most practical for me and will probably be the way to go -
is to store inheritance information in each VMT.

> > And calls to `Foo' in instances of `MyObj' would yield a run-time error?
> 
> No -- there can't be any instances of MyObj, the compiler should check this.
> That's the main goal of the whole thing: to make these checks at
> compile-time, not at run-time.

This is how Delphi behaves?

> The OOP way to do this is: if something doesn't suit you, derive a new class,
> and apply all modifications you want to the new class.

Hmm ... I essentially re-worte Turbo Vision to run in graphics mode,
while introducing many extensions.  (* I am calling the result "BO4" -
Benutzeroberflaeche, 4. Versuch (German) which means user interface,
4. try. *)  I paid 600DM (~$400) to get the source because I couldn't stand to
derive a new class from just *everything*, re-implementing the same extensions
in each new class everywhere which would have been placed most naturally in
`tView'.  (* Some weeks later, Borland reduced the price of the source to
50DM. *):

> [IDs ...]
> 
> > > And what kind of things would you do with an unique integer ID?
> >
> > Inter-process communication and message passing for one.
> 
> No problem with pointers! (Assuming the objects reside in some kind of shared
> memory, but otherwise an integer ID would be quite useless as well.)

I think the point is to get the address of an object you are not even sure
that it already is in memory.  All this can be done with a common *method* of
all objects (which doesn't waste space, see above).  A unique ID in each
object can be useful in the implementation of that method, but it does waste
space, so I would vote against having it in the "mother of all objects".  But
since we won't restrict GPC to have exactly one class library, that's a matter
of taste, IMHO.  (Just my 2Pf.)

> [... snip ...]

> If things are done as I suggested, you could do things like:
> 
> type t=object
>          const c:integer=2; {stored in VMT of x}
>          var cv:integer=3;  {class variable; stored in VMT of x; syntax???}
>          v:integer;         {stored in data area of o}
>        end;
> 
> var o:t;
> ...
> o.v:=o.cv+o.v;
> ...

This syntax seems reasonable for me.  However:  Other suggestions?

> > [...]

> > One of the things I hate about some frameworks that will remain
> > nameless is the multitude of ancestors. It makes it really tedious
> > when you are trying to find out what is really going on in an object
> > (sometimes having to plough through myriads of objects in myriads
> > of units). Sometimes inheritance can be taken too far.
> 
> You seem to be doing the other extreme. The former might be tedious and
> confusing to use, especially at first sight, and it requires good
> documentation, but the latter can lead to real problems if code from
> different sources doesn't fit together.

I agree that some way "in between" is the way to go.  However, there are many
points in Borland's object hierarchies which seem to me as if they were
designed with the goal to avoid the exchange of source code.  Not considering
myself as a "free software extremist" I state that you can forget about many
"hooks" in an object hierarchy (e.g. in Turbo Vision which I know best) if you
are not worrying about someone else reading your source code (which is still
the best documentation you can have for any library anyway).  (After having 
studied TV's source code extensively I understand why Borland first didn't 
want everybody to read it ... ;-)

Phew!  That was a long e-mail ...

Later,

    Peter

  Dipl.-Phys. Peter Gerwinski, Essen, Germany, free physicist and programmer
peter.gerwinski@uni-essen.de - http://home.pages.de/~peter.gerwinski/ [970201]
 maintainer GNU Pascal [970510] - http://home.pages.de/~gnu-pascal/ [970125]


Peter Gerwinski (peter@agnes.dida.physik.uni-essen.de)

HTML conversion by Lluís de Yzaguirre i Maura
Institut de Lingüística Aplicada - Universitat "Pompeu Fabra"
e-mail: de_yza@upf.es