Shevek (shevek) wrote,

More on writing fast compilers and interpreters

Somebody reminded me that people do read LJ sometimes, so behind this cut you will find high speed procedural double-dispatch in Java, also known as how to switch on two class pointers.

(Ignoring the obvious method using casts, exceptions or instanceof in conditionals, it's FAR too slow.)

In an ideal double-dispatch system, we would have control over all the classes in the system, and we could write a dispatch method in our superclass, e.g.

public abstract MyBaseType add(MyBaseType right);
public abstract MyBaseType addToMyIntType(MyIntType left);
public abstract MyBaseType addToMyDoubleType(MyDoubleType left);

and override each method in MyType, for each type we have in the system.

public MyBaseType add(MyBaseType right) { right.addToIntType(this); }
public MyBaseType addToMyIntType(MyIntType left) { return ... left.intValue() + this.intValue(); }

and then call left.add(right). Look! No casts! No instanceof! No conditionals! Thus, we use the combination of the two vtables similarly to a visitor pattern to achieve double dispatch. However, this assumes that we have control over every class in the system, and that rapidly gets restrictive, ugly and inconvenient. So we're going to implement a procedural multiple-dispatch using existing unmodified classes, and we are going to do it fast.

Well, first we need a perfect hash on classes. Class.hashCode() isn't sufficiently constant to use in a switch - actually, it's the identity hash code, and the system is (thankfully) allowed to GC classes. So, we need another hash function. The only relatively constant thing about classes is the name. I could do a perfect string hash on the name, but actually, the length of the name is sufficient:

    public static ObjectType getObjectType(Object o) {
        if (o == null)
            return OT_NULL;
        Class   c = o.getClass();
        switch (c.getName().length()) {
            case 14:
                if (c == Long.class)
                    return OT_LONG;
                if (c == Byte.class)
                    return OT_BYTE;
            case 15:
                if (c == Float.class)
                    return OT_FLOAT;
                if (c == Short.class)
                    return OT_SHORT;
        patch up using slow conditionals;

99% of the time, we will hit a fast case. (Those who are particular about these things will check that the implementation of Class.getName() is as fast as expected. It is.) But what is this OT_*? It's an enum with behaviour:

    protected static enum ObjectType {
        OT_NULL   (                                   OP_STRING),
        OT_LONG   (       OP_LONG         |OP_DOUBLE |OP_STRING),

The OP_* constants form a bitfield, IN PRIORITY ORDER, that is to say, for standard java semantics and the add operation, we have the following declarations:

    private static final int    OP_STRING   = 0x001;
    private static final int    OP_DOUBLE   = 0x010;
    private static final int    OP_FLOAT    = 0x020;
    private static final int    OP_LONG     = 0x040;
    private static final int    OP_INT      = 0x080;

Now, the operation we perform is the highest common 1-bit between the masks in the ObjectTypes of the two operands.

For example, given an Integer and a Float. The (nearly-)perfect hash gives us OT_LONG and OT_FLOAT, which have opmasks (OP_LONG|OP_DOUBLE |OP_STRING) and (OP_FLOAT|OP_DOUBLE |OP_STRING). The combination of these is (OP_DOUBLE|OP_STRING) and the highest 1-bit is OP_DOUBLE. (See Integer.highestOneBit(int) - no jumps, so inlines with no pipeline breaks)

Now have an OP_* which we can switch on. I suppose if we were really clever, we'd compute the number of trailing zeros after the highest one bit, which would compress the lookupswitch into a tableswitch, but we aren't feeling that clever today.

In overview, Class => PerfectHash => ObjectType => OpTypeMask => highestOneBit => LookupSwitch implements double dispatch in about twenty operations, many of which hotspot to sub-cycle hardware opcodes.


Performance tests show that custom RTTI is much faster than single dispatch, never mind double dispatch, so if you do have control over all your classes, that's what to do: switch (unknown.getTypeCode()) {...

Other bootnote:
I'll elide the enum and just make the first switch return the integer before this goes into production, because I'm fairly sure the hotspot doesn't do that. Also, lowestOneBit is much faster than highestOneBit, so I'll use that and reverse the order of the OP_* constants.
Tags: java
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.