Funnily enough, bytecoding the thing is easy. Interpreting it is fairly easy. Defining accurate semantics for the operators is a total bastard. Semantics which worked perfectly well when I thought it was sequential simply don't work in the parallel case because the concept of "termination" of an operator has gone away. You don't know when you've finished evaluating an operator. But I need that concept in order to print the result. So I'm redefining the time-extent of an operator in terms of active resources and doing the resource management manually (a bloody pain) so that when an instance has no active resources, it considers itself terminated.
This means reference counting, which is hard work, and hard to make exception-safe, because when you're parallel, you don't necessarily "return" down the same stack you went up, so finally-constructs don't work. There has to be a better way.