Shevek (shevek) wrote,

So, what's the theory? I'll give a brief introduction and if you'll forgive the use of sorting as the classic example, it might make sense.

A map-reduce architecture is a distributed divide-and-conquer approach to data processing. The map is the transformation, and the reduce is the combination of the transformed data. The classic merge sort is an example: divide, sub-process parts, and merge together. In the real world, the transformation might be to divide based on the presence of a feature, e.g. a search term, in the input, and build separate indexes in the map stage. Reduce may do no more than detect termination.

In order to guarantee termination, algorithms of this class are always described as a tree of computation such that an inductive proof of termination is possible. For example, sort(N) = sort(N/2) + sort(N/2) + merge(N), and sort(1) and merge(N) are defined. This structure makes scheduling computation and detecting termination quite easy.

So what happens when computation does not form a tree or DAG? What happens if the structure of the computation contains loops for which an inductive proof is not possible? What happens if insufficient information about the structural decomposition of the algorithm is available to the compiler? A map (e.g. in a cycle) must terminate when it is no longer reachable, but this reachability can no longer be proved by passing an end-of-stream signal, or returning from a function. Without a more advanced static analysis of the program structure to know when a map sub-process must be terminated, we cannot schedule a reduce.

There is an algorithm which can still detect termination if it occurs, furthermore schedule the reduce operations at the earliest provably correct point without waiting for a global condition, and so forth. Intuitively, we felt that the answer existed, and we used a fairly heavyweight logic solver to analyse specifications of the problem until we found a correct formulation.

While we could generate an optimal scheduling of any program using the logic solver, I then spent a little while converting the logical specification into an imperative algorithm, which we now use to schedule programs.

I suspect my next exercise will be in applied calculus, something I have considered little since my first year. I suspect further that this post does not make sense, so I will restrict the readership so as to reduce my own embarassment.

The Exam.

So, did you see the game last night?
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.