I received this interesting comment on one of my previous blogs:
So, Nikita, you propose two equations:
(1) Grid Computing = Compute Grid + Data Grid
(2) Scale out on grid = Data Partition + Affinity MapReduce
Both are interesting and important statements that should be carefully analyzed. The first equation basically comes down to data-aware scheduling which (as you eloquently explain in your last week’s blog) means that the algorithm that assigns jobs to specific compute resources must take into account the distribution of data over the grid and affinity property between the job and the data partitions. Reasonable data-aware schedulers are still rare and I am very glad to see GridGains coming up with a commercial implementation. We recently build a demo that measures the performance of a “typical” job with and without data affinity. The performance is affected by the factor of 2x to 3x, simply based on data-aware routing being switched on and off. Clearly, this is very important direction for grid middleware; I am convinced that data-aware, affinity-capable grid middleware will someday become mainstream.
Let us not forget that all this concerns job-centric processing. For throughput computing that operates under a shower of real-time transactions, the equivalent concept to “data-aware scheduling” of jobs would be “data-aware routing” of these transactions. Mainstream Data Grid middleware, like GigaSpaces and Oracle Tangosol, have long been able to handle this scenario. Nowadays, GigaSpaces is moving towards support of “data-aware scheduling” through the concept of Processing Unit on the Service Grid.
Now, your second equation raises a question. Are you talking about scaling out the data grid in a static or dynamics sort of way? In other words, is the objective to allow for “a-priori” arbitrary large number of partitions with scalable MapReduce or to be able to adjust the number of partitions dynamically in response to the sporadic jumps in the payloads across the entire grid fabrics? If it’s the former, then data partitioning with affinity is the traditional answer. If it’s the latter, well, then we need to solve a lot of hard problems for stateful, data-aware services that are outside of your equation. I see dynamic scaling of stateful services as being increasingly important area of research and commercial implementations. Sun’s project Hedeby is an interesting step in this direction.
It comes from
Victoria Livshitz who led grid-related efforts at Sun and now she’s running GridDynamics. I want to concentrate on subject of
dynamic vs. static nature of GridGain’s MapReduce implementation.
In GridGain every aspect of executing task on the grid is dynamic by design. Dynamic in the following list means that decision is made at runtime and can be programmed by the developer of the task or SPI with whatever logic is required:
- Dynamic topology management
- Dynamic deployment
- Dynamic failover
- Dynamic early load balancing (“map” in MapReduce)
- Dynamic late load balancing (a.k.a. collision resolution)
- Dynamic reduction (“reduce” in MapReduce)
As you can see every step of task execution is
dynamic and the developer has full control of every detail. In the same time (and that’s the beauty of the GridGain) you are not required to develop all that logic if you don’t need it – GridGain works by default out-of-the-box. This allows you to start executing your tasks with all default options and gradually add or experiment with different strategies optimizing one or another aspect of grid task execution.
Enjoy dynamic MapReduce with GridGain!