ClusterMonkey: More parallel secrets; speedup, efficiency and your code
In the previous column, we found that parallel programs for clusters have very subtle differences and their efficiency requires careful examination of the code. In this article, we will see what a typical parallel program looks like and how it is executed on a cluster. Be warned, however, there is a bit of gentle mathematics in this column. It will not hurt, we promise.
For simplicity, let us consider that each cluster node contains one processor and each node runs one program. Parallel computation occurs because the programs are communicating with other nodes. Using nodes with two and more processors on each node will work as well as long as the number of programs matches the number of processors. If we ignore memory contention and possible local communication optimizations, then we assume each programs occupies its own memory and we can think about it as running on a separate node with local memory. Therefore, a node with several processors can be considered to contain several virtual uniprocessor nodes. The possibility of programming multiprocessor nodes using threads will be discussed in future columns.