In the field of high performance computing, "peak performance" has been defined as the speed at which the manufacturer guarantees that you can't compute faster than. Although peak performance figures make for good marketing, they don't provide much insight into actual performance.
To help rectify this, for the past eight years Lawrence Berkeley National Laboratory has been developing new tools and techniques for more accurately assessing the performance of high performance computers, especially when it comes to running real-world scientific applications.
In November, many of these projects in performance characterization, modeling and benchmarking for supercomputers were brought together to comprise the Berkeley Institute for Performance Studies in Berkeley Lab's Computational Research Division (CRD). Known as BIPS, this umbrella organization will be led by UC Berkeley Professor Kathy Yelick and encompasses several research activities at LBNL and UC Berkeley:
The Performance Evaluation Research Center (PERC), directed by CRD Chief Technologist David Bailey, is one of seven SciDAC Integrated Software Infrastructure Centers (ISICs). PERC involves approximately 25 researchers at eight centers (four labs and four universities). The goal of PERC is to develop a science for understanding performance of scientific applications on high-end computer systems, and develop engineering strategies for improving performance on these systems. The project is integrating several active efforts in the high performance computing community and is forging alliances with application scientists working on DOE Office of Science missions to ensure that the resulting techniques and tools are truly useful to end users. For detailed information about PERC, go to http://perc.nersc.gov/main.htm .
The Berkeley Benchmarking and Optimization Group (BeBOP) is led by Kathy Yelick and James Demmel of UC Berkeley, with substantial participation by Berkeley graduate and undergraduate students. Their research areas include:
* the interaction between application software, compilers, and hardware
* managing trade-offs among the various measures of performance, such as speed, accuracy, power, storage,
* automating the performance tuning process, starting with the computational kernels which dominate application performance in scientific computing and information retrieval
* performance modeling and evaluation of future computer architectures.
The BeBOP Web site can be found at http://bebop.cs.berkeley.edu/ .
BeBOP works closely with the UCB LAPACK/ScaLAPACK project, which focuses on new algorithms for numerical linear algebra and new, more efficient implementations of linear algebra software.
Berkeley Lab's architecture evaluation research project, led by Leonid Oliker and Yelick, is conducted by staff from LBNL's CRD and the NERSC Center Division, as well as collaborators from other institutions. They evaluate emerging architectures, such as processor-in- memory and stream processing, and develop adaptable "probes" to isolate performance-limiting features of architectures. They conducted the first in-depth analysis of state-of-the-art parallel vector architectures, running benchmark studies on the Japanese Earth Simulator System (ESS) and comparison runs on Cray's X1 system. Results on the ESS demonstrated 23 times faster performance than the IBM Power3 in a node-to- node comparison. (See the September issue of CRD Report at http://crd.lbl.gov/html/news/CRDreport0904.pdf for more information on this work.)
NERSC's benchmarking and performance optimization project is carried out by NERSC staff with expertise in performance analysis. They developed the Effective System Performance (ESP) benchmark to measure system-level efficiency and the Sustained System Performance (SSP) benchmark to measure overall system application throughput. SSP resulted in a 30 percent increase in the Seaborg system's capability and is now used in several non-DOE procurements. This team also accelerated several SciDAC application programs running on Seaborg. Read more about ESP at http://www.nersc.gov/projects/esp.php .
Yelick, a professor of computer science at UC Berkeley with a joint appointment in LBNL's Computational Research Division, has been named to lead the newly established BIPS. She will also be leading CRD's Future Technologies Group (FTG). Yelick's appointment, which includes a leave of absence from her teaching position, officially takes effect Jan. 1, 2005.
The main goal of Yelick's research is to develop techniques for obtaining high performance on a wide range of computational platforms and to ease the programming effort required to obtain improved performance. She is perhaps best known for her efforts in global address space languages, which attempt to present the programmer with a shared memory model for parallel programming. These efforts have led to the design of Unified Parallel C (UPC), which merged some of the ideas from three shared address space dialects of C: Split-C, AC and PCP. In recent years, UPC has gained recognition as an alternative to message passing programming for large-scale machines. Compaq, Sun, Cray, HP, and SGI are implementing UPC, and Yelick is currently leading a large effort at LBNL to implement UPC on Linux clusters and IBM machines and to develop new optimizations. The language provides a uniform programming model for both shared and distributed memory hardware. Read more at http://upc.lbl.gov/. She has also worked on other global address space languages such as Titanium, which is based on Java.
Yelick has also done some notable work on single processor optimizations including techniques for automatically optimizing sparse matrix algorithms for memory hierarchies. These efforts are part of an NSF-funded project called BeBOP (Berkeley Benchmarking and Optimization) that is working on methods to take advantage of special structure such as symmetry and triangular solves.
Another area that Yelick has worked on that has led to very interesting results is her research on architectures for memory-intensive applications and in particular the use of mixed logic and DRAM, which avoids the off-chip accesses to DRAM, thereby gaining bandwidth, while lowering latency and energy consumption. In the IRAM project, a joint effort with David Patterson, she developed an architecture to take advantage of this technology. The IRAM processor is a single chip system designed for low power and high performance on multimedia applications and achieves an estimated 6.4 GOP/s in a two-watt design. The IRAM architecture is based on vector instructions, historically reserved for expensive vector supercomputers designed for large-scale scientific and engineering applications.
Yelick earned her bachelor's (1985), master's (1985), and Ph.D. (1991) degrees in electrical engineering and computer science from the Massachusetts Institute of Technology. Her research interests include parallel computing, memory hierarchy optimizations, programming languages and compilers. You can read her UC Berkeley Web page at http://www.cs.berkeley.edu/~yelick/ .
Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California. Learn more at http://www.lbl.gov .