Evergrid, Inc., a provider of advanced quality of application service management for next generation datacenters, today announced its entry into the high performance computing market, with patent pending high availability and resource management software that lets massively parallelized distributed applications run at near 100 percent reliability on high performance computing (HPC) clusters.
The Evergrid software sits between the operating system and the applications, and captures the collective state of the application and its IO across all processors. By recording the state of the application, Evergrid is able to checkpoint and recover from failures rapidly with minimal overhead. The software also allows data centers to do preemptive scheduling of lower priority applications in favor of running higher priority applications, with little or no data lost. The software installs on Linux systems and requires no modifications to either the OS or application. It is scalable up to thousands of nodes at a time, with less than five percent performance overhead.
“As open source and commodity hardware have become de facto standards, large data centers today are increasingly deploying their mission critical applications on huge clusters of servers,” said Ameet Patel, partner and CTO, Acartha Group, and former technology executive at JPMorgan Chase. ”But traditional datacenter configurations are rigid, complex, underutilized and expensive. The market desperately needs a solution that treats commodity servers like we used to treat mainframes. Datacenters want to schedule high priority jobs on pools of commodity servers that can quickly recover from inevitable failures.”
Despite a host of recent advances in hardware and software, downtime for compute intensive applications is an ever-worsening problem in high performance technical computing (HPTC) environments. Expanding clusters of commoditized servers has resulted in higher failure rates and lower mean time between failures (MTBF) because of the large number of nodes and the length of time users want to run parallel applications. Also, in an attempt to meet quality of service objectives, data centers have dedicated individual servers to particular applications, resulting in over-provisioning. Such a situation has created an environment of low utilization, poor reconfiguration flexibility and high cost.
“When we built System X at Virginia Tech we found that the reliability of large clusters was an important issue,” said Srinidhi Varadarajan, CTO and founder of Evergrid. “Even with excellent hardware the runtime of large jobs was restricted by the mean time between failures of 1000’s of processors. We decided very quickly that we needed to do something about system availability, and that was our impetus for founding Evergrid.”
Evergrid’s new fault tolerant software prevents downtime by automating the checkpointing, migration and recovery of applications, thus offering automatic failover across multiple nodes and tiers. With Evergrid, even failure of multiple processors does not stop an application from functioning continuously. In addition, Evergrid’s efficient and robust management software provisions servers from bare metal up through the application and allows preemptive allocation of resources to high priority applications. This unprecedented level of functionality allows quality of service objectives to be easily met. All this is done with complete transparency to the user.
“Evergrid provides commodity server clusters with the industry’s first and only transparent, fault tolerant system, and also the first and only preemptive scheduler for distributed applications,” said David Anderson, CEO, Evergrid. ”Our product is truly massively scalable. The closest competitor can scale to only eight nodes with performance overhead of more than 40 percent. We designed Evergrid to grow to a remarkable 100,000 nodes or more.”
Evergrid’s infrastructure software is designed for demanding, computing intensive sectors such as aerospace, financial services and petrochemical research. Initially, Evergrid software solutions target High Performance Technical Computing (HPTC) applications that are computationally intensive and use high speed interconnects. In the future, Evergrid will also provide solutions for the High Performance Enterprise Computing (HPEC) market and transaction processing database markets.
Evergrid is a spin-off of California Digital, a company that created two of the highest performance supercomputers in the world (now at #14 and #28 on the Top 500 list). The company is funded by a number of private investors, led by the Acartha Group.
Evergrid is demonstrating its new virtualization software technology this week in booth #244, at Supercomputing 2006 (SC ’06), November 13-17, at the Tampa Convention Center in Tampa, Florida. Evergrid is one of only 54 show participants chosen to present poster submissions displaying emerging ideas and early results of advanced research in high performance computing, networking and storage. This special poster exhibit takes place on the second floor of the convention center, from 5:15pm - 7:15pm, on Tuesday, Nov. 14.
About Evergrid, Inc.
Evergrid, a provider of advanced quality of application service management for next generation datacenters, lets massively parallelized, distributed applications run properly on high performance cluster grids, at near 100 percent reliability. Evergrid’s fault tolerant application virtualization software prevents downtime, automates checkpoint, migration, and recovery of applications, and scales to thousands of nodes, with less than five percent performance overhead.
Evergrid’s leadership team brings extensive management and technology expertise from IBM, Amdahl, VERITAS, Motorola, Tandem Computers and the Virginia Polytechnic Institute and State University. Evergrid is a private company. For more information, visit http;//www.Evergrid.com