SpyderByte.com ;Technical Portals 
      
 News & Information Related to Linux High Performance Computing, Linux Clustering and Cloud Computing
Home About News Archives Contribute News, Articles, Press Releases Mobile Edition Contact Advertising/Sponsorship Search Privacy
HPC Vendors
Cluster Quoter (HPC Cluster RFQ)
Hardware Vendors
Software Vendors
HPC Consultants
Training Vendors
HPC Resources
Featured Articles
Cluster Builder
Beginners
Whitepapers
Documentation
Software
Lists/Newsgroups
Books
User Groups & Organizations
HP Server Diagrams
HPC News
Latest News
Newsletter
News Archives
Search Archives
HPC Links
ClusterMonkey.net
Scalability.org
HPCCommunity.org

Beowulf.org
HPC Tech Forum (was BW-BUG)
Gelato.org
The Aggregate
Top500.org
Cluster Computing Info Centre
Coyote Gultch
Dr. Robert Brown's Beowulf Page
FreshMeat.net: HPC Software
SuperComputingOnline
HPC User Forum
GridsWatch
HPC Newsletters
Stay current on Linux HPC news, events and information.
LinuxHPC.org Newsletter

Other Mailing Lists:
Linux High Availability
Beowulf Mailing List
Gelato.org (Linux Itanium)

LinuxHPC.org
Home
About
Contact
Mobile Edition
Sponsorship

Latest News

RAS-ware Runtime Breakthrough in HPC Cluster
Posted by Kenneth Farmer, Wednesday November 02 2005 @ 03:02PM EST

Today, the eXtreme Computing Research (XCR) group at Louisiana Tech University announced a breakthrough development in the RAS-ware runtime for transparent job queue fault tolerance in HPC Cluster environment.

Dr. Box Leangsuksun, an associate professor in computer science, explains XCR group's recent breakthrough consists of High Availability, Self-configuration, and Self-healing as enabling solutions. His group of graduate students, led by Anand Tikotekar, has implemented a proof-of-concept Beowulf cluster based on HA-OSCAR 1.1 and standard HPC resource management/job queue system (e.g PBS/TORUE). Preliminary results suggest that MPI jobs can continue their execution and job queue is preserved regardless of failures at the head node and compute nodes. The experiment runs standard MPI jobs without any modification under LAM/MPI 7.0. The breakthrough handles both running and queued jobs transparently and the queue order is maintained in a catastrophic failure. HA-OSCAR multi-head solution provides a capability to failover and transparently recovers the job queue in a head-node outage event.

"This is very exciting for us," said Leangsuksun. "This marks a major milestone in our overarching goal - toward non-stop services in HPC environment. We expect that our breakthrough technology is exactly what the community has been waiting for."

Leangsuksun continued, "Our breakthrough is also expected to be part of the next HA-OSCAR release that will have broad impacts in HPC and telecomm cluster environments, especially for mission critical applications."

The demo will be shown at SC05 in booth #218.

HA-OSCAR is an open source project. Dr. "Box" Chokchai Leangsuksun is the chief architect and project director of the HA-OSCAR research and development program at Louisiana Tech University. This project is collaboration between the eXtreme Computing Research (XCR) group at Louisiana Tech University and the Network and Cluster Computing (NCC) group at Oak Ridge National Laboratory (ORNL). The research and development program is supported and funded by Office of Science, Department of Energy contract DE-FG02-05ER25659. More information can be obtained at http://xcr.cenit.latech.edu/ha-oscar


< DoD to Use Voltaire Grid Backbone Solutions in 10 Teraflop HP Supercomputer | New GE: ProLiant DL585 System 2800MHz 4P AMD64/Opteron Dual Core >

 

Affiliates

Cluster Monkey

HPC Community


Supercomputing 2010

- Supercomputing 2010 website...

- 2010 Beowulf Bash

- SC10 hits YouTube!

- Louisiana Governor Jindal Proclaims the week of November 14th "Supercomputing Week" in honor of SC10!








Appro: High Performance Computing Resources
IDC: Appro Xtreme-X Supercomputer Blade Solution
Analysis of the Xtreme-X architecture and management system while assessing challenges and opportunities in the technical computing market for blade servers.

Video - The Road to PetaFlop Computing
Explore the Scalable Unit concept where multiple clusters of various sizes can be rapidly built and deployed into production. This new architectural approach yields many subtle benefits to dramatically lower total cost of ownership.
White Paper - Optimized HPC Performance
Multi-core processors provide a unique set of challenges and opportunities for the HPC market. Discover MPI strategies for the Next-Generation Quad-Core Processors.

Appro and the Three National Laboratories
[Appro delivers a new breed of highly scalable, dynamic, reliable and effective Linux clusters to create the next generation of supercomputers for the National Laboratories.

AMD Opteron-based products | Intel Xeon-based products



Home About News Archives Contribute News, Articles, Press Releases Mobile Edition Contact Advertising/Sponsorship Search Privacy
     Copyright © 2001-2013 LinuxHPC.org
Linux is a trademark of Linus Torvalds
All other trademarks are those of their owners.
    
  SpyderByte.com ;Technical Portals