SpyderByte.com ;Technical Portals 
      
 News & Information Related to Linux High Performance Computing, Linux Clustering and Cloud Computing
Home About News Archives Contribute News, Articles, Press Releases Mobile Edition Contact Advertising/Sponsorship Search Privacy
HPC Vendors
Cluster Quoter (HPC Cluster RFQ)
Hardware Vendors
Software Vendors
HPC Consultants
Training Vendors
HPC Resources
Featured Articles
Cluster Builder
Beginners
Whitepapers
Documentation
Software
Lists/Newsgroups
Books
User Groups & Organizations
HP Server Diagrams
HPC News
Latest News
Newsletter
News Archives
Search Archives
HPC Links
ClusterMonkey.net
Scalability.org
HPCCommunity.org

Beowulf.org
HPC Tech Forum (was BW-BUG)
Gelato.org
The Aggregate
Top500.org
Cluster Computing Info Centre
Coyote Gultch
Dr. Robert Brown's Beowulf Page
FreshMeat.net: HPC Software
SuperComputingOnline
HPC User Forum
GridsWatch
HPC Newsletters
Stay current on Linux HPC news, events and information.
LinuxHPC.org Newsletter

Other Mailing Lists:
Linux High Availability
Beowulf Mailing List
Gelato.org (Linux Itanium)

LinuxHPC.org
Home
About
Contact
Mobile Edition
Sponsorship

Latest News

New Features in Linux 2.6 - Key Performance Improvements Summarized
Posted by Philip Carinhas, Tuesday March 07 2006 @ 05:36PM EST

Over the last several years, the Linux operating system has gained acceptance as the operating system of choice in many scientific and commercial environments, respectively. Today, the performance aspects of the Linux operating system has improved significantly, as compared to traditional UNIX flavors. This is particularly true for smaller SMP systems with up to 4 processors. Recently, there has been an increased emphasis on Linux performance in mid to high-end enterprise-class environments, consisting of SMP systems that are configured with 64 CPUs. Therefore, scalability and performance of Linux 2.6 are paramount for applications on large systems that are scalable to high CPU counts. This article highlights some of the performance and scalability improvements of the Linux 2.6 kernel.

The Virtual Memory (VM) Subsystem

Most modern computer architectures support more than one memory page size. To illustrate, the IA-32 architecture supports either 4KB or 4MB pages. The 2.4 Linux kernel used to only utilize large pages for mapping the kernel image. In general, large page usage is primarily intended to provide performance improvements for high performance computing applications, as well as database applications that have large working sets. Any memory access intensive application that utilizes large amounts of virtual memory may obtain performance improvements by using large pages. Linux 2.6 can utilize 2MB or 4MB large pages, AIX uses 16MB large pages, whereas Solaris large pages are 4MB in size. The large page performance improvements are attributable to reduced translation lookaside buffer (TLB) misses. Large pages further improve the process of memory prefetching, by eliminating the necessity to restart prefetch operations on 4KB boundaries.

CPU Scheduler

The Linux 2.6 scheduler is a multi queue scheduler that assigns a run-queue to each CPU, promoting a local scheduling approach. The previous incarnation of the Linux scheduler utilized the concept of goodness to determine which thread to execute next. All runnable tasks were kept on a single run-queue that represented a linked list of threads. In Linux 2.6, the single run-queue lock was replaced with a per CPU lock, ensuring better scalability on SMP systems. The new per CPU run-queue scheme decomposes the run-queue into a number of buckets (in priority order) and utilizes a bitmap to identify the buckets that hold runnable tasks. Locating the next task to execute requires a read from the bitmap to identify the first bucket with runnable tasks, and choosing the first task in that bucket's run-queue.

It should be pointed out that the Linux 2.6 environment provides a Non Uniform Memory Access (NUMA) aware extension to the new scheduler. The focus is on increasing the likelihood that memory references are local rather than remote on NUMA systems. The NUMA aware extension augments the existing CPU scheduler implementation via a node-balancing framework. Further, it is imperative to point out that next to the preemptible kernel support in Linux 2.6, the Native POSIX Threading Library (NPTL) represents the next generation POSIX threading solution for Linux, and hence has received a lot of attention from the performance community. The new threading implementation in Linux 2.6 has several major advantages, such as in-kernel POSIX signal handling. In a well-designed multi-threaded application domain, fast user space synchronization (futex) can be utilized. In contrast to the Linux 2.4, the futex framework avoids a scheduling collapse during heavy lock contention among different threads.

I/O Scheduling

The I/O scheduler in Linux is the interface between the generic block layer and the low-level device drivers. The block layer provides functions that are utilized by file systems and the virtual memory manager to submit I/O requests to block devices. As prioritized resource management seeks to regulate the use of a disk subsystem by an application, the I/O scheduler is considered an important kernel component in the I/O path.

It is further possible to tune the disk usage in the kernel layers above and below the I/O scheduler. Adjusting the I/O pattern generated by the file system or the virtual memory manager (VMM) is now an option. Another option is to adjust the way specific device drivers or device controllers handle the I/O requests. Further, a new read-ahead algorithm designed and implemented by Dominique Heger and Steve Pratt for Linux 2.6 significantly boosts read IO throughput for all the discussed IO schedulers below.

The Deadline I/O scheduler available in Linux 2.6 incorporates a per-request expiration based approach, and operates on five I/O queues. The basic idea behind the implementation is to aggressively reorder requests to improve I/O performance while simultaneously ensuring that no I/O request is being starved. More specifically, the scheduler introduces the notion of a per-request deadline, which is used to assign a higher preference to read than write requests. To summarize, the basic idea behind the deadline scheduler is that all read requests are satisfied within a specified time period. On the other hand, write requests do not have any specific deadlines associated. As the block device driver is ready to launch another disk I/O request, the core algorithm of the deadline scheduler is invoked. In a simplified form, the first action being taken is to identify if there are I/O requests waiting in the dispatch queue, and if yes, there is no additional decision to be made on what to execute next. Otherwise, it is necessary to move a new set of I/O requests to the dispatch queue.

The Anticipatory I/O scheduler's design attempts to reduce the per-thread read response time. It introduces a controlled delay component into the dispatching equation. The delay is being invoked on any new request to the device driver, thereby allowing a thread that just finished its I/O request to submit a new request. This basically enhances the chances (based on locality) that this scheduling behavior will result in smaller seek operations. The tradeoff between reduced seeks and decreased disk utilization (due to the additional delay factor in dispatching a request) is managed by utilizing an actual cost-benefit calculation method.

The Completely Fair Queuing (CFQ) I/O scheduler can be considered as representing an extension to the better known stochastic fair queuing (SFQ) scheduler implementation. The focus of both implementations is on the concept of fair allocation of I/O bandwidth among all the initiators of I/O requests. A SFQ based scheduler design was initially proposed for some network subsystems. The goal to be accomplished is to distribute the available I/O bandwidth as equally as possible among the I/O requests.

The Linux 2.6 Noop I/O scheduler can be considered a minimal I/O scheduler that performs basic merging and sorting functionalities. The main usage of the noop scheduler revolves around non disk-based block devices like memory devices, as well as specialized software or hardware environments that incorporate their own I/O scheduling and caching functionality, and hence require only minimal assistance from the kernel. Hence, for large-scale I/O configurations that incorporate RAID controllers and many disk drives, the noop scheduler has the potential to outperform the other three I/O schedulers.

Conclusion

The Linux 2.6 kernel represents another evolutionary step forward, and builds upon its predecessors to boost (application) performance, through enhancements to the VM subsystem, the CPU scheduler and the I/O scheduler. In addition, this new version of the kernel delivers important functional enhancements in security, scalability, and networking. This outline only highlights the major performance features in Linux 2.6. Please visit the Fortuitous Website http://www.fortuitous.com for the full article on Linux 2.6 Performance Enhancements. Fortuitous Technologies provides high quality IT services, focusing on performance tuning and capacity planning.


< Moab Utility/Hosting Suite 4.5.0 Released | PathScale Introduces InfiniPath InfiniBand Adapt >

 

Affiliates

Cluster Monkey

HPC Community


Supercomputing 2010

- Supercomputing 2010 website...

- 2010 Beowulf Bash

- SC10 hits YouTube!

- Louisiana Governor Jindal Proclaims the week of November 14th "Supercomputing Week" in honor of SC10!








Appro: High Performance Computing Resources
IDC: Appro Xtreme-X Supercomputer Blade Solution
Analysis of the Xtreme-X architecture and management system while assessing challenges and opportunities in the technical computing market for blade servers.

Video - The Road to PetaFlop Computing
Explore the Scalable Unit concept where multiple clusters of various sizes can be rapidly built and deployed into production. This new architectural approach yields many subtle benefits to dramatically lower total cost of ownership.
White Paper - Optimized HPC Performance
Multi-core processors provide a unique set of challenges and opportunities for the HPC market. Discover MPI strategies for the Next-Generation Quad-Core Processors.

Appro and the Three National Laboratories
[Appro delivers a new breed of highly scalable, dynamic, reliable and effective Linux clusters to create the next generation of supercomputers for the National Laboratories.

AMD Opteron-based products | Intel Xeon-based products



Home About News Archives Contribute News, Articles, Press Releases Mobile Edition Contact Advertising/Sponsorship Search Privacy
     Copyright © 2001-2013 LinuxHPC.org
Linux is a trademark of Linus Torvalds
All other trademarks are those of their owners.
    
  SpyderByte.com ;Technical Portals