RAID: Redundant Arrays of Independent Disks
Contact: Garth Gibson
This page contains a brief summary of RAID technology for those people who are unfamiliar with it so that our RAID research can be understood within a larger context. More detailed background can be found in Garth Gibson's Ph.D. thesis, Mark Holland's Ph.D. thesis, and the RAIDframe documentation, which excerpts chapter two of Mark Holland's thesis and draws heavily upon Bill Courtright's thesis. These three documents also refer to a large number of technical papers, including several available from the RAID Publications page, which can provide even more depth on specific variants of RAID technology.
A Brief Overview
Several trends in the computer industry over the past decade have driven the design of the storage subsystem towards increasing parallelism. This means that systems can and will perform better in terms of I/O by increasing the number, rather than the performance, of individual disks. These trends include a widening gap between the speed of CPUs and that of disks, the shrinking size of disk drives, and new I/O-intensive applications such as digital video, scientific visualization, and spatial databases.
By adding redundancy in storing data, arrays of disks offer the ability to withstand the failure of a single disk. There are several methods for maintaining redundant data, and in 1988 the different methods were categorized into a taxonomy known as RAID (Redundant Arrays of Inexpensive Disks -- Inexpensive was later changed to Independent) by a research group at U.C.-Berkeley headed by David Patterson. Garth Gibson, the head of the Parallel Data Lab, completed his thesis while working on this research project.
Originally, there were five RAID levels, but the phrase "RAID Level 0" is now commonly used to refer to a nonredundant disk array and RAID Level 6 has been added to the numbered levels. Additionally, a number of researchers have proposed variations on RAID levels, including several by the PDL: write deferring, parity logging, parity declustering, and log-structured storage.
Today, RAID systems are an extremely profitable product for the storage industry -- the market for RAID exceeded $3 billion in 1994 and is expected to surpass $13 billion by 1997. But they do have their limitations. First, there is the cost of maintaining redundant data -- both in terms of disk space and the time it takes to access the disks in the array. Second, ensuring that redundant arrays can handle transient operating errors as well as tolerate the failure of a disk -- through the ability to recover lost data (reliability) as well as the ability to perform well while the system restores the data on-line (availability) -- is a complex process that is becoming more difficult as each new RAID optimization is proposed. Third, many applications access disk drives serially, meaning that they are unable to take advantage of the parallelism offered by disk arrays. Finally, RAID systems directly attached to a host system bus are inherently not scalable.
Recognizing these limitations, members of the Parallel Data Lab have moved beyond proposing optimizations for specific RAID levels to solving problems common to all redundant arrays. Mark Holland's thesis, On-line Data Reconstruction in Redundant Disk Arrays, targets the reliability and availability of RAID systems. He offers a disk-oriented reconstruction algorithm for restoring data lost during a single disk failure. This algorithm maximizes the efficiency of reconstruction without significantly penalizing response time for the system user.
Hugo Patterson began working on a "smart" disk controller before realizing that the first issue to address was how "smartly" applications use disks. This research, which is really situated within the context of file systems and is independent of the number of disks in the storage subsystem, has had significant implications for the third major limitation on RAID systems: the inability of most applications to access disks in the array in parallel. Hugo's solution to this problem enables applications with serial I/O workloads to mimic parallel applications by fetching needed data in advance. His research in informed prefetching and caching is covered elsewhere in the PDL Web pages.
While Mark Holland's research addressed how the array restores data lost from a specific failed disk without penalizing the system user excessively, Bill Courtright's research focused more broadly upon how the array handles errors that occur while it is operating -- independent of the specific cause for the error. Bill began his research investigating how to reduce the complexity associated with handling errors in RAID systems. His work was motivated by a very real need in industry: large fractions (over half) of array code was being devoted to handling architecture-specific errors. Verifying the correctness of this code is difficult and the code is not easily extended to support new architectures.
In separating error-handling from code specific to RAID architecture, Bill chose to model RAID operations using directed acyclic graphs (DAGs) because they offer an intuitive, visual structuring of sequences of disk operations. Mark then built upon this approach when he began developing a general-purpose RAID controller. The main implication of Mark's new work was that a truly general-purpose controller would allow RAID designers to prototype new RAID designs quickly.
At this point, the work on error-handling and the general-purpose RAID controller naturally merged into a project to develop an extensible RAID framework, which we have named RAIDframe (in the tradition of RAIDsim, the simulation tool developed at U.C.-Berkeley). RAIDframe offers RAID designers a number of benefits. First, separating error-handling from RAID-specific code lets designers reuse over 90% of the code to build new RAID systems. It also means that error-handling can be automated across all RAID designs. Second, modeling RAID operations as DAGs means that techniques for verifying the correctness of software designs can be used -- even before they're implemented in code. Essentially, RAIDframe allows designers to address the current limitations of RAID systems by quickly prototyping new designs which are verifiably correct, handle errors transparently, and recover from failed disks with performance being degraded minimally.
We thank the members and companies of the PDL Consortium: Actifio, American Power Conversion, EMC Corporation, Emulex, Facebook, Fusion-io,Google, Hewlett-Packard Labs, Hitachi, Huawei Technologies Co., Intel Corporation, NEC Laboratories, NetApp, Inc., Oracle Corporation, Panasas, Samsung Information Systems America, Seagate Technology, STEC, Inc., Symantec Corporation, VMware, Inc., and Western Digital for their interest, insights, feedback, and support.