Speaker: Peter M. Chen, University of Michigan
Date: February 12, 1998
Transparent, Low-Overhead Recovery for Distributed Applications
Reliable main memory enables a simple model for distributed computing,which we call Messages in Local Transactions (MLT). A local transaction in the MLT model may update memory, send messages, and receive messages. By performing a set of these operations atomically and durably, the transaction mechanism keeps the message and local state consistent. Like other recovery schemes, MLT ensures that process failures always recover to a globally consistent state. However, MLT suffers from none of the drawbacks of other recovery schemes. Surviving processes are not involved in recovery (i.e. no roll back); processes do not coordinate or send extra messages during normal operation; and applications can be non-deterministic. We have implemented MLT as an extension to Vista (Vistagrams), and it adds negligible overhead to an existing protocol.
I also describe a checkpointing library (Free Checking) built on top of Vista and Vistagrams. Free Checking maps the entire process state (address space and registers) into Vista. By linking with the Free Checking library, an application becomes recoverable with few source code changes and little run-time overhead.
Bio: Peter M. Chen received a B.S. in Electrical Engineering from the Pennsylvania State University in 1987 and a M.S. and Ph.D. in Computer Science from the University of California at Berkeley in 1989 and 1992.
He is currently an Assistant Professor in the Department of Electrical Engineering and Computer Science at the University of Michigan at Ann Arbor. His research interests include operating systems, databases, and distributed systems and focus on improving the performance and reliability of computer storage systems.