Fault-Tolerant Virtual Machines

 

Introduction

  • state machine approach has low bandwidth requirement.
  • The base technology that allows us to record the execution of a primary and ensure that the backup executes identically is known as deterministic replay.
  • no data is lost if a backup VM takes over after a primary VM fails.
  • recording and replaying the execution of a multi-processor VM not covered, with significant performance issue because nearly every access to shared memory can be a non-deterministic operation.

Basic FT Design

  • backup VM on a different physical server
  • all input that the primary VM receives is sent to backup VM via a network connection known as the logging channel.

Deterministic Replay Implementation

Replicating server (or VM) execution can be modeled as the replication of a deterministic state machine.

Deterministic inputs
e.g. incoming network packets, disk reads, input from the keyboard and mouse
Non-deterministic inputs
e.g. virtual interrupts, reading the clock cycle counter of the processor

Challenges:

  1. correctly capturing all the input and non-determinism necessary to ensure deterministic execution of a backup virtual machine
  2. correctly applying the inputs and non-determinism to the backup virtual machine
  3. does not degrade the perf

Previous work divides the execution of VM into epochs, where non-deterministic events such as interrupts are only delivered at the end of an epoch. However, in this work the deterministic replay has no need to use epochs. Each interrupts is recorded as it occurs and efficiently delivered at the appropriate instruction while being replayed.

FT Protocol

Output Requirement
if the backup VM ever takes over after a failure of the primary, the backup VM will continue executing in a way that is entirely consistent with all outputs that the primary VM has sent to the external world.
Output Rule
the primary VM may not send an output to the external world, until the backup VM has recevied and acknowledged the log entry associated with the operation producing the output.

Can not guarantee that all outputs are produced exactly once in a failover situation. There is no way that the backup can determine if a primary crashed immediately before or after sending its last output.

Detecting and Responding to Failure

To avoid split brain problem, we make use of the shared storage that stores the virtual disks of the VM.

Q: For the test-and-set, how does the previous primary release the lock?

Implementation of FT

Starting and Restarting FT VMs