Lecture #3: Theoretical Foundations -- Clocks in a Distributed Environment

Topics for today

Some inherent limitations of a distributed system and their implication.
Lamport logical clocks
Vector clocks

These topics are from Chapter 5-5.4 in Advanced Concepts in OS.

Distributed systems

A collection of computers that do not share a common clock and a common memory
Processes in a distributed system exchange information over the communication channel, the message delay is unpredictable.

Inherent limitations of a distributed system

Absence of a global clock

Distributed processes cannot rely on having an accurate view of global state, due to transmission delays.

Effectively, we cannot talk meaningfully about global state.

The traditional notions of "time" and "state" do not work in distributed systems. We need to develop some concepts that are corresponding to "time" and "state" in a uniprocessor system.

Lamport's logical clocks

the "time" concept in distributed systems -- used to order events in a distributed system.
assumption:
- the execution of a process is characterized by a sequence of events. An event can be the execution of one instruction or of one procedure.
- sending a message is one event, receiving a message is one event.
The events in a distributed system are not total chaos. Under some conditions, it is possible to ascertain the order of the events. Lamport's logical clocks try to catch this.

Lamport's ``happened before'' relation

The ``happened before'' relation (Ž) is defined as follows:

A Ž B if A and B are within the same process (same sequential thread of control) and A occurred before B.
A Ž B if A is the event of sending a message M in one process and B is the event of receiving M by another process
if A Ž B and B Ž C then A Ž C

Event A causally affects event B iff A Ž B.

Distinct events A and B are concurrent (A | | B) if we do not have A Ž B or B Ž A.

Lamport Logical Clocks

are local to each process (processor?)
do not measure real time
only measure ``events''
are consistent with the happened-before relation
are useful for totally ordering transactions, by using logical clock values as timestamps

Logical Clock Conditions

C_i is the local clock for process P_i

if a and b are two successive events in P_i, then
C_i(b) = C_i(a) + d₁, where d₁ > 0
if a is the sending of message m by P_i, then m is assigned timestamp t_m = C_i(a)
if b is the receipt of m by P_j, then
C_j(b) = max{C_j(b), t_m + d₂}, where d₂ > 0

Logical Clock Conditions

The value of d could be 1, or it could be an approximation to the elapsed real time. For example, we could take d₁ to be the elapsed local time, and d₂ to be the estimated message transmission time. The latter solves the problem of waiting forever for a virtual time instant to pass.

Total Ordering

We can extend the partial ordering of the happened-before relation to a total ordering on ervents, by using the logical clocks and resolving any ties by an arbitrary rule based on the processor/process ID.

If a is an event in P_i and b is in P_j, aŢ b iff

C_i(a)< C_j(b) or
C_i(a)=C_j(b) and P_i < P_j

where < is an arbitrary total ordering of the processes

How useful is this? How close to real time?

Example of Lamport Logical Clocks

C(a) < C(b) does not imply a Ž b

That is, the ordering we get from Lamport's clocks is not enough to guarantee that if two events precede one another in the ordering relation they are also causally related. The following Vector Clock scheme is intended to improve on this.

Vector Clocks

Clock values are vectors
Vector length is n, the number of processes
C_i[i](a) = local time of P_i at event a
C_i[j](a) = time C_j[j](b) of last event b at P_j that is known to happen before local event a

Vector Clock Algorithm

if a and b are successive events in P_i, then C_i[i](b) = C_i[i](a) + d₁
if a is sending of m by P_i with vector timestamp t_m
b is receipt of m by P_j then
C_j[k](b) = max{C_j[k](b), t_m[k]}

Vector Clock Ordering Relation

t = t˘Ű"i t[i] = t˘[i]
t š t˘Ű$i t[i] š t˘[i]
t Ł t˘Ű"i t[i] Ł t˘[i]
t < t˘Ű(t Ł t˘and t š t˘)
t | | t˘Űnot (t < t˘or t˘ < t)

The relation Ł defined above is a partial ordering.

Vector Clocks

a Ž b if t^a < t^b
b Ž a if t^b < t^a
otherwise a and b are concurrent

This is not a total ordering, but it is sufficient to guarantee a causal relationship, i.e.,

a Ž b iff t^a < t^b

How scalable is this?

Figure 5.5 in the book.

Non-causal Ordering of Messages

Message delivery is said to be causal if the order in which messages are received is consistent with the order in which they are sent. That is, if Send(M₁) Ž Send (M2) then for every recipient of both messages, M₁ is received before M₂.

Enforcing Causal Ordering of Messages

Basic idea: Buffer each message until the message that immediately precedes it is delivered.

The text describes two protocols for implementing this idea:

Birman-Shiper-Stephenson: uses all broadcast messages
Shiper-Eggli-Sandoz: does not have this restriction

Note: These methods serialize the actions of the system. That makes the behavior more predictable, but also may mean loss of performance, due to idle time. That, plus scaling problems, means these algorithms are not likely to be of much use for high-performance computing.

Birman-Shiper-Stephenson Causal Message Ordering

Before P_i broadcasts m, it increments VT_{P_i}[i] and timestamps m. Thus VT_{P_i}[i]-1 is the number of messages from P_i preceding m.
When P_j (j š i) receives message m with timestamp VT_m from P_i, delivery is delayed locally until both of the following are satisfied:

VT_{P_j}[i] = VT_m[i] - 1
VT_{P_j}[k] ł VT_m[k] for all k š i
Delayed messages are queued at each process, sorted by their vector timestamps, with concurrent messages ordered by time of receipt.

When m is delivered to P_j, VT_{P_j} is updated as usual for vector clocks.

Schiper-Eggli-Sandoz Protocol

Generalizes the above, so that messages do not need to be broadcast, but are just sent between pairs of processes, and the communication channels do not need to be FIFO.

How would you implement and test the above algorithms?