Lecture #3: Theoretical Foundations --
Clocks in a Distributed Environment
Topics for today
- Some inherent limitations of a distributed system and their implication.
- Lamport logical clocks
- Vector clocks
These topics are from Chapter 5-5.4 in Advanced Concepts in
OS.
Distributed systems
- A collection of computers that do not share a common clock and a common
memory
- Processes in a distributed system exchange information over the communication channel, the message delay is unpredictable.
Inherent limitations of a distributed system
Absence of a global clock
Distributed processes cannot rely on having an accurate
view of global state, due to transmission delays.
Effectively, we cannot talk meaningfully about global state.
The traditional notions of "time" and "state" do not work in distributed
systems. We need to develop some concepts that are corresponding to "time"
and "state" in a uniprocessor system.
Lamport's logical clocks
- the "time" concept in distributed systems -- used to order events in a distributed system.
- assumption:
- the execution of a process is characterized by a sequence of events.
An event can be the execution of one instruction or of one procedure.
- sending a message is one event, receiving a message is one event.
- The events in a distributed system are not total chaos. Under some
conditions, it is possible to ascertain the order of the events. Lamport's
logical clocks try to catch this.
Lamport's ``happened before'' relation
The ``happened before'' relation (®) is defined as follows:
- A ® B if A and B are within the same process
(same sequential thread of control) and A occurred before B.
- A ® B if A is the event of sending a message M
in one process and B is the event of receiving M by another
process
- if A ® B and B ® C then A ® C
Event A causally affects event B iff A ® B.
Distinct events A and B are concurrent (A | | B) if we do not have
A ® B or B ® A.
Lamport Logical Clocks
- are local to each process (processor?)
- do not measure real time
- only measure ``events''
- are consistent with the happened-before relation
- are useful for totally ordering transactions,
by using logical clock values as timestamps
Logical Clock Conditions
Ci is the local clock for process Pi
- if a and b are two successive events in Pi, then
Ci(b) = Ci(a) + d1, where d1 > 0
- if a is the sending of message m by Pi, then
m is assigned timestamp tm = Ci(a)
- if b is the receipt of m by Pj,
then
Cj(b) = max{Cj(b), tm + d2}, where d2 > 0
Logical Clock Conditions
The value of d could be 1, or it could be an approximation
to the elapsed real time. For example, we could take d1 to
be the elapsed local time, and d2 to be the estimated
message transmission time. The latter solves the problem of
waiting forever for a virtual time instant to pass.
Total Ordering
We can extend the partial ordering of the happened-before
relation to a total ordering on ervents, by using the logical
clocks and resolving any ties by an arbitrary rule based on the
processor/process ID.
If a is an event in Pi and b is in Pj, aÞ b iff
- Ci(a)
< | Cj(b) or - Ci(a)
| = | Cj(b) and Pi < Pj |
where < is an arbitrary total ordering
of the processes
How useful is this? How close to real time?
Example of Lamport Logical Clocks
C(a) < C(b) does not imply a ® b
That is, the ordering we get from Lamport's clocks
is not enough to guarantee that if two events precede one
another in the ordering relation they are also causally related.
The following Vector Clock scheme is intended to improve on this.
Vector Clocks
- Clock values are vectors
- Vector length is n, the number of processes
- Ci[i](a) = local time of Pi at event a
- Ci[j](a) = time Cj[j](b) of last event b at Pj
that is known to happen before local event a
Vector Clock Algorithm
- if a and b are successive events in Pi, then
Ci[i](b) = Ci[i](a) + d1
- if a is sending of m by Pi with vector timestamp tm
b is receipt of m by Pj then
Cj[k](b) = max{Cj[k](b), tm[k]}
Vector Clock Ordering Relation
The relation £ defined above is a partial ordering.
Vector Clocks
- a ® b if ta < tb
- b ® a if tb < ta
- otherwise a and b are concurrent
This is not a total ordering, but it is sufficient
to guarantee a causal relationship, i.e.,
a ® b iff ta < tb
How scalable is this?
Figure 5.5 in the book.
Non-causal Ordering of Messages
Message delivery is said to be causal if the order in which
messages are received is consistent with the order in which they
are sent. That is, if Send(M1) ® Send (M2) then
for every recipient of both messages, M1 is received before
M2.
Enforcing Causal Ordering of Messages
Basic idea: Buffer each message until the message that
immediately precedes it is delivered.
The text describes two protocols for implementing this idea:
- Birman-Shiper-Stephenson: uses all broadcast messages
- Shiper-Eggli-Sandoz: does not have this restriction
Note: These methods serialize the actions of the
system. That makes the behavior more predictable, but also
may mean loss of performance, due to idle time.
That, plus scaling problems, means these algorithms
are not likely to be of much use for high-performance
computing.
Birman-Shiper-Stephenson Causal Message Ordering
- Before Pi broadcasts m, it increments VTPi[i]
and timestamps m. Thus VTPi[i]-1 is the number of
messages from Pi preceding m.
- When Pj (j ¹ i) receives message
m with timestamp VTm from Pi, delivery is delayed locally
until both of the following are satisfied:
- VTPj[i] = VTm[i] - 1
- VTPj[k] ³ VTm[k] for all k ¹ i
Delayed messages are queued at each process, sorted by their
vector timestamps, with concurrent messages ordered by time of
receipt.
- When m is delivered to Pj,
VTPj is updated as usual for vector clocks.
Schiper-Eggli-Sandoz Protocol
Generalizes the above, so that messages do not need to be
broadcast, but are just sent between pairs of processes, and
the communication channels do not need to be FIFO.
How would you implement and test the above algorithms?