Distributed Systems: Coordination

Distributed Systems: Coordination

Distributed Systems Serie

Clock synchronization

Physical clocks

Universal Coordinated Time (UTC)

  • Based on the number of transitions per second of the cesium 133 atom (pretty accurate).

  • At present, the real time is taken as the average of some 50 cesium clocks around the world.

  • Introduces a leap second from time to time to compensate that days are getting longer.

  • Is broadcast through short-wave radio and satellite. Satellites can give an accuracy of about +/-0.5 ms.

Clock synchronization algorithms

  • The goal is to keep the deviation between two clocks on any two machines within a specified bound, known as the precision (internal synchronization).

In the case of accuracy, we aim to keep the clock bound to a value compared to the UTC (External synchronization).

The Berkeley algorithm

An approach for keeping time without UTC is to let the time server scan all machines periodically, calculate an average, and inform each machine how it should adjust its time relative to its present time. This method is suitable for a system in which no machine has a UTC receiver.

Logical clocks

In a seminal paper, Lamport showed that although clock synchronization is possible, it need not be absolute. If two processes do not interact, it is not necessary that their clocks be synchronized because the lack of synchronization wouldn't be observable and thus could not cause problems.

Lamport's logical clocks

What usually matters is not that all processes agree on exactly what time it is, but that they agree on the order in which events occur. Requires a notion of ordering.

Rules of Lamport's logical clocks:

  • If a and b are two events in the same process, and a comes before b, then ab.

  • If a is the sender of the message, and b is the receipt of that message,

    then ab.

  • If ab and bc, then ac.

Vector clocks

Vector clocks are a powerful tool for tracking causality ("happens-before" relationship between operations, which means that a might have some role in causing b) and ordering events in distributed systems.

How does it work

  • Initially, all the clocks are set to zero ("0,0,0" for 3 processes example).

  • Every time, an Internal event occurs in a process, the value of the process’s logical clock in the vector is incremented by 1.

  • Every time a process sends a message, the value of the process’s logical clock in the vector is incremented by 1.

  • Every time, a process receives a message, the value of the process’s logical clock in the vector is incremented by 1, and each element is updated by taking the maximum of the value in its own vector clock and the value in the vector in the received message for every element in the vector (?, ?, ?).

Mutual exclusion

Exclusive access to some resources can be given to processes using basic solutions:

  • Permission-based: A process wanting to enter its critical section, or access a resource, needs permission from other processes.

  • Token-based: A token is passed between processes. The one who has the token may proceed in its critical section, or pass it on when not interested.

A centralized algorithm

  1. Process P1 asks the coordinator for permission to access a shared resource. Permission is granted.

  2. Process P2 then asks permission to access the same resource. The coordinator does not reply.

  3. When P1 releases the resource, it tells the coordinator, which then replies to P2.

A distributed algorithm

  1. Two processes want to access a shared resource at the same moment.

  2. P0 has the lowest timestamp, so it wins.

  3. When process P0 is done, it sends an OK also, so P2 can now go ahead.

A token-ring algorithm

Organize processes in a logical ring, and let a token be passed between them. The one that holds the token is allowed to enter the critical region (if it wants to).

A decentralized algorithm

Assume every resource is replicated N times, with each replica having its own coordinator ⇒ access requires a majority vote from m > N/2 coordinators.

A coordinator always responds immediately to a request.

Election algorithms

The bully algorithm

Consider N processes {P0,...,PN−1} and let id(Pk ) = k. When a process Pk notices that the coordinator is no longer responding to requests, it initiates an election:

  1. Pk sends an ELECTION message to all processes with higher identifiers:

    Pk+1, Pk+2, ....., Pn−1.

  2. If no one responds, Pk wins the election and becomes coordinator.

  3. If one of the higher-ups answers, it takes over and Pk ’s job is done.

A ring algorithm

Process priority is obtained by organizing processes into a (logical) ring. Process with the highest priority should be elected as coordinator.

  • Any process can start an election by sending an election message to its successor. If a successor is down, the message is passed on to the next successor.

  • If a message is passed on, the sender adds itself to the list. When it gets back to the initiator, everyone had a chance to make its presence known.

  • The initiator sends a coordinator message around the ring containing a list of all living processes. The one with the highest priority is elected as coordinator.

Thank you, and goodbye!