Distributed Systems: Communication

Distributed Systems: Communication

Distributed Systems Serie

Foundations

Layered protocols

Due to the absence of shared memory, all communication in distributed systems is based on sending and receiving messages.

When process P wants to communicate with process Q, it first builds a message in its own address space. Then it executes a system call that causes the operating system to send the message over the network to Q.

Although this basic idea sounds simple enough, in order to prevent chaos, P and Q have to agree on the meaning of the bits being sent.

Basic networking model

  • Physical layer: Deals with standardizing how two computers are connected and how 0s and 1s are represented.

  • Data link layer: Provides the means to detect and possibly correct transmission errors, as well as protocols to keep a sender and receiver at the same pace.

  • Network layer: Contains the protocols for routing a message through a computer network, as well as protocols for handling congestion.

  • Transport layer: Mainly contains protocols for directly supporting applications, such as those that establish reliable communication, or support real-time streaming of data.

  • Session layer: Provides support for sessions between applications.

  • Presentation layer: Prescribes how data is represented in a way that is independent of the hosts on which communicating applications are running.

  • Application layer: Essentially, everything else: e-mail protocols, Web access protocols, file-transfer protocols, and so on.

Types of communication

  • Tersistent communication: means that a message that has been submitted for transmission is stored by the communication middleware as long as it takes to deliver it to the receiver. In this case, the middleware will store the message at one or several of the storage facilities.

  • Transient communication: means that a message is stored by the communication system only as long as the sending and the receiving application are executing. So if the middleware cannot deliver a message due to a transmission interrupt, or because the recipient is currently not active, it will simply be discarded.

Remote procedure call

Remote procedure call, or just RPC means to allow programs to call procedures located on other machines. When a process on machine A calls a procedure on machine B, the calling process on A is suspended, and execution of the called procedure takes place on B. Information can be transported from the caller to the callee in the parameters and can come back in the procedure result. No message-passing at all is visible to the programmer.

Basic RPC operation

Remote procedure call steps:

  1. The client procedure calls the client stub in the normal way.

  2. The client stub builds a message and calls the local operating system.

  1. The client’s OS sends the message to the remote OS.

  2. The remote OS gives the message to the server stub.

  3. The server stub unpacks the parameter(s) and calls the server.

  4. The server does the work and returns the result to the stub.

  5. The server stub packs the result in a message and calls its local OS.

  6. The server’s OS sends the message to the client’s OS.

  7. The client’s OS gives the message to the client stub.

  8. The stub unpacks the result and returns it to the client.

Parameter passing

  • Client and server need to properly interpret messages, transforming them into machine-dependent representations.

  • The function of the client stub is to take its parameters, pack them into a message, and send them to the server stub. While this sounds straightforward, it is not quite as simple as it at first appears. Packing parameters into a message is called parameter marshalling.

  • Wrapping a parameter means transforming a value into a sequence of bytes.

Variations on RPC

As in conventional procedure calls, when a client calls a remote procedure, the client will block until a reply is returned. This strict request-reply behaviour is unnecessary when there is no result to return, or may hinder efficiency when multiple RPCs need to be performed.

  1. Asynchronous RPC: to support situations in which there is simply no result to return to the client, RPC systems may provide facilities for what are called asynchronous RPCs.

    The server, in principle, immediately sends a reply back to the client the moment the RPC request is received, after which it locally calls the requested procedure. The reply acts as an Acknowledgment to the client that the server is going to process the RPC. The client will continue without further blocking as soon as it has received the server’s acknowledgment.

  2. Multicast RPC: it means executing multiple RPCs at the same time. Adopting the one-way RPCs (i.e., when a server does not tell the client it has accepted its call request but immediately starts processing it), a multicast RPC boils down to sending an RPC request to a group of servers.

Message-oriented communication

Simple transient messaging with sockets

A socket: is a communication endpoint to which an application can write data that are to be sent out over the underlying network, and from which incoming data can be read. A socket forms an abstraction over the actual port that is used by the local operating system for a specific transport protocol.

The socket operations for TCP/IP

socket

Create a new communication endpoint.

bind

Attach a local address to a socket.

listen

Tell the operating system what the maximum number of pending connection requests should be.

accept

Block the caller until a connection request arrives.

connect

Actively attempt to establish a connection.

send

Send some data over the connection.

receive

Receive some data over the connection.

close

Release the connection.

Advanced transient messaging

  • Messaging needs to be implemented separately by an application programmer. In practice, we often need more advanced approaches for message-oriented communication to make network programming easier, to expand beyond the functionality offered by existing networking protocols, to make better use of local resources, and so on.

  • One approach toward making network programming easier is based on the observation that many messaging applications, or their components, can be effectively organized according to a few simple communication patterns. By subsequently providing enhancements to sockets for each of these patterns, it may become easier to develop a networked, distributed application.

Message-oriented persist communication

  • Message-queuing systems provide extensive support for persistent asynchronous communication. The essence of these systems is that they offer intermediate-term storage capacity for messages, without requiring either the sender or receiver to be active during message transmission.

  • Queues are managed by queue managers. An application can put messages only into a local queue. Getting a message is possible by extracting it from a local queue only ⇒ queue managers need to route messages.

  • Message queuing systems assume a common messaging protocol: all applications agree on message format, and a broker handles application heterogeneity in an MQ system.

    The general organization of a message broker in a message queuing system:

Multicast communication

The essence is to organize nodes of a distributed system into an overlay network and use that network to disseminate data:

  1. Oftentimes a tree, leading to unique paths.

  2. Alternatively, also mesh networks, requiring a form of routing.

Application-level tree-based multicasting

The basic idea in application-level multicasting is that nodes organize into an overlay network, which is then used to disseminate information to its members. An important observation is that network routers are not involved in group membership. As a consequence, the connections between nodes in the overlay network may cross several physical links, and as such, routing messages within the overlay may not be optimal in comparison to what could have been achieved by network-level routing.

Thank you, and goodbye!