Grid'5000 experiment

Jump to: navigation, search

Self Stabilizing Spanning Tree for P2P Systems (Middleware)

Conducted by

Thomas Herault, Olivier Peres

Description

Peer to peer networks and grids are emerging large scale systems that gather thousands of nodes. These networks usually relies on IP to communicate: in such systems, two different nodes are identified by an address and can communicate only by addressing the node. Classical distributed applications need a notion of neighborhood. Large scale systems shall provide such a notion as a basic system feature to address two issues: 1) provide a list of contacts in the network for each node (their neighbours) 2) bound the amount of processes each node has to communicate with. 1) also implies that the virtual topology provided by the system should be fully connected, in order to ensure that although it is communicating only with its neighbours, any node may have any information in the system. It is not practical to multiplex many communication streams today on a single computer, thus it may be desirable to bound the number of neighbours (2). It is not advisable to maintain a complete topology on large scale network as nodes should maintain connectivity and knowledge of thousands connection and address. As it is more likely to maintain only a limited amount of address knowledge and opened connection per nodes, it is necessary to build a bounded degree topology. As a first attempt to address this issue we choose to build a spanning tree which provides nice properties for peer to peer systems and efficient diffusion. Peer to peer systems rely mostly on Internet Protocols. In such system, a node can gently stop to participate to the service, can stop running unexpectedly, or can get back to the system at any time. The network can loose some messages due to congestion (for non-reliable Internet communication protocols, like UDP), or can fall down in a whole geographic region. Maintaining a topology in these systems implies being fault tolerant. On one hand, many different fault tolerant protocols have been studied, including replication and rollback recovery protocols, but those protocols focuses mostly on fail-stop faults. On the other hand, Self-Stabilization addresses a larger kind of behaviors, as long as the system has enough time to converge between the failures. A self-stabilizing system is a system that eventually behaves according its specification whatever the configuration it is initialized in. A set of failures (network failures, topology changes due to process depart or arrival, or even memory corruption) leads the distributed system in a given configuration. Due to its convergence property, the self-stabilizing system will reach from this configuration another configuration that we call legitimate. Continuing from this legitimate configuration, the system behaves accordingly its specification. The idea of self-stabilizing systems was proposed by Dijkstra in 1974. The way self-stabilization abstracts any kind of faults by considering any initialization possible provides a tool to handle the high variety of faults of large scale system. The issue addressed in this work is the following: consider a distributed system over the internet. Every process p has a unique identifier, and may potentially communicate with any other process q, if it has the identifier of q. Every process has moreover access to two services: a resource discovery service that provides identifiers of processes (that were at one point part of the system), and a failure detection service that provides accurate information on the status of processes from their identifiers. We proposes a self-stabilizing algorithm to build a virtual tree which includes all the processes in the system storing only a (small) bounded identifiers. The large scale of the system prevents using solutions that may store the set of all process identifiers. In this context, we use Grid5000 to evaluate the performances of the proposed solutions. This is a challenge, since measuring performances of Self-Stabilizing solutions is not a straightforward process. Indeed, it is not possible for the system to determine if it has converged or not. We use a solution based on trace of executions and post-mortem analysis to compute the convergence time and other meaningful measures. This work provides one of the rare experimental evaluation of self-stabilizing systems, which are useful for large scale distributed systems.

Status

achieved

Resources

  • Nodes involved: 500
  • Sites involved: >3
  • Minimum walltime: 8h
  • Batch mode: no
  • Use kadeploy: yes
  • CPU bound: yes
  • Memory bound: no
  • Storage bound: no
  • Network bound: yes
  • Interlink bound: yes

Tools used

No information

Results

In this work, we propose a new model in which to design distributed algo- rithms for large scale systems. We replace the classical notion of a neighbor list with the combination of two devices, a resource discovery service and a failure detector, in order to avoid making unnecessary assumptions and to improve scalability. We illustrate our model with a self-stabilizing algorithm that builds a spanning tree whose degree is bounded by δ using only δ +1 process identifiers. We present a formal proof of convergence and performance measurements of a prototype implementation of this algorithm and its services for clusters. The experimental results show that the algorithm performs well enough to argue in favor of the application of self-stabilization in practice. Our intended followup on this work is to design other protocols, in the same model, so as to explore its viability and efficiency for different problems. We will also study other topologies suitable for large scale systems. It would also be interesting to try to define the notion of stabilisation time in this model. This would require stronger assumptions on the resource discovery service, but which ones exactly is an open question.

Shared by: Thomas Herault, Olivier Peres
Last update: 2007-02-22 12:23:43
Experiment #187

Personal tools
Namespaces

Variants
Views
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox