Grid'5000 experiment

Jump to: navigation, search

BitDew (Middleware)

Conducted by

Haiwu He

Description

Desktop Grids use the computing, network and storage resources from idle desktop PC distributed over multiple-LAN’s or the Internet. Data management in such large-scale, dynamic, heterogeneous, volatile and highly distributed Grids is still a challenging issue that needs to be addressed in order to broaden the usage of Desktop Grids. We propose the BitDew framework, a programmable environment for data management and distribution on computational Desktop Grids. This paper presents the BitDew programming interface, the architecture and performance evaluations of its runtime components. We describe the API which controls operations of data management: life cycle, distribution, placement, replication and fault-tolerance with a high level of abstraction and transparency. Our runtime environment is a flexible distributed service architecture which integrates modular P2P components such as DKS DHT for a distributed data catalog or BitTorrent for data distribution. we conduct a performance evaluation of these components, and we report on scalability and efficiency of the environment. Enabling Data Grids is one of the fundamental efforts of the computational science community as emphasized by projects such as EGEE and PPDG . This effort is pushed by the new requirements of E-Science. That is, large communities of researchers collaborate to extract knowledge and information from huge amounts of scientific data. This has lead to the emergence of a new class of applications, called data-intensive applications which require secure and coordinated access to large datasets, wide-area transfers and broad distribution of TeraBytes of data while keeping track of multiple data replicas. The Data Grid aims at providing such an infrastructure and services to enable data-intensive applications. Our project, BitDew, targets a specific class of Grid called Desktop Grids. Desktop Grids use computing, network and storage resources of idle desktop PC distributed over multiple- LAN’s or the Internet. Today, such kind of computing platform, among the largest distributed computing systems, already provides scientists with tens of TeraFlop/sec from hundreds of thousands of host. Despite the attractiveness of this platform, little work has been done to support data-intensive applications in this context of massively distributed, volatile, shared and heterogeneous resources. Most Desktop Grid systems, like BOINC , XtremWeb and OurGrid rely on a centralized architecture for indexing and distributing the data, and thus potentialy face issues with scalability and faulttolerance. However, we think that the basic blocks for building BitDew can be found in P2P systems. Researchers of DHT’s (Distributed Hash Tables) and collaborative data distribution , storage over volatile resources and wide-area network storage offer various tools to construct data grids. To utilize these tools effectively, one needs to bring together these components into a comprehensive framework. BitDew suits this purpose by providing an environment for data management and distribution in Computational Desktop Grids. BitDew is a subsystem which could be easily integrated into other Desktop Grid systems. It offers programmers a simple API for creating, accessing, storing and moving data with ease, even on highly dynamic and volatile environments. The BitDew programming model relies on 5 abstractions to manage the data : i) replication indicates how many occurrences of a data should be available at the same time on the network, ii) fault-tolerance controls the policy in presence of machine crash, iii) lifetime is an attribute absolute or relative to the existence of other data, which decides the life cycle of a data in the system, iv) placement drives movement of data according to dependency rules, v) distribution gives the runtime environment hints about the protocol to distribute the data. Programmers define for every data these simple criteria, and let the BitDew runtime environment manage operations of data creation, deletion, movement, replication, and fault-tolerance operation. The BitDew runtime environment is a flexible environment implementing the API. It relies both on centralized and distributed protocols for indexing, storage and transfers providing reliability, scalability and high-performance. In this paper, we present the architecture of the prototype, and we describe in depth the various mechanisms used. We also provide detailed quantitative evaluation of the runtime environment on the GRID5000 experimental Grid platform. Through a set of micro-benchmarks, we measure its costs and benefits, components by components, of the underlying infrastructures.

Status

in progress

Resources

  • Nodes involved: 500
  • Sites involved: >3
  • Minimum walltime: 1h
  • Batch mode: yes
  • Use kadeploy: yes

Tools used

java RMI, jikes,ant,xml,hsqldb

Results

Not yet
Illustrating chart picture not found

Shared by: Haiwu He
Last update: 2008-11-13 21:30:55
Experiment #300

Personal tools
Namespaces

Variants
Views
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox