Storage: Difference between revisions

From Grid5000
Jump to navigation Jump to search
(Drop deployed ceph: that's up to the experimenter and fall into the case of the reserved disks now.)
Line 25: Line 25:
|-
|-
| managed Ceph cluster || No || RADOS || medium-term || Temporarily free (until contention for space) || ~n X 10 Gb/s (n = parallelism) || Note-3
| managed Ceph cluster || No || RADOS || medium-term || Temporarily free (until contention for space) || ~n X 10 Gb/s (n = parallelism) || Note-3
|-
| deployed Ceph cluster || No || RADOS || non persistent || OAR || ~n X 10 Gb/s || Note-3
|}
|}


Line 35: Line 33:
*** for the managed Ceph Cluster at rennes (4 ''nodes'' each with 1 network interface), n = 4,
*** for the managed Ceph Cluster at rennes (4 ''nodes'' each with 1 network interface), n = 4,
*** for the managed Ceph Cluster at nantes (3 ''nodes'' each with 1 network interface), n = 3,
*** for the managed Ceph Cluster at nantes (3 ''nodes'' each with 1 network interface), n = 3,
*** for deployed Ceph clusters, n = (number of ''nodes'' deployed - 2).
** the aggregated disk bandwidth of all ''Object Storage Devices'' (OSD) used in the cluster. The recommendation is to have 1 OSD per physical disk.  
** the aggregated disk bandwidth of all ''Object Storage Devices'' (OSD) used in the cluster. The recommendation is to have 1 OSD per physical disk.  


Line 54: Line 51:


== On Node local disks reservation ==
== On Node local disks reservation ==
[[Disk_reservation|Disk reservation]] consists in reserving local hard disks of nodes, in order to locally store large datasets between reservations, and thus avoid the need of moving data to nodes at the beginning of every node reservation.  
[[Disk_reservation|Disk reservation]] consists in reserving extra local hard disks of nodes, in order to locally store large datasets between reservations, and thus avoid the need of moving data to nodes at the beginning of every node reservation.  
Disk reservation provides medium-term storage persistence.
Disk reservation provides medium-term storage persistence.


Line 76: Line 73:


'''''Disadvantages:''''' These are ''Object-based storage'': hence non-accessible using direct Unix filesystem commands.
'''''Disadvantages:''''' These are ''Object-based storage'': hence non-accessible using direct Unix filesystem commands.
== Dedicated storage on reserved nodes ==
This is storage that resides on nodes reserved for a single experiment. Hence, this is dedicated storage for an experiment. There is no contention from other users. It is '''not persistent''' storage. It is available as long as the node reservation lasts. Some clusters have nodes with multiple hard disks (HDD) and even SSD. In total, for each node, the number of disks can go up to 6 (e.g. cluster <code class="command">parasilo</code> in rennes), and their unit capacity can vary between 200 GB to 600 GB. [https://api.grid5000.fr/3.0/ui/quick-start.html '''This page'''] gives the details of resources available.
Hence, by ''aggregating'' storage on a few nodes (6-10) it is possible to achieve easily ~20 TB of dedicated storage for an experiment.
'''''Disadvantages:''''' The primary disadvantage is that the storage is non-persistent. Hence, a clever mechanism has to be used for marshalling (copy-in and copy-out) datasets, before and after each experiment.

Revision as of 14:30, 30 March 2018

Storage resources in Grid'5000

This gives a broad view of the different storage resources that are available for experiments on Grid5000. The focus is on Big Data experiments. Advantages and disadvantages will be highlighted to help the user decide on the optimal combination of storage resources. This will help in two directions:

  • It will help the user achieve better quality in experiments
  • It will avoid excessive usage of any single type of resource by a user, thereby making them available mutually to other users of Grid'5000.

In Grid'5000, there are multiple resources for data storage in experiments. Each has its own characteristics, advantages and disadvantages. They are summarised in the table below. Further details are discussed in the following sub-sections.

Comparison table of storage resources

The following table summarises the comparison of different aspects of storage resources on Grid'5000 - both persistent and non-persistent types:

Storage Resource Data recoverability? Protocol used Persistence period Provisioning mechanism Network connectivity Remarks
/home No NFS long-term Quota + User Acct mgmt Variable (1Gb/s - 10 Gb/s) Note-1
OSIRIM No NFS long-term Quota 1Gb/s Note-1
storage5k No NFS medium-term OAR Variable (1Gb/s - 10 Gb/s) Note-1
On node local disks reservation No - medium-term OAR - Note-2
Storage Array No NFS long-term Manual 10Gb/s inside LAN, 6Gb/s between storage array and server Note-1
managed Ceph cluster No RADOS medium-term Temporarily free (until contention for space) ~n X 10 Gb/s (n = parallelism) Note-3
  • Note-1: These storage resources use NFS accessed by multiple users. Hence, the performance is strongly dependent on the degree of contention during the time of experiment.
  • Note-2: This storage uses the hard disks of the reserved node.
  • Note-3: There are 2 factors to consider for performance:
    • aggregated network bandwidth of all the nodes in the cluster:
      • for the managed Ceph Cluster at rennes (4 nodes each with 1 network interface), n = 4,
      • for the managed Ceph Cluster at nantes (3 nodes each with 1 network interface), n = 3,
    • the aggregated disk bandwidth of all Object Storage Devices (OSD) used in the cluster. The recommendation is to have 1 OSD per physical disk.

For further interesting details on maximum storage capacity and degree of parallelism in Ceph clusters, see here.

/home

This is the principal storage space when logged-in on a Grid'5000 site: site:/home/userid. It is based on File System exposed by NFS. Each user has a quota of 25GB of storage on each site, with a reserve of 100GB. If required, the user can request to increase the size of the quota, using the account management interface.

OSIRIM

This storage space is available under /srv/osirim/<username> directory from frontends and nodes (in default environment or deployed -nfs and -big). It is provided by the OSIRIM project (IRIT lab) and exposed by NFS (using an autofs mount). Each user has a quota of 200GB of storage. If required, user can request to increase the size of the quota, by sending an email to support-staff@lists.grid5000.fr.

storage5k

This is another shared storage resource offered on certain sites of Grid'5000 (e.g. rennes, nancy, lyon, sophia, luxembourg). Space on storage5k can be reserved in chunks of 10GB each, over weeks or months.

Advantages: Possibility of easy persistent storage over a series of experiments.

Disadvantage: The disadvantages are those of an NFS server with multiple simultaneous users. Also, this tool is not available from every Grid5000 site (see Storage5k for details)

On Node local disks reservation

Disk reservation consists in reserving extra local hard disks of nodes, in order to locally store large datasets between reservations, and thus avoid the need of moving data to nodes at the beginning of every node reservation. Disk reservation provides medium-term storage persistence.

Advantages: Storage is available directly on the hard disk of the reserved node.

Disadvantage: You need to reserve a hard disk on a node, and then reserve the same node for carrying out your experiment.

Storage Array

This is another shared resource that consists of a bay of RAID disks, located at the Rennes site, cumulatively offering ~180 TB of storage space. This is long-term persistent storage required over months (or the duration of a research project). Reservation is manual, it is not automatic. To reserve this resource, prospective users need to contact the Grid'5000 technical team (Email: support-staff@lists.grid5000.fr).

Advantages: Long-term dedicated storage over months. Hence, less time spent in marshalling datasets between experiments (to stay within your usage quotas).

Disadvantage: Those of an NFS server. Network latencies also need to be taken into account. Currently, the storage resources are based in rennes site. For experiments running at other sites, one needs to be aware that NFS handles poorly inter-site latencies.

Managed Ceph clusters

These are Object-based storage resources (i.e. not offering a File System interface). They are based in rennes site (~9 TB) and at nantes site (~7 TB). Hence, overall they aggregate to ~15 TB of storage resources. For details about Managed Ceph resources see here.

Advantages: Ceph is a distributed object storage system designed to provide excellent performance, reliability and scalability using multiple nodes.

Using Virtual Block Devices with managed Ceph backend: If experiments can support variable performances then the managed Ceph clusters offer additional persistent storage resources. They will be used in this tutorial also. Here is an example of using virtual block devices for creating a persistent Database service on a virtual machine.

Disadvantages: These are Object-based storage: hence non-accessible using direct Unix filesystem commands.