PDSH

From Grid5000

Jump to: navigation, search

PDSH (Parallel Distributed SHell) issues commands to groups of hosts in parallel. It is a rewrite of IBM's dsh implementation done at LLNL.

Contents

Installing PDSH

Debian-way

PDSH is part of the official Debian package repository

Red Hat-way

RPM packages for PDSH are available on its SourceForge project page

Other ways

Tarballs are available on PDSH SourceForge project page

Using PDSH

PDSH: issuing commands

Basics

To perform a shell command:

pdsh [options]... command

To kill an instance of pdsh, two SIGINT, within one second, have to be sent:

pdsh@frontale: interrupt (one more within 1 sec to abort)
pdsh@frontale: node-1: command in progress
pdsh@frontale: node-2: command in progress
sending SIGTERM to ssh node-1 pid 8298
sending SIGTERM to ssh node-2 pid 8299
pdsh@frontale: interrupt, aborting.

With no specified command, it runs interactively:

pdsh [options]... 
pdsh> command

RCMD Modules

pdsh knowns various methods to run commands on remote hosts. These methods are implemented via dynamically loadable modules, called RCMD modules.

  • Get available RCMD modules list:
pdsh -L
  • Run command via SSH:
pdsh -R ssh [options]... command
  • Run command via RSH:
pdsh -R rsh [options]... command
  • Setup SSH as the default module (this can be done in /etc/environment to apply to everybody):
PDSH_RCMD_TYPE=ssh

Targeting nodes

Include

With the -w argument, a list of nodes can be specified to pdsh. This argument accepts a comma-separated list:

pdsh -w node-1,node-3 command

The node list may contain hostlist expressions of the form 'node-[1-5,7]'.

Image:Note.png Note

A list consisting of a single '-' character causes the target nodes to be read from stdin.

Exclude

With the -x argument, a list of nodes can be excluded from the included list. As for -w, -x argument accepts a comma-separated list:

pdsh -w node-[1-5,7] -x node-2,node-4 command

The nodelist may also contain hostlist expressions of the form 'node-[2-3,5]'

Hostlist expressions

As a convenience on clusters with a prefixXXX naming convention, pdsh accepts lists of hosts the general form: prefix[a-b,c-d,e,...]. This is only an alternative to explicit lists of hosts. Some examples of usage follow:

  • Run command on node-1,node-2,...,node-5
pdsh -w node-[1-5] command
  • Run command on node-7,node-9,...,node-12
pdsh -w node-[7,9-10] command
  • Run command on node-1,node-2,node-6,...,node-9
pdsh -w node-[1-9] -x node-[3-5] command
Image:Warning.png Warning

Some shells, like tcsh, will interpret brackets for pattern matching. So, it may be necessary to enclose ranged lists within quotes.

Machines file

pdsh can use a machine file to apply command to a specific list of nodes:

pdsh -a -F $OAR_NODEFILE command
Image:Warning.png Warning

PDSH versions prior to 2.12 cannot handle FQDN hostname: they are shortened to their canonical part. These PDSH versions cannot launch command accross the grid.

If no machines file is specified, pdsh uses the default /etc/genders:

pdsh -a command

DSHBAK: formating output

dshbak formats output from pdsh command. It prints a header, wich contains node's name, before each node output. dshbak can compress identical output on demand. This way, headers contain hostlist expression of the nodes that match the output.

  • Format output of the root filesystem location:
$ pdsh -w node-2[1-4] "mount | grep 'on / '" | dshbak
----------------
node-24
----------------
 /dev/sda7 on / type ext3 (rw,errors=remount-ro)
----------------
node-22
----------------
 /dev/sda2 on / type ext3 (rw,errors=remount-ro)
----------------
node-23
----------------
 /dev/sda2 on / type ext3 (rw,errors=remount-ro)
----------------
node-21
----------------
 /dev/sda2 on / type ext3 (rw,errors=remount-ro)
  • Compress output of the root filesystem location:
$ pdsh -w node-2[1-4] "mount | grep 'on / '" | dshbak -c
----------------
node-24
----------------
 /dev/sda7 on / type ext3 (rw,errors=remount-ro)
----------------
node-[21-23]
----------------
 /dev/sda2 on / type ext3 (rw,errors=remount-ro)
Image:Note.png Note

dshbak is a powerful tool to detect heterogeneity.

PDCP: copying files

Basics

To perform file copy:

pdcp [options]... src [src2...] dest

As for pdsh, to kill an instance of pdcp, two SIGINT, within one second, have to be sent:

pdcp@frontale: interrupt (one more within 1 sec to abort)
pdcp@frontale: node-1: command in progress
pdcp@frontale: node-2: command in progress
sending SIGTERM to ssh node-1 pid 24510
sending SIGTERM to ssh node-2 pid 24511
pdcp@frontale: interrupt, aborting.

RCMD Modules

As pdsh does, pdcp can choose a specific method to connect to remote hosts.

  • Copy files via SSH:
pdcp -R ssh [options]... src [src2...] dest
  • Copy files via RSH:
pdcp -R rsh [options]... src [src2...] dest

Targeting nodes

Same inclusion, exclusion and hostlist expressions mechanisms as pdsh ones are available for pdcp to target nodes.

Limitations

  • Source files must be on the local host (eg. where pdcp runs).
  • Each destination node listed must have pdcp installed for the copy to succeed.

pdcp copies files to multiple remote hosts in parallel. These files must be on the local host. Each destination node listed must have pdcp installed for the copy to succeed.

Links

Personal tools
Wiki special pages