Debian Diskless Cluster Howto
Inital Setup
This guide will walk you through the diskless cluster install and setup process. The cluster has a head node that serves boot images to the compute nodes and the database node. We’re attempting to present a unified system image across the cluster. For this reason, all nodes are looking at the same root filesystem, served via NFS.
DHCP
I started from a bare-bones netinstall of Debian Squeeze (testing) on the head node. This should work about equally well on any Debian-derived distribution.
First, we need to install some packages we’ll need in a minute.
sudo apt-get install dnsmasq syslinux nfs-kernel-server nfs-common debootstrap tftpd-hpa xinetd
Now, we need to configure dnsmasq, which will serve as our DHCP server for diskless booting.
You replace your existing /etc/dnsmasq.conf with something like this:
dhcp-range=192.168.1.50,192.168.1.150,255.255.255.0,12h dhcp-boot=pxelinux.0,headnode,192.168.1.1
Replace 192.168.1.x with your preferred IP subnet and “headnode” with the hostname of your head node.
tftp
Our tftp server needs to be configured to launch on command from xinetd. The binary is already installed from our previous apt-get command.
create a file, /etc/xinetd/tftp-hpa that looks like this:
service tftp
{
disable = no
id = chargen-dgram
socket_type = dgram
protocol = udp
user = root
wait = yes
server = /usr/sbin/in.tftpd
server_args = -s /var/lib/tftpboot/
}
PXE
Now we need to tell the PXE server what to serve our clients.
Let’s set up our pxelinux configuration directory.
sudo cp -r /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/ sudo mkdir /var/lib/tftpboot/pxelinux.cfg
We’ll need a kernel and a initial ramdisk to give to your diskless clients. Assuming you’re going to be running the same kernel on the head node as your diskless clients (recommended), you can just copy the kernel from /boot.
sudo cp /boot/vmlinuz-`uname -r` /var/lib/tftpboot/
You’re going to need to create a NFS-root-enabled ramdisk. This accomplished with the tool mkinitramfs. You should have a configuration directory, /etc/initramfs-tools/. Make a copy of it:
sudo cp -r /etc/initramfs-tools /etc/initramfs-pxe
Note: On Debian Squeeze, the installed /etc/initramfs-tools did not work for unknown reasons, it seems to be missing module configurations. I ended up copying a /etc/initramfs-tools from an Ubuntu 8.04 install. It worked fine.
Edit /etc/initramfs-pxe/initramfs.conf. Change BOOT=local to BOOT=nfs.
Now we can create the ramdisk.
sudo mkinitramfs -d /etc/initramfs-pxe -o /var/lib/tftpboot/initrd.img-`uname -r` `uname -r`
We should be ready to create a default boot configuration now. We’ll need to create /var/lib/tftpboot/pxelinux.cfg/default
LABEL linux KERNEL vmlinuz-2.6.29 APPEND root=/dev/nfs initrd=initrd.img-2.6.29 nfsroot=192.168.1.1:/home/nfsroot ip=dhcp rw
Change 2.6.29 to match your kernel, obviously.
If you want to pass different parameters to different machines, you can create individual configuration files in /var/lib/tftpboot/pxelinux.cfg/ based on their MAC addresses. For example, if I create a file, /var/lib/tftpboot/pxelinux.cfg/01-00-21-97-7a-24-0f, then my node with a MAC of 00:21:97:7a:24:0f will load that instead of the default. I like to create softlinks in the configuration directory corresponding to the hostnames of my nodes because if you can remember MAC addresses of individual machines then you’re a better man than I am.
NFS
NFS time! Create a directory to store your NFS root you’ll be serving clients.
sudo mkdir /home/nfsroot
Edit /etc/exports. It should look something like this:
/home/nfsroot 192.168.1.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash)
Now we just have to bootstrap a basic Debian install into /home/nfsroot. Luckily for us, there’s a nifty little tool called debootstrap that does just that. For a 64-bit Debian Squeeze environment, I do this:
debootstrap --arch amd64 squeeze /home/nfsroot/
A few minutes later, it’s installed. Now you need to make some modifications to that system you just installed.
Edit /home/nfsroot/etc/fstab to look something like this:
#proc /proc proc defaults 0 0 /dev/nfs / nfs defaults 0 0 none /tmp tmpfs defaults 0 0 none /var/run tmpfs defaults 0 0 none /var/lock tmpfs defaults 0 0 none /var/tmp tmpfs defaults 0 0 none /media tmpfs defaults 0 0 none /var/log tmpfs defaults 0 0
/home/nfsroot/etc/network/interfaces should be:
auto lo iface lo inet loopback iface eth0 inet dhcp
Note that auto eth0 isn’t there anymore. That’s because your primary ethernet interface is already up. If you try to initialize it again, it might drop your existing connection and it’ll dump you out of the boot process.
Testing
At this point you’re ready to test. Make sure to restart xinetd, dnsmasq and nfs-kernel-server to make sure your new settings take effect. Then, check your node’s BIOS to verify that network boot is enabled and give it a shot.
Congratulations! You now have a diskless cluster. Next we’ll make some special modifications to the configuration of the nodes to make them play nicely together and make maintenance easier.
Networking
Each node will receive an IP address from dnsmasq on the head node. We can either just note which IP each node gets, as it should give each node a unique IP by default and these are persistent as long as the node’s MAC address remains the same, or you can force each node to a specified IP with a configuration similar to this in /etc/dnsmasq.conf on the head node:
dhcp-host=id:00:21:97:7d:ad:bf,192.168.1.10 dhcp-host=id:00:21:97:7a:24:0f,192.168.1.11 dhcp-host=id:00:21:97:7d:b3:26,192.168.1.12
Either way, you’ll need /etc/hosts on your head node to reflect the IP addresses of your nodes. Mine looks like this:
127.0.0.1 localhost 10.13.99.1 scoop head 192.168.1.10 dizzy db 192.168.1.11 tumbler 192.168.1.12 scrambler
You’ll want to copy that hosts file over to /home/nfsroot/etc/hosts as well.
Init Tricks
Sometimes you want the nodes to behave just a little bit differently from each other. I wanted my nodes to have different hostnames, fancy that. So, I wrote this bash script to figure out what their hostname should be:
#!/bin/bash
#finds node's hostname based on matching ip in /etc/hosts
grep `ifconfig | grep 'inet addr:'| grep -v '127.0.0.1' | /usr/bin/cut -d: -f2 \
| /usr/bin/awk '{ print $1}'` /etc/hosts | /usr/bin/awk '{print $2}'
Save the script in /home/nfsroot/bin/whereami. You’ll need awk in order for it to work. Boot up a node and just apt-get it from the node itself before running the script. Package installation is best accomplished from a booted diskless node, just try not to install multiple packages from multiple nodes simultaneously. You might corrupt your apt database.
Now that we have that taken care of, we can modify /etc/init.d/hostname.sh to set our hostname on boot based on the IP we’ve received. This is as simple as changing this:
[ -f /etc/hostname ] && HOSTNAME="$(cat /etc/hostname)"
To this:
[ -f /etc/hostname ] && HOSTNAME="$(/bin/whereami)"
This also allows us to modify other init scripts so they’ll only run on particular nodes. For example, I wanted MySQL to start only on the database node, dizzy. So I added this to the top of /etc/init.d/mysql:
hostname=$(hostname) if [ $hostname != "dizzy" ]; then exit 0 fi
Logging
Since we’re not saving local log files on the diskless nodes, it makes sense to centralize our logging on the head node. We’ll need a better logging daemon to accomplish this.
On both the head node and a diskless node (only do this on one of your nodes, changes populate to the others, remember?)
sudo apt-get install syslog-ng
Edit /etc/syslog-ng/syslog-ng.conf on the head node.
## add this to the options section
create_dirs(yes);
long_hostnames(off);
keep_hostname(yes);
## add this to the source section
source s_udp {
udp ( ip(192.168.1.1) ); # replace with your system's IP address
};
## add this to the destination section
destination df_udp {
file ("/var/log/$HOST/$FACILITY");
};
## add this to the log section
log {
source(s_udp);
destination (df_udp);
};
Now edit /etc/syslog-ng/syslog-ng.conf on one of the diskless nodes.
## add this to the destination section
destination remote_udp { udp("192.168.1.1"); }; # replace with your log server's IP address
## add this to the log section
log { source(src); destination(remote_udp); };
Restart syslog-ng on both head and diskless nodes.
That’s It
I hope this was helpful. Feel free to ask questions or leave comments.

anbi
June 19, 2009
What to do if you have already a LTSP system working with thin-client?
Thanks.
LnddMiles
July 24, 2009
Great post! I’ll subscribe right now wth my feedreader software!
gary
August 7, 2009
hie great article. it actually works may you please tell me the kind of physical setup you had when u did this. do you have 2 network cards and is this possible with like 1 nic and a switch
ahs1200
October 28, 2009
Brillian post. Thanks a lot. Worked like a charm.
Sune Beck
January 28, 2010
With this setup it seems the head server would become single-point-of-failure. Is there some way to load the node OS onto a local disk (or perhaps in-memory) upon boot and then continue from there?
It would be nice to, say, replace a switch or the head machine without affecting the (possibly high number of) nodes.
What is the best solution for this?
gabe
January 28, 2010
That’s an excellent point. The point of this particular setup was to setup a very fast, cheap, diskless cluster. You could very easily implement this in such a way that the disk storage was on a SAN or NAS instead, which would provide a more robust way to implement storage. I am intrigued by the idea of loading the root image into a ramdisk, but it would require an extra couple GB of RAM to do this, which is impractical for the configuration I’ve got. Let me know if you experiment with that particular idea any further.
gabe
January 28, 2010
I had two NICs and a switch. Because it’s its own private network, using a secondary NIC was easiest. You could just as easily use a virtual network adapter and a single NIC, however.
ahs1200
April 9, 2010
Hi Gabriel,
first of all: Nice work putting all this together.
I just want to remind though that, especially, when you are looking at clusters and a few more diskless nodes, you will run into trouble.
At least I did. I have obviously one central image mounted per nfs on 44 nodes. They again get the rest (/home, /data, /srv …) over automount amp; nfs from same head node.
This leads to daily 1-2 nodes losing the autofs mounts. There seems to be problems in this scale somehow. Can’t say where the limits lie.
Cheers.
gabe
April 10, 2010
Thanks for the input. I have not scaled this beyond 10 nodes, so I have not encountered this issue. I suspect something more robust than NFS would be required for a cluster larger than a handful of nodes.
Anderson
April 18, 2010
Hello Gabriel,
After following these steps I could not run any application that is distributed by the cluster. How do the processes to be distributed in the cluster automatically or as a program can take advantage of this type of cluster?
gabe
April 18, 2010
You need a message-passing interface and software which is designed for parallel processing. I suggest looking into http://www.open-mpi.org/
Bruce
May 19, 2010
You need a message-passing interface and software which is designed for parallel processing. I suggest looking into http://www.open-mpi.org/