Most computers nowadays support booting from network. This method is called PXE booting. The BIOS in combination with the network card ROM do a DHCP query for this and receive information about where to download the files required for booting. These are then transferred via TFTP, a stipped-down file transfer protocol. Now if a computer breaks (for example, you install Windows and it kills your Grub or you have a hard drive failure), you usually have to rely on a boot CD or boot USB stick. The problem with that is (and this has happened often to me, seriously!) that you don't need those CDs for a year or two and if you do they're outdated. On goes the process of downloading images, finding CD-Rs and so on. This annoyed me. Since I do have a file server running anyways, I wanted to boot from my file server. And I wanted to boot in a "real" Linux system, where I could download new packages, install custom scripts, configure everything to my liking. This page describes how to do exactly that.
The first thing to get PXE running is to prepare the files that the TFTP server will serve to its clients. This is PXELINUX. Setup for the file structure can first be done locally with user privileges (this is the "joequad" computer), later when we move everything to the deployment system, some work has to be done with root privileges on "brick" (our server).
The files which TFTP will serve first and which are automatically requested are from the Syslinux toolchain. They first initialize and display a rudimentary boot menu (or text prompt, depending on the configuration). They also take care that upon a selection of the boot menu, the appropriate kernel files are fetched over the network and booted into.
Therefore, the first step is to download a recent version of Syslinux. I used syslinux-4.04.tar.bz2. Unpack it:
joequad joe [~/pxe]: tar xfvj syslinux-4.04.tar.bz2
There's already a prebuilt pxelinux.0 executable (this is the first file that is fetched via tftp) in the Syslinux package. If you do not want to build it yourself, just use that:
joequad joe [~/pxe]: mkdir tftp joequad joe [~/pxe]: mv syslinux-4.04/core/pxelinux.0 tftp
If you want a pretty menu, you should also take the menu.c32 file from within the Syslinux package. For your convenience, this file has also also been pre-built by the Syslinux gurus.
joequad joe [~/pxe]: mv syslinux-4.04/com32/menu/menu.c32 tftp
In order to display items in a menu, PXELINUX has to know which menu points there are and what action to perform when those menu items are selected. This is done via a configuration file residing in the /pxelinux.cfg directory of the TFTP server. It is possible to customize menus on a per-client basis - we're going to go for the low-hanging fruit of "one configuration fits all", however. This file must be called /pxelinux.cfg/default.
joequad joe [~/pxe/tftp]: cat pxelinux.cfg/default default menu.c32 timeout 300 prompt 0 MENU TITLE PXE Boot Menu serving from brick.homelan.net LABEL local MENU LABEL Boot locally localboot 0 LABEL memtest86 MENU LABEL Memtest86 kernel kernel/memtest86/memtest86
The first boot label here will quit the PXE boot process and fallback to local boot methods (i.e. hard disk, etc.). The second label specifies Memtest86. This is an incredible useful tool which you should definitely have in your PXE boot menu (since you will want to test memory on new computers before you install an operating system). To include the files, first download the latest version of Memtest86, at the time of writing this document that is 4.0a. Download the .tar.gz, extract. The "precomp.bin" file in that package is a precompiled binary kernel that will run Memtest86.
joequad joe [~/pxe]: tar xfz memtest86-4.0a.tar.gz joequad joe [~/pxe]: mkdir -p tftp/kernel/memtest86 joequad joe [~/pxe]: cp memtest86-4.0a/precomp.bin tftp/kernel/memtest86/memtest86
So far you have prepared PXELINUX and have one booting option (Memtest86) in the menu configuration. This all can be done on your local system with user privileges. Conveniently, you can immediately simulate if your configuration file and menu setup works as you expect it to by using QEMU. QEMU has an option where it automatically serves DHCP and emulates TFTP by pointing to a local directory structure (as you show by now have):
joequad joe [~/pxe]: qemu-system-x86_64 -boot n -net nic -net user,tftp=tftp,bootfile=pxelinux.0
This way if you can run Memtest86, you already know that the configuration file layout works smoothly - now you only have to get it on your server.
To configure your dhcpd server to advise clients to use a PXE environment, you have to add two statements to your configuration. Suppose you have your file server running at 192.168.1.1 (which does DHCP and also will be the NFS server for the Debian system). You'll have to edit /etc/dhcp/dhcpd.conf and can either add the two statements in the "subnet" part (if it is valid for the whole subnet) or for specific clients (if you only want certain clients to boot PXE).
subnet 192.168.1.0 netmask 255.255.255.0 { range 192.168.1.200 192.168.1.220; filename "pxelinux.0"; next-server 192.168.1.1; }
The two directives that were added here for the whole subnet are "filename" and "next-server". Filename contains the name of the file which clients will then fetch via TFTP. The "next server" specifies where they should be fetched from. In our scenario, the NFS-server, DHCP-server and TFTP-server are all identical (192.168.1.1).
After you changed the configuration, restart dhcpd for the new settings to become effective.
First, you have to install tftpd-hpa on your server. For Debian or Ubuntu, the package is called tftpd-hpa:
brick [~]: apt-get install tftpd-hpa
During the configuration, specify a tftp root directory of /srv/pxe/tftp. Then create that directory and copy all the structure you've prepared beforehand into it (i.e. the "pxelinux.0" file must reside in /srv/pxe/tftp, etc.). Make sure all files are readable by the tftp daemon (which usually runs as user tftp).
The directory which will soon contain the Linux root file system will reside in /srv/pxe/rescue64 -- you will have to export this via NFS. To do this, add the following line to /etc/exports:
/srv/pxe/rescue64 *(rw,no_subtree_check,no_root_squash)
Then, export the shares after saving the changed file:
brick [~]: exportfs -a
Security advice: Note that this will export the whole /srv/pxe/rescue64 directory with full (i.e. root) read/write permissions to the whole network. If you're on a untrusted network you need to take measures to control this. Note that everyone will also be able to break your netboot-setup by removing system files after booting into the image. At the very least (to protect against unwanted changes) make regular backups of that directory on your server.
Now it's time to prepare the Debian system which will go to /srv/pxe/rescue64. For this we're going to use the tool "debootstrap". Become root and do:
brick [/srv/pxe]: debootstrap --arch amd64 squeeze rescue64 https://ftp.us.debian.org/debian I: Retrieving Release I: Retrieving Packages I: Validating Packages I: Resolving dependencies of required packages... [...] I: Configuring apt-utils... I: Configuring aptitude... I: Configuring tasksel-data... I: Configuring tasksel... I: Base system installed successfully. brick [/srv/pxe]: cd rescue64 brick [/srv/pxe/rescue64]: mount --bind /proc proc brick [/srv/pxe/rescue64]: chroot . root@brick:/# echo rescue64 >/etc/hostname
Now that the bootstrapping is done, chroot into the newly created system and install some stuff that you might want or need. Obvious things which are missing are kbd, less, sshd, cryptsetup, nfs-client, vim and most importantly: a kernel! YMMV -- it's a custom system, customize it as much as you want!
root@rescue64:/# apt-get install kbd less vim openssh-server dhcpcd cryptsetup nfs-client ddrescue screen linux-image-amd64 [...]
Some things which have to be definitely configured are the booting process (boot from NFS), fstab and the root passwd (maybe also inittab). Let's do those, one-by-one:
Edit the /etc/initramfs-tools/initramfs.conf and replace the line BOOT=local by BOOT=nfs. Then recreate the initramdisk:
root@rescue64:/# vi /etc/initramfs-tools/initramfs.conf root@rescue64:/# update-initramfs -u [...]
Remember to set a root password, or you're going to be locked out once you have your login screen:
root@rescue64:/# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully
Also disable the NFS client services from within the system or it will lock up upon bootup (when trying to start statd):
root@rescue64:/# update-rc.d -f nfs-common remove
Now if you've come this far you should have a tftp directory in /srv/pxe/tftp and a Debian root image in /srv/pxe/rescue64. Now you need to copy the kernel and initrd from the rescue directory into the tftp directory (so it gets served) and update the configuration a little. Then it should all work already.
brick [/srv/pxe]: mkdir tftp/kernel/rescue64 brick [/srv/pxe]: cp rescue64/boot/vmlinuz* tftp/kernel/rescue64/vmlinuz brick [/srv/pxe]: cp rescue64/boot/initrd.img* tftp/kernel/rescue64/initrd.img
Then, update the tftp/pxelinux.cfg/default configuration file by adding a new entry:
LABEL rescue64 MENU LABEL Debian Rescue x86_64 kernel kernel/rescue64/vmlinuz append initrd=kernel/rescue64/initrd.img root=/dev/nfs ip=dhcp nfsroot=192.168.1.1:/srv/pxe/rescue64
Going one step further, say you want to try out the whole PXE configuration, but don't want to reboot your PC constantly to see if the tweaks on your server yielded the expected result. No problem whatsoever, just use QEMU for that as well and use a tap interface that is bridged to your main network card. First prepare the network devices:
joequad [~]: brctl addbr br0 joequad [~]: brctl addif br0 eth0 joequad [~]: brctl setfd br0 0 joequad [~]: ifconfig eth0 0.0.0.0 joequad [~]: ifconfig br0 up joequad [~]: dhcpcd br0
Now you have basically created a bridge that has only your network card connected to it. You may or may not use DHCP (I do, but YMMV) and your network card may or may not be called "eth0". Important is to have the forwarding time of the bridge set to zero, or the PXE process will time out.
If you have setup your bridge as described above, it's time to fire up qemu (as root, since we need a tap device).
joequad [/tmp]: qemu-system-x86_64 -boot n -net nic -net tap
QEMU will then first create a tap device and attach that to your bridge br0. Then it will fire up a VM which tries to boot over network (using PXE). All packets the VM sends out will go directly out through the TAP device (and therefore out of your network card, since it's bridged). Very nice to test everything!
If you do not want to log in every time, you can change the default console to not query you for any password. For this, edit /etc/inittab and replace entries like this one
1:2345:respawn:/sbin/getty 38400 tty1
By this variant:
1:2345:respawn:/sbin/mingetty --autologin root /dev/tty1
Note that you have to have mingetty (Debian package mingetty) installed for this to work. Also, mingetty previously had a known bug (fixed in 2003) in which console fonts are reset if used. I worked around that by including this in the ~/.bashrc file
TTY=`tty` if [ "${TTY:0:8}" == "/dev/tty" ]; then /etc/init.d/console-setup start fi
If you do not want the comfort of your country locale missing from your rescue system, simply install it:
root@rescue64:/# apt-get install locales
Then, edit /etc/locale.gen and uncomment the locales that you want to have generated. For me this is de_DE.UTF-8. Afterwards, regenerate the locales:
root@rescue64:/# locale-gen Generating locales (this might take a while)... de_DE.UTF-8... done Generation complete.
You probably called the kernel "memtest86.bin". If the file extension is ".bin", PXELinux will try to load the file as a MBR instead of a kernel file. Rename the file and change the configuration, everything will work fine then.
Try to mount the NFS share via a separate computer. Are all UIDs 4294967294? Check the "mount" output:
brick:/srv/pxe/rescue64 on /tmp/x type nfs (rw,vers=4,addr=192.168.1.1,clientaddr=192.168.1.200)
The directory has been mounted using NFS4. Try falling back to NFSv3.
The tftpd-hpa has a really annoyingly stupid logging policy. By default it logs nothing. If started with "-v", it logs the requests (but not if they failed or not or why they failed). Only if you start it with "-vv", it will also log errors. Did I mention this is stupid? So edit your /etc/default/tftpd-hpa and append "-vv" to the TFTP_OPTIONS variable to get sane output. Then you might see something like:
Sep 18 17:01:31 homelan in.tftpd[19371]: RRQ from 192.168.2.211 filename kernel/memtest86/memtest86 Sep 18 17:01:31 homelan in.tftpd[19371]: sending NAK (2, File must have global read permissions) to 192.168.2.211
or
Sep 18 17:02:02 homelan in.tftpd[19437]: RRQ from 192.168.2.211 filename kernel/memtest86/memtest86 Sep 18 17:02:02 homelan in.tftpd[19437]: sending NAK (1, File not found) to 192.168.2.211
Which will enable you to fix the problem. Also, if you then still don't see any output you can be almost sure that the problem is before the tftp point (i.e. with your DHCP server configuration).
By default when a new device is added, it is first in the "learning" state before it switches to "forwarding". This is usually 10 seconds. Far too long for PXE -- it will by then long have timed out. So change the forwarding delay of the bridge to zero in order to get it working. To do this, do the following:
joequad [~]: brctl setfd br0 0