The Complete Magazine on Open Source

The Way FreeBSD Jail Work

SHARE
/ 549 0
FreeBSD Logo
FreeBSD-based computer systems can be partitioned into several independent mini-systems called jails by using operating system level virtualisation, which is an implementation of FreeBSD. FreeBSD jails compartmentalise the system – its files and its resources—in such a way that only the right person has access to the right compartments (jails).

Currently, container virtualisation is very popular. It is used for PaaS (Platform as a Service) and IaaS (Infrastructure as a Service), although usage for the former is more common.
PaaS makes application deployment seamless and very easy for developers, who do not have to deal with the intricacies of server set-up and management, which can often be overwhelming.
FreeBSD is an excellent operating system for servers because its network performance (if tuned properly) trumps all other operating systems, and it has the Z File System. ZFS makes the life of a systems administrator extremely easy. There is ZFS on Linux as well, but due to licensing issues, Linux will never get native ZFS. The last time I tried, it was quite buggy and for some reason, the ZFSOnLinux developers had declared the version I was using as stable!

What is container virtualisation?
To quote Wikipedia, “Operating system–level virtualisation is a server virtualisation method where the kernel of an operating system allows for multiple isolated user space instances, instead of just one. Such instances (often called containers, virtualisation engines (VE), virtual private servers (VPS), or jails) may look and feel like a real server from the point of view of its owners and users.”
If you have used something like KVM, VirtualBox, etc, you might recall that you had to install a complete operating system from the installation disk, just as you would do for a physical machine. This mode is known as full virtualisation – the hardware, like the disk, CPU, etc, is emulated by the hypervisor (the program that runs on the host system) for the guest operating system.
The advantage of this mode is that two guests and the host are completely isolated from each other and it is even possible to run multiple operating systems on a single host machine, i.e., you can have a physical Linux server and run Linux, FreeBSD and Windows as guest operating systems over it.
The disadvantage of full virtualisation is that, it is not always beneficial. There are performance issues since there is emulation of a complete machine’s resources, and if you want to secure a single service – say you want to run an http server isolated from everything else, or a database server isolated from everything else — it is a pure overkill. You can’t use the full potential of your hardware in such cases.
For most use cases like these, container virtualisation is more than sufficient. There is no doubt that container virtualisation is the future of cloud computing with technologies like FreeBSD Jails, Docker (LXC or Linux Containers), etc.
Container virtualisation is implemented on UNIX-like systems using the chroot system call, by which it is possible to switch to a specific path on the file system, and have your shell and further programs consider the root as the path you are currently in.
There is a single kernel which runs on the host machine. It is not possible to run different operating systems (i.e., different kernels) or different CPU architectures (you can very well have an ARM virtual machine on a X86 host when you are using full virtualisation) in this kind of virtualisation.

FreeBSD Jails
FreeBSD Jails is one type of container virtualisation, which is supported on the FreeBSD operating system. This is pretty old technology with roots that go back to 1998 (as per Wikipedia). Each jail is an isolated FreeBSD machine with limited resources and privileges. A FreeBSD Jail has its own IP addresses and its own process namespace. One jail cannot see or access the other jails or the host’s data and/or processes. The only mode of communication between a process running inside a jail and another jail/host is via the network.
IPC (Inter Process Communication) using UNIX sockets is also possible, but it doesn’t have the strict restriction of using namespaces, and hence it’s disabled, by default. The officially recommended tool for managing FreeBSD Jails is ezjail, which can be installed on a FreeBSD system as follows:

(FreeBSD >= 10) pkg install ezjail
OR
(FreeBSD < 10)   pkg_add ezjail

If you prefer installing it via ports, then you can find it at sysutils/ezjail.
Open up /usr/local/etc/ezjail.conf in your favourite text editor and take a peek at the settings given there. These are ezjail’s settings (the utility itself). If you are using ZFS (there’s really no reason for not using it), you should turn on the ezjail_use_zfs_for_jails parameter in the configuration file and set ezjail_jailzfs for the dataset under which all the jails should be created.
In order to start using ezjail, you first need to install the basejail, which can be done by running the following command:

ezjail-admin install -p

Or, run the following command (if you need the base OS source code – in most cases, you don’t):

ezjail-admin install -s

You can even specify the FreeBSD version you want for the basejail using the -r option. By default, it will choose the version that is the same as that of the host.
This command will fetch the FreeBSD base OS archives from the FTP server and set it up.
Next, let’s create our first jail, as follows:

ezjail-admin create test ‘lo0|10.0.0.2/24’

ezjail will create the jail using your specified settings (like mine is at /usr/jails/test). As I mentioned earlier, the only way to communicate from host to jail and vice versa is via network sockets; the IP address we assigned above, i.e., 10.0.0.2/24, is the IP address of the jail. We must have an IP address in the same subnet (e.g., 10.0.0.1/24) on the host for the networking to work.
Note that a jail doesn’t need routing tables like traditional virtual machines, and neither can jail  manipulate the firewall and routing tables.
If you have extra public IPs, you can assign one of those to the jail and get direct access to the jail from the Internet. For example, if your public interface is re0 (the name of the interface depends on your NIC and ReX is the name for realtek NICs), then you can assign a public IP to the jail during creation, as follows:

ezjail-admin create test ‘re0|a.b.c.d/cidr’

There is one problem with the above set-up though. Inside the jail, all traffic to 127.0.0.1 is redirected to the first IP address allocated to the jail. So, if you assigned just one public IP to it, then all traffic being sent to the loop-back address from inside the jail would appear to come from the public IP. This causes difficulties in firewalling and it’s a security risk as well, because you definitely don’t want to open up everything to the outside world.
To allocate two IP addresses to a jail during creation, use the following command:

ezjail-admin create test ‘lo0|10.0.0.2/24,re0|a.b.c.d/cidr’

So now, all traffic to 127.0.0.1 inside the jail will be sent to 10.0.0.2. If you already created the jail, you can modify this configuration at /usr/local/etc/ezjail/<jailname>. In this case, the jailname is test.

IPFW firewall and jails
IPFW is the default firewall in FreeBSD, and was created by the FreeBSD team itself—of course, with inspiration from other firewalls. FreeBSD supports OpenBSD’s PF and NetBSD’s IPF as well, but I tend to stick to IPFW because it is better integrated with the OS.

NAT
In most cases, you have just one public IP address, but you definitely want to have network access to your jails; otherwise, they’re almost useless (well, there are development-related use cases, but this article isn’t about that). IPFW has a built-in facility for NATing, which has the same syntax as the user space NAT daemon natd. Let’s suppose your public IP address is a.b.c.d, and configure the NAT rule, as follows:

ipfw nat 123 config ip a.b.c.d same_ports unreg_only

With the above command, you’ll have a NAT instance number 123, which will send all traffic through a.b.c.d (see the documentation in the IPFW manual page for same_ports and unreg_only). This is just the NAT instance configuration; we haven’t yet set up the rules for allowing traffic through the jails to the Internet. This part is a bit complex to understand and it took me many days to get this working. So I strongly recommend that you try this out in a virtual machine on your laptop/desktop first before trying it out on a machine to which you are connected over ssh.
I use a stateful firewall for outgoing traffic, so the first rule after permitting local traffic would be as follows:

ipfw add 200 check-state

But we need to add the incoming NAT rule before this one, so that the firewall can forward the NAT traffic correctly – i.e., if some connection was initiated from inside the jail, the packets related to it should go back there and not somewhere else where they don’t belong.

ipfw add 199 nat 123 ip from any to a.b.c.d in

The above rule number 199 comes before our check-state rule number 200, so IPFW will check for NAT traffic arriving on public IP address a.b.c.d before checking for dynamic rules.
Let’s suppose you have given jails IPs in the 10.0.0.0/24 subnet. Now, when you want to provide outgoing Internet access via NAT, you need to add rules like these before the rules for filtering outgoing traffic on the host. Now, let’s consider a stateful outgoing traffic filtering rule on port 80 (i.e., let the machine access other http servers):

ipfw add 400 allow tcp from me to any 80 out setup keep-state

What this rule does is that whenever you initiate a connection to an external machine on port 80, it will first allow the SYN packet (setup flag) to pass, and then insert a dynamic rule for allowing incoming traffic appearing from port 80 on that IP address to your machine (the keep-state flag).
Before this rule, you need to add a rule so that traffic coming from 10.0.0.0/24 gets NATed and doesn’t pass out directly. So the two rules (including the previous one) would be (the order is critical!) as shown below:

ipfw add 400 skipto 700 tcp from 10.0.0.0/24 to any 80 out setup keep-state
ipfw add 400 allow tcp from me to any 80 out setup keep-state

IPFW allows the same index numbers for different rules, and when that happens, rules are applied sequentially as they were added. With the ‘skipto 700’ action in the above rule, we tell IPFW to jump to rule number 700 for all packets arriving from 10.0.0.0/24 and trying to leave the machine to port 80. We need the following rule as rule number 700:

ipfw add 700 nat 123 ip4 from 10.0.0.0/24 to not 10.0.0.0/24 out

The above rule will apply the NAT for all  IPv4 addresses. NAT is not supported on IPv6 and the keyword ip means both v4 and v6.

Resource control
Jails can be resource limited using FreeBSD’s Resource Accounting (RACCT). For this to work, you need to recompile the kernel with the following options:

options RACCT
options RCTL

See the FreeBSD documentation for kernel compilation – it’s really easy. A couple of parameters are available that can be controlled using RCTL, which can be found in the man pages. I’ll describe one of these parameters -– vmemoryuse. It restricts the address space of a jail, effectively limiting its maximum memory usage.

rctl -a jail:test:vmemoryuse:deny=512M

The above command will add a rule to restrict the maximum address space to 512 megabytes.