The Complete Magazine on Open Source

The Z File System: It’s Honest and Different

, / 338 0

Data Integirty visual

The Z File System, or ZFS, is an advanced file system designed to overcome many of the major problems found in previous designs. Initially developed by Sun Microsystems, its development has now been moved to the OpenZFS Project.

Data is priceless. Small ventures to the fortune 500 spend crores to maintain backups and ensure data integrity. The loss of even a byte of data can impact huge MNCs and could possibly threaten a nation’s security. Ever since the evolution of computers and file systems, experiments and research have been going on in an effort to find an efficient, trustworthy backup system.

What is the Zettabyte File System (ZFS)?
ZFS is an advanced, open source file system and logical volume manager designed by Sun Microsystems for use in its Solaris operating system, and licensed under the Common Development and Distribution License (CDDL). The name ZFS is a registered trademark of the Oracle Corporation.

Why should you use ZFS?
The answer to this question lies in the three main design goals, which have translated into ZFS’most appealing features.

Data integrity: Integrity is the keyword that any serious user searches for. In ZFS, all data includes a checksum of the data. When data is written, the checksum is calculated and written along with it. When that data is later read back, the checksum is calculated again. If the checksums do not match, a data error has been detected. ZFS will attempt to automatically correct errors when data redundancy is available.
Pooled storage: Physical storage devices are added to a pool, and storage space is allocated from that shared pool. Space is available to all file systems, and can be increased by adding new storage devices to the pool.
Performance and scalability: It is a 128-bit file system that’s capable of managing zettabytes (one billion terabytes) of data. Multiple caching mechanisms provide increased performance. Examples include ARC, an advanced memory based read cache; L2ARC, a second level disk based cache; and disk-based synchronous write cache, ZIL.
In addition to these, ZFS supports multiple RAID levels (redundant array of independent disks) that further improves performance and diminishes redundancy.

Installing ZFS in Ubuntu
In Ubuntu, ZFS can be installed by running the following code:

$ sudo add-apt-repository ppa:zfs-native/stable
$ sudo apt-get update
$ sudo apt-get install ubuntu-zfs

Once the installation is over, run the command shown below:

$ sudo zfs list

This should give an output that states ‘No datasets available’, which ensures that the installation is fine.
Now we have just one hard drive.

Configuring ZFS
As mentioned earlier, our installation has just one hard drive. Let’s suppose that we need to add six more hard drives to our system.
Now, let us assume that we have finished adding the six hard drives. But since none of them is partitioned, they are currently unusable.
This is where the benefits of ZFS come in, freeing us from the burden of creating partitions (although we have the liberty of creating these, if we want to).
Let’s now create a storage pool using two of our hard drives:

$ sudo zpool create -f <storage pool name> </path/to/drive1> </path/to/drive2>

zpool create is the command used to create a new storage pool, and -f overrides any errors that occur (such as if the disk(s) already have information on them).
To see the newly created pool, run the following command:

$ sudo zfs list

Now let us look at where RAID levels come into play. What if we create a 2KB file in the storage pool that we just made? 1KB would automatically go to the first drive and 1KB to the second one. Then when we read the 2KB file, each hard drive would present 1KB to us, combining the speed of the two drives. Here we have used RAID 0. When you use RAID 0, your multiple disks appear to be a larger and faster hard disk. One disadvantage of RAID 0 is that it is fragile. If any one of the drives dies, we lose our entire data. But ZFS supports higher RAID levels that can overcome this disadvantage.
To get a feel of how ZFS overcomes data loss as in the above case, let us look at another scenario.
First, let’s delete the pool that we have created, as follows:

$ sudo zpool destroy <storage pool name>

Now we are going to create a RAID Z pool, which is an improved version of a RAID 5 pool (RAID 5 needs at least three drives and uses striping to divide data across all the hard drives, with additional parity data divided across all disks. If one of the hard drives dies, you won’t lose any of your data. RAID 5 offers data redundancy with a lower storage cost than RAID 1). RAID Z voids the ‘write hole’ by using copy-on-write. If a single disk in your pool dies, simply replace that disk, and ZFS will automatically rebuild the data based on parity information from the other disks. To lose all the information in your storage pool, two disks would have to die. To make things even more redundant, you can use RAID 6 (RAID-Z2 in the case of ZFS) and have double parity.
To achieve the above, run the following command:

$ sudo zpool create -f <storage pool name> raidz <drive1> <drive2> <drive3>

What makes ZFS different?
ZFS is significantly different from any previous file system because it is more than just a file system. Combining the traditionally separate roles of volume manager and file system provides ZFS with unique advantages. The file system is now aware of the underlying structure of the disks. Traditional file systems could only be created on a single disk — one at a time. If there were two disks, then two separate file systems would have to be created. In a traditional hardware RAID configuration, this problem was avoided by presenting the operating system with a single logical disk made up of the space provided by a number of physical disks, on top of which the operating system placed a file system. ZFS’s combination of the volume manager and the file system solves this, and allows the creation of many file systems — all sharing a pool of available storage.
One of the biggest advantages of ZFS’s awareness of the physical layout of the disks is that existing file systems can be grown automatically when additional disks are added to the pool. This new space is then made available to all the file systems. ZFS also has a number of different properties that can be applied to each file system, giving many advantages to creating a number of different file systems and datasets rather than a single monolithic file system.