We can’t deny the fact that the traditional model of data management and hosting has changed, and users now want their data/resources to be centralised. This is possible because of the innovation of virtualisation technology. It has become much easier for large organisations and even small businesses to centralise not only their storage but also everything they came across in the IT world. Due to the rapid increase in structured and non-structured data, there is the need for good management of storage, without a waste of resources.
If we talk about Storage as a Service (STaaS) in cloud computing and if you are a cloud service provider, then a lot of your resources get wasted once fixed storage space is allotted to a client. Most of this space is rendered useless since the client doesnt use some of it and you are not allowed to use it either. To solve this or to minimise problems like this, it is preferable to use file data storage as a shared storage solution rather then object storage or a storage attached network (SAN).
In this article, I will try to lead you through the procedure of using and managing a file storage solution and also qemu-img virtualisation. I will even try to take you through some new techniques of disk virtualisation, which can be really useful at some point for your cloud server. I have used simple techniques, which can be easily understood and are not very striking but will really help you at different levels of storage system development.
All the tools used are generally available on most Linux systems; if not, these can be easily downloaded via your default online repository. The tested system for these commands is RHEL 7.0, but they will work on other Linux systems in the same way, with a little modification.
File storage: Solving the storage problem
If you are a STaaS provider, you will one day definitely come across the problem of a lot of your storage being wasted. This is because, if a user purchases 10GB of storage space from you, then you are forced to allocate him his 10GB, regardless of whether he will use all this space at once. Another challenge you will face frequently is that of maintaining the right amount of storage to provide the best service to your customers. As an example, if your data storage has 100GB of space left, someone may unexpectedly demand 200GB of storage from you, and you may have to refuse. But a good provider should fulfil all user requests and provide 24/7 service. So a good solution is to use file storage, with which you can scale your storage according to the data inside it and allocate space that you don’t even have.
File storage relies on the fact that everything in your operating system is a file. In file storage, we create a file and use it as our virtual partition, then format it in the desired file system and mount it. All these operations are done in our real disk partition (/dev/sda*) with some small tricks or manipulation. For this procedure, we use sparse images as our disk.
Sparse files: A sparse file is a specific type of file that aims to use the file system space more efficiently by using metadata to represent empty blocks. In sparse files, blocks are allocated and written dynamically as the actual data is written, rather than at the time the file is created. So, if you create a 10GB sparse file, it will not even take 1MB of your disk space but in its property, it will show 10GB as that is the allocated space. So let’s start the procedure.
A sparse file can be created by either using the truncate or the dd utility in Linux (other tools are also available).
$truncate --size=1GB test.img
The above command will create a 1GB sparse image test.img.
To get the same result with dd, use the following command:
$dd if=/dev/zero of=test.img bs=1024 count=0 seek=$[1024*1000]
Here, bs is block size and the size is provided by seek.
You can also use the following code for simplicity:
$dd if=/dev/zero of=test.img bs=1 count=0 seek=1G
If you want to allocate the whole disk space at once (which is not a good solution in our case), you can use fallocate, as follows:
$fallocate -l 1G test.img
Note: Here’s a little trick: Can I create a 2TB partition on my 1TB hard disk? The answer is, “Yes!” Since sparse files have the property of not taking up space during creation.
$truncate size=2TB mylargefile.img
Create any desired file system on it, as follows:
$mkfs.ext2 mylargefile.img $mount mylargefile.img /mnt/
You now have a 2TB partition.
After creating the file, you can format it and directly mount it in order to use it as your storage, but to enable it to do lots of disk operations it first has to connect to the loop device using the losetup utility. To attach the file to your loop device, use the following command:
$losetup -f test.img
where test.img is your formatted file. To see all such files or block devices connected to the loop device, use the following command:
$losetup -a
You can easily grep the last created loop device by using the command below:
$losetup -a | tail -1
After creating the loop device, it’s time to format it. You can use any file system, but I recommend using the ext2 file system (the reason will be explained later).
To format, use mkfs utils:
$mkfs.ext2 /dev/loop0
If you are working on big data management, then the xfs file system is preferred, because it works well on handling large files and supports larger inode data. Now you can mount it to use as your virtual storage, which can be shared, as follows:
$mount /dev/loop0 /mnt/
To share it across the network using a service like nfs, open /etc/exports and use the following command:
/mnt/ *(rw,rsync,no_root_squash)
Close it and restart the nfs service by using the command given below:
$service nfsd restart
Once you put the data inside the device, you can check its original occupied size by using the qemu-img utility, as follows:
$qemu-img info test.img
or by using du, as shown below:
$du -h test.img
General disk operation
Scaling up the size: If a user wants more storage, it can get difficult for you since this type of storage doesn’t support disk operations like resizing (especially online resizing) and you can’t unmount or put the storage offline. Using a small trick and with a bit of manipulation, however, you can achieve this. Just follow the exact procedure I have outlined. Suppose a user wants his space to be increased by 1GB (i.e., 2GB total), then you first need to increase the size of the file by 1GB. For that you can use the same truncate tool, since it works on already created files and also creates the file of the desired size.
$truncate --size=2GB test.img
or:
$qemu-img resize test.img +2GB
You will find that there is no change in the mounted space, since the loop device has not detected the increased size. To make the loop device detect this, it will first have to be checked by e2fsck, using the following command:
$e2fsck -f test.img
After that, since the file system is only made on a 1GB partition, it has to be resized. However, we don’t want data to be lost or deleted and can’t handle the break in service by unmounting it for even a few seconds. So we need to do online resizing by using the command given below:
$resize2fs test.img
We have specially used the ext2 file system, since online resizing and file checking only works perfectly for this file system with resize2fs utility. This will make our file partition ready for use. However, all these changes have not yet been detected by the loop device; so the same operations have to be done on it.
$losetup -a |grep mnt $losetup -c /dev/loop0
With the -c operation, losetup will detect the increase in the size but you still need to make the change in the file system for the loop device.
$resize2fs /dev/loop0
With this, the new resized storage is ready to be used.
Decreasing the storage: To decrease the partition size, it is not required to decrease the size of the file with the ‘truncate’ utility, since it may leave some half data chunks or bad blocks, and due to these bad blocks you will not be able to use it until you format it again. So, the best solution can be to just decrease the file system layer on the file using the resize2fs command:
$resize2fs test.img 1G
-where 1G is the new decreased size. You do not need to worry about the size shown in metadata since the user will only be able to use the formatted space. Since the file is a sparse file, the remaining space will not take any space in your system. In the looped device, we just need to detect the changed size, as follows:
$losetup -c /dev/loop0
Now, your storage of decreased size is ready.
Tip: To detach the storage from the loop device use the following command:
$ losetup -d /dev/loop0
Backing up your data: Features that every storage service provider must have are backup, snapshot and clustering. Data snapshots seem to be the less sensible option, so let’s not concentrate on them and discuss the topic at a later stage. But talking of backing up data, this feature creates a backup file which saves all the useful data for future use.
So let’s take a backup, which can be successfully done by using the rsync utility:
$rsync -avz test.img test_backup.img
This will create a test_backup.img file with the same data blocks. So every time you run this command, new changes made since the last backup will get saved in test_backup.img.
Warning: If you create the backup image of your file system through rsync, the backup file will not be a sparse file; so it will allocate all of the space at once.
Although this looks simple, it is not a very efficient solution since once data starts overwriting in some blocks in the original storage image, it will get copied exactly in the backup image.
So a better solution can be to rsync the mounted path:
$rsync -avz /mnt/ /mnt_backup
To make the backup process automatic, you can use lsyncd for live synchronisation. Install lsyncd in an RHEL system, as follows:
$yum install -y lsyncd
Edit the configuration file lsyncd.conf, as follows:
$cat /etc/lsyncd.conf ---- -- User configuration file for lsyncd. -- -- Simple example for default rsync. -- settings = { logfile = /var/log/lsyncd.log, statusFile = /var/log/lsyncd.stat, statusInterval = 2, } sync{ default.rsync, source=/mnt/, target=192.168.1.15:/backup/, rsync={rsh =/usr/bin/ssh -l root -i /root/.ssh/id_rsa,} }
Note: You need to first connect to the backup machine by ssh, using ssh-keygen.
If you want to automate the backup process, you can use fsmonitor npm, as follows:
$fsmonitor rsync -azP /mnt/ /mnt_backup
You can even use rsnapshot for incremental backup.
Snapshots of the storage data: Snapshot is a facility available in different Linux distros to save your storage at particular stages. It can be a useful feature for STaaS providers, because it will help your storage to revert back to any previous state. Snapshot is different from backup because it doesnt take up your storage space until changes are made in the storage. To save space, it just copies the files that have been deleted. For our file storage, we will be using the qemu-img utility for a snapshot. So first create a new snapshot of your storage, as shown below:
$qemu-img snapshot -c backup_snapshot test.img
-c is used for creating a new snapshot. To revert back to a particular state, use the following command:
$qemu-img snapshot -a 5 test.img
-where 5 is the snapshot ID. To see all the available snapshots, use the command given below:
$qemu-img -l test.img
To delete a snapshot, use the command shown below:
$qemu-img snapshot -d 2 /images/sles11sp1.qcow2
Securing your virtual storage: The other advantage of using file storage is that its easy to ship like a container. But with shipping comes the responsibility of securing your storage. So a good solution for securing your virtual data storage is to protect it by using encryption, for which you will need a password every time to mount it. For encrypting your storage, let’s use dm-crypt. We will try encryption on fresh file storage, as follows:
$truncate encrypted.raw --size=2GB
Next, set up a LUKS header, as follows:
$cryptsetup luksFormat encrypted.raw
Warning:
Don’t try this with an already formatted partition because it will delete all the previous data inside the partition.
This will prompt you to enter a fresh password. To gain access to the device, use the command given below:
$cryptsetup open encrypted.raw my_encp.raw
my_encp.raw is the name of the file where our partition is mapped in /dev/mapper/. Now you can create a file system on top of it, as follows:
$mkfs.fstype /dev/mapper/my_encp.raw
Mount the newly created partition anywhere with the following command:
$mount -t ext2 /dev/mapper/my_encp.raw /mnt/
Once use of the storage is finished, you can unmount it, as follows:
# umount /mnt/ # cryptsetup close my_encp.raw
Note: You can do the same disk scaling and other operations on disk, but now you need to make changes to /dev/mapper/my_encp.raw rather then /dev/loop0.
Using a file storage virtual machine: A great benefit of file storage is its use as a base storage for virtual machines. Once you decide to use a file storage for an OS run on a virtual machine, other operations like scaling and encryption can also be applied to it. To create a virtual machine instance from our already created storage, let’s use the qemu-kvm utility, as shown below:
$qemu-kvm -name my_os -m 1024 -smp 2 -drive file=test.img,if=virtio,\ index=0,media=disk,format=raw -drive file=ubuntu-14.04.iso,index=1,media=cdrom
This will start a new virtual machine with the minimal option selected. Here, -m defines the amount of RAM allocated and -smp refers to the number of cores. You can read about more available options in the qemu-kvm man page or by using qemu-kvm help. To start an already created virtual machine, use the following command:
$qemu-kvm -name my_os -m 1024 -smp 2 -drive\ file=/images/sles11/hda,if=virtio,index=0,media=disk,format=raw
File storage can be a good solution for your enterprise environment or personal use, but what matters is how you make it useful at a different level of your cloud solution. Storage management in the case of file storage doesn’t end here since there is a lot more you can do with it. It can be a powerful as well as flexible solution for storage management, but requires thorough knowledge to get the job done.
References
[1] For lsyncd: http://www.linuxtechi.com/install-and-use-lsyncd-on-centos-7-rhel-7/
[2] For qemu utility: https://www.suse.com/documentation/sles11/book_kvm/data/book_kvm.html
[3] For Dm-crypt encryption: https://wiki.archlinux.org/index.php/Dm-crypt/Encrypting_a_non-root_file_system