The real threats to stored data are breaches which, of late, have been affecting many cloud service providers. Security vulnerabilities that enable breaches result in a loss of millions of user credentials. In this article, we explore the prospects of setting up a personal data store or even a private cloud.
The European Organisation for Nuclear Research (CERN), a research collaboration of over 20 countries, has a unique problem—it has way more data than it is possible to store! We’re talking about petabytes of data per year, where one petabyte equals a million gigabytes. There are entire departments of scientists working on a subject termed DAQ (Data Acquisition and Filtering), simply to filter out 95 per cent of the experiment-generated data and store only the useful 5 per cent. In fact, it has been estimated that data in the digital universe will amount to 40 zettabytes by 2020, which is about 5,000 gigabytes of data per person.
With the recent spate of breaches affecting cloud service providers, setting up a personal data store or even a private cloud becomes an attractive prospect.
Data storage infrastructure is broadly classified as object-based, block storage and file systems, each with its own set of features.
This construct manages data as objects instead of treating it as a hierarchy of files or blocks. Each object is associated with a unique identifier and comprises not only the data but also, in some cases, the metadata. This storage pattern seeks to enable capabilities such as application programmable interfaces, data management such as replication at object-scale, etc. It is often used to allow for the retention of massive amounts of data. Examples include the storage of photos, songs and files on a massive scale by Facebook, Spotify and Dropbox, respectively.
Data is stored as a sequence of bytes, termed a physical record. This so called ‘block’ of data comprises a whole number of records. The process of putting data into blocks is termed as blocking, while the reverse is called deblocking. Blocking is widely employed when storing data to certain types of magnetic tape, Flash memory and rotating media.
These data storage structures follow a hierarchy, which controls how data is stored and retrieved. In the absence of a file system, information would simply be a large body of data with no way to isolate individual pieces of information from the whole. A file system encapsulates the complete set of rules and logic used to manage sets of data. File systems can be used on a variety of storage media, most commonly, hard disk drives (HDDs), magnetic tapes and optical discs.
Building open source storage
Network Attached Storage (NAS) provides a stable and widely employed alternative for data storage and sharing across a network. It provides centralised repository of data that can be accessed by different members within the organisation. Variations include providing complete software and hardware packages serving as out-of-the-box alternatives. These include software and file systems such as Gluster, Ceph, NAS4Free, FreeNAS, and others. As an example, we will look into the general steps involved in deploying such a system by taking the case of a popular representative of the set.
With enterprise-grade features, richly supported plugins, and an enterprise-ready ZFS file system, it is easy to see why FreeNAS is one of the most popular operating systems in the market for data storage.
Let’s take a deeper look at file systems since they are widely used in setting up storage networks today. Building your own data storage using FreeNAS involves following a few of the following simple steps:
1. You will need to download the disk image suitable for your architecture and burn it onto either a USB stick or a CD-ROM, as per your preference.
2. Since you will be booting your new disk or machine with FreeNAS, you will need to open the BIOS settings on booting it, and set the boot preference to USB so that your system first tries to boot from the USB and, if not found, then from other attached media.
3. Once you have created the storage media with the required software, you can boot up your system and install FreeNAS in the designated partition.
4. Having set the root password, when you boot into it after installation, you will have the option of using the Web GUI to log into the system. For some users, it might be much more intuitive to use this option as compared to the console-based login.
5. Using the GUI or console, you can configure and manage your storage options depending on your application(s).
Private cloud storage
Another recent trend is cloud storage, given the sudden reduction in free cloud storage offered by providers like Microsoft and Dropbox. Public clouds have multi-tenancy infrastructure and allow for great scalability and flexibility, abstracting away the complexities associated with deploying and maintaining hardware. For instance, the creators of Gluster recently came out with an open source project called Minio to provide this functionality to users. One of the services we will look at is ownCloud, a Dropbox alternative, that offers similar functionality, along with the advantage of being open source.
1. In order to build a private cloud, you require a server running an operating system such as Linux or Windows. ownCloud allows clients to be installed on such a Linux server.
2. While installing and running an Apache server on Linux, the up-load_max_filesize and post_max_filesize flags need to be updated to higher values than the default (2MB).
3. The system is required to have MySQL, PHP (5.4+), Apache, GD and cURL installed before proceeding with the ownCloud installation. Further, a database must be created with privileges granted to a new user.
4. Once the system is set up, proceed with downloading the ownCloud files and extract them to /var/www/ownCloud.
5. Change the Apache virtual host to point to this ownCloud directory by modifying the document root in /etc/apache2/sites-available/000-default.conf to /var/www/ownCloud.
6. Finally, type in the IP address of the server in your browser and you should be able to arrive at the login screen.
While there are trade-offs between cloud-based storage and traditional means of storage, the former is a highly flexible, simplified and secure model of data storage. And with the providers offering more control over deployments, private clouds may well be the main file storage options in the near future!
The author has worked at Microsoft Research, CERN and startups in AI and cyber security. He is an open source enthusiast who enjoys spending time organising software development workshops for school and college students. You can contact him at https://www.linkedin.com/in/swapneelm; https://github.com/SwapneelM or http://www.ccdev.in.