Disaster may strike anytime, anywhere. And when it does, all is lost for those who are not prepared for it. Disaster preparedness and recovery are basic elements of a well-run enterprise plan, especially one that is based on IT. In this article, the author shows how disaster recovery is different from a mere backup, and highlights a few FOSS disaster recovery tools.
Disaster recovery (DR) is about preparing for and recovering from a disaster. Any event that has a negative impact on a company’s business continuity or finances could be termed a disaster. This includes hardware or software failure, a network outage, a power outage, physical damage to a building like fire or floods, human error or some other significant event. To minimise the impact of a disaster, companies invest time and resources to plan and prepare for it, to train employees, and to document and update processes.
There are a few common industry terms related to disaster planning.
Recovery time objective (RTO) is the time it takes after a disruption to restore a business process to its service level, as defined by the operational level agreement (OLA). For example, if a disaster occurs at 12:00 p.m. (noon) and the RTO is eight hours, the DR process should restore the business process to the acceptable service level by 8:00 p.m.
Recovery point objective (RPO) refers to the acceptable amount of data loss measured in time. For example, if a disaster occurs at 12:00 p.m. (noon) and the RPO is one hour, the system should recover all data that was in the system before 11:00 a.m. Data loss will span only one hour, between 11:00 a.m. and 12:00 p.m. (noon).
Failover is the disaster recovery process of automatically offloading tasks to backup systems in a way that is seamless to users. You might fail over from your primary data centre to a secondary site, with redundant systems that are ready to take over immediately.
Failback is the disaster recovery process of switching back to the original systems. Once the disaster has passed and your primary data centre is up and running, you should be able to fail back seamlessly as well.
Restore is the process of transferring backup data to the primary system or data centre. The restore process is generally considered part of backup rather than disaster recovery.
A company typically decides on an acceptable RTO and RPO based on the financial impact to the business when systems are unavailable. The company determines the financial impact by considering many factors, such as the loss of business and damage to its reputation due to downtime and the lack of systems availability.
IT organisations then plan solutions to provide cost-effective system recovery based on the RPO and on the service level established by the RTO.
The key capabilities of an efficient IT disaster recovery plan should be:
- The life cycle of the recovery, covering the total provision, monitoring, validating, testing, recovery and reporting processes
- Close integration with the network, OS and replication technologies to enable the automation of the monitoring, discovery and recovery processes
- Close integration with a virtual infrastructure to ensure automation of virtual machines for recovery
- Application awareness to ensure best practice templates for various enterprise applications
A disaster recovery plan (DRP) and its importance
To ensure that your systems, data and personnel are protected, and that your business can continue to operate in the event of an actual emergency or disaster, companies can follow the guidelines given below to create a disaster plan.
Inventory hardware and software: Your DR plan should include a complete inventory of [hardware and] applications in priority order.
Define your tolerance for downtime and data loss: Evaluate what an acceptable recovery point objective (RPO) and recovery time objective (RTO) is for each set of applications.
Lay out who is responsible for what – and identify backup personnel: All disaster recovery plans should clearly define the key roles, responsibilities and parties involved during a DR event.
Create a communication plan: Communication is critical when responding to and recovering from any emergency, crisis or disaster.
Top open source FOSS backup and DR tools
AMANDA
AMANDA stands for Advanced Maryland Automatic Network Disk Archiver. It is a simple backup solution that allows the IT administrator to set up a single master backup server to back up multiple hosts over the network to media like tapes, hard disk drives, etc.
AMANDA uses native utilities and formats, and can back up a large number of servers and workstations running multiple versions of Linux or UNIX. It uses a native Windows client to back up Microsoft Windows desktops and servers.
Rsync
A case study of Rsync is given at https://www.osradar.com/use-rsync-to-set-up-remote-disaster-recovery-backup-system/. To ensure data security, an enterprise felt the need for a remote disaster system. So, every morning at 3.00 a.m., data from the site is backed up to a remote server using Rsync. Because the volume of data is very large, the daily backup can only be incremental. In the case of a failure on site, the backed up data can be restored, helping the enterprise to a great extent.
Box Backup
This is a totally automatic, open source online backup system, using which the data that is backed up gets stored as files on the server’s file system. No additional backup devices such as disk drives are needed (https://www.boxbackup.org/).
Burp
Burp is a network backup and restore program. It attempts to reduce network traffic and the amount of space that is used by each backup. There are two independent backup protocols to choose from.
Protocol 1: This optionally uses librsync. It is available in all Burp versions and is stable.
Protocol 2: This uses variable length chunking for inline deduplication and sparse indexing on the server side.
Burp uses VSS (Volume Shadow Copy Service) to make snapshots when backing up Windows computers (https://burp.grke.org/).
Duplicati
Duplicati is a backup client that securely stores encrypted, incremental, compressed backups on cloud storage servers and remote file servers. It provides various options and tweaks like filters, deletion rules as well as transfer and bandwidth options to run backups for specific purposes. It also supports the AES 256 encryption standards.
Bareos
Bareos is a network and client-server based backup tool consisting of computer programs that empower IT administrators to manage the verification, backup and recovery of data through a network of computers. Bareos can also be made to run on a single computer, backing up various kinds of media (https://www.bareos.org/en/).
BackupPC
BackupPC is a fully cross-platform tool that can run on all operating systems, and is designed for enterprise use. The provider supports full file compression and uses a small amount of disk space. Additionally, BackupPC doesn’t need any client-side software to run. The solution is freely available as open source on GitHub and SourceForge.
Clonezilla
Clonezilla is open source backup software from Symantec Ghost Corporate Edition, based on DRBL (diskless remote boot Linux). The primary mechanism of the solution includes image partition and partial cloning, as well as bare metal backup and recovery udpcast.
Two versions of Clonezilla are available — Clonezilla Live for single machine backup and restore, and Clonezilla SE for a server. The tool saves and restores only used blocks in the hard disk.
Back in Time
Back in Time is backup software designed for Linux, and is inspired by the Flyback Project. The solution offers a command line client as well as a GUI, both written in Python. In order to perform backups, users specify where to store snapshots, what folders to back up and the frequency of the backups. The solution is licensed with GPLv2.
Areca Backup
Areca Backup is an open source personal backup solution, released under the General Public License (GPL). Users may select a set of files or directories to back up, choose where and how they will be stored, and configure post-backup actions. The software also provides encryption, compression and splitting functionalities. Areca allows customers to use advanced backup modes, such as delta backup, or produce a basic copy of their source files as a standard directory or Zip archive. The solution is compatible with Linux and Windows (http://www.areca-backup.org/).
Disaster Recovery Linux Manager (DRLM)
DRLM provides disaster recovery to systems (desktops, servers, laptops, etc) using GNU/Linux, with standard tools and protocols. It is also used as a migration tool for all your system migrations. You can recover an entire system in a few minutes by just booting from the network, irrespective of whether your system is virtual or physical (https://drlm.org/).
Efficient disaster recovery management
DR infrastructure must be aligned to meet recovery objectives for the various applications. An effective disaster management plan integrates the processes of reporting, monitoring and testing. It also provides workflow automation, and makes it a scalable and user-friendly solution for the organisation while not compromising on industry standards.
Recovery must be managed across virtual, physical and cloud infrastructure. It must incorporate advanced technologies but be perfectly in line with the needs and resources of the organisation.