Thursday, June 17, 2010

Backup: a practical point of view

In people's mind, a backup could mean a whole bunch of files to be copied and pasted somewhere. This is partly true. We can copy file-by-file or bit-by-bit. It really depends on the level of backup operation.

Here I would like to sum up a few points so hope the people around may know more about the backup operations of a system.

There are three kinds of backup operations to be carried out in order to keep the system running well. What it means by running well is that the system is running with practically up-to-date data, managed application source code and working opearting system files.

Data backup makes sure the data collected by the system is practically up-to-date. The backup target is specific to the master database server. To fulfil a high level of service uptime, the backup operation is carried out in the offline secondary (slave) database server which synchronizes the data with online master database. Carrying out such data backup operation will lead to a drop of overall system performance so we need to schedule it carefully and avoid any peak time of service demands. The principle is keeping minimum but most recent copies of backup as you can.

Backup of application source code can be done through using SVN server which provides the version control over the source code and maintains different versions of the source code among Production server, Test server, and Development workstation. It makes sure the source code can be reverted to a particular version on Production server at any time, assuming that this particular version of source code to be compatible with the current structure of database.

Backup of opearting system files keeps up-to-date software and configuration settings on currently workable state of Production server. It mainly includes software update, custom configuration settings and scheduled tasks for maintaining service uptime and data backup operation. This is extremely useful when a disaster recovery is not triggered by a minor incident at the data centre. The reason is that disaster recovery is defined as fire, flooding and physical damage to the equipment in the data centre. Lower level of disaster like incorrect system configuration and third-party software installation would not trigger such recovery process but can lead to a corrupted system state which makes Production server non-workable. In this case, the data centre has no choice but retrieves the initial image copy of the server which is out-dated and needs to be applied all the patches, settings and scheduled tasks again to make it workable. Regularly taking system snapshots of Production server can shorten the time to recover the service. In a virtual environment like VMware, the snapshots of system state can be taken in a regular schedule while Production server is running. The reverting process to previous system state is quite handy and can be efficiently handled by any VMware administrator. This is part of business continuity features by using virtualization technology. To meet a requirement of business continuity, the system administrator should schedule the tasks to take system snapshot of Production server at regular period. It ensure that a reasonable system state can be recovered in case of the aforementioned incidents.

All those backups ensure that the system state can be kept practically up-to-date after the recovery from most incidents.


No comments:

Post a Comment