Tuesday, 3 April 2012

How to Backup Fault Tolerant VMs in vSphere 4

How to Backup Fault Tolerant VMs in vSphere 4

A major benefit of deploying VMware's vSphere 4 is the additional options it offers you for business continuity and disaster recovery, such as virtual machine level backups and high availability features. vSphere 4 Essentials Plus, Advanced, and Enterprise editions all include VMware Data Recovery

A more detailed explanation of the potential of vSphere based DR solutions is beyond the scope of this article, but all the products utilise VM snapshots to enable backing up of live VMs without affecting availability:
VMware Snapshots: Data Recovery Snapshot Based Backup
VMware Data Recovery Snapshot Based Backup
Another major feature in vSphere 4 Advanced and higher editions is Fault Tolerance, which is intended to eliminate VM downtime in the event of a host server failure by creating a live shadow instance of the VM on another host and keeping them in "lockstep" synchronisation. The effect is similar to clustering, but since it operates at the hypervisor level it does not require any special features at the VM software level. Again it is beyond the scope of this article to discuss the advantages and disadvantages of this technology, the important thing to note here is that FT enabled VMs cannot be snapshotted.

The Problem with Backing Up FT enabled VMs

As we have already seen, all the vSphere based VM backup solutions rely on snapshot technology to image live VMs, but Fault Tolerant VMs cannot be snapshotted which therefore precludes backing them up. Unfortunately, it seems that the first time many users discover this is when trying to run their first backup of a FT VM. Depending on their backup application, it will either not allow them to create the backup job in the first place, or the job will fail shortly after starting.
It seems that this situation was exacerbated by conflicting information from VMware when vSphere 4 was originally released, at one point they said that Fault Tolerant VMs would be allowed a single snapshot for backup purposes but this feature was not included in the released version of vSphere 4.0. Subsequently they implied that it would be enabled in a later update, but despite vSphere 4.1 including some major improvements to FT it appeared that they had given up on the single snapshot feature, at least until the next major version release.

The Solution

To be completely accurate, it is still not possible to backup FT enabled VMs. What is now the commonly accepted method is in fact a workaround, as it involves disabling FT in order to allow a snapshot to be created, and then re-enabling it once the backup has completed. This means that for the duration of the backup, the Virtual Machine will no longer have the benefit of FT protection, which may be a problem for some readers. Unfortunately if that is the case, then you will have to look at alternative backup methods, i.e., either running inside the VM or at the SAN level.
It is quite simple to test the process manually in order to establish that your backup application will be able to image an FT VM; you just need to turn off Fault Tolerance for that VM. Note that you have to "turn off" rather than "disable", otherwise it still won't allow a snapshot to be created.
vSphere Fault Tolerance: Turning Off to Create Snapshot

Turning Off Fault Tolerance

In your vSphere Client, right-click on the FT VM and select "Fault Tolerance", then "Turn Off Fault Tolerance" from the sub-menu. The "Disable Fault Tolerance" option will only stop the lockstep synchronisation, but leaves the secondary VM in place. "Turn Off" will actually disable lockstep sync and then remove the secondary VM completely, leaving you with a normal VM that can be snapshotted as usual.
Now start your backup running and if you watch the bottom "Tasks" section in the vSphere Client, you should see the snapshot being created by the backup application. Once the backup has completed, the snapshot should be removed. Once that is done you can right-click the VM again and select the "Turn On Fault Tolerance". This might take a few minutes as it has to create a secondary VM and bring it into lockstep sync with the primary, exactly the same process as when you originally enabled FT.
This process obviously isn't a practical solution for regular scheduled backups. However, it does demonstrate the steps required to allow backups of any Fault Tolerant Virtual Machine, and it will give you an idea of the potential hazards involved. The main problem has already been mentioned; the VM will not be FT protected during the backup process so a host hardware problem would cause downtime, and an error whilst turning FT on or off could leave the VM unprotected, requiring manual intervention. On a more positive note though, in my experience such problems are very rare and are unlikely to cause downtime by themselves, and whilst the VM lacks FT protection, it will still have vSphere High Availability. Therefore, should the host fail, the VM will be started on another host. It will have undergone a "dirty shutdown" so there may be some data loss or even corruption and a short period of downtime, all of which illustrates quite neatly why Fault Tolerance was an attractive option in the first place!

Automating the Procedure

Fortunately vSphere has comprehensive scripting support, allowing for the automation of any process achievable via the vSphere Client GUI, so we can use this to turn Fault Tolerance off and on when required. vSphere scripts are written in Perl, but don't worry if you have no experience of using that - . The instructions below will show you how to implement it. However if you are using a third party application such as Veeam and have an active support agreement with the vendor, then you should contact them first to see if they have their own solution.
  1. To use any scripts you first of all need to install the vSphere CLI (Command Line Interface), which you can download from the VMware website - browse to the "Downloads" section and select vSphere4, then click the "Drivers & Tools" tab. Expand the "Automation Tools & SDKs" section then download the version of the vSphere CLI that matches your vCenter installation, note that you want the standard CLI not the "PowerCLI" (the PowerCLI has similar functionality but integrates with the Windows PowerShell):
    vSphere CLI Download
  2. Once you have downloaded the CLI, run the file to install it. In theory, you can run the CLI on any Windows system with a network connection to your vCenter Server, but in practice it will be much easier to install it on the same system as your VM backup application. For these instructions I have assumed you will install it to the default recommended folder location, but if you choose a custom folder then just change subsequent file paths appropriately.
  3. Now download the FTcli2.pl script file from http://communities.vmware.com/docs/DOC-10279 and save it to the C:\Program Files (x86)\VMware\VMware vSphere CLI\perl\bin folder.
  4. If you open the Start - Programs - VMware menu you should see an entry for the "VMware vSphere CLI", with just a Command prompt icon in it. Clicking this will indeed just open a standard command prompt, but with the location changed to the vSphere CLI installation folder. At this point to simplify things in future I would recommend adding the  C:\Program Files (x86)\VMware\VMware vSphere CLI\perl\bin folder to the default folder paths list
  5. With the path set, you should now find you can execute your scripts from a standard Command Prompt. Try entering FTcli2.pl /? in order to see the online help listing all the options for this script. You will see that you can specify explicit or passthrough authentication. We will assume you are running it on your vCenter Server with sufficient privileges to use passthrough.
  6. Now we need to test running the script with the required options to first turn off FT, and then to turn it back on again. It would be a good idea to create a test FT VM for testing these procedures rather than using a live production VM, just in case. The command to turn off FT should be something like this: ftCLI2.pl --server vcenter.domain.local --passthroughauth --operation stop --vmname MyTestFTvm , where vcenter.domain.local is the FQDN (or IP) of your vCenter Server. Enter that command at the prompt and run it, it should return some progress information, you should see the task appear in the vSphere Client, and the Fault Tolerance will be turned off for that VM.
  7. In the event that the script fails to turn off FT, then the output in the Command Prompt window, or the Task Status in the vSphere Client will usually give a good indication of the cause of the problem. You may also add the --verbose option to the command which should make it return more detailed error messages.
  8. The command to turn on FT should be identical to the turn off command, except with --operation create instead, so now you should be able make a test VM Fault Tolerant and then remove FT again afterwards.
  9. In order to use these new script commands effectively, they need to be coordinated with the backup application. To facilitate this, you should create two batch files; open Notepad and enter your commands, starting each separate command on a new line like this:Enable Fault Tolerance
  10. The cd C:\Program Files (x86)\.... line should not be necessary if you have already added the folder location to your default system paths list, but it won't do any harm to include it anyway. This example is for turning on FT as it contains the --operation create option.
  11. Now save the file to a suitable location, making sure you change the "Save as type" to "All files" and include a .bat extension at the end of the file name. This tells Windows that it is a batch file, so the commands in it should be executed:Enable Fault Tolerance
  12. Repeat steps 9-11 but replace create with stop in order to create a batch file to turn off FT for your VM.
  13. Now you have your two batch files. In the future, all you should have to do is change the --vmname MyTestFTvm option to match the name of your FT VM as shown in the vSphere Client.

Scheduling your Fault Tolerant VM Backups

The first step for every scheduled FT VM backup needs to be turning off Fault Tolerance, so you can use the Windows Task Scheduler to create a scheduled task to run your batch file at the appropriate time. The Windows Task Scheduler in Windows 2008 Server is quite different to use from the old Windows 2003 Server version

Note that when you create your task, you can specify what Windows user account it should run under. If you are using the passthrough authentication option, it is essential that you specify an account that has sufficient rights on your vCenter Server to change the Fault Tolerance settings for the VM. Configure the task to run at a suitable time and frequency for your backup schedule.
Next you need to create a backup job for your FT VM, just like any other VM backup, but you should schedule it to run at a suitable period after the "Turn Off FT" task to ensure it has time to complete that before starting the backup, otherwise it will fail. Usually I find allowing a delay of 15 minutes is ample, but you should be able to confirm what is best for your system with some testing. Setting the time for reactivating Fault Tolerance is harder because the duration of the backup job may be quite variable from day to day - set it too early and the task will fail, whilst making it too late will leave your VM unprotected for longer than necessary. The best option, which most backup applications support, is to use the option in the backup job properties to run a command after the job has completed:
Enable Fault Tolerance Backup Job


Although it sounds like a complex and significant set of tasks to be running on a perhaps nightly basis it does in fact usually turn out to be a reliable procedure once setup and the schedules established. The main disadvantage has already been highlighted - the Fault Tolerance protection has to be turned off in order to backup the VM, which increases the risk of downtime during the backup window. Depending on the role of the VM in question this may or may not be an issue, but with some planning it should be possible to minimise the risk. For example, by combining a weekly VM level backup with a daily OS level backup.  

No comments:

Post a Comment