For automated testing and execution of disaster recovery procedures in a VMware infrastructure VMware Site Recovery Manager (SRM) has been for long the only available option. But a relative young company called VirtualSharp Software based in the USA with Spanish roots is going to challenge VMware SRM with a product called ReliableDR. While SRM verifies the recoverability of virtual machines VirtualSharp Software’s solution verifies the recoverability at the application level.
Recovery Point and Time Assurance
This is what the business wants: a recoverable IT-service which is recovered within the agreed Recovery Time Objective (RTO) and maximum data loss according the Recovery Point Objective (RPO). The current backup tools, snapshot and replication technologies create a copy of the production data. But if that production data is corrupt, or has configuration errors it will be replicated to the secondary site. Two disasters for the price of one if that corrupted data needs to be recovered!
VirtualSharp Software developed an automated disaster recovery solution for a Spanish insurance company in 2007. Two years later this evoled in a commercial product which is now in 2011 at version 2.5. Around 25 to 30 customers worldwide are using ReliableDR to verify that agreed Recovery Point Objectives and Recovery Time Objectives can actually be achieved. VirtualSharp Software rather speaks about Recovery Time Assurance and Recovery Point Assurance. Customers using ReliableDR can be assured applications can be made operational complying to the RTO and RPO as agreed in Service Level Agreements.
The current 2.5 version of ReliableDR works with VMware ESX3.5, 4.0 and 4.1 infrastructures. Support for Hyper-V is under development and expected to be released later this year. The ReliableDR software is installed on a Windows Server 2008 virtual machine running in the secondary site. This server acts as an orchestrator for testing and failover.
At setup, the ReliableDR software is connected to the vCenter Server running in the protected site. The virtual machines running in the protected site are grouped into IT-services. An IT-service is a set of virtual machines which are dependent of each other and deliver a service/application to the business. Think about a webserver, application server and a database server which deliver the companies website. Then a runbook is created which defines startup order of the vm’s and the scripts (business rules) used for application level testing.
Supported storage arrays
ReliableDR supports a lot of storage arrays. A requirement for support is that the array is able to create snapshots and has an API to communicate with. All common arrays of HP, Dell, IBM, EMC, NetApp etc are supported. VirtualSharp Software develops PowerShell scripts (Storage Adapters) which communicate with the storage array. For Site Recovery Manager the storage vendor needs to develop the Storage Replication Adapter (SRA) which is not based on open standards. A storage adapter in use by ReliableDR could be adjusted by the customer if needed.
The unique ability of ReliableDR is the ability to verify if an IT-service can be restored according RTO and RPO should a disaster occur. It not only checks if a server is operational by verifying the VMware heartbeat and pinging the operating system. It also runs so called Business Rules inside the virtual machine which checks if the IT-service/application is operational. VirtualSharp Software supplies a set of predefined business rules which are scripts to test common applications like SQL Server, Exchange, IIS, Sharepoint, mySQL, Apache, Oracle etc. It is also possible to create your own business rules. The results of the test are shown on a dashboard, can be sent by emails or via a snmp trap. The image below shows the dashboard.
During the runtime of the test the time is measured to determine if the recovery is finished in the available time defined in the RTO. If the tests are okay, the snapshot is marked as a Certified Recovery Point. Recovery verification jobs can be scheduled to run every hour, every day or whatever schedule is needed.
ReliableDR cannot be used for a failback once the protected site has been restored to normal operation. VirtualSharp Software believes this should be a manual procedure and documents how to do that
-ReliableDR can be made operational in a single day for 3 services
-the IP-configuration of virtual machines can be automatically adjusted when the recovery site has a different IP-subnet than the protected site.
-ReliableDR can be used for any edition of VMware vSphere including Esssentials and Essentials Plus. VMware SRM does not support the two cheapest editions of vSphere. This enables the use of a passive site which has a limited number of ESX hosts installed with Essentials edition for testing and to be able to host business critical applications during a disaster recovery.
-ReliableDR supports testing of applications running on physical servers as well. Novell PlateSpin Protect is used to create a virtual machine using P2V of the physical server. Protect is licensed separately.
For a demo of the working of ReliableDR see this movie.
Editions, pricing and licensing
ReliableDR is available in two editions: ReliableDR Foundation and ReliableDR Enterprise. The Foundation edition can only be used with one snapshot and does not have automated test scripts (Business Rules). ReliableDR is licensed on the number of protected vm’s and can be purchased in batches of 10, 25, 50 or 100 vm’s. Pricing is around the same as for VMware SRM and will be roughly between EURO 350 tot EURO 400 per virtual machine. The more licenses are purchased the lower the price will be. Support contracts for 1 to 3 years for 24×7 support are available.