Continuity Software Sets The Bar High For Disaster Recovery Testing

After a major IT operational problem or even a disaster, getting applications working again with the data they need is vital to any enterprise (public, private or government). Yet how many enterprises could honestly raise their right hands in response to the question: Are you very confident in your business continuity plan? Those enterprises that use Continuity Software's RecoverGuard solution (and maybe a few other enterprise solutions) could honestly answer in the affirmative.

David Hill

September 23, 2010

6 Min Read
Network Computing logo

After a major IT operational problem or even a disaster, getting applications working again with the data they need is vital to any public, private or government enterprise. Yet how many enterprises could honestly raise their right hands in response to the question, are you very confident in your business continuity plan? Those enterprises that use Continuity Software's RecoverGuard solution (and maybe a few other enterprise solutions) could honestly answer yes.

Enterprises that do not adequately protect themselves are running a high risk financially. Even though the probability of a disaster is relatively low, the costs associated with disaster can be astronomically high, so the expected value--multiplying the expected probability by the expected cost--is likely to be substantial. Yet, even if the expected value is not sky high, enterprises cannot take a bet that places their business at risk. Auditors should mandate effective disaster recovery planning, not only for risk management, but also for compliance reasons.

Still, even IT organizations with the monetary and people resources to perform high-availability (HA) and disaster recovery (DR) testing face numerous challenges. Testing tends to be very intensive and manual and can also be disruptive to online applications. Even annual or quarterly testing is likely to produce disturbing results as failure rates are likely to be high. Continuity Software reports a 75 percent testing failure rate as recovery configurations are "out of sync" with their production configurations. That sounds high, but I suspect that it is fairly accurate.

The reason for this is simple: IT infrastructures are very complicated. To implement high-availability (HA) at a local operational site requires necessary redundancies through technologies, such as RAID, SAN multi-pathing and clustering. Then for disaster recover (DR), geographical redundancy requires a remote site that has to have the necessary replication and fail-over capabilities.

That sounds fine, and on day one of getting everything to work properly, it probably is. However, IT solution architectures are dynamic, not static. The production environment at the local site constantly changes, such as for new applications, the need to re-provision storage or the creation of virtual machines (VMs) on physical servers. These constant production environment changes are manually applied to both the local production HA systems, as well as to the remote DR systems.It's no surprise that some of the changes at the local site are not properly applied at remote sites. Consequently, it should come as no surprise, either, when recovery testing fails. Even if inconsistencies are found and fixed, if a change is made to the production environment the next day, week or month that is not replicated appropriately, the DR/HA strategy is rendered virtually useless, once again.

For companies struggling with these issues, Continuity Software's RecoverGuard rides to the rescue. It automatically scans and detects HA/DR-affecting configuration changes and identifies HA/DR vulnerabilities before failure. RecoverGuard does its work by matching what Continuity calls "risk signatures" to the type of problems that can occur. How complex is this process? RecoverGuard proactively scans for over 4,000 risk signatures. That level of detail means that problems can often be found right away rather than remaining hidden until the next recovery testing cycle, which could be as long as a year. Naturally, those signatures apply to a wide, heterogeneous range of IT infrastructure products.

Continuity's RecoverGuard includes other valuable capabilities. For ease of use, there is a recovery and availability dashboard to let the IT administrator know what is going on. The "analytical doors" offer insight into gap analysis, which contrasts the difference between what is and what should be. For example, RecoverGuard can reveal incomplete or inconsistent replication of production data to the disaster recovery (DR) site, invalid cluster configurations, drifts between the configuration or production and DR servers and much more. Should a failure occur, this shortcoming would result in a data loss if the DR site becomes the official site for production applications to run against.

Another sophisticated RecoverGuard feature is detecting replication age inconsistency where point-in-time copies are inconsistent between the production and DR systems, which would result in data corruption. In addition, in the latest version of RecoverGuard, Continuity Software added new service level agreement (SLA) management capabilities, such as the capability to track violations.

Enterprises can't rely upon service providers to Implement RecoverGuard. Companies that can benefit most from RecoverGuard tend to have large, topologically complex IT infrastructures. These organizations should be able to derive enough value to justify the cost of the software, as well as having the internal IT administration resources and skills to implement the product. Having those personnel resources is also critical as third-party service providers may be reluctant to provide RecoverGuard as a service.Now on the surface, RecoverGuard would seem to be the ideal product for a third-party IT services firm to employ. Their experience and knowledge could be used to provide RecoverGuard in numerous scenarios as part of an ad hoc, ongoing service or as a transfer of knowledge service, which trains the customer how to take over responsibility for running the product.

The challenge lies in developing a business model that satisfies Continuity Software, as well as the third-party service supplier, while providing a realistic price for the customer. Not surprisingly, Continuity Software wants to receive what it considers fair compensation for the value it provides, but that does not necessarily fit into the business model of service providers that want to receive the lion's share of revenues even though innovative software is a critical part of the deal.

In a heterogeneous world where products in the IT infrastructure are not limited to a very narrow set of vendors, Continuity Software seems to have the broadest solution to apply in the HA/DR world. This point is likely to become even more important as virtualization and cloud computing continue to dominate the conversation in the IT world, and as the need to provide robustness and reliability through the elimination of configuration errors become ever more necessary. Why so? In virtualized IT environments and cloud infrastructures, hidden IT layers and dependencies mean that configuration errors can more easily happen.

Since IT-as-a-service is the endpoint promulgated by VMware and other virtualization and cloud leaders, capabilities, such as those provided by RecoverGuard, will become more of a necessity than a nicety. So even though Continuity Software's nominal goal is to support recovery testing that results in no failures, the deeper impact is to enable clients' IT operations to run more smoothly and easily recover from both small and big problems. That scenario will certainly make IT happy, but it will also satisfy end users and the greater organization as IT lives up to its SLA commitments. At the date of posting, Continuity Software is not a client of the Mesabi Group or David Hill

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights