Issue link: https://epubs.iltanet.org/i/4636
www.iltanet.org Infrastructure Technologies 13 revisiting disaster recovery in a virtualized environment making it possible for the document to be added to the BC plan without any additional work. DR TESTING One of the key tenets of a DR plan is regular testing, which helps to ensure the plan performs as anticipated in the event of a true DR situation. However, with a near 24-hour operation at Davis, it would be a challenge to shut down primary systems and test the failover process on any regular basis. SRM allows for the running of DR plans in test mode. Due to the tight integration of SRM with the SAN, NetApp's FlexClone is required to implement this feature. FlexClone provides a near instantaneous writable clone of a volume without the need for additional disk space. This allows access to copies of the replicated VMs, while leaving the original copies untouched and in their read-only state. Coupled with VMware virtual switches connected to an isolated VLAN, a fully usable test "bubble" can be created in real time. Virtualized Windows XP machines are added to the test network to test the results of the recovery plan. Because this is a fully isolated network, this testing can occur during business hours without affecting the production system. BENEFITS The full integration of VMware and NetApp products simplifies the backup, replication and restoration of VMs in terms of both complexity and execution time. The ease and nonintrusiveness of testing allows for frequent confirmation of an operational DR plan by nontechnical individuals. Having only two vendors provide the end-to-end solution also simplifies the troubleshooting when the plan is not operating as expected and minimizes potential finger-pointing on the part of the vendors. CHALLENGES Typical DR plans focus on a quick cutover to the DR systems; however, the process of failing back to the primary site is often complex and time-consuming. While on the roadmap, there is currently no automated nor non-disruptive process for failback to the primary data center. Servers that are now running at the DR site must be individually taken offline, replication must be reversed and servers put back into production at the primary site. LESSONS LEARNED AND SHARED We discovered a number of potential issues while going through our process that others can avoid with a few simple precautions. • Have a clear understanding of each tool used in the DR plan and the tools' respective requirements and limitations before designing the DR infrastructure • Develop the entire IT DR plan before investing in storage • Plan for small and large DR scenarios • Ensure you factor in "usable" capacity when determining SAN storage requirements • Ensure any SAN can easily scale storage in a cost-effective manner