In an article that appeared in The Register on 15th March 2018, which was sponsored by Fujitsu, Trevor Pott writes that most disaster recovery (DR) systems ‘suck.’ Now, he points out that there are many of them on the market. However, he finds that they are either “maddening or ruinously expensive.” He also discovers that the storage industry is now focusing on ease of use, allowing new options to emerge. In fact, he claims that ease of use is becoming a product dimension again. Based on this, he believes that it will be easier for the storage industry to compete again as these new products appear.
To a degree, they sound like a step forward, but there is another issue. A number of organizations don’t have an adequate disaster recovery plan. Many think they have a plan, but very few have a creditable one and even fewer organizations have tested it. In this case, people should drop the term back-up and recovery because it should be solely about recovery. Backing up is really a small part of the recovery tool set. For years, I have been preaching that your backup process is all about your recovery requirements.
Imperfect world
Pott adds: “In a perfect world, organizations of all sizes would also hold disaster recovery dear, with both back-ups and disaster recovery being engaged in by all. Unfortunately, many organizations feel that disaster recovery is either too expensive, too time consuming or both. It's neglected and even forgotten.” Sadly, such neglect could lead to financial penalties, loss of customers, brand reputation and revenue. So, while it might seem “too expensive,” the expense of trying to recover from downtime or from a data breach can be even more expensive. It’s therefore wise to plan ahead for business and service continuity to ensure that all commercial operations remain undisrupted.
I agree with Pott. Much of the back-up-as-a-Service (BUaaS) industry is focusing on how to make it easier to back up to the cloud via a cloud gateway. Organizations are under a massive illusion that once there, their backing up process is complete – meaning that they feel they are now safe. Yet, many of these appliances use deduplication to compress the files down ready to transport the data to the cloud. Until that process is completed and the data is in the cloud – their backup is incomplete.
What is worst is that, without WAN data acceleration solutions such as PORTrockIT, the recovery times from the cloud aren’t that great. Restoring deduplicated files is painfully slow. Yet, this is a time when you need speed. This issue is compounded if you have a slow, high latency WAN link with packet loss. This is something that is skipped over in many storage vendor and BUaaS presentations.
Having worked many years with a myriad of software tools and instruments in my career, I have concluded that if a tool is too complicated and confusing to use, people will avoid using it. In some circumstances, they circumnavigate around it. A tool should be something you want to use to make your life easier and better.
Another problem with software, such as a backup program, is the lack of constant use and therefore being unfamiliar with the product. Let’s face it, once it is set up organizations should only be looking for the recovery or exceptions. The lack of use is counter-intuitive because the people using the solution often have to re-learn it over and over again. However, there are some providers out there that have taken this onboard; they have re-designed their product from a user’s prospective and not from a developer’s prospective.
WAN-Op dependency
Many organizations also depend on technologies such as WAN optimization and SD-WAN to ensure that they can, whenever they do have some back-ups in place, recover from any kind of disaster that impacts on both their service and business continuity. Yet, these technologies, including SD-WANs on their own, don’t often live up to expectations. Even SD-WANs ideally need to add a WAN data acceleration overlay to mitigate latency and packet loss over Wide Area Networks (WANs), and so they must become part of any disaster recovery toolkit.
So, to what extent though is Pott looking at the cost of disaster recovery wrongly? Well, Potts is cutting across the three subjects of BUaaS, business continuity and disaster recovery. Back-up-as-a-Service has two differing requirements: The first is to create a data set for disaster recovery, and then a data set for day-to-day saving of files that may require restoring; next is business continuity, and people some think that this is the same as disaster recovery, but it is distinctively different.
Synchronized arrays
Business Continuity is all about continuing to conduct business without a break in the face of a service or system going offline and the workloads are switched to a system. This is where most companies use the synchronized arrays either in the same data center or geographically separated data centers. Depending on the type of synchronicity, this will depend on how far apart these data centers can be from each other.
The arrays must be instep like trading platforms are, and they use synchronous replications with a limit of 2ms of latency. This can be quite close - too close for real disaster recovery facility. Where the data on the two arrays can be just a little out of step with each other, organizations can make use of asynchronous replication. When it is combined with WAN data acceleration to maximize the performance of the WAN, the distance can be extended to thousands of miles apart with the data being only the latency of the WAN out of sync
Optimal DR sites
Having two data centers in synchronicity with each other is not necessarily a disaster recovery solution. What happens when you are hit with a cyber-attack that corrupts all your data across both sites? This is why you need a disaster recovery plan to be in place whenever a disaster occurs.
As part of this plan, DR data should be in three separate places - not two, as Pott advises. If you look at non-stop systems, they always have three nodes. The other aspect of DR that organizations fail to appreciate the need for an “Air Gap” between your systems and your DR data. The cyber-criminals are getting clever at detecting online disk-based back-up systems and corrupting the back-ups as well. That last “air-gapped” copy is therefore your get out of cyberattack jail card.
Denial of access
The last area to address is total disaster, where you lose or are denied access to the data center for some reason. This is since most systems are based on virtualized hardware. These days, many companies are planning to use the cloud for data center DR, rather than the traditional DR facilities from the established players. The advantage of using the cloud as a DR facility is that it is a relatively low-cost option. You still have to overcome the problems of latency and packet loss affecting the performance of your WAN link into the cloud. Yet, duplicating data to two different cloud vendors and keeping this up to date means you improve your chances again of overcoming problems with cloud provider outages and choose the lowest cost option to run up your DR site.
Insurance policy
Disaster recovery, or more to the point service continuity, needn’t cost the earth. Investing in WAN data acceleration, including WAN data acceleration overlays for SD-WANs, as well as backing up data to at least disaster recovery sites could be your organization’s best insurance policy. People often feel there is no need for insurance until an event occurs that leaves that without cover, leading them to have to cover the costs of physical, emotional, financial and sometimes even reputational damage, caused by the disaster or incident.
This doesn’t mean that ease of use isn’t important. It is, but service continuity requires investment in a plan to keep one step ahead of any event that could be detrimental to the ability of an organization to continue to operate without disruption.