In his blog last week, David Chapa of Enterprise Strategy Group (ESG) posed a question: “Is DR the new Backup?” This understandably got a lot of backup people excited and generated a lot of thoughtful discussion. David upped the ante this week with “Stop Backing up Data.” The main point he makes is about the need to focus on business protection and only backup data that is critical for protecting the business. However, the provocative title of his later blog had me thinking about a point of view in the industry that claims that backup as a separate process may not be necessary at all.
Like David, I too spent time at Cheyenne Software, and subsequently earned my livelihood from backup at CA, Mimosa and now Iron Mountain. The very mention of this topic is going to give my friends an aneurism and get them talking about naiveté, the invincibility of youth and so on.
I don’t know (didn’t get invited to Japan!) the context of the #HDSday notion tweeted by Andrew Reichman (@reichmanIT) of Forrester that “backup has to die because it’s such a high operational cost.” I think many will agree that Cloud backup can substantially reduce both the operational and infrastructure costs associated with on-premise backup. But let me take the point at its surface and explore if systems can be developed that eliminate backup.
Eliminating Backup for Exchange 2010
The foremost example of a system that is designed not to require backups is Microsoft Exchange 2010. For a business critical system, Exchange 2010 interestingly went against conventional wisdom in advocating two bold ideas:
- Use of direct attach Serial SCSI (SAS) or even SATA instead of requiring expensive SANs and Disk Arrays
- Using native mechanisms to protect Exchange and does not require thirdparty backup (or even Microsoft DPM) and the accompanying backup hardware and operational hassles
In hindsight, designing highly scalable storage systems using commodity DAS was similar to what Google, Facebook and others were doing and hence not as controversial, but we will leave that discussion for another time.
As a proof point, Microsoft IT has rolled out Exchange 2010 to support 515 office locations in 102 countries with more than 180,000 users. Their system eliminates backups completely (“Exchange Server 2010 Design and Architecture at Microsoft”). The key capabilities in Exchange 2010 that support high availability in Exchange 2010 are:
- Database Availability Groups (DAGs) with replication of databases (at least 3 copies across multiple locations) using log shipping (shameless plug –Iron Mountain NearPoint also uses log shipping for capture of data for backup and archiving)
- Lagged Database Copy (remote copy that lags behind by some specified period before a log is applied to that database)
- Single Item Recovery
- Deleted Item Retention Policies
While a majority of Exchange 2010 customers are backing up their systems using third-party solutions, there are some customers who seem to be comfortable adopting this backup-less approach. The feature that enables these customers to eliminate separate backup of their Exchange 2010 is the lagged data base copy feature that prevents corruption to be propagated to a copy if detected within the lag period. Plus a significant investment in an operational and support team.
Are there applications that are better suited to being Backup-less?
What if I don’t need to get back to an earlier point in time? Will replication suffice?
The challenge with replication is that it just as efficiently replicates errors and corruption as it replicates data. Applications with built-in asynchronous or time-delayed mechanisms for making copies across sites may be tempted to avoid backup, but this is fraught with risk and a lot will depend on the quality of the processes and people running operations.
The “Belts and Suspenders” approach–if the belt breaks, the suspenders will keep the pants on–of a separate backup process for protecting systems is a best practice that has provided confidence in system recoverability for decades. While this approach introduces an additional process that can be operationally expensive, having a separate and independent backup process improves recoverability. Redundant systems successfully address issues with hardware failures, but a software bug can result in system failure if the systems have identical software. Having systems with different software implementations or different models will protect you from these bugs.
A lot of things can go wrong – administrator mistakes, storage software bugs, propagation of corruption through replication, etc. An investment in an independent process for backup or disaster recovery gives you peace of mind for when that Black Swan event happens.
For applications with Write-Once-Read-Many (WORM) data characteristics and separate processes for disposition, one could imagine a system with replication that is backup-less. The expectation would be that a separate storage system can rigorously enforce the write-once capabilities. However a bug in the software enforcing the write-once storage or the disposition process can make data unrecoverable.
What about Tape?
The recent Gmail failure illustrates how replication can’t protect from storage software that has bugs and runs amok corrupting all the replicas. Also disks from the same batch (see “JBOD versus RAID”) that have problems can fail at the same time. Google had to ultimately go to offline tapes, a different media altogether, (“Gmail back soon for everyone”) to recover user email.
Tape is well suited for long term retention of data. If your need is to keep data for a long time for governance, compliance or discovery reasons, tape will be a cost-effective part of your recovery system. Tape is extremely durable. You can play Frisbee with tapes – and still hope to recover data from them!
Can backup or disaster recovery be built into a system? I would say yes. Would it require a different process, technology and people? Yes to that one too. What do you think?
As one commenter on David’s blog said, This decade will be interesting.