It is not uncommon to see many different technologies in the data center including disk, tape, SSD and even optical. Each of these solutions provides unique benefits from a performance, reliability and availability perspective. However, the one element that frustrates me is when vendors (typically disk-centric ones) point to tape as a technology that is outdated and unreliable. It is clear that all technologies have their strengths and weaknesses, but reliability is not typically one associated with tape. In this blog post, I will walk through four reasons why tape is more reliable than disk.
1. Bit Error Rate
Bit error rate (BER) is a concept that sounds complex, but in reality isn’t. It is simply a measure of the total of number of expected erroneous bits as compared to the total number of bits received. In the most simplistic terms, it quantifies the likelihood of a faulty bit as a percent of total number of bits written. The table below is taken from the LTO website and highlights how the BER of tape compares to that of disk storage.
As you can see tape has at least a 10x improvement in BER versus the most expensive enterprise disks. However, since LTO is normally used for backup and archiving, SATA has historically been the disk alternative of choice and in this case tape’s BER is 100x better! In summary, the likelihood of tape writing an incorrect bit is between 10x and 100x better than an equivalent disk drive.
2. Read After Write
There is a risk with any medium that what you want to write does not actually get written. Why would this “silent corruption” occur? Well, it could be caused by a number of things, but is often related to an unexpected error in the storage hardware. LTO tape incorporates technology to directly address this risk. As described in this primer, LTO drives natively re-read data as soon as it is written to validate accuracy. This is powerful because this error checking occurs with every write without any need for user intervention. Furthermore, any errors experienced are addressed in real time. If you compare this to disk arrays, the situation gets murky; each disk vendor has a unique RAID algorithm which may or may not include verification. Some arrays may go back at a later time to “scrub” the disk for errors, but the delayed analysis misses the real time verification and correction that LTO provides. Simply put, when you write data to LTO you can be confident that the tape is accurately storing your information.
3. Data Storage is Abstracted from Data Access
A hard drive is a self-contained device that includes data storage (magnetic platters where information is physically stored) and access (drive heads that move to read/write information and motors to spin the platters and move the heads). A disk drive is presented to the world as a simple block device (through SATA, SAS or FC, typically), but this simplicity masks the complexity of the underlying technology and microscopic tolerances required when manufacturing these items. All of the components inside the hard drive must operate in perfect harmony to access stored information and if any of them fails then all the stored data on the medium is inaccessible. A corrupted drive can be fixed, but the process is extremely complex, time consuming and costly and so most users bypass drive recovery and instead rely on RAID technology to solve the problem.
Tape is different because it separates data storage from data access. In the tape world, data is stored on removable tape media such as LTO-6, but the information is only readable when inserted into a tape drive. Hence data access is separated from storage. This separation is noteworthy because it changes the reliability dynamics since a failed tape drive can easily be replaced while the underlying data remains intact. It also brings interesting compatibility benefits because you can upgrade the data access device (tape drives) separately from the underlying data (the tapes). LTO enables this use case by providing 3 generations of LTO format backwards compatibility (e.g. an LTO-5 drive can read data from LTO-5, LTO-4 and LTO-3 media.)
4. Bit Rot
Extended data retention is the norm in today’s IT environments. Yet as retention extends, users need to be cognizant of the consistent readability of their storage medium. Bit rot refers to the gradual degradation of magnetic media that can happen over time and can result in data corruption. Noted industry Analyst W. Curtis Preston discusses this concept in a blog post entitled Tape more reliable than disk for long term storage. The long and short is that all magnetic media face the risk of bit rot, but disk faces a higher risk than tape due its smaller magnetic particles and higher operating temperature. Thus customers can be more confident of the long-term viability of data stored on tape versus disk. Additionally, this formula also helps explain why LTO tapes are stated to last 30 years while the average disk subsystem is replaced every five years.
In summary, disk and tape are complementary technologies that provide strong benefits. Disk excels when it comes to random read and write performance which is why most customers use it as a target for near-term backup and recovery operations. However, it is also clear that tape brings substantial benefits as well and is the optimal medium for longer-term data storage and preservation. As a user, I encourage you to think carefully about the technology you are using to ensure that you are using the right technology to solve your business challenges.
Chart source: LTO Website