It is commonplace to hear the terms backup and archive mentioned together which is not surprising since both technologies support primary data storage. However, the commonalities end there. I often encounter users who imply that archive is analogous to backup. Simply put, backup and archive are not the same. In this post I will look at key differences between the two technologies.First, let's get SNIA's view on backup versus archive. According to SNIA's online dictionary, the terms are defined as follows:
Backup: A collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy.
Archive: A collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data.
SNIA states that backup and archive serve different purposes (recovery vs long-term preservation and retention) and so let's review three key differences between the two solutions.
1. The Data
Backup When backing up your data, you are protecting both active and inactive information which encompasses all of your production data. As part of the process, you are copying your vital information to a backup target such as disk or tape. It is critical to recognize that a backup is a copy of production information and the actual data still resides on the production storage systems. Thus, if your backup system suffers a catastrophic data loss, your operations could still continue normally since your production data would not be impacted; however, you would be operating at an elevated risk.
Archive Archive solutions solve a different problem. These technologies are typically used to maintaining older or inactive data for extended periods of time. Archive systems typically move older or inactive information off of primary storage to dedicated systems which are optimized for low cost long-term storage. A key differentiator from backup is that the data stored in an archive is actual production data and hence a loss of an archive system will result in permanent loss of production information. (To be fair, the information will likely be older and less active, but unlike backup, it is the only copy of the data.)
2. Access
Backup Backup applications have historically been optimized for large scale recoveries. Backup data is written in large blocks to dedicated hardware like tape libraries or deduplication appliances. This format is optimized for accelerated access to large volumes of information. Backup systems are often configured to protect not just individual data objects, but also application and OS files. You can restore objects of all sizes with a backup system, but the process is optimized for larger scale to recover a file often takes about the same amount of work as recovering an entire server. In short, a backup application is the right tool to use if you want to recover an application or a complete system.
Archive Archives are designed with very different access profiles. These systems typically store individual data objects such as files, databases or email messages and usually also capture metadata associated with each item. The result is that an archive can provide immediate granular access to stored information and so accessing an individual file or email is typically very easy in an archive system. (The metadata component can even include full content search which can further simplify access.) However, unlike backup systems, archives cannot provide full server or volume level recoveries since they typically only contain a subset if enterprise data.
3. Disaster Recovery
Backup Disaster Recovery (DR) is a core component of backup and most IT practitioners consider these two processes closely related. Typically, administrators run a backup job to protect their data and then another process to get their information offsite for DR purposes. (The offsite process typically includes either a copy to tape or a replication of backup data). These processes are mature and are typically automatically incorporated as a single stream-lined protection process.
Archive The process of maintaining archive system DR can be complex and costly. Many customers rely on replication functionality embedded inside the archive platform for DR. The challenge is that most replication implementations are proprietary and so organizations are required to purchase two identical and often costly archive systems one for the production environment and the other for the DR site. Furthermore, the ability to control replication, to rollback data to previous restore points and to manage bandwidth usage can varies widely depending on the archive system. In summary, the DR process for archive systems is very different from traditional DR.
In summary, backup and archive are two processes that solve very different problems. It is not uncommon to find customers using both in a complementary fashion. Backups are used as the primary method to protect corporate data and to enable large scale recoveries when needed. Archives, in contrast, enable cost effective retention and rapid access to important information for compliance or cost savings purposes. These two different use models have dictated very different design choices that have been made by backup and archive system providers. However, this delineation frequently confuses customers and I often find users who rely exclusively on backups as an archive. The unfortunate result is that accessing archive data becomes a complex and time-consuming process that still may not deliver what you need when you need it. Fortunately, Iron Mountain's Archival Tape Management solution can significantly ease this process; however, the best solution is










