Why Are You Copying Your Data? Part 1 – Bad Things Happen

Enterprise IT organizations like to have multiple copies of every piece of data, but every copy we store has a cost. It is vital that you know why you are making a copy of your data and choose the right place and product to store that copy. Traditionally we made copies of data because bad things could happen, I will focus on that in this post. There are a few different categories of ways that things can go wrong, with varying requirements for the data copies. I will also talk about the good things that can happen when you make copies of data, that will be another post. There are also considerations when you want to use a single platform for all of your data copying, that may end up being another blog post too.

Disclosure: This post is part of my work with Cohesity.

People Make Mistakes

The top reason for making a copy of data is to enable recovery of that data after a human error; this is the backup use case for copying data. We back up our data (or virtual machines) regularly to be able to restore files or VMs after someone accidentally deletes or corrupts the original. When restore time comes, we need to be able to select from a list of protected objects—files and VMs—to restore the broken element. Restores are usually for a tiny percentage of the protected items, so a well-indexed copy is essential. Restores are usually also urgent, so an immediately accessible copy is crucial. The final point is that restores are almost always from a recent copy, generally within the last seven days, and a particular recent copy.

Restores require data copies that are recent, indexed, and that is rapidly and granularly accessible.

Big Bad Stuff Happens

Sometimes the bad thing that happens is more significant, maybe an act of nature such as a flood or hurricane, and the entire data center is knocked out. The primary copy of our data is inside that data center, so we need to have a copy in another data center which we can use in place of the damaged data center. This is the Disaster Recovery (DR) use case for data copies. The data copy for DR needs to be off-site, outside the primary data center. Only the most recent copy needs to be kept for DR, there is no sense in recovering to last week’s data. The DR copy also needs to be in a state where we can recover a large number of workloads very fast, maybe the whole data center worth of VMs that need to be recovered in six hours. Usually, DR data copies need to be on fast storage that can be used as production storage for the recovery. Most often the DR data copy is made by replicating the production data to an identical storage platform, and usually, this is done with the built-in storage platform tools. It might be storage array replication or the replication in an HCI.

Disaster Recovery requires the latest copy of data that is held off-site and rapidly restorable in bulk.

People are Bad

There are times when we must prove that a bad thing did or did not happen. Maybe proving whether a staff member made an inappropriate comment in an email or finding the (possibly deleted) financial records of fraudulent activity. Perhaps our company is accused of these things and need to find the documents that prove us innocent. Archives are copies of data from a past time that we can use to know what happened. Archives are held for a long time, usually between seven and 120 years depending on the industry and legal requirements. Archives must also be immutable; the archive copy should be guaranteed to be unchanged since the moment it was created. Archives are usually less frequent than backups, often only weekly or monthly copies are required for archives. Similar to backups, recovery from archives tends to be a specific small subset, but unlike restore from backups, there is less time pressure for recovery from archives. When the time comes to show proof, it is easier if each item can be shown, rather than having to restore entire applications or VMs.

Archives require many complete and protected copies of data at specific times, which are indexed or indexable and granularly recoverable.

Why are you copying your data?

We can help protect our businesses from bad things by using data copies in the form of backups, DR, and archives. These are all data copies that we must have and that we wish we never need to use. In the second blog post about copying data, I will look at some data copies that we want to have because they enable positive activities.

© 2019, Alastair. All rights reserved.

About Alastair

I am a professional geek, working in IT Infrastructure. Mostly I help to communicate and educate around the use of current technology and the direction of future technologies.
This entry was posted in General. Bookmark the permalink.