There is definitely a divide between what is possible with on-premises IT infrastructure and what is possible with public cloud services. On-premises infrastructure is finite, dedicated, under our direct control, and paid for up front even if we don’t use it. Public cloud infrastructure is effectively infinite but must be shared with other tenants of the public cloud. We have limited control of public cloud infrastructure, but we only need to pay for what we actually use. These differences mean that most organizations are using a hybrid cloud approach, some IT on-premises and some from one or more public clouds. One of the first infrastructure elements that are outsourced to the public cloud is cold data storage, and it was no coincidence that S3 was one of the first services that AWS offered. The two usual initial adoption models are tape replacement and tiering. Both these adoptions models treat cloud storage as an extension of on-premises backup storage.
Hierarchy of storage media
We usually think of a hierarchy of storage media technologies. RAM and NVRAM are very small, very fast, and very expensive while fast solid-state drives are moderate-sized, fast, and expensive. Large SSDs are typically like hard-drives: high capacity and relatively slow, large SSDs are moderately priced while large hard drives are cheap. Usually, the largest and slowest storage medium is tape, it is usually the cheapest per gigabyte of storage. Cloud storage adds another layer to the hierarchy, it is effectively infinitely large and extremely cheap but can be very slow to access. Cloud storage can often be used in place of tape and hard drives, at lower cost and with simpler operational processes.
Cloud storage as tape replacement
Ever since Iron Mountain was founded in 1951, businesses have been sending data physically off-site for safe storage. Initially, the value was that companies did not need to find space for their physical records, later it became clear that keeping a second copy of electronic data ensured the records were available if the first copy got destroyed when our office building burns down. Often the first IT infrastructure use of cloud storage is as a replacement for physical tape. A full copy of a backup is copied to cloud storage, allowing the data to be recovered if the on-premises site is destroyed. As an extension of this tape replacement, cloud storage can also be used for archival storage, keeping compliance copies of production data at specific points in time. In both these use cases, the data is almost never restored from the public cloud to on-premises, but the opportunity to access the off-site data is crucial. In this use, both on-premises and cloud-based data have full copies of data, a failure at either location does not prevent full data access.
Cloud as overflow capacity
Public cloud can also be used if there is not enough low-performance capacity on-premises, tiering to the public cloud is a valid option for large amounts of data that is seldom accessed. Last month’s backups are a classic case of large amounts of data that is rarely accessed. Adding cloud storage as a tier in a backup system can allow effectively infinite capacity, reducing the risk of running out of capacity for backups. The big difference here is that data tiered to the cloud is not retained on-premises. To have access to the complete set of data, we must have access to both on-premises storage and the cloud storage. If either location fails, then we no longer have complete data access.
Mind your blast radius
Combining cloud tiering and cloud archives is a desirable proposition. It minimizes the on-premises storage purchase and maximizes the options for data protection. We can retain a lot of archive copies of data as well as plenty of backups, without needing huge amounts of on-premises storage. The risk is when we place our archive alongside our cloud tier, in the same service and potentially in the same failure domain. If we choose to use AWS S3 for both tier and archive, then it would be prudent to have each in a different region. Since S3 is a regional service, an outage of S3 in one region should not impact S3 in another region. We might lose access to our archives but not our Tiered data at the same time. The more paranoid might choose to use a different public cloud provider, such as Azure or Google, rather than just another region.
Plan for Public Cloud Storage
As you are considering your backup solution, expect that public cloud storage will be part of the future and ensure that you are selecting products that allow multiple cloud storage options. More advanced uses of cloud storage include DR to the cloud, cloud migrations, and integration with other cloud services for Dev/Test workflows, or reporting. Cloud archiving is a great use case, but make sure that Tiering is an option in case of future capacity issues. Most of all, make sure you get choice in public cloud storage vendors.
Public Cloud Archive and Tier with Cohesity
While I have been thinking about using Public Cloud with backup software for some time, this week I actually started using AWS as part of my backup for my lab. In this video, I show you the steps I took to get my Cohesity Virtual Platform configured to send archives to AWS Glacier and use an AWS S3 bucket as a tier for backup storage. Cloud archiving is immediate and is controlled through the protection policy. Cloud tiering only moves data that has not been accessed for 60 days and is controlled through the storage domain.
Disclosure: This post is part of my work with Cohesity.
© 2018 – 2019, Alastair. All rights reserved.