In the last two blog posts of this series, I looked at ways that we copy data for protection and ways that are about improving business. Since we are making these copies of the same data for different purposes it might be worth considering how we might use a single product to make these copies without a lot of redundant copying and storage. Each time we make a copy of the production data we are impacting the production system, minimizing the impact on production should result in a business benefit. The challenge is that the different reasons for copying data have very different requirements so a single product for these needs will have to be flexible and feature rich.
Disclosure: This post is part of my work with Cohesity.
|Purpose||Data to restore||Restore characteristics||Special Characteristics|
|Backup & Recovery||Recent||Granular
Full VM and application
Rare recovery, mostly for testing
|Compliance||Many past copies||Very granular||Immutable copy
Almost never recovered
|Daily or weekly restore|
|Test and Development||Latest||Immediate
|Restore controlled by external workflow tools
Multiple restores per day
|Only used for duration of migration process|
To consolidate all of the copying functions into a single platform we will need a few features:
- Indexing with data and application awareness
- Local storage for fast restores
- Replication to another data centre or cloud for DR and migration
- Replication to public cloud storage for compliance
- High performance storage for DR, reporting, and test/Dev use
- Low cost storage for compliance and archive
- Public cloud integration for recovery
- An API for integration
- Pre-built integration with common IT and application process automation
I have left out the basics that we need before even considering the platform: reliable and scalable storage, integration with our hypervisor.
Such a diverse set of requirements leads to an interesting platform design that looks a lot like a cost-efficient modern storage array, at least for the deployment into our data centre:
- Tiered Storage
For reporting and test/dev we need performance. A relatively small amount of solid-state storage delivers performance and a lot of hard disk capacity to keep costs down.
Deduplication keeps both physical capacity and replication bandwidth under control. Long-term compliance storage could get very large without deduplication. A side benefit of deduplication is that only the metadata for each compliance point needs to be protected from modification, the deduplicated data is protected by definition.
- Scale-Out architecture
Even with deduplication we expect growth in capacity over time, a scale-out architecture allows this growth to occur incrementally.
No matter where we deploy the platform, we want integration, simplicity, and efficiency.
- Public cloud support
We want flexible options to use public cloud, as a destination for compliance data copies, as a location to run reporting and dev/test workloads, and as a source of business data that needs to be copied from cloud native applications.
- APIs everywhere
In order to integrate with existing reporting and dev/test tools, the data copying platform needs to have APIs as a last resort, and pre-built integrations with common platforms. This might mean integration with Jenkins for CI/CD using copies of live data.
- Logical copying
When we integrate all the purposes for copying data, we get a set of requirements that lead us to a virtualized copy platform. Each writable copy is a logical copy of the data rather than a full copy, changes are stored in snapshots or a deduplication system.
Could a single product satisfy all of your requirements to copy production data? Would that product deliver additional business benefits that an older backup application cannot provide?
© 2019, Alastair. All rights reserved.