Why Are You Copying Your Data? Part 3 – Unification

In the last two blog posts of this series, I looked at ways that we copy data for protection and ways that are about improving business. Since we are making these copies of the same data for different purposes it might be worth considering how we might use a single product to make these copies without a lot of redundant copying and storage. Each time we make a copy of the production data we are impacting the production system, minimizing the impact on production should result in a business benefit. The challenge is that the different reasons for copying data have very different requirements so a single product for these needs will have to be flexible and feature rich.

Disclosure: This post is part of my work with Cohesity.

Requirements

Purpose	Data to restore	Restore characteristics	Special Characteristics
Backup & Recovery	Recent	Granular Fast	On-site storage Frequent restore
Disaster Recovery	Recent	Workflow Full VM and application	Off-site storage Rare recovery, mostly for testing
Compliance	Many past copies	Very granular	Immutable copy Almost never recovered
Reporting	Latest	Immediate Application aware Automated	Daily or weekly restore
Test and Development	Latest	Immediate Data aware Multi-VM	Restore controlled by external workflow tools Multiple restores per day
Migration	Latest	Test workflow Failover workflow	Only used for duration of migration process

Desirable Features

To consolidate all of the copying functions into a single platform we will need a few features:

Indexing with data and application awareness
Local storage for fast restores
Replication to another data centre or cloud for DR and migration
Replication to public cloud storage for compliance
High performance storage for DR, reporting, and test/Dev use
Low cost storage for compliance and archive
Public cloud integration for recovery
An API for integration
Pre-built integration with common IT and application process automation

I have left out the basics that we need before even considering the platform: reliable and scalable storage, integration with our hypervisor.

Implementation

Such a diverse set of requirements leads to an interesting platform design that looks a lot like a cost-efficient modern storage array, at least for the deployment into our data centre:

Tiered Storage

For reporting and test/dev we need performance. A relatively small amount of solid-state storage delivers performance and a lot of hard disk capacity to keep costs down.

Deduplication

Deduplication keeps both physical capacity and replication bandwidth under control. Long-term compliance storage could get very large without deduplication. A side benefit of deduplication is that only the metadata for each compliance point needs to be protected from modification, the deduplicated data is protected by definition.

Scale-Out architecture

Even with deduplication we expect growth in capacity over time, a scale-out architecture allows this growth to occur incrementally.

No matter where we deploy the platform, we want integration, simplicity, and efficiency.

Public cloud support

We want flexible options to use public cloud, as a destination for compliance data copies, as a location to run reporting and dev/test workloads, and as a source of business data that needs to be copied from cloud native applications.

APIs everywhere

In order to integrate with existing reporting and dev/test tools, the data copying platform needs to have APIs as a last resort, and pre-built integrations with common platforms. This might mean integration with Jenkins for CI/CD using copies of live data.

Logical copying

When we integrate all the purposes for copying data, we get a set of requirements that lead us to a virtualized copy platform. Each writable copy is a logical copy of the data rather than a full copy, changes are stored in snapshots or a deduplication system.

Could a single product satisfy all of your requirements to copy production data? Would that product deliver additional business benefits that an older backup application cannot provide?

Why Are You Copying Your Data? Part 3 – Unification

Requirements

Desirable Features

Implementation

About Alastair

Past posts

Why Are You Copying Your Data? Part 3 – Unification

Requirements

Desirable Features

Implementation

Share this:

About Alastair

Past posts