There are plenty of reasons to copy your production data, in my last blog post I talked about the reasons that were protection against things going wrong. Today I want to talk about the more positive reasons to copy data, ways that data copies can make your business more productive and profitable. All of the data copies that we made in the last post were insurance, we are winning if we never need to access those copies. The positive reasons for copying data are all about making the data accessible immediately and getting value out of that immediate access. Insurance copies of data are all about durability and metadata searchability, production copies are about performance and are often short-lived. There will be value in having a platform for managing these valuable data copies, but it will need some sophisticated capabilities to deliver business value.
Disclosure: This post is part of my work with Cohesity.
Reporting and Analytics
In databases we see optimization for different uses, On-Line Transaction Processing (OLTP) is the usual profile for a database behind a Line of Business (LoB) application. OLTP databases are optimized for new data writes and for random queries to support active business processes like stock control and order fulfillment. OLTP is very different from reporting, or On-Line Application Processing (OLAP), where the transaction data is summarized, maybe into daily, weekly or monthly sales performance reports. OLAP databases are copies of OLTP database that are optimized for sequential read queries, and often no new data writes. It is possible to use the same database for both OLTP and OLAP, but there is a resource conflict between these usage types that leads to poor application performance for the LoB application and then unhappy customers. Far more commonly the OLTP database is copied to the OLAP database server each day or week, before reporting is started.
Like all these positive use data copies, the reporting copy needs to be available immediately and at full performance. For reporting the copy needs to give great sequential read performance. The data copy for reporting only needs to contain the application data, it will usually be attached to a reporting server rather than copying the entire original application database server. Either data files need to be copied into the reporting server or copied data drives need to be attached. The copy and data mounting also needs to be automated so that up-to-date application data is available from the reporting server at the same time every day, week or month. Usually we want the data copying to be part of the workflow that generates the new reports or analytics information, at the end of the workflow the copy can be disposed to free up respources.
The main reporting data copy requirements are:
- Application aware data copy
- Automated mounting and unmounting
Test and Development
Changes to production applications should be tested before they are deployed to production, to minimize the chance that we will break production. This test environment needs to reflect the production environment as closely as possible for best testing fidelity. The closest we can get is an isolated copy of production, a great reason to copy your entire production application environment. If we are copying production for testing, how about we also copy production to provide a development environment? Developing and testing using copies of production definitely improves the quality of application changes when they hit real production. The problem we have is that production tends to hold a lot of information about people inside and outside our organization, to protect those people’s privacy we shouldn’t just copy their information to places that don’t directly help them. For test and development copies of production data we need to do some scrubbing or obfuscation of private data. This definitely separates the beginners from eth experienced data copiers. Ideally our data copy process would identify Personally Identifiable Information (PII) in the copied data and obfuscate it according to some rules. We might also want to only present a subset of our data to development to reduce storage cost, although using the full data set helps avoid some performance problems in test and production.
The main test/dev data copy requirements are:
- Automated multi-VM copying
- Data cleansing around PII
Migration
One very common, and difficult, task is to migrate from one infrastructure platform to another. It might be a data center consolidation project where workloads get migrated into a smaller number of data centers. It might be a best-of-breed virtualization platform that is being replaced with a new Hyperconverged Infrastructure. It might even be a lift-and-shift migration to public cloud. Often a Disaster Recovery product will be used to do this migration since it is like a planned DR failover. One important aspect of migration is that we are often copying a very large amount of data to a remote location, a slow background copy is not uncommon. For slow data copies it is crucial that we do incremental copies, not a whole new data copy every cycle. Usually we also want to test the copied virtual machines and applications on the new platform before committing to migrating the production workload. For the final migration we need some workflow automation, clean ordered shutdown on the source side before a final data copy and then ordered startup on the new platform.
The main migration copy requirements are:
- Incremental copying
- Migration testing workflow
- Real migration workflow
There are more great ways to get business value form copying your production data, these are the first few that come to mind. The characteristics that they all share is the need for immediate access to the copied data to deliver business value. We also see a lot of requirements to integrate data copying with workflow or automation to build out a full business process. I think that there is a lot of value in unifying the management of all these data copy functions but this is quite enough for today.
© 2019, Alastair. All rights reserved.