There seems to be a fashion to rename your backup product as a data management product. I think that there are significant differences between data protection and data management, some products are not merely being renamed but are fundamentally different. I think it is worthwhile identifying the difference between products that are made for data protection and those that were designed from the start for data management. As always happens, I expect marketing departments to jump on-board the new name even when it is not within the capabilities of their products.
What does Data Protection do?
The mission of a data protection product is to allow a data container to be recreated exactly as it was at some point in the past. The container depends a little on the protection; it might be a whole VM, a database, or a file system. If the container is recreated in the same physical place as the original container, then we have a restore. If the container is recreated in another physical location, then we have disaster recovery. The implementation of this protection could be as storage snapshots; it could be by metadata management on a deduplicated platform, it could be copies of the container stored on a different storage system, it could even be copies of the container on tape that is physically shipped to another location. Data protection encompasses backup, recovery, disaster recovery, archiving, and data retention. All these use cases require the ability to recreate data exactly as it was at some point in the past. Restores and DR actions are always human initiated and probably infrequent, I would expect one restore activity for about every 100 backup operations.
What does Data Management do?
The mission of a data management product is to make data available for different purposes, not necessarily as an exact copy of the original data. When data is made available, it has a transform applied that suits the purpose. Both restore and DR are methods where the original state of the data needs to be retained, so a null transform is used. But there are other methods where the restored data will need to be modified by a transform. One modification is to mask Personally Identifiable Information (PII) from the live data and present the masked data for test and development. The purpose of masking is to allow non-production use without requiring production level privacy protection. For many non-production uses the volume of data can also be reduced, it is much quicker to test functionality with only 1% of production data not to mention that non-production systems are often resource constrained so lower performing. Data management products should also be able to take data inside one type of container and translate it into another container. For example, data in a Microsoft SQL server database in production might be converted to a MySQL database for testing, or a vSphere VM may be transformed into an AWS EC2 instance. Fundamentally these transforms all require a more intimate knowledge of what data is inside the container, which is what I think data management is all about. A result of using data for multiple purposes is that a lot of the access is setup and removed using automation. For example, as part of a Continuous Integration/Continuous Deployment (CI/CD) environment, a copy of the production data may be required several times a day when developers check in a new piece of code so that the code can be tested against real data. There may be multiple restore events for each backup event. The data management tool needs to work at the speed of the automation tool rather than at the pace of a human operator.
How much Data Management can you add to Data Protection?
Data protection product development often involves becoming more aware of the contents of data containers, effectively allowing smaller containers. An example is a backup product that is aware of databases inside a disk image backup. Now it is possible to restore individual databases, rather than needing to restore entire VMs to recover a single database. The challenge becomes managing the metadata for the backups and recovery. An approach that suits hundreds of backups and recoveries per month is unlikely to satisfy an environment with millions of files and database tables that might have thousands of restores per week.
Can Data Management do Data Protection?
We should now be clear that data management is largely a superset of data protection, all be it a use case that requires a specific implementation of data management. The protection part essentially needs that there be a copy of the valuable (production) data that is isolated from the original, so if whatever holds the primary copy is lost there is still a copy to restore from. This is why snapshots on the primary storage are not a complete data protection strategy even though they can be used for data management. Some data management implementations are entirely in the primary storage and so do not inherently provide backup and recovery. Most data management solutions allow two or more copies of data, usually replicating to a second storage device either exactly the same as the primary or a dedicated secondary device. Some data management solutions even use tiered structure. Copy first from primary to a secondary device at the same site, then replicate to a secondary device at another location which might be a public cloud. Once that architecture is set up, then data protection is simply another method of data access, one where no transform is applied to the data. There are customers for whom the flexibility of a data management platform is sufficient that using it just for data protection offers value. Other customers derive even more value by using the additional data management functions to enable new data uses such as CI/CD, reporting, or even data warehouse or AI functions.
I have heard from data protection companies that they can unlock the value of the data that is inside your backups but have seen little evidence that this is possible. I have seen demonstrations of data management products delivering business agility by making the data inside backups available for other purposes. In the future, I expect to see data management companies replace many data protection companies by delivering far higher value to their customers. Do not be surprised if your backup software now calls itself data management but do ask how they handle data transformations that make data management more valuable than just a restore of some data container.
Disclosure: This post is part of my work with Cohesity.
© 2018 – 2019, Alastair. All rights reserved.