Is your data protection driven by your data governance? You do have data governance policies, don’t you? Data governance policies should come from the business that generates data and identifies how that data needs to be cared for and protected. Things like how often it needs to be backed up, copied off-site, archived for compliance, encrypted, protected from copying, copied for other uses, and how long it should be kept before it is deleted. Once you have these data governance policies and you know where the governed data resides, you will know how to configure your data protection policies. Allowing the data governance policies to flow through to the data protection policies automatically will help significantly to ensure compliance.
Disclosure: This post is part of my work with Cohesity.
Automate Your Policies
How do you identify where governed data resides? Most often, the governed data is inside some physical server or VM. Commonly the name of the computer is used to determine its function and data governance. With VMs, you get inventory folders and tags as well, for a more nuanced identification. Using a data protection platform that can decide to protect a server or VM based on name, folder, or tags will allow the governance policy to be applied automatically. By identifying that a VM contains governed data, we can apply the right data protection policy. Those same identifiers should also drive other elements of data management, replication for DR or encryption at rest. Hopefully, you also control access to manage the VMs via the same identifiers to implement as much data governance as possible through simple controls.
Schedule vs. Policy
There has always been a challenge with conventional backup products in aligning the data protection with the data governance policy. They use entirely different ways to express their requirements and capabilities. Data governance may say that no more than four hours of data loss is acceptable while a backup schedule may only be able to specify specific times to start a backup. More recent backup products have moved to policy-based descriptions of data protection, where you can set a maximum interval between protection times and therefore maximum data loss. The closer your data protection policy’s language is to the data governance, the more likely you are to be compliant with the governance policy.
Watch Out
Be aware that backup protection schedules do not necessarily include the time to complete a backup. Consider what happens if the protection action kicks off on time at four hours after the last protection started. If the protection task takes half an hour to complete, we do not have a new restorable protected copy until four and a half hours after the last protection point. There is a time where a failure might lead to an out of specification data loss. The good news is that if we take frequent and incremental backups, then the extra time is trivial. Watch out for backups that take a significant amount of time to complete; you may need to make your protection policies more aggressive than your data governance suggests.
Compliance Reporting
There is no point in having data governance policies if you do not check that they are being implemented. Auditing for and reporting on compliance is central to the value of data governance policies. For the data protection part, you will want a platform that reports on non-compliance with the data protection policies. Both for each occasion that a protection policy is in breach but also statistics around the amount of policy breach the data protection system experiences. Naturally, data protection breaches are about risk, how much potential for out of specification data loss did we have? But there should also be reporting on the tendency of protected systems to be in breach. If there are frequent small breaches, then there is likely to be a system-level issue, maybe an under-specified data protection system or an overloaded network. In any complex system, frequent small problems are often a harbinger of a much larger problem which we should try to avoid.
The role of IT in data governance is to implement the policies defined by the part of the business that generates that data. Having your data protection policies automatically aligned to your data governance will simplify compliance.
© 2019, Alastair. All rights reserved.