While I’m teaching the course “Architecting on AWS,” one of the central themes is that the highest value comes from using the specific capabilities of AWS services. Directly uploading your software into EC2 instances is unlikely to give you a great result. Consequently, I am very interested in stories of how on-premises products have been re-platformed to be cloud-native. A little while ago, I had the opportunity to sit down at the Pure Storage office in Mountain View and hear about how their hardware arrays become a cloud platform. We looked at the dual controller Flash Array product in our Build Day Live event with Pure; you can watch those videos here. Soon you will get all of the goodness of the on-premises Flash Array in a cloud-deployed form. I was very impressed that the Pure Storage team chose to use the native features of the AWS platform to deliver the same features as their on-premises hardware.
In my last blog post, I took a look at using your Cohesity cluster to host file shares, which are called Views in Cohesity. I finished with the point that these file shares might hold valuable data and need to be protected against data loss. Your other data is protected by copying from its location to your Cohesity cluster, but a failure that affects Views on a Cohesity cluster is likely also to affect the data protection copies on that cluster. The good news is that we already know how to protect against a Cohesity cluster failure, and therefore need only apply these same protections to Views. We can archive to the public cloud, replicate to another on-premises Cohesity cluster, and replicate to a Cohesity cluster in the public cloud. The Protection Policy for your View should include archiving or replication to ensure data protection against data loss if your Cohesity Cluster is destroyed. Bear in mind that a Cluster outage is likely to result from a human error or some significant disruption such as a data center flood or fire.
If you had built a scale-out storage cluster, why would you not make it usable as a filer? One thing that we know for sure is that data storage requirements will keep growing. A scale-out storage platform that allows you to buy capacity progressively as your needs grow might be a great solution. Having integrated data protection will be a great bonus. The Cohesity name for file sharing from their cluster is Views. A View can be an NFS or SMB file share and can be an S3 compatible object-store. The default is for the View to be all three, although in this mode the S3 store is read-only with object ingestion via the file share interfaces. If a View is setup as S3 only, then it is read/write accessible via S3.
Early in my career, many years ago, I was the sole IT person for a small call center business. As the only IT person, I was also responsible for some of the phone system, and everything else that plugged into the wall (including a mechanical letter folder.) Of all the varied tasks I had to complete, the one I hated was preparing the monthly report on IT and phone performance. I had to assemble lots of statistics from a variety of sources into the standard report form. Like all management reports, only the first paragraph was every read. I would spend a day and a half putting together the report and the graphs with the sure knowledge that my work would have no measurable impact anywhere in the organization. I would have loved a tool that would automate the data compilation process and output the graphs and report document for me. At Tech Field Day 19, I learned that there is a whole category of products dedicated to freeing people like me from the tedium of creating those reports. Tech Field Day disclaimer link. Robotic Process Automation (RPA) is the category of products that automate repetitive manual tasks.
Is your data protection driven by your data governance? You do have data governance policies, don’t you? Data governance policies should come from the business that generates data and identifies how that data needs to be cared for and protected. Things like how often it needs to be backed up, copied off-site, archived for compliance, encrypted, protected from copying, copied for other uses, and how long it should be kept before it is deleted. Once you have these data governance policies and you know where the governed data resides, you will know how to configure your data protection policies. Allowing the data governance policies to flow through to the data protection policies automatically will help significantly to ensure compliance.
The best automation is definitely one that is built into the platform or product. However, each business is unique, so there is also a need for custom automation. Today I will dig into the beginnings of writing PowerShell automation against a Cohesity cluster. I use PowerShell for a lot of system administration automation since I come from a Windows centered background and use PowerCLI to manage vSphere environments. Cohesity has a PowerShell module which is distributed through the PowerShell Gallery, making installation simple. I spent a while looking at the basics and example scripts, then extending some examples to fit what I wanted a little better. The resulting script is by no means ready to do anything useful, that will need to wait for a later post.
It has been
over a year since my last full Tech
Field Day event, and I’m delighted to be going to Silicon Valley next week
for Tech Field Day 19. I am
particularly looking forward to a mix of old friends and new ones, both delegates
and presenting companies. VMware
and Dan Frith are both
friends through many events, while Automation
Anywhere and Marina
Ferreira are both new to me. In between are friends and companies that I
know a little and will learn more about next week.
Business process automation is an essential part of what IT delivers, and central to allowing IT to deliver business automation is using IT automation. You may recall lots of vBrownBag content around “Automate all the things” a few years ago. Plenty of help to use PowerShell to automate your vSphere and more recent content helping you learn to automate with Python. The thing is that the best automation is the one you don’t write. A vendor with an open API to allow customers to develop automation is fantastic, and now an open API is table stakes for products in private and public cloud deployment. Unfortunately, some vendors stop with an API and expect their customers to develop all of their own automation using that API. If a large number of your customers are writing precisely the same automation, then your product is missing the feature that removes the need to write the automation. For example, if most of your customers need to write reports that show the availability and performance of your product, then your product should have those reports built-in. Customers are far better served with features built-in, rather than lots of duplicated effort to build these features on top of your API.
I had a great chat with a Zerto customer while we were eating lunch at ZertoCon last week. The customer is an MSP that offers Backup and DR as services to their customers. The DR as a Service (DRaaS) is delivered with Zerto. MSPs are a large market for Zerto, the MSPs put their own brand over the Zerto product and provide DRaaS for their customers. The comment from the support engineer was that when Zerto reports a problem, then there was a real problem, rather than a Zerto software issue. The MSPs Backup as a service product used a different vendor’s product, and the MSP frequently had support calls open with the vendor to resolve the problems with the backup product. The MSP’s DRaaS engineer that I chatted with could not believe how often the backup team had to open support requests with the backup vendor. He seldom, if ever, opens tickets with Zerto. For enterprise customers, the reliability translates into less staff time to deliver consistent DR using Zerto.
The headline feature of Zerto 7.0, released this month, is long term retention. Previously Zerto would retain recovery points for a maximum of 30 days, and now retention is a policy set by the customer. Recovery for up to 30 days meets most DR and backup/restore requirements, but not compliance and archive. With long term retention Zerto can be used for all of the “bad things happen” use cases that I outlined a few months ago. I wonder whether that MSP will deprecate their backup as a service and rename their DRaaS as “data protection as a service” when they upgrade to Zerto 7.0. Unifying back and DR makes a lot of sense. A single “copy” action from production can satisfy both requirements; then policies determine which copies go where and how long they are retained.
Hopefully, you understand that cloud-native applications have very different architecture and different databases compared to traditional enterprise applications. There are now modern enterprise applications, which use cloud-native services and databases alongside traditional application constructs. We also see more enterprises having parts of their estate in the public cloud and using cloud-native services. Hybrid cloud is not merely using both on-premises and cloud services, but also often has a melding of cloud-native and enterprise techniques. Against that backdrop, it makes a lot of sense for enterprise data protection platforms to add cloud-native data protection. Cohesity buying Imanis Data adds non-relational database protection to the platform.
Disclosure: This post is part of my work with Cohesity.
I had a briefing with Imanis in the middle of 2018, well before the acquisition. Imanis’s mission was to offer enterprise data management for Hadoop and No-SQL data. Their platform is software-only and uses a scale-out architecture. It provides a distributed file system with deduplication and compression, it protects data using API based access, rather than agents. So far, so good, so what? The exciting part for me is data awareness and the use of machine learning in the platform. Data awareness means that the Imanis platform knows about the data structures inside the databases and can use this to aid migration and do analytics on the protected data. I particularly liked that the analytics include ransomware detection, a role that I think backup products are ideally placed to fulfill.
I saw Imanis again when Cohesity unveiled their app store. You can deploy Imanis directly onto your Cohesity cluster and use the Cohesity scale-out storage for Imanis. It appears that Cohesity will continue to sell Imanis as a stand-alone product, as well as more integration into the Cohesity platform. I hope that we see the integration of the Cohesity policy based back into Imanis. I want to apply the same protection policy to the parts of my application that reside in VMs and cloud-native data stores. Cohesity continues to expand the coverage for their data protection, covering Enterprise platforms, Outlook 365, and now cloud-native databases. The aim is clearly to be a one-stop-shop for data protection and management in enterprises and particularly support multi-cloud organizations.