Following on from my quick look at using PowerShell with Cohesity, I want to highlight the developer resources page at developer.cohesity.com. The developer portal has a few useful resources for automation around Cohesity, some detailed documentation of the REST API that is the core of all developer access to Cohesity and a repository of sample scripts that you can re-use and re-purpose to your needs. There are samples in Python, PowerShell, and Ansible as well as details of how to build an application to run on a Cohesity cluster.
Born in the cloud companies approach problem-solving differently to on-premises software companies, so Druva looks at the world differently to other enterprise backup vendors. One difference is the expectation that infrastructure is rented from cloud providers, rather than purchased from hardware vendors and deployed on-site. Druva offers Backup-as-a-Service and prefers to deploy as little infrastructure as possible inside their customer’s environments. Initially, Druva provided a backup solution for distributed endpoints (laptops and PCs) that live outside the corporate offices. Highly mobile staff who generate business data are a prime target for endpoint backup, and backup to the public cloud works extremely well for these uses. More recently, Druva has added support for enterprise virtualization as a data source for backup to the cloud.
Way back in the 1990s I was involved in managing large numbers of Windows file servers, as a central repository of business data. These file servers grew and grew over time, more and more files stored. Many organizations now have years and years of files stored on file servers and high-performance NAS appliances. Over time the knowledge of the value of these files is diluted, but the fear that something important may get lost never fades. IT teams are left as the holders of this business data and must treat every file as if some manager or regulator may demand access at any moment. Back in the ’90s, there was also a dream of Hierarchical Storage Management (HSM) which allowed data to move to lower-cost storage when it was not frequently accessed, freeing space on the expensive and fast storage for more frequently accessed data. At the time, there was no built-in support for data mobility in operating systems, so each HSM product had its own custom file-system driver to redirect access to migrated files.
Enterprise storage delivered on-premises or in the cloud, as a service where you pay only for what you use. That is my one-sentence summary for Zadara. We know that storage management is hard, and multi-cloud storage management is very hard. Zadara’s business is to deliver multi-cloud, enterprise storage as a service. I was surprised to hear that they have product deployed on five continents, that means hardware shipped to and maintained on five continents. All that hardware is on Zadara’s books; their customers are paying a monthly fee based on their consumption and Zadara carries the cost and risk on the hardware.
Zadara is a scale-out storage platform using standard x86 servers with options for hard drives, SSDs, and NVMe flash including Optane. The hardware is shipped to customers; however, customers are only billed for the resources that they use: performance and capacity. Within a scale-out cluster, virtual arrays can be defined that have a dedicated subset of the overall cluster’s physical resources. These virtual arrays have dedicated hardware, so perform very much like a physical array and allow multi-tenant consumption of a larger cluster. Zadara maintains a service control plane that manages every deployed device down to drive level. This global control plane does not have access to customers data on the arrays; the data is encrypted at rest using customer-controlled keys. One unusual new capability is the ability to run Docker containers directly on the storage cluster; I’m sure that will drive some interesting use cases.
Posted inGeneral|Comments Off on Vendor Briefing – Zadara
While I’m teaching the course “Architecting on AWS,” one of the central themes is that the highest value comes from using the specific capabilities of AWS services. Directly uploading your software into EC2 instances is unlikely to give you a great result. Consequently, I am very interested in stories of how on-premises products have been re-platformed to be cloud-native. A little while ago, I had the opportunity to sit down at the Pure Storage office in Mountain View and hear about how their hardware arrays become a cloud platform. We looked at the dual controller Flash Array product in our Build Day Live event with Pure; you can watch those videos here. Soon you will get all of the goodness of the on-premises Flash Array in a cloud-deployed form. I was very impressed that the Pure Storage team chose to use the native features of the AWS platform to deliver the same features as their on-premises hardware.
In my last blog post, I took a look at using your Cohesity cluster to host file shares, which are called Views in Cohesity. I finished with the point that these file shares might hold valuable data and need to be protected against data loss. Your other data is protected by copying from its location to your Cohesity cluster, but a failure that affects Views on a Cohesity cluster is likely also to affect the data protection copies on that cluster. The good news is that we already know how to protect against a Cohesity cluster failure, and therefore need only apply these same protections to Views. We can archive to the public cloud, replicate to another on-premises Cohesity cluster, and replicate to a Cohesity cluster in the public cloud. The Protection Policy for your View should include archiving or replication to ensure data protection against data loss if your Cohesity Cluster is destroyed. Bear in mind that a Cluster outage is likely to result from a human error or some significant disruption such as a data center flood or fire.
If you had built a scale-out storage cluster, why would you not make it usable as a filer? One thing that we know for sure is that data storage requirements will keep growing. A scale-out storage platform that allows you to buy capacity progressively as your needs grow might be a great solution. Having integrated data protection will be a great bonus. The Cohesity name for file sharing from their cluster is Views. A View can be an NFS or SMB file share and can be an S3 compatible object-store. The default is for the View to be all three, although in this mode the S3 store is read-only with object ingestion via the file share interfaces. If a View is setup as S3 only, then it is read/write accessible via S3.
Early in my career, many years ago, I was the sole IT person for a small call center business. As the only IT person, I was also responsible for some of the phone system, and everything else that plugged into the wall (including a mechanical letter folder.) Of all the varied tasks I had to complete, the one I hated was preparing the monthly report on IT and phone performance. I had to assemble lots of statistics from a variety of sources into the standard report form. Like all management reports, only the first paragraph was every read. I would spend a day and a half putting together the report and the graphs with the sure knowledge that my work would have no measurable impact anywhere in the organization. I would have loved a tool that would automate the data compilation process and output the graphs and report document for me. At Tech Field Day 19, I learned that there is a whole category of products dedicated to freeing people like me from the tedium of creating those reports. Tech Field Day disclaimer link. Robotic Process Automation (RPA) is the category of products that automate repetitive manual tasks.
Is your data protection driven by your data governance? You do have data governance policies, don’t you? Data governance policies should come from the business that generates data and identifies how that data needs to be cared for and protected. Things like how often it needs to be backed up, copied off-site, archived for compliance, encrypted, protected from copying, copied for other uses, and how long it should be kept before it is deleted. Once you have these data governance policies and you know where the governed data resides, you will know how to configure your data protection policies. Allowing the data governance policies to flow through to the data protection policies automatically will help significantly to ensure compliance.
The best automation is definitely one that is built into the platform or product. However, each business is unique, so there is also a need for custom automation. Today I will dig into the beginnings of writing PowerShell automation against a Cohesity cluster. I use PowerShell for a lot of system administration automation since I come from a Windows centered background and use PowerCLI to manage vSphere environments. Cohesity has a PowerShell module which is distributed through the PowerShell Gallery, making installation simple. I spent a while looking at the basics and example scripts, then extending some examples to fit what I wanted a little better. The resulting script is by no means ready to do anything useful, that will need to wait for a later post.