We tend not to think a lot about some of the most fundamental IT infrastructure services, yet they come up a lot when we are troubleshooting problems. One rule of thumb is that when it is an application networking problem then it is a DNS issue. Even when it is definitely not a DNS issue, it is a DNS issue. I am being. A bit flippant here, but the reality of troubleshooting non-obvious problems is that name resolution in one form or another is a common problem area. When it is change time, the issue of IP address assignment and management will also come up. Many years ago, I was working at a global pharmaceutical company where IP address management was handled by DHCP reservations for servers. Apparently, there had been a slipup where a new server was assigned the IP address of another server and a highly critical application had gone down.
These two pieces of context came up for me when I was chatting with BlueCat Networks about their product. Their core is in IP addresses and DNS names: providing integrated IPAM, DHCP, and DNS. You may feel that these areas are covered by your Windows or Linux servers, or by your network platform and you may be right for your organization. At a massive scale, you do need a more coordinated and integrated system that keeps IP addresses attached to hostnames across the whole enterprise. The target market for BlueCat Networks is the very large networks with critical applications and high change rates, where security, scalability, and automation are critical. One aspect of the product that I found interesting is the ability to do analytics against DNS requests, noticing if a server suddenly changes its behavior in a way that indicates that it has been compromised. Your print server probably shouldn’t be trying to locate your payroll system. I’m always interested in management tools that gain more insights from the data they have about system behavior.
If you have a large network and if you have an IP address problem, BlueCat might be your savior.
I mentioned the Cohesity REST API when I looked at the developer portal, now I’d like to show you a little about how to access that API and gather information from your Cohesity clusters. For my example I am going to do some very basic work directly with the API using Python. There are language bindings for Python and PowerShell that make accessing the API simpler, but direct access to the API is also worth illustrating. I chose a couple of basic tasks: reporting on capacity reporting and VMs that are not protected. Below I show how I accessed data via the API, I also posted a video of the same process if you prefer to watch.
One of the fun elements of being briefed about a product that is not yet released, and probably has not had its form finalized, is that only part of the product is revealed. This week at Pure Accelerate the Pure Cloud Block Store (CBS) was launched in its production form. The CBS is an implementation of the Flash Array that runs on AWS rather than on-premises. In my earlier post about CBS, I talked about the storage architecture, S3 object storage for performance, EC2 Instance Store for a read cache and EBS IO1 for the write buffer. This storage architecture remains in place in the CBS but is not attached to the controller as I thought. The EC2 instances that have the IO1 and Instance Store are called Virtual Disks. The basic CBS has seven of these as a “disk shelf.” The controllers in CBS have boot volumes, all the data and metadata storage are in the Virtual Disks, which is the same architecture as a physical Flash Array. One other element that I did not foresee is a DynamoDB table to store system configuration, rather than having this configuration on the disks.
Following on from my quick look at using PowerShell with Cohesity, I want to highlight the developer resources page at developer.cohesity.com. The developer portal has a few useful resources for automation around Cohesity, some detailed documentation of the REST API that is the core of all developer access to Cohesity and a repository of sample scripts that you can re-use and re-purpose to your needs. There are samples in Python, PowerShell, and Ansible as well as details of how to build an application to run on a Cohesity cluster.
Born in the cloud companies approach problem-solving differently to on-premises software companies, so Druva looks at the world differently to other enterprise backup vendors. One difference is the expectation that infrastructure is rented from cloud providers, rather than purchased from hardware vendors and deployed on-site. Druva offers Backup-as-a-Service and prefers to deploy as little infrastructure as possible inside their customer’s environments. Initially, Druva provided a backup solution for distributed endpoints (laptops and PCs) that live outside the corporate offices. Highly mobile staff who generate business data are a prime target for endpoint backup, and backup to the public cloud works extremely well for these uses. More recently, Druva has added support for enterprise virtualization as a data source for backup to the cloud.
Way back in the 1990s I was involved in managing large numbers of Windows file servers, as a central repository of business data. These file servers grew and grew over time, more and more files stored. Many organizations now have years and years of files stored on file servers and high-performance NAS appliances. Over time the knowledge of the value of these files is diluted, but the fear that something important may get lost never fades. IT teams are left as the holders of this business data and must treat every file as if some manager or regulator may demand access at any moment. Back in the ’90s, there was also a dream of Hierarchical Storage Management (HSM) which allowed data to move to lower-cost storage when it was not frequently accessed, freeing space on the expensive and fast storage for more frequently accessed data. At the time, there was no built-in support for data mobility in operating systems, so each HSM product had its own custom file-system driver to redirect access to migrated files.
Enterprise storage delivered on-premises or in the cloud, as a service where you pay only for what you use. That is my one-sentence summary for Zadara. We know that storage management is hard, and multi-cloud storage management is very hard. Zadara’s business is to deliver multi-cloud, enterprise storage as a service. I was surprised to hear that they have product deployed on five continents, that means hardware shipped to and maintained on five continents. All that hardware is on Zadara’s books; their customers are paying a monthly fee based on their consumption and Zadara carries the cost and risk on the hardware.
Zadara is a scale-out storage platform using standard x86 servers with options for hard drives, SSDs, and NVMe flash including Optane. The hardware is shipped to customers; however, customers are only billed for the resources that they use: performance and capacity. Within a scale-out cluster, virtual arrays can be defined that have a dedicated subset of the overall cluster’s physical resources. These virtual arrays have dedicated hardware, so perform very much like a physical array and allow multi-tenant consumption of a larger cluster. Zadara maintains a service control plane that manages every deployed device down to drive level. This global control plane does not have access to customers data on the arrays; the data is encrypted at rest using customer-controlled keys. One unusual new capability is the ability to run Docker containers directly on the storage cluster; I’m sure that will drive some interesting use cases.
Posted inGeneral|Comments Off on Vendor Briefing – Zadara
While I’m teaching the course “Architecting on AWS,” one of the central themes is that the highest value comes from using the specific capabilities of AWS services. Directly uploading your software into EC2 instances is unlikely to give you a great result. Consequently, I am very interested in stories of how on-premises products have been re-platformed to be cloud-native. A little while ago, I had the opportunity to sit down at the Pure Storage office in Mountain View and hear about how their hardware arrays become a cloud platform. We looked at the dual controller Flash Array product in our Build Day Live event with Pure; you can watch those videos here. Soon you will get all of the goodness of the on-premises Flash Array in a cloud-deployed form. I was very impressed that the Pure Storage team chose to use the native features of the AWS platform to deliver the same features as their on-premises hardware.
In my last blog post, I took a look at using your Cohesity cluster to host file shares, which are called Views in Cohesity. I finished with the point that these file shares might hold valuable data and need to be protected against data loss. Your other data is protected by copying from its location to your Cohesity cluster, but a failure that affects Views on a Cohesity cluster is likely also to affect the data protection copies on that cluster. The good news is that we already know how to protect against a Cohesity cluster failure, and therefore need only apply these same protections to Views. We can archive to the public cloud, replicate to another on-premises Cohesity cluster, and replicate to a Cohesity cluster in the public cloud. The Protection Policy for your View should include archiving or replication to ensure data protection against data loss if your Cohesity Cluster is destroyed. Bear in mind that a Cluster outage is likely to result from a human error or some significant disruption such as a data center flood or fire.
If you had built a scale-out storage cluster, why would you not make it usable as a filer? One thing that we know for sure is that data storage requirements will keep growing. A scale-out storage platform that allows you to buy capacity progressively as your needs grow might be a great solution. Having integrated data protection will be a great bonus. The Cohesity name for file sharing from their cluster is Views. A View can be an NFS or SMB file share and can be an S3 compatible object-store. The default is for the View to be all three, although in this mode the S3 store is read-only with object ingestion via the file share interfaces. If a View is setup as S3 only, then it is read/write accessible via S3.