Size matters, not in absolute terms where bigger or smaller is always better, but in matching a solution to the requirements it needs to fulfill. Scale Computing has transformed over the last few years from a player for budget-conscious small businesses to a scalable solution for distributed enterprises. I see two vital dimensions where Scale Computing has been innovating. The first is multi-cluster management to allow the central management of vast numbers of clusters. The other has been in scaling down the size of the minimum site for which they have a solution. Past Scale Computing hardware platforms have been full-depth rack-mount servers, offering options for dozens of CPU cores and hundreds of gigabytes of RAM. These models fulfill the requirements for medium-sized offices where a few dozen to a few hundred VMs are required. If you are a bank or a big box retail store, you might need this infrastructure at each branch to serve dozens of staff. You also want the same management console to manage all hundred or thousand branches, each with a local cluster. The scale of multi-cluster management that Scale Computing offers has been impressive. Recently, Scale simplified some of the network requirements, removing the need for a physical dedicated cluster network, now using VXLAN to isolate the cluster networking. There is an excellent set of videos on some of Scale’s innovations in their presentations at Tech Field Day 20.
A branch office solution that is deployed to every retail premises of a national or international retailer needs to scale to the requirements of the largest and smallest branch. Often, the smallest branch is a sole-charge staff member and a single till plus the corporate infrastructure like security systems and staff tracking. That small branch might need between three and five VMs, usually with a total size of a few Gigabytes of RAM and less than a terabyte of storage. In the past, the only cost-effective way to run these VMs was on a small form-factor desktop PC, very limited in both redundancy and remote management. The latest platform from Scale Computing is the HC150, based on Intel NUC tenth generation hardware, which can have as little as 4GB per node, allowing 6GB of VMs in a 3-node redundant cluster. Some of the magic is that Scale has optimized their RAM overhead to under 1GB, leaving 3GB per node for VMs in a tiny 4GB NUC config and 15GB for VMs in a 16GB NUC configuration. With the tenth generation NUC, Intel has brought back AMT features for remote management of hardware. Scale Computing uses the AMT capability to allow zero-touch remote deployment; a tiny cluster is shipped to the site with just a diagram of how to install. The commissioning process for the cluster is managed remotely once the NUCs are connected to the network. With NUCs, a three-node cluster can fit inside a shallow network rack, or on a shelf under the till in a small retail site where space is at a premium. If those NUCs don’t seem enterprise enough for you, Lenovo is a significant partner for Scale. Lenovo has rugged micro-servers that can form a Scale Computing cluster, no fans, and robust metal cases. I also saw some mention of support for the Wi-Fi adapters in both the NUC and Lenovo machines. I imagine that the Scale cluster traffic is still over wired ethernet, but the VM networking could happen over Wi-Fi. I imagine that it opens some exciting deployment options.
I really like the architecture of the Datrium DVX platform. Large (cost-effective) NVMe SSDs inside ESXi hosts provide impressive storage performance, and one or more shared disk shelves provides data persistence and protection. If you remember Pernix Data’s idea of separating performance from capacity, it is applied end-to-end in Datrium rather than bolted on the side as Pernix did. We showed just how simple Datrium is to deploy in a Build Day Live event in 2017. I was impressed that we deployed a DVX (vSphere) cluster, migrated VM workloads, and then added existing hosts to the DVX cluster all in a four-hour live-streamed activity. In the two years since we were at Datrium, the cloud has driven new features. First, with the cloud as a destination for backups, which are stored on cost-effective object storage(S3), then as a place where DVX based VMs could be restored. Cloud DVX is the Datrium DVX platform running in VMs on public cloud and presented to ESXi host in VMware Cloud on AWS (VMConAWS). The top use for Cloud DVX is DRaaS, cloud DR to VMConAWS.
DR, as a Service (DRaaS) to the public cloud, has a very compelling value proposition. Protect on-premises VMs at minimum cost and pay for recovery resources only when you practice or execute your DR plan. The magic of Datrium DRaaS is that there is no waiting for data to be rehydrated off S3 before your VMs can be powered on. Most solutions that use S3 for DR storage require the data to be copied from S3 to transactional storage such as EBS or VSAN before VMs can be powered on. These copies from S3 are fast, but with 100’s of GB being copied, it still takes time before you can start recovering applications. Datrium Cloud DVX uses EC2 instances with NVMe SSDs to provide performance while data persistence is on S3. The Cloud DVX storage is presented as an NFS share (with DR VMs) to the VMConAWS cluster. Recovered VMs can be powered on immediately, and later Storage VMotioned to VSAN so that Cloud DVX can be shut down. There is another point of difference: the compute part of Cloud DVX only needs to run when VMs are being recovered to VMConAWS, during protection, and after storage VMotion, there is no requirement for EC2 resources for Cloud DVX.
The most recent announcement is that Datrium DRaaS is no longer limited to DVX hardware; you can get DRaaS to VMConAWS for any vSphere environment. Datrium DRaaS Connect protects non-DVX clusters; you will need to deploy a virtual appliance that performs the backups using VMware’s VADP. Data protection is stored on S3, and recovery to VMConAWS uses Cloud DVX, just like recovering a DVX system. The primary value here is the shorter RTO by not needing to rehydrate S3 based images before a recovered VM can be powered on and start to deliver applications.
This week I was looking at the Cohesity Developer portal again and decided to see what was on GitHub. One of the repositories that Cohesity has is the Cohesity Management SDK for Python, which got me thinking. Python is a multi-platform programming language, and I have mostly used Python on a Raspberry Pi. Would the Cohesity SDK work on a Pi? What about other Cohesity management? So I downloaded the latest build of Raspbian Buster and got myself a desktop running on a Raspberry Pi 3. Naturally, a desktop on a Pi3 is nowhere near as powerful as my usual 2013 MacBook Pro, or the newer MacBook that I want to replace this old machine. You can see the video of me doing this all with my RapberryPi desktop here. I could still use Chromium to manage my Cohesity clusters through Helios, or directly through the cluster management page.
It has been a while since the phrase “Pets versus Cattle” was on the top of the conversational pile, but I think that it is a useful tool for approaching application architecture. Originally the phrase referred to on-premises enterprise IT as pets. We would have individual names for our servers and would spend a lot of time troubleshooting issues to return a server to a healthy state. By contrast, cloud-native applications were referred to as cattle. Instances have a numeric reference for a name, and if one stops working, it is destroyed and replaced with a new working instance.
One of the new features in the Cohesity Data Platform version 6.4 is called Data Migration and is part of the Smart Files function. Data Migration automates moving files from a file share to the Cohesity platform and leaving a symbolic link in place of the migrated file. The objective here is to free the file server or NAS from holding old or infrequently accessed files, which then reduces the need for expanding capacity on file servers or NAS devices that are no good at data efficiency or have high-cost storage. I talked about this in a past blog post or two, you can also read what Dan Frith wrote, and now I recorded a video of the actual migration.
The Data Migration job is simple to set up, requiring a source share, criteria for migration, and a name for the new Cohesity View to hold the migrated files. The View and its share are created automatically, don’t use the name of an existing View.
File servers (or NAS) with disks that fill up are a constant problem in any organization. Twenty years ago, I spent more than a few weekends swapping to larger hard drives in physical file servers. Now that those file servers are virtual, the virtual disks can grow until a datastore is full, then a SAN LUN needs to be made larger, and it is still a lot of work and a lot of money to store a lot of data, often of questionable value. If you are suffering from overloaded file servers, then you might want to look at a couple of ways that Cohesity can help.
Disclosure: This post is part of my work with Cohesity.
way is that Cohesity offers a scale-out multi-protocol NAS platform. I made a video walking through creating
a NAS share on my Cohesity cluster, and you can also take a look at Theresa Miller’s video here.
Cohesity is more than just a scale-out NAS; this video outlines some
of the value add offered by smart files and applications that run directly on the
way that Cohesity can help is by migrating older (less frequently accessed)
files off your file servers or NAS appliances, onto the Cohesity cluster. This
is what I was talking about in my post on reviving the HSM dream. You can get
more details from this video
that Mike Letschin recorded.
We tend not to think a lot about some of the most fundamental IT infrastructure services, yet they come up a lot when we are troubleshooting problems. One rule of thumb is that when it is an application networking problem then it is a DNS issue. Even when it is definitely not a DNS issue, it is a DNS issue. I am being. A bit flippant here, but the reality of troubleshooting non-obvious problems is that name resolution in one form or another is a common problem area. When it is change time, the issue of IP address assignment and management will also come up. Many years ago, I was working at a global pharmaceutical company where IP address management was handled by DHCP reservations for servers. Apparently, there had been a slipup where a new server was assigned the IP address of another server and a highly critical application had gone down.
These two pieces of context came up for me when I was chatting with BlueCat Networks about their product. Their core is in IP addresses and DNS names: providing integrated IPAM, DHCP, and DNS. You may feel that these areas are covered by your Windows or Linux servers, or by your network platform and you may be right for your organization. At a massive scale, you do need a more coordinated and integrated system that keeps IP addresses attached to hostnames across the whole enterprise. The target market for BlueCat Networks is the very large networks with critical applications and high change rates, where security, scalability, and automation are critical. One aspect of the product that I found interesting is the ability to do analytics against DNS requests, noticing if a server suddenly changes its behavior in a way that indicates that it has been compromised. Your print server probably shouldn’t be trying to locate your payroll system. I’m always interested in management tools that gain more insights from the data they have about system behavior.
If you have a large network and if you have an IP address problem, BlueCat might be your savior.
I mentioned the Cohesity REST API when I looked at the developer portal, now I’d like to show you a little about how to access that API and gather information from your Cohesity clusters. For my example I am going to do some very basic work directly with the API using Python. There are language bindings for Python and PowerShell that make accessing the API simpler, but direct access to the API is also worth illustrating. I chose a couple of basic tasks: reporting on capacity reporting and VMs that are not protected. Below I show how I accessed data via the API, I also posted a video of the same process if you prefer to watch.
One of the fun elements of being briefed about a product that is not yet released, and probably has not had its form finalized, is that only part of the product is revealed. This week at Pure Accelerate the Pure Cloud Block Store (CBS) was launched in its production form. The CBS is an implementation of the Flash Array that runs on AWS rather than on-premises. In my earlier post about CBS, I talked about the storage architecture, S3 object storage for performance, EC2 Instance Store for a read cache and EBS IO1 for the write buffer. This storage architecture remains in place in the CBS but is not attached to the controller as I thought. The EC2 instances that have the IO1 and Instance Store are called Virtual Disks. The basic CBS has seven of these as a “disk shelf.” The controllers in CBS have boot volumes, all the data and metadata storage are in the Virtual Disks, which is the same architecture as a physical Flash Array. One other element that I did not foresee is a DynamoDB table to store system configuration, rather than having this configuration on the disks.
Following on from my quick look at using PowerShell with Cohesity, I want to highlight the developer resources page at developer.cohesity.com. The developer portal has a few useful resources for automation around Cohesity, some detailed documentation of the REST API that is the core of all developer access to Cohesity and a repository of sample scripts that you can re-use and re-purpose to your needs. There are samples in Python, PowerShell, and Ansible as well as details of how to build an application to run on a Cohesity cluster.