This is the first in a series of posts about things that surprised me when I started to work with AWS services. This particular surprise was one of the first, back in 2013, I was still teaching VMware courses and was invited to attend some AWS training. On the first day, the trainer explained that you could deploy as much software-defined network and as many VMs as you wanted. That was all a given at the time when VMware was struggling to integrate Nicera (now NSX) with vSphere. Then my mind was blown when the instructor said that EC2 (the VM service) is not that interesting. The real fun is in the AWS services that you use to assemble applications. The next two days were spent learning about these services and labs where I actually built an application. It was just too easy, even when I started going outside the lines. The lab had the usual prescriptive guidance about how to configure everything to work as intended. I built the autoscaling group of EC2 instances that would respond to the number of requests in a queue and take input data from object storage and place results in more object storage. The lab instructions only told us to scale-out the cluster, I worked out how to configure scale-in too & managed to test that still in the lab time.
Now that I teach the AWS courses, I do see more acceptance of applications that run in VMs, the way they do on-premises. But I now talk about EC2 instances as the last resort for when you have no better way to achieve your objectives. Usually, a managed service is a better option because it requires less work from you and often because it costs less in AWS bills too. I noticed that the Developing on AWS course is very focussed on serverless application development, meaning no EC2. New applications shouldn’t require working with legacy constructs like VMs, but you can have your VMs for your older applications.
As I started working with and writing about AWS, there were a few things that surprised me about how different AWS is from on-premises vSphere. As I have been teaching AWS official courses, I have continued to notice things that surprise me about AWS. I’m planning to write a separate post about each of these strange things. As I think of more strange things, this list will get longer, and I will write separate blog posts with more detail about each strange thing.
Ansible is an easy tool to start using for declarative configuration. I use Ansible to make sure a small fleet of Linux VMs are configured exactly the same each time I deploy them. In my last Cohesity video and post, I showed you how to deploy the Ansible role for Cohesity and gather information about your Cohesity cluster. Today we get to the real use of Ansible, integrating the protection of a fleet of physical servers with our Cohesity platform. The playbook I created from the samples deploys the Cohesity agent, adds the physical server as a source, and then adds the source to a protection job. You can watch me copy and paste from the samples and run the playbook in this video on YouTube.
When it comes to managing a fleet of Linux boxes with minimal extra infrastructure, I am a fan of Ansible. I have written before about using PowerShell to automate working with Cohesity, that will be a good choice for vSphere and Hyper-V environments where PowerShell is the native automation platform. I have also shown how the AutoProtection feature on Cohesity allows newly created VMs to be protected based on folders, tags, or naming. But what about when you have a bunch of physical Linux boxes that you want to protect? Ansible seems a great fit, and happily, Cohesity has an Ansible role to make everything easy. Here, I look at deploying the role and retrieving information about your Cohesity cluster using the Cohesity facts function. The video of me following this process is right here if you would prefer not to read any further.
Cohesity I have made a lot of walk-through and demonstrations videos as I have learnt about the Cohesity platform over the last year. We also showed you the deployment process and some architecture in the Cohesity Build Day Live event, plenty of video there too. If you would like a more structured set of presentations about the Cohesity platform and its newer features, then I suggest you take a look at their presentations at Tech Field Day. Unfortunately, I did not get to attend any of these events, hopefully I will see my Cohesity friends at Tech Field Day in 2020.
Disclosure: This post is part of my work with Cohesity.
Storage Field Day 18
Cloud Field Day 5
Size matters, not in absolute terms where bigger or smaller is always better, but in matching a solution to the requirements it needs to fulfill. Scale Computing has transformed over the last few years from a player for budget-conscious small businesses to a scalable solution for distributed enterprises. I see two vital dimensions where Scale Computing has been innovating. The first is multi-cluster management to allow the central management of vast numbers of clusters. The other has been in scaling down the size of the minimum site for which they have a solution. Past Scale Computing hardware platforms have been full-depth rack-mount servers, offering options for dozens of CPU cores and hundreds of gigabytes of RAM. These models fulfill the requirements for medium-sized offices where a few dozen to a few hundred VMs are required. If you are a bank or a big box retail store, you might need this infrastructure at each branch to serve dozens of staff. You also want the same management console to manage all hundred or thousand branches, each with a local cluster. The scale of multi-cluster management that Scale Computing offers has been impressive. Recently, Scale simplified some of the network requirements, removing the need for a physical dedicated cluster network, now using VXLAN to isolate the cluster networking. There is an excellent set of videos on some of Scale’s innovations in their presentations at Tech Field Day 20.
A branch office solution that is deployed to every retail premises of a national or international retailer needs to scale to the requirements of the largest and smallest branch. Often, the smallest branch is a sole-charge staff member and a single till plus the corporate infrastructure like security systems and staff tracking. That small branch might need between three and five VMs, usually with a total size of a few Gigabytes of RAM and less than a terabyte of storage. In the past, the only cost-effective way to run these VMs was on a small form-factor desktop PC, very limited in both redundancy and remote management. The latest platform from Scale Computing is the HC150, based on Intel NUC tenth generation hardware, which can have as little as 4GB per node, allowing 6GB of VMs in a 3-node redundant cluster. Some of the magic is that Scale has optimized their RAM overhead to under 1GB, leaving 3GB per node for VMs in a tiny 4GB NUC config and 15GB for VMs in a 16GB NUC configuration. With the tenth generation NUC, Intel has brought back AMT features for remote management of hardware. Scale Computing uses the AMT capability to allow zero-touch remote deployment; a tiny cluster is shipped to the site with just a diagram of how to install. The commissioning process for the cluster is managed remotely once the NUCs are connected to the network. With NUCs, a three-node cluster can fit inside a shallow network rack, or on a shelf under the till in a small retail site where space is at a premium. If those NUCs don’t seem enterprise enough for you, Lenovo is a significant partner for Scale. Lenovo has rugged micro-servers that can form a Scale Computing cluster, no fans, and robust metal cases. I also saw some mention of support for the Wi-Fi adapters in both the NUC and Lenovo machines. I imagine that the Scale cluster traffic is still over wired ethernet, but the VM networking could happen over Wi-Fi. I imagine that it opens some exciting deployment options.
I really like the architecture of the Datrium DVX platform. Large (cost-effective) NVMe SSDs inside ESXi hosts provide impressive storage performance, and one or more shared disk shelves provides data persistence and protection. If you remember Pernix Data’s idea of separating performance from capacity, it is applied end-to-end in Datrium rather than bolted on the side as Pernix did. We showed just how simple Datrium is to deploy in a Build Day Live event in 2017. I was impressed that we deployed a DVX (vSphere) cluster, migrated VM workloads, and then added existing hosts to the DVX cluster all in a four-hour live-streamed activity. In the two years since we were at Datrium, the cloud has driven new features. First, with the cloud as a destination for backups, which are stored on cost-effective object storage(S3), then as a place where DVX based VMs could be restored. Cloud DVX is the Datrium DVX platform running in VMs on public cloud and presented to ESXi host in VMware Cloud on AWS (VMConAWS). The top use for Cloud DVX is DRaaS, cloud DR to VMConAWS.
DR, as a Service (DRaaS) to the public cloud, has a very compelling value proposition. Protect on-premises VMs at minimum cost and pay for recovery resources only when you practice or execute your DR plan. The magic of Datrium DRaaS is that there is no waiting for data to be rehydrated off S3 before your VMs can be powered on. Most solutions that use S3 for DR storage require the data to be copied from S3 to transactional storage such as EBS or VSAN before VMs can be powered on. These copies from S3 are fast, but with 100’s of GB being copied, it still takes time before you can start recovering applications. Datrium Cloud DVX uses EC2 instances with NVMe SSDs to provide performance while data persistence is on S3. The Cloud DVX storage is presented as an NFS share (with DR VMs) to the VMConAWS cluster. Recovered VMs can be powered on immediately, and later Storage VMotioned to VSAN so that Cloud DVX can be shut down. There is another point of difference: the compute part of Cloud DVX only needs to run when VMs are being recovered to VMConAWS, during protection, and after storage VMotion, there is no requirement for EC2 resources for Cloud DVX.
The most recent announcement is that Datrium DRaaS is no longer limited to DVX hardware; you can get DRaaS to VMConAWS for any vSphere environment. Datrium DRaaS Connect protects non-DVX clusters; you will need to deploy a virtual appliance that performs the backups using VMware’s VADP. Data protection is stored on S3, and recovery to VMConAWS uses Cloud DVX, just like recovering a DVX system. The primary value here is the shorter RTO by not needing to rehydrate S3 based images before a recovered VM can be powered on and start to deliver applications.
This week I was looking at the Cohesity Developer portal again and decided to see what was on GitHub. One of the repositories that Cohesity has is the Cohesity Management SDK for Python, which got me thinking. Python is a multi-platform programming language, and I have mostly used Python on a Raspberry Pi. Would the Cohesity SDK work on a Pi? What about other Cohesity management? So I downloaded the latest build of Raspbian Buster and got myself a desktop running on a Raspberry Pi 3. Naturally, a desktop on a Pi3 is nowhere near as powerful as my usual 2013 MacBook Pro, or the newer MacBook that I want to replace this old machine. You can see the video of me doing this all with my RapberryPi desktop here. I could still use Chromium to manage my Cohesity clusters through Helios, or directly through the cluster management page.
It has been a while since the phrase “Pets versus Cattle” was on the top of the conversational pile, but I think that it is a useful tool for approaching application architecture. Originally the phrase referred to on-premises enterprise IT as pets. We would have individual names for our servers and would spend a lot of time troubleshooting issues to return a server to a healthy state. By contrast, cloud-native applications were referred to as cattle. Instances have a numeric reference for a name, and if one stops working, it is destroyed and replaced with a new working instance.
One of the new features in the Cohesity Data Platform version 6.4 is called Data Migration and is part of the Smart Files function. Data Migration automates moving files from a file share to the Cohesity platform and leaving a symbolic link in place of the migrated file. The objective here is to free the file server or NAS from holding old or infrequently accessed files, which then reduces the need for expanding capacity on file servers or NAS devices that are no good at data efficiency or have high-cost storage. I talked about this in a past blog post or two, you can also read what Dan Frith wrote, and now I recorded a video of the actual migration.
The Data Migration job is simple to set up, requiring a source share, criteria for migration, and a name for the new Cohesity View to hold the migrated files. The View and its share are created automatically, don’t use the name of an existing View.