This article is part of my series on how AWS surprised me as I transitioned from an on-premises vSphere specialist to teaching AWS courses. In a previous life, I worked for an IT outsourcing company here in New Zealand, much like any other outsourcing company around the world we would take on your IT systems for a fee. In my experience, the more problems you get the outsourcer to manage for you, the larger the bill. It is a bit of a surprise to me that a given outcome often costs customers less to achieve with an AWS managed service than a more basic service. The more problems you hand over to AWS, usually the smaller your bill.
Cohesity started with data protection and management for virtual machines and can also help customers who have valuable data and applications in physical servers. Like every other physical backup product, there is an agent to install and then register with the Cohesity cluster. We will take a quick walk through this simple process. Then we will see that once the Cohesity cluster knows about your physical machines, data protection and recovery is just as simple as on virtual machines. You can watch me walk through the process in this video.
New servers in your data center can have a lot of RAM DIMMs in them, or some RAM and some Persistent MEMory (PMEM) such as Optane DIMMs. Either way, there is a lot of fast capacity that might help solve application performance issues that are caused by slow storage. Formulus Black would like to help you by turning some DIMMs into a block device for your application. Their FORSA software takes either RAM or PMEM and makes it into a local file system with latency and throughput that are better than NVME SSDs. Your application can run directly on the Linux host, where FORSA is installed. The other option is your application inside VMs under the KVM hypervisor on that host. The VM option allows applications that require Windows to be accelerated with DIMM based storage. There are other drivers that take Optane DIMMs and make them a disk device, and other RAM disk drivers around. FORSA has some unique features around management and data services. One feature is the ability to rapidly clone a volume (usually RAM-based) to an SSD to provide data protection. Another is replicating a volume from one FORSA node to another for high availability. There is also a GUI for friendlier management and deduplication to increase the effective capacity of the high-speed volume.
I expect we
will see more use of DIMM based storage when there are cost-effective options
for PMEM. Increasing application (particularly database) performance is
important, but finding the expertise to fix and optimize applications is
expensive. Moving from the fastest NVMe SSD, you can get to DIMM attached
storage can offer an order of magnitude better application performance for a
relatively small increase in hardware cost. Formulus Black says that they are
targeting medium-sized companies and Wall Street. The presentation I saw had
some significant improvements in IOPS and 99th percentile latency, which
translates into fewer storage bottlenecks. If you are struggling to get enough
storage performance with local NVMe SSDs, give them a look.
This is the first in a series of posts about things that surprised me when I started to work with AWS services. This particular surprise was one of the first, back in 2013, I was still teaching VMware courses and was invited to attend some AWS training. On the first day, the trainer explained that you could deploy as much software-defined network and as many VMs as you wanted. That was all a given at the time when VMware was struggling to integrate Nicera (now NSX) with vSphere. Then my mind was blown when the instructor said that EC2 (the VM service) is not that interesting. The real fun is in the AWS services that you use to assemble applications. The next two days were spent learning about these services and labs where I actually built an application. It was just too easy, even when I started going outside the lines. The lab had the usual prescriptive guidance about how to configure everything to work as intended. I built the autoscaling group of EC2 instances that would respond to the number of requests in a queue and take input data from object storage and place results in more object storage. The lab instructions only told us to scale-out the cluster, I worked out how to configure scale-in too & managed to test that still in the lab time.
Now that I teach the AWS courses, I do see more acceptance of applications that run in VMs, the way they do on-premises. But I now talk about EC2 instances as the last resort for when you have no better way to achieve your objectives. Usually, a managed service is a better option because it requires less work from you and often because it costs less in AWS bills too. I noticed that the Developing on AWS course is very focussed on serverless application development, meaning no EC2. New applications shouldn’t require working with legacy constructs like VMs, but you can have your VMs for your older applications.
As I started working with and writing about AWS, there were a few things that surprised me about how different AWS is from on-premises vSphere. As I have been teaching AWS official courses, I have continued to notice things that surprise me about AWS. I’m planning to write a separate post about each of these strange things. As I think of more strange things, this list will get longer, and I will write separate blog posts with more detail about each strange thing.
Ansible is an easy tool to start using for declarative configuration. I use Ansible to make sure a small fleet of Linux VMs are configured exactly the same each time I deploy them. In my last Cohesity video and post, I showed you how to deploy the Ansible role for Cohesity and gather information about your Cohesity cluster. Today we get to the real use of Ansible, integrating the protection of a fleet of physical servers with our Cohesity platform. The playbook I created from the samples deploys the Cohesity agent, adds the physical server as a source, and then adds the source to a protection job. You can watch me copy and paste from the samples and run the playbook in this video on YouTube.
When it comes to managing a fleet of Linux boxes with minimal extra infrastructure, I am a fan of Ansible. I have written before about using PowerShell to automate working with Cohesity, that will be a good choice for vSphere and Hyper-V environments where PowerShell is the native automation platform. I have also shown how the AutoProtection feature on Cohesity allows newly created VMs to be protected based on folders, tags, or naming. But what about when you have a bunch of physical Linux boxes that you want to protect? Ansible seems a great fit, and happily, Cohesity has an Ansible role to make everything easy. Here, I look at deploying the role and retrieving information about your Cohesity cluster using the Cohesity facts function. The video of me following this process is right here if you would prefer not to read any further.
Cohesity I have made a lot of walk-through and demonstrations videos as I have learnt about the Cohesity platform over the last year. We also showed you the deployment process and some architecture in the Cohesity Build Day Live event, plenty of video there too. If you would like a more structured set of presentations about the Cohesity platform and its newer features, then I suggest you take a look at their presentations at Tech Field Day. Unfortunately, I did not get to attend any of these events, hopefully I will see my Cohesity friends at Tech Field Day in 2020.
Disclosure: This post is part of my work with Cohesity.
Storage Field Day 18
Cloud Field Day 5
Size matters, not in absolute terms where bigger or smaller is always better, but in matching a solution to the requirements it needs to fulfill. Scale Computing has transformed over the last few years from a player for budget-conscious small businesses to a scalable solution for distributed enterprises. I see two vital dimensions where Scale Computing has been innovating. The first is multi-cluster management to allow the central management of vast numbers of clusters. The other has been in scaling down the size of the minimum site for which they have a solution. Past Scale Computing hardware platforms have been full-depth rack-mount servers, offering options for dozens of CPU cores and hundreds of gigabytes of RAM. These models fulfill the requirements for medium-sized offices where a few dozen to a few hundred VMs are required. If you are a bank or a big box retail store, you might need this infrastructure at each branch to serve dozens of staff. You also want the same management console to manage all hundred or thousand branches, each with a local cluster. The scale of multi-cluster management that Scale Computing offers has been impressive. Recently, Scale simplified some of the network requirements, removing the need for a physical dedicated cluster network, now using VXLAN to isolate the cluster networking. There is an excellent set of videos on some of Scale’s innovations in their presentations at Tech Field Day 20.
A branch office solution that is deployed to every retail premises of a national or international retailer needs to scale to the requirements of the largest and smallest branch. Often, the smallest branch is a sole-charge staff member and a single till plus the corporate infrastructure like security systems and staff tracking. That small branch might need between three and five VMs, usually with a total size of a few Gigabytes of RAM and less than a terabyte of storage. In the past, the only cost-effective way to run these VMs was on a small form-factor desktop PC, very limited in both redundancy and remote management. The latest platform from Scale Computing is the HC150, based on Intel NUC tenth generation hardware, which can have as little as 4GB per node, allowing 6GB of VMs in a 3-node redundant cluster. Some of the magic is that Scale has optimized their RAM overhead to under 1GB, leaving 3GB per node for VMs in a tiny 4GB NUC config and 15GB for VMs in a 16GB NUC configuration. With the tenth generation NUC, Intel has brought back AMT features for remote management of hardware. Scale Computing uses the AMT capability to allow zero-touch remote deployment; a tiny cluster is shipped to the site with just a diagram of how to install. The commissioning process for the cluster is managed remotely once the NUCs are connected to the network. With NUCs, a three-node cluster can fit inside a shallow network rack, or on a shelf under the till in a small retail site where space is at a premium. If those NUCs don’t seem enterprise enough for you, Lenovo is a significant partner for Scale. Lenovo has rugged micro-servers that can form a Scale Computing cluster, no fans, and robust metal cases. I also saw some mention of support for the Wi-Fi adapters in both the NUC and Lenovo machines. I imagine that the Scale cluster traffic is still over wired ethernet, but the VM networking could happen over Wi-Fi. I imagine that it opens some exciting deployment options.
I really like the architecture of the Datrium DVX platform. Large (cost-effective) NVMe SSDs inside ESXi hosts provide impressive storage performance, and one or more shared disk shelves provides data persistence and protection. If you remember Pernix Data’s idea of separating performance from capacity, it is applied end-to-end in Datrium rather than bolted on the side as Pernix did. We showed just how simple Datrium is to deploy in a Build Day Live event in 2017. I was impressed that we deployed a DVX (vSphere) cluster, migrated VM workloads, and then added existing hosts to the DVX cluster all in a four-hour live-streamed activity. In the two years since we were at Datrium, the cloud has driven new features. First, with the cloud as a destination for backups, which are stored on cost-effective object storage(S3), then as a place where DVX based VMs could be restored. Cloud DVX is the Datrium DVX platform running in VMs on public cloud and presented to ESXi host in VMware Cloud on AWS (VMConAWS). The top use for Cloud DVX is DRaaS, cloud DR to VMConAWS.
DR, as a Service (DRaaS) to the public cloud, has a very compelling value proposition. Protect on-premises VMs at minimum cost and pay for recovery resources only when you practice or execute your DR plan. The magic of Datrium DRaaS is that there is no waiting for data to be rehydrated off S3 before your VMs can be powered on. Most solutions that use S3 for DR storage require the data to be copied from S3 to transactional storage such as EBS or VSAN before VMs can be powered on. These copies from S3 are fast, but with 100’s of GB being copied, it still takes time before you can start recovering applications. Datrium Cloud DVX uses EC2 instances with NVMe SSDs to provide performance while data persistence is on S3. The Cloud DVX storage is presented as an NFS share (with DR VMs) to the VMConAWS cluster. Recovered VMs can be powered on immediately, and later Storage VMotioned to VSAN so that Cloud DVX can be shut down. There is another point of difference: the compute part of Cloud DVX only needs to run when VMs are being recovered to VMConAWS, during protection, and after storage VMotion, there is no requirement for EC2 resources for Cloud DVX.
The most recent announcement is that Datrium DRaaS is no longer limited to DVX hardware; you can get DRaaS to VMConAWS for any vSphere environment. Datrium DRaaS Connect protects non-DVX clusters; you will need to deploy a virtual appliance that performs the backups using VMware’s VADP. Data protection is stored on S3, and recovery to VMConAWS uses Cloud DVX, just like recovering a DVX system. The primary value here is the shorter RTO by not needing to rehydrate S3 based images before a recovered VM can be powered on and start to deliver applications.