Is it just me that gets annoyed when category definitions are arbitrary and fail to match up to real business needs? One example is Gartner’s All-Flash Array (AFA) storage analysis. Any product that can be either AFA or hybrid is excluded, so vendors make unique product IDs that are really just an all-flash configuration of a hybrid array. Gartner’s definition of AFA gets in the way of customers looking for a set of benefits. I have come to realize that I have made the same mistake about HyperConverged Infrastructure (HCI) as a category. The realization arrived as I took part in Tech Field Day 16, particularly this presentation by Adam Carter. Naturally, my standard TFD disclaimer applies. HCI is not really about putting clustered storage inside a bunch of hypervisor hosts; it is far more about the simplicity of operating an environment designed purely to run VMs. There is a range of vendors with products that make it easy to deploy and manage a virtualization platform which is what HCI is really about. To me, the big surprise is that VMware does not have a general-purpose deployment tool, even for a basic vSphere cluster.
I am in Austin (Texas) this week to be a delegate at Tech Field Day 16, this is my fourteenth Tech Field Day (TFD) event. I want to spend a few moments talking about what I think the events are and share some of “The Rules” that we talk about for the event. My list of rules is by no means definitive. In fact, I hope to have corrections and additions to this list over time. Naturally, this post is covered by my TFD disclaimer post. Continue reading
I have long believed that success in the public cloud is not just about meeting the NIST definition, it also requires developer enablement. The rampant success of AWS is not driven by EC2 compute instances; it is by delivering services that enable developers to build applications that satisfy business needs rapidly. I believe that this is why we have seen IaaS based public clouds fail, they don’t deliver services that developers want to consume. Is there a parallel in private cloud? It seems that Stratoscale believes that there is, they have pivoted from providing only an on-premises IaaS cloud to delivering familiar AWS services on-premises. To be clear, they do not offer all of the AWS services and don’t give every API for every service. They are awesome, but not miracle workers, more products, and more extensive API coverage will come over time. Nonetheless, the services that they offer are pretty amazing. There are clones of AWS networking and load balancing, database service, Hadoop-as-a-service, Object and file services, as well as a Kubernetes-as-a-service offering. All these services are delivered on-premises using a software-only HCI deployment; you can re-use existing physical servers or buy your choice of new servers.
Developing software for AWS services is undoubtedly a popular practice, but usually locks you into deployment onto the AWS public cloud. With Stratoscale you can use many of the same services but deploy to on-premises infrastructure by changing one URL in the deployment process. Developers could use AWS for the development phase and then deploy to production on-premises, or the other way around. Applications could also be built with a split between on-premises and public cloud services, using the same architectures in both locations. I think that the strongest enterprise use-case for Stratoscale is organizations that want the agility of public cloud development but have regulatory or compliance requirements to keep their applications and data on-premises. The other strong use-case is for smaller public cloud providers to offer their own AWS compatible services and service niche requirements.
This is a very cool product. I hope to see more from Stratosacale as they expand their product and educate customers about the possibility of AWS compatible services on-premises.
I am continuing my look at the Rubrik platform. In my previous blog post, I looked at the deployment process for the Rubrik Edge virtual appliance, as well as backups and restores from that Edge appliance. Today I want to dig a little deeper into the backup policies (SLA Domains in the Rubrik terms) as well as look at using replication to protect against losing the Edge appliance itself. I will start with replication and then loop round to policies since replication is driven by these policies.
For every data center full of servers, there are dozens or even hundreds of remote or branch offices. These locations are where business actually sell their products and make money. Delivering IT to these ROBO locations is a challenge in part because there are lots of them so controlling cost is crucial. While we might hope that all our business processes are run out of cloud applications, the reality is that many of these ROBO locations need to have their own servers. One retail branch I visited ran around 20 VMs, enough to need a virtualization platform but not enough for a SAN and multiple ESXi servers. Since these locations are where the money is made, it is also where the data is generated. Protecting that data in the Remote or branch office is what Rubrik Edge is all about.
We are back in the conference season, as I write this I have been back from VMworld USA for three days and will be leaving for VMworld EMEA in the morning. The TechTalks have taken a big step up this year, sessions are listed in the Schedule Builder application and Content Catalog. The result is nice, live audiences, where previously the audiences were sparse at best. The US TechTalks are all posted in this playlist on YouTube, the EMEA ones will be added as they are made.
That is a roundabout way of saying that I had lots going on and didn’t post about the other things I wrote in July and am surprised I’m writing this now.
Over on TechTarget I continued my series on using VMware Integrated Containers but ran into some significant issues. It seems that not all container platforms are created equal. By the way, did you see any Photon Platform or VIC announcements at VMworld? Maybe in Barcelona.
I also looked at whether a new breed of Workspace products can replace a VDI for some customers. There do seem to be benefits from a lighter infrastructure for desktop delivery.
On SearchVMware I covered some of the decisions around choosing the right media for your VSAN deployment. As with all storage it really does matter what sort of disk or flash you use.
For SearchDataBackup I looked at some of the important but not obvious questions to ask about cloud based backups. I also looked at aligning backup frequency to business needs, rather than simply IT policy.
A friend of mine has replaced a large amount of a Converged Infrastructure stack with an equally large amount of HyperConvereged Infrastructure. I ask him why and got a story of simplification.
Finally, I dug into using NSX and Kubernetes with a DevOps focus to deliver more agility and support microservices based deployments.
Another busy month in June, HPE Discover in Las Vegas and teaching an Online vSphere Operations course for O’Reilly. Now I’m caught up in all the organization for the vBrownBag TechTalks at VMworld. This year we will have far more presentations and will also be listed in Schedule Builder, so I expect an in-person audience too. Remember that there is also only a one week gap between the US and EMEA conferences, I only get home to NZ for three nights between. All this means that organization for TechTalks has to happen early, which is now.
On TVP I talked a bit about the new Oracle Cloud, which should be an interesting platform with distinct differences to what AWS offers.
While I was at HPE Discover I talked to IT people from two manufacturing plants that use IoT technology to better manage their maintenance. It is interesting that we think of IoT in a consumer context, IoT fridge or toilet, but the real IoT value will be in industrial applications.
It looks like my massive stint of writing for TechTarget has borne fruit & there is a heap of my articles published. I expect the next few months will be quieter as I haven’t been writing so much recently.
I think July is going to be a fun month and then conference season starts in earnest. I hope to see lots of my friends in-person at VMworld and to make a few new friends too.
Sometimes it takes a while for a company to come up on the radar, then it keeps coming up. It was at Tech Field Day 11, last year when I first learned of Comtrade and their software development business. Comtrade came up again this year when I was researching the VDI management and monitoring buyer’s guide for TechTarget. This week they reappeared with a new product, a backup product specifically for Nutanix named HYCU. I keep seeing HYCU as HKeyCurrentUser, so it is important to pronounce it like the Japanese poetry, Haiku. HYCU may be a new product, but Comtrade have been developing backup software for a long time, so it has mature thinking behind it. Backup policies require an RPO and retention, as well as an RTO. This last is interesting as backups don’t usually have restore time objectives. The destinations can be local NFS or SMB shares, or remote AWS or Azure storage. By default, HYCU will make its selection of backup destination to respect your configured RTO. A 6TB VM backed up to S3 is unlikely to be restored inside a 2-hour RTO, but from a local NFS server, there is a good chance to meet that RTO. Policies are applied to VMs, VMs are discovered from the Nutanix Prism API. Right now, HYCU only supports the Nutanix Acropolis hypervisor (AHV), but ESXi support is sure to be added soon. Restores can be whole VMs, or file level restores directly into the VM and either overwriting the file or redirecting the restore to preserve the current files. There is also an element of application awareness, HYCU can identify VMs that have SQL Server installed and backup the databases, then restore individual databases to a point in time by rolling the SQL logs forward. To speed up the restores, Nutanix snapshots are used and retained on the VM for a day. This means that a restore can happen immediately but that the backup can be sent to AWS for cheap storage. I like the simplicity of the approach, while still having a fair amount of flexibility.
Hyperconverged is all about simplifying infrastructure management. There is built-in backup and replication with the Nutanix product, but there has been some discussion about whether backups should be on the same storage as the original VM. There are a few more things I would like in the product. The ability to do a monthly compliance/eDiscovery backup that is retained indefinitely is essential if object storage is to replace tapes. I would also like to see integration with the Nutanix Prism interface, and I’m sure it will come. If I can make some time I will have a play with HYCU, there is a trial at tryhycu.com that I imagine will work with Nutanix Community Edition.
I’m in the middle of some crazy travel. Dell/EMC world in Las Vegas at the start of May then home. Silicon Valley for the Ravello/Oracle blogger briefing last week, home this week. On Sunday I head back over the Pacific for HPE Discover back in Vegas. I don’t plan any more long-haul travel in June, but July, August, and September will all have a lot of miles. This is the result of my choice to live in New Zealand but work largely for US businesses. One of the great things has been seeing so many of my friends on these trips, there is nothing like sharing a meal with a table full of friends.
I wrote for SearchDisasterRecovery about the concept of using Canary files to detect the actions of ransomware, then WannaCry blew up in mass media.
The VDI Management and Monitoring Buyer’s Guide continues. The third article is about what to expect your tools to do and the fourth looks at a few of the top products in the category.
For SearchDataCenter I looked at using data fabrics for cross-cloud mobility.
I continued the theme of getting the benefits of Hyperconverged without using hyperconverged. This article focusses on policy-based management, my favorite part of HCI.
I also looked at how inflexible AWS is as an IT provider, they really are the department of NO.
Every so often a product comes along that works in a new way and we need to re-learn how to think about building an IT infrastructure. I spent some time with Datrium learning about how their solution is different from other solutions. I think of their product as a scale-out controller with a shared storage shelf. Both hyperconverged and scale-out storage have scale-out controllers and scale-out storage. Hyperconverged uses the same scale-out physical servers to run VMs and scale-out storage uses additional servers. Datrium puts the controller with cache and workload VMs in each scale-out host but uses centralized storage shared by all the hosts.
With Datrium the controllers scale-out and are on the compute nodes, alongside the VMs. Each node has some solid-state storage as a cache but does not have “persistent” storage. All persistent storage is in a data node, separate from the compute nodes. The data node has local disks and NVRAM, but is only accessible through the compute nodes. Think of the data node as a disk shelf, a future release will allow multiple data nodes to be joined together. The compute nodes scale-out, up to 32 compute nodes can access a single data node. A nice feature is the ability to have non-uniform compute nodes. You might have sixteen general purpose compute nodes; dual socket, 256GB of RAM, and 1TB of SSD. Then maybe four nodes that are for large database VMs; quad socket, 1TB of RAM, and 8TB of SSD. All these compute nodes can access the same data node.
Datrium’s architecture provides a lot of scale-out benefits without some of the challenges. In typical scale-out and hyperconverged architectures there is a lot of east-west network traffic between the storage nodes. Data written to one node must also be written to another node, or two, to provide durability. There are also operational and availability issues with having storage capacity in your compute nodes. Taking an HCI node down for maintenance effects the redundancy of your storage, potentially reducing your failure tolerance. With Datrium the compute nodes seldom talk to each other, they almost exclusively talk to the data node. Having a compute node shut down or failed does not change your storage availability and resilience. With both HCI and scale-out you must have a minimum quorum of nodes operational before any storage is available. Datrium need the data node and one compute node to provide a working storage system.
Datrium is also designed to be simple to manage, that is a top value proposition for HCI too. Datrium has very few settings to configure; deduplication, erasure coding, and compression are always enabled, cannot be turned off. The only feature that can be turned on and off is full system encryption. The encryption happens in the compute nodes. Data is encrypted after it is deduplicated and compressed but before it leaves the compute node where the VM IO occurs. Data is encrypted across the storage network and at rest on the data node, no need for self-encrypting hard disks.
This architecture has some interesting consequences. It is going to take me a while to think through and talk about what the benefits are and what the downsides are, there are always downsides. Hopefully I will get to do some more work with Datrium and we will all learn more about their cool product.