I am continuing my look at the Rubrik platform. In my previous blog post, I looked at the deployment process for the Rubrik Edge virtual appliance, as well as backups and restores from that Edge appliance. Today I want to dig a little deeper into the backup policies (SLA Domains in the Rubrik terms) as well as look at using replication to protect against losing the Edge appliance itself. I will start with replication and then loop round to policies since replication is driven by these policies.
For every data center full of servers, there are dozens or even hundreds of remote or branch offices. These locations are where business actually sell their products and make money. Delivering IT to these ROBO locations is a challenge in part because there are lots of them so controlling cost is crucial. While we might hope that all our business processes are run out of cloud applications, the reality is that many of these ROBO locations need to have their own servers. One retail branch I visited ran around 20 VMs, enough to need a virtualization platform but not enough for a SAN and multiple ESXi servers. Since these locations are where the money is made, it is also where the data is generated. Protecting that data in the Remote or branch office is what Rubrik Edge is all about.
We are back in the conference season, as I write this I have been back from VMworld USA for three days and will be leaving for VMworld EMEA in the morning. The TechTalks have taken a big step up this year, sessions are listed in the Schedule Builder application and Content Catalog. The result is nice, live audiences, where previously the audiences were sparse at best. The US TechTalks are all posted in this playlist on YouTube, the EMEA ones will be added as they are made.
That is a roundabout way of saying that I had lots going on and didn’t post about the other things I wrote in July and am surprised I’m writing this now.
Over on TechTarget I continued my series on using VMware Integrated Containers but ran into some significant issues. It seems that not all container platforms are created equal. By the way, did you see any Photon Platform or VIC announcements at VMworld? Maybe in Barcelona.
I also looked at whether a new breed of Workspace products can replace a VDI for some customers. There do seem to be benefits from a lighter infrastructure for desktop delivery.
On SearchVMware I covered some of the decisions around choosing the right media for your VSAN deployment. As with all storage it really does matter what sort of disk or flash you use.
For SearchDataBackup I looked at some of the important but not obvious questions to ask about cloud based backups. I also looked at aligning backup frequency to business needs, rather than simply IT policy.
A friend of mine has replaced a large amount of a Converged Infrastructure stack with an equally large amount of HyperConvereged Infrastructure. I ask him why and got a story of simplification.
Finally, I dug into using NSX and Kubernetes with a DevOps focus to deliver more agility and support microservices based deployments.
Another busy month in June, HPE Discover in Las Vegas and teaching an Online vSphere Operations course for O’Reilly. Now I’m caught up in all the organization for the vBrownBag TechTalks at VMworld. This year we will have far more presentations and will also be listed in Schedule Builder, so I expect an in-person audience too. Remember that there is also only a one week gap between the US and EMEA conferences, I only get home to NZ for three nights between. All this means that organization for TechTalks has to happen early, which is now.
On TVP I talked a bit about the new Oracle Cloud, which should be an interesting platform with distinct differences to what AWS offers.
While I was at HPE Discover I talked to IT people from two manufacturing plants that use IoT technology to better manage their maintenance. It is interesting that we think of IoT in a consumer context, IoT fridge or toilet, but the real IoT value will be in industrial applications.
It looks like my massive stint of writing for TechTarget has borne fruit & there is a heap of my articles published. I expect the next few months will be quieter as I haven’t been writing so much recently.
I think July is going to be a fun month and then conference season starts in earnest. I hope to see lots of my friends in-person at VMworld and to make a few new friends too.
Sometimes it takes a while for a company to come up on the radar, then it keeps coming up. It was at Tech Field Day 11, last year when I first learned of Comtrade and their software development business. Comtrade came up again this year when I was researching the VDI management and monitoring buyer’s guide for TechTarget. This week they reappeared with a new product, a backup product specifically for Nutanix named HYCU. I keep seeing HYCU as HKeyCurrentUser, so it is important to pronounce it like the Japanese poetry, Haiku. HYCU may be a new product, but Comtrade have been developing backup software for a long time, so it has mature thinking behind it. Backup policies require an RPO and retention, as well as an RTO. This last is interesting as backups don’t usually have restore time objectives. The destinations can be local NFS or SMB shares, or remote AWS or Azure storage. By default, HYCU will make its selection of backup destination to respect your configured RTO. A 6TB VM backed up to S3 is unlikely to be restored inside a 2-hour RTO, but from a local NFS server, there is a good chance to meet that RTO. Policies are applied to VMs, VMs are discovered from the Nutanix Prism API. Right now, HYCU only supports the Nutanix Acropolis hypervisor (AHV), but ESXi support is sure to be added soon. Restores can be whole VMs, or file level restores directly into the VM and either overwriting the file or redirecting the restore to preserve the current files. There is also an element of application awareness, HYCU can identify VMs that have SQL Server installed and backup the databases, then restore individual databases to a point in time by rolling the SQL logs forward. To speed up the restores, Nutanix snapshots are used and retained on the VM for a day. This means that a restore can happen immediately but that the backup can be sent to AWS for cheap storage. I like the simplicity of the approach, while still having a fair amount of flexibility.
Hyperconverged is all about simplifying infrastructure management. There is built-in backup and replication with the Nutanix product, but there has been some discussion about whether backups should be on the same storage as the original VM. There are a few more things I would like in the product. The ability to do a monthly compliance/eDiscovery backup that is retained indefinitely is essential if object storage is to replace tapes. I would also like to see integration with the Nutanix Prism interface, and I’m sure it will come. If I can make some time I will have a play with HYCU, there is a trial at tryhycu.com that I imagine will work with Nutanix Community Edition.
I’m in the middle of some crazy travel. Dell/EMC world in Las Vegas at the start of May then home. Silicon Valley for the Ravello/Oracle blogger briefing last week, home this week. On Sunday I head back over the Pacific for HPE Discover back in Vegas. I don’t plan any more long-haul travel in June, but July, August, and September will all have a lot of miles. This is the result of my choice to live in New Zealand but work largely for US businesses. One of the great things has been seeing so many of my friends on these trips, there is nothing like sharing a meal with a table full of friends.
I wrote for SearchDisasterRecovery about the concept of using Canary files to detect the actions of ransomware, then WannaCry blew up in mass media.
The VDI Management and Monitoring Buyer’s Guide continues. The third article is about what to expect your tools to do and the fourth looks at a few of the top products in the category.
For SearchDataCenter I looked at using data fabrics for cross-cloud mobility.
I continued the theme of getting the benefits of Hyperconverged without using hyperconverged. This article focusses on policy-based management, my favorite part of HCI.
I also looked at how inflexible AWS is as an IT provider, they really are the department of NO.
Every so often a product comes along that works in a new way and we need to re-learn how to think about building an IT infrastructure. I spent some time with Datrium learning about how their solution is different from other solutions. I think of their product as a scale-out controller with a shared storage shelf. Both hyperconverged and scale-out storage have scale-out controllers and scale-out storage. Hyperconverged uses the same scale-out physical servers to run VMs and scale-out storage uses additional servers. Datrium puts the controller with cache and workload VMs in each scale-out host but uses centralized storage shared by all the hosts.
With Datrium the controllers scale-out and are on the compute nodes, alongside the VMs. Each node has some solid-state storage as a cache but does not have “persistent” storage. All persistent storage is in a data node, separate from the compute nodes. The data node has local disks and NVRAM, but is only accessible through the compute nodes. Think of the data node as a disk shelf, a future release will allow multiple data nodes to be joined together. The compute nodes scale-out, up to 32 compute nodes can access a single data node. A nice feature is the ability to have non-uniform compute nodes. You might have sixteen general purpose compute nodes; dual socket, 256GB of RAM, and 1TB of SSD. Then maybe four nodes that are for large database VMs; quad socket, 1TB of RAM, and 8TB of SSD. All these compute nodes can access the same data node.
Datrium’s architecture provides a lot of scale-out benefits without some of the challenges. In typical scale-out and hyperconverged architectures there is a lot of east-west network traffic between the storage nodes. Data written to one node must also be written to another node, or two, to provide durability. There are also operational and availability issues with having storage capacity in your compute nodes. Taking an HCI node down for maintenance effects the redundancy of your storage, potentially reducing your failure tolerance. With Datrium the compute nodes seldom talk to each other, they almost exclusively talk to the data node. Having a compute node shut down or failed does not change your storage availability and resilience. With both HCI and scale-out you must have a minimum quorum of nodes operational before any storage is available. Datrium need the data node and one compute node to provide a working storage system.
Datrium is also designed to be simple to manage, that is a top value proposition for HCI too. Datrium has very few settings to configure; deduplication, erasure coding, and compression are always enabled, cannot be turned off. The only feature that can be turned on and off is full system encryption. The encryption happens in the compute nodes. Data is encrypted after it is deduplicated and compressed but before it leaves the compute node where the VM IO occurs. Data is encrypted across the storage network and at rest on the data node, no need for self-encrypting hard disks.
This architecture has some interesting consequences. It is going to take me a while to think through and talk about what the benefits are and what the downsides are, there are always downsides. Hopefully I will get to do some more work with Datrium and we will all learn more about their cool product.
What if I told you that you could fit sixty (60) physical servers in a 4U rack chassis? And that the chassis also included redundant switching with multiple 40GB uplinks. That is exactly what Aparna Systems are producing. The servers are a cartridge, the same size as a 3.5” hard disk, with an Intel Xeon CPU, 64GB of RAM, two SSDs and two 10Gbe networks. You can install whatever operating system you want on the nodes. Maybe Linux for containers or KVM, ESXi for a vSphere deployment, even Windows if that floats your boat. The chassis that accommodates sixty nodes is a full-size 4U enclosure, designed to go into server racks in a data center. With all the upstream bandwidth, these chassis are designed to be stacked up in a rack and clustered into massive scale-out server farms. There is also a smaller chassis, a mere 15 servers in 4U. This chassis is much shorter and will fit into communications racks or smaller data center racks. The smaller chassis are more suited to geo-dispersed use, service provider PoPs or industrial automation and analytics.
This is a hardware platform from which to build a cloud. It is not an opinionated, cloud-in-a-box with a defined operating system and orchestration platform. You get a bunch of servers and networking, add your own cloud software. The switches do have capabilities to help you deploy your chosen operating system, but you get to choose what and how you deploy. This is some very cool hardware, continuing the progression from tower servers, through large rack, to pizza boxes and then blades. A cartridge based platform is even more dense. Aparna is still very early, no flash offices. I liked that one cubicle had bare circuit boards pinned to the wall, the team is deep in the hardware development. I would love to see an even smaller chassis, four cartridges, and basic 10Gbe networking. That would be a great platform for ROBO or even home lab. That is not a market that Aparna are looking at. They are aiming for large analytics farms, NFV for Telcos and IoT edge compute.
I recorded two interviews at the Australian VMUG UserCon in Melbourne, back in March. It is way overdue for me to post these. The first was with Josh Atwell who has been a good friend of mine since the start of the vBrownBag days. I asked Josh about what it was like for him to meet customers in Australia and whether they had different things to say and ask.
I think I asked a way too serious question. Josh is hilarious to hang around with. At the 2016 San Diego UserCon he didn’t have the slide deck he wanted to present. So he taught us about Bourbon and drew quite a crowd.
April is already over, that is a bit hard to believe yet I know why it has passed so fast. It was a very busy month. I spent the first week in Houston with HPE. We ran the first vBrownBag Build Day with their HC380 hyperconverged platform. We showed you the end user customer experience of deploying the HC380 and migrating an existing workload onto the platform.
On TechTarget, I had a huge amount published. I wrote about the need for operations teams to understand container technologies. I also wrote a procedural article about deploying your first vSphere Integrated Containers environment.
And shared some thoughts on what the AWS S3 issues mean for DR products that use cloud services. As well as considering how using DRaaS may have unexpected costs if you haven’t considered some consequences of using this model.
I also looked some more at policy-based management, which I think will be a standard practice in a few years.
The Buyer’s Guide to VDI Management and Monitoring is being published, articles on what features to expect and how to evaluate products. I also wrote about the complexity of upgrading a VDI environment.
A new and fun format was a quick guide to setting up a basic lab for learning DevOps tools and methods. I will be interested to see how often my GitHub repo gets cloned by people following along.
Over on TVP, I wrote about the different way that hyperscalers operate compared to Enterprise IT. I also expanded on my thoughts about serverless on-premises, it really is only one aspect of developer enablement and not sufficient by itself. Another thought that I have had for a while is that the biggest benefits of Hyperconverged aren’t really from clustering local storage inside hypervisor hosts. The real benefits of HCI can often be had without using an HCI product. This article is about the physical aspects, the next one will be about the policy-based management that I am so keen on.