For every data center full of servers, there are dozens or even hundreds of remote or branch offices. These locations are where business actually sell their products and make money. Delivering IT to these ROBO locations is a challenge in part because there are lots of them so controlling cost is crucial. While we might hope that all our business processes are run out of cloud applications, the reality is that many of these ROBO locations need to have their own servers. One retail branch I visited ran around 20 VMs, enough to need a virtualization platform but not enough for a SAN and multiple ESXi servers. Since these locations are where the money is made, it is also where the data is generated. Protecting that data in the Remote or branch office is what Rubrik Edge is all about.
Disclosure: This post is part of some paid work I am doing with Rubrik. Being smart people, they did not select the topic or message in this post. These are my thoughts and experiences, not the Rubrik party line. If you would like to hear more of my thoughts and experiences about Rubrik and ROBO data protection then join me for this Rubrik webinar on November 14th.
My home lab is about the same scale as a small ROBO location. There is a single ESXi host with local storage running half a dozen VMs. I have a domain controller that is also my file server and a full VMware View deployment with an RDSH host and a couple of desktop VMs. While I call it a lab, it is central to my business as I keep all my records on the file server and use a Windows desktop to run my accounting software. I need a good backup of the VMs on this one ESXi server, which is where Rubrik Edge come into my data shed.
Rubrik is known for a HyperConverged secondary storage product, it uses the same hardware appliance approach as the HCI vendors. Four nodes fit into a 2U Brik, multiple Briks can be clustered to create a large secondary storage solution. You can have a cluster at each data center and have the clusters replicate backup. Brian Suhr has a good post about the hardware and another that walks through the install process. Both posts are detailed but quite old. Anthony Spiteri has a more recent walk through post. Not every location needs a cluster of Briks, and there are some situations where even a single physical backup node is too expensive. I have been spending some time with Rubrik Edge, the virtual appliance version of Rubrik’s product. The idea is that you can have a virtual Rubrik at your smaller sites and have backups replicated back to your central cluster, or to the public cloud.
I added a 2TB SATA SSD to my ESXi server and deployed the Rubrik virtual appliance onto that datastore. Using a separate SSD for the Edge appliance means that if the SSDs with my production VMs die, I still have a local backup. The appliance has a 64GB boot disk and a 1-5TB data disk, so the minimum footprint on a datastore is 1.07TB. I would have liked a minimum VM size that fits on a 1TB SSD without a low free space warning, a 700GB data disk would do the trick. The other trap I found is that the appliance has 2vCPUs and a 4GHz CPU reservation, my ESXi server has 1.9GHz CPUs so cannot deliver 4GHz to the VM. Reducing the reservation to 3GHz allowed me to power on the VM. Once the appliance is deployed & booted there is a simple wizard that completes the Rubrik deployment. On the current Rubrik Edge build, the wizard runs in text mode on the console of the appliance, not through a web page like the full Brik.
Rubrik backups are controlled by applying a policy to VMs, ideally, the same policy applied to groups of VMs. Policies configure the backup interval and the retention period, a single policy can drive both local and remote (replicated) backups. It only took me a few minutes to assign backup policies to my VMs, the most critical VM to a Gold policy and the others to a Bronze policy. Only my fileserver VM has fast changing data, the other infrastructure and desktop VMs change much more slowly.
That one critical VM, called “DC” has had the Gold SLA applied for three weeks now. I can restore from four hourly intervals for the last three days and any day since backups began. Right now, that is around forty possible points in time for restores. In vSphere, the VM is listed as using 143GB of space, while Windows reports 109GB of disk usage. Currently, Rubrik reports that it is using 148GB of storage to protect this one VM.
The other six VMs, with around 250GB of data, are only backup up each day by the Bronze SLA. After three weeks, they are using 207GB of Rubrik capacity, showing that deduplication works better across multiple VMs. Overall, after three weeks the Edge appliance is using 355GB of its 1TB capacity.
Backups are great, but the whole point is to do a restore. The first kind of restore is when a stupid user deletes a file that they should not. The restore process is to find a suitable restore point and then locate the file in that restore point. The web GUI makes it simple to find the snapshot time and then browse for the file, provided the user knows where the file was and when they deleted it.
Luckily the file I deleted was in my DC VM, which is protected every four hours and I can browse the files to find the text file I deleted. Then I can either download the file to my PC or restore it right back into the VM.
After confirming that I want the file in its original location and waiting a few seconds, there was the file back where it was supposed to be. File level restores from image-based backups are a great capability.
VM Instant Restore
The other kind of restore is where the whole VM is gone, like when I just deleted a VM called Other-Master by accident. Now I need my VM back fast, so I find the backup point and this time rather than clicking browse files I choose Instantly Recover.
Since I am recovering a deleted VM, I recover back to the same ESXi server and keep the network attached. The Rubrik appliance creates a new NFS datastore on my ESXi host, registers the VM, and powers it on.
In a few seconds, I have a running VM, it still lives on the Rubrik appliance’s storage but is running in my inventory. Now I can Storage VMotion the VM back to the production datastore when it suits me.
So, what have we seen today? A virtual appliance-based backup solution that is protecting the VMs in my branch office sized vSphere lab. Capacity efficient backups with different protection policies for VMs of different criticality. The ability to recover whole VMs or individual files in a few seconds.
The main limitation of my current implementation is that all my Rubrik backups are inside the same ESXi server where the VMs reside. If I lose that ESXi server I have no recovery available. Join me in the second blog post (in a few weeks’ time) where I will setup replication of backups from one site to another and look at recovering all the VMs if I lose the primary site.
© 2017, Alastair. All rights reserved.