The above quote is attributed to both Peter Drucker and Edward Demming although it appears that neither actually said it. Even so it is a driver for a lot of what operations teams do in IT. Measure and record everything in case we need the information to manage something later. A whole swathe of management products exist to gather measurements into large piles of data and let you see what has happened. Most of these products will let you know when something happened, whether it’s a breach of a static threshold or a departure from “normal” status. This sort of massive data crunching is what computers are great for, a lot of data in and a little analytics out. Then a human must decide what to do with the analyzed data. These products are an open loop, they just measure and report.
Measuring It Isn’t Managing It
Another example of an open loop is a simple heater. It is winter here in New Zealand so cold mornings. We have an old high energy gas heater in our house. My first job, when I get up in the morning, is to turn on the heater to chase away the overnight chill. A little later the house is too hot and I need to turn the heater off. This cycle of too cold and too hot can go on all day until bed time. My US friends will laugh, they have thermostats which are close the loop. A thermostat measures the room temperature and switches the heater on when the room is cool and off when it’s too warm. All day the thermostat does its job and seldom does the human even notice it or need to take any action.
Datacentre management products need to be closed loop. We need a thermostat for our whole datacentre, not just the temperature. Management products need to be empowered to take action to correct potential issues before the humans notice. We have a little of this in vSphere, DRS and storage DRS will automatically take action to prevent problems. The problem is that DRS and SDRS each look after only two resource dimensions and can make changes in only one dimension. DRS looks at CPU and RAM utilization and can use VMotion. SRDS looks at storage latency and free space and can use Storage VMotion. Both are great at the limited range of things that they do, but neither provides comprehensive management.
Disclosure: All travel and incidental expenses for attending Virtualization Field Day events were paid for by Gestalt IT. This was the only compensation provided, and it did not influence the content of this post.
I’ve been learning about VMturbo for a while. First at Virtualization Field Day 3 and then at VFD5. Between these events, my buddy Eric went to work for VMturbo. The aim for VMturbo’s product is to have your whole datacentre manage itself to an optimal state. The clever thing that VMturbo realized at the start is that performance is about supply and demand. The same supply and demand that drives free market economics. Rather than build a whole new model with new math, VMturbo used the well-known math of free market and applied it to the datacentre. Naturally this is all hidden from the user, so you don’t need a degree in economics to deploy VMturbo.
VMturbo measures a lot of dimensions, and can make changes in many dimensions. Along with the usual migration dimensions, there are resize dimensions, more vCPUs or less RAM for a VM. VMturbo will even account for the fact that there is a lot of network communication between two VMs and tend to put them on the same host.
I particularly like that VMturbo are going beyond hypervisor management. Their control software also integrates with other datacentre infrastructure. There is a storage control module that talks to a variety of different vendors arrays. This way the economic scheduling engine is aware of the capabilities and limitations of the array. Also, there is a network control module that gathers the same knowledge from the physical network devices. This is the part that identifies VMs which communicate a lot and drives putting them close together. Gathering data from many sources before making decisions is a good idea when it’s a human making decisions us a good idea. When computers are making decisions it is crucial that all the affected resources are instrumented.
Virtualization management needs to extend well past the hypervisor. We really need virtual datacentre management. To achieve massive scale without massive cost, this datacentre management must close the loop. Management at scale must involve trusting your management tool to make decisions and act on them. Humans need to be setting policy and having the datacentre act. This is what software-defined should mean in a datacentre.
© 2015, Alastair. All rights reserved.
The Drucker quote is “What gets measured, gets managed.” Also sometimes attributed to Tom Peters as “what gets measured get done”.
There’s a subtle difference. People often manage the wrong things, or manage them poorly, because the measurements that get set up are stupid or wrong. Perverse incentives, for example.