VVols, Oh VVols, wherefore art thou VVols?

Some technologies take over the (IT infrastructure) world, some sink without a trace, and others find a niche where they fit a requirement. I suspect VMware’s VVols (Virtual Volumes) fall into the last category. VVols was released as a feature of vSphere 6.0 in March 2015 and updated to version 2.0 with vSphere 6.5 in late 2016. The primary functionality of VVOLS is to allow a block storage array visibility and control of the storage presented to VMs. Rather than the usual VMFS, where the array only sees a datastore but cannot tell what VMs are using that datastore. Less storage management in the ESXi hypervisor and more in your storage array. The assumption is that the storage array, or storage team, is good at managing storage capabilities and can provide a better service to the VMs than the vSphere hypervisor’s native capabilities. For example, storage array-based snapshots rather than vSphere snapshots or storage replication at the VM level rather than the datastore. An interesting element is an ability to control performance on a per-disk level to the VM from the array rather than layering per-datastore performance management from the array with per-disk performance management from the hypervisor. VVOLs only applies to block storage (iSCSI, Fibre Channel, and NVMEoF) and not to NFS-based storage, where the NFS server already knows about the individual VM disks because they are just files on the NFS share. NFS-based arrays know about the individual VM disks but can also benefit from VVOLs to offload advanced storage features to the NFS array. Thanks, Ben for the correction.

The Pure Storage presentation inspired my thinking about VVOLS at the Tech Field Day Extra at VMworld US 2022. It had been quite a while since I heard a lot of talk about VVOLs, so I wondered whether it was still a thing. A quick Google search shows that storage vendors are still talking about VVOLs and vSphere 7 brought some updates to VVOLs, so there must be customers using and benefitting from VVOLs; it hasn’t sunk without a trace. But why didn’t VVOLs take over the world? I suspect it is a combination of easily used features in vSphere and plentiful performance from all-flash arrays. After all, the infrastructure needs only be good enough not to limit the applications that it hosts. If your applications don’t demand more performance and capabilities than vSphere, VMFS, and flash together can deliver, then the simplest solution will provide the best benefit. The place where VVOLs have value is where VMFS limits the application. It might be that vSphere snapshots cause VM stuns, which affect application performance. These stuns are a part of vSphere’s snapshot behavior and can be a big issue when snapshots frequently happen during working hours. It might be a highly critical application that requires very low and stable disk latency, such as real-time commodity trading. These are use cases where the application requirements demand specific storage capabilities. Usually, these are business-critical applications at the core of the business.

Do you need VVOLs in your vSphere environment? Ask yourself a simple question, does VMFS simplify or complicate your storage design? If VMFS simplifies, you probably spend little time thinking about storage for your VMs and have at least half a dozen VMs for every datastore. If VMFS complicates your storage design, you probably have datastores dedicated to specific VMs or applications and spend significant time tuning the datastore and LUN configurations. If VMFS simplifies your storage, keep using VMFS and not have to spend too much time on storage. If VMFS complicates, then looks loosely at VVOLs, it will probably be easier to build complicated storage configurations with VVOLs than using VMFS.

Posted in General | Comments Off on VVols, Oh VVols, wherefore art thou VVols?

Diversity? How about neurodiversity?

I’ve always used this blog to document and share what I learned. Lately, I have been learning about myself and my neurodiverse mind, so I plan to share some of this new learning. First off, I am not a psychiatrist, and I have not seen any professional or had any diagnosis. I am going to talk entirely about my experience. If some of this resonates with you, maybe spend some time researching neurodiversity because understanding how your brain works can unlock the super-power part of not being typical. Being neurodiverse hasn’t stood in my way. I have been married to my (first) wife for nearly 30 years, we have two adult children who live independently but communicate and hug us when we see them. I have a fun and successful career where I go to amazing places and do amazing things with awesome people.

Continue reading
Posted in General | Comments Off on Diversity? How about neurodiversity?

Multi-Cloud Mobility – MinIO

Cloud-native applications usually use cloud-native storage, usually a combination of databases and object storage. Some of the agility of cloud-native application development comes from separating the persistence (storage) from the compute. Applications can be rapidly developed using DevOps methodologies while the valuable persistent data remains in the storage services. But what about the portability of those storage services? If you are not all-in with one cloud, you might want a persistence layer that you can use across clouds and on-premises. This is the challenge with multi-cloud, each cloud has its own standard services, and there is little interoperability between those standard services because each cloud provider wants to host all your IT.

MinIO can help you with multi-cloud object storage, providing S3 compatible storage anywhere you run a Kubernetes cluster, on-premises, or on almost any public cloud platform. The S3 compatibility is not simply about the API to access objects. MinIO has S3 features such as Lifecycle policies, versioning, object lock, and replication. MinIO can replicate buckets and objects from cloud-based MinIO buckets to on-premises or other cloud locations. Object storage tends to suit asynchronous replication; most of the time, objects are written once and read many (WORM) times, although MinIO offers synchronous replication for different use-cases. All your MinIO clusters are managed through a centralized console and API.

MinIO does not provide object storage with the lowest cost per GB. The focus is on performance and solving data management problems for large consumers of object storage. I saw MinIO present at Tech Field Day 25 and earlier at Cloud Field Day 11. At last year’s Cloud Field Day, MinIO also talked about having an interface for RocksDB, which multiple types of database engines can use. Using the same underlying platform for both unstructured data (S3) and structured data (RocksDB) might allow a unified persistence tier to enable multi-cloud deployment of cloud-native applications.

Posted in General | Comments Off on Multi-Cloud Mobility – MinIO

AWS Principles: Use caching

The design principle to use caching is not simply an AWS principle, it is a common application design principle. A cache is a temporary storage location for a small amount of data that improves application performance. Sometimes the cache is distributed around the world to be close to users, then it might be called a content delivery network. Other times the cache is simply extra memory in your web or application servers that holds some status information about currently active users. The idea that a cache is temporary is important, it is not the persistent storage location for the data. If the cache gets lost, the data in the cache can be re-created from a persistent location. The idea that the cache is not the definitive source is also important, data in the cache represents a copy of the persistent data at some point in the past. Some data can stay in the cache for a long time, the temperature recorded at 10 am yesterday will never change. The current temperature will change, so the current temperature shouldn’t be kept in cache for long, it should have a short Time To Live (TTL).

There are a few trigger points for considering adding a cache, usually centered around needing more application performance for transactional (rather than analytics) workloads. If increasing the database or other persistence tier performance tier seems inefficient, you might feel that you are not getting consistent value for money, then caching might be a good option. This can also be a trigger for considering a different database for a subset of the data, I mentioned this in a previous principle. As an infrastructure person, I am used to providing transparent caches, where the application code is unaware of the cache. But software developers often use explicit caches, where the application code makes choices about what data to place in the cache and when to update or remove cached data. On AWS, the ElastiCache service provides RAM-based caching which developers can choose to use within their application. Because it is an explicit cache, the application developer chooses what data to cache, whether to write to the cache on database updates or only on reads. There is a lot of developer effort to get the most out of ElastiCache, but the potential performance improvement is huge.

Caching is an important tool for improving application performance, everywhere from the end access device (user’s laptop or phone), through the application servers and to the persistent storage at the back. Efficient use of caching does require good design and the more application awareness you bring to that design the more efficiently you can use the expensive cache. Allocating excess RAM to application servers is a simple but inefficient way to provide caching, particularly for applications that you cannot get rewritten.

Posted in General | Comments Off on AWS Principles: Use caching

Faster Reaction Time with Hazelcast

Are databases too slow for your application? I don’t mean, is your database slow, and do queries take minutes to complete. I mean, is an optimized database still too slow for the rate at which things happen? That brings you into the stream processing world, where data arrives very fast. You need to make decisions quickly and act on those decisions immediately. One example is credit card processing, where instant fraud identification can prevent transactions from being approved. Another is real-time cyber-threat analytics, where every request to a website or application is validated before acceptance. In both cases, there are a massive number of transactions to monitor and complex scoring that is required within the allowed latency for the transaction. This is the space where Hazelcast plays, unifying large amounts of slower changing data with fast arriving streamed data. The slower changing data might be machine learning models and reference data, which are then used to evaluate the faster-arriving data stream. This is not an infrastructure feature; it is an application platform service. To use Hazelcast, your application will be developed using the Hazelcast SDK. There will be fast infrastructure to support your Hazelcast application: fast networks and powerful servers. The architecture is a grid or cluster, so several servers working together in a distributed architecture to provide a memory-first database and stream engine.

I heard from Hazelcast at Cloud Field Day 11; they have presented at several Tech Field Day events. My usual Tech Field Day disclaimer applies. If you have a big problem, and if a standard database just isn’t fast enough, maybe Hazelcast can handle your data rate.

Posted in General | Comments Off on Faster Reaction Time with Hazelcast

AWS Principles: Optimize for cost

This AWS design principle is based on the financial reality of using cloud services. The magic of AWS is that you can use as much or as little resource as you want and only pay for what you use. The tragedy of AWS is that every month you get charged for (more or less) every piece of resource that you use. Optimizing for cost is not about minimizing the amount you spend. Closing your AWS account will reduce the bill, but at what impact on the business? The objective is to get as much business value as possible and only pay for things that deliver business value.

Continue reading

Posted in General | Comments Off on AWS Principles: Optimize for cost

AWS Principles: Understand Your Single Points of Failure

My favourite quote from Werner Vogels is, “Everything fails, all the time.” One of the AWS design principles is to understand where things fail and prevent a failure from causing your application to stop doing its job. The guidance from AWS is to avoid Single Points of Failure (SPOF). I don’t believe you can eliminate every SPOF, so you should understand and accept your remaining SPOFs. This principle is related to the previous principles of designing services, automating, and using disposable resources. It adds awareness of the reality that every AWS service has a scope and may fail at that scope. EC2 is scoped at the Availability Zone (AZ), and a single EC2 instance is susceptible to failure within its AZ. We use autoscaling groups and elastic load balancing to remove the AZ as a SPOF, and now the regional services are our SPOF. While it is unusual for a regionally scoped AWS service to fail, they can and have failed in the past. To eliminate a region as a SPOF, you use a global service like Route53 to distribute application access across multiple regions, with load balancers and autoscaling groups in each region.

The problem is that each time we eliminated a SPOF, we at least doubled our cost and complexity. The additional cost and complexity are precisely why we may choose to leave a SPOF; eliminating the SPOF may be more expensive than an outage cost due to the SPOF. It may also be that the business’s nature may be its own SPOF; a company that operates in one city may not suit failover to another AWS region. For each SPOF, you will need to identify the cost of elimination and the failure’s risk. Everything fails all the time. Ensure you know what single points of failure might cause your application to die and that the business (not IT) accepts the business risk of the possible outage.

Posted in General | Comments Off on AWS Principles: Understand Your Single Points of Failure

AWS Principles: Choose the Right Database Solutions

Some of the AWS design principles pinpoint that AWS has many services to fulfill many different needs. The guidance for choosing the correct database solutions is not to say that you must standardize with one database for your application, quite the opposite. In a previous life, with on-premises enterprise IT, I was told that the database platform for critical production is Oracle. For non-critical, you could choose to use Microsoft SQL Server. There were only two database platforms (both relational) no matter what technical requirements come from your application. It is easy to choose the suitable database for each section of data that your application requires on AWS. There are at least seven different database services on AWS, relational or not, transactional or analytical. There are plenty of options. There are even options that are specialized for recommendations or transaction immutability. Many of these databases are serverless, so you only pay for what you use rather than hourly charges for performance capacity that you may not be using. When the database is delivered as a service, there is a far lower cost to add a different database type to your application. On-premises you would need a team to support the new platform, which might take months and cost thousands. Database as a service allows application teams to choose the right database platform for their requirements and to have multiple different database platforms within one application.

Before choosing a database solution, you need to understand your data structure and quantity and what you will do with that data. A few dozen gigabytes of data that you will use for ad-hoc monthly reporting (SQL, probably RDS) is a very different proposition to storing user profiles (Dynamo) and high scores (Elasticache with Redis) for millions of online gamers. The online game needs both a scale-out SSD-based JSON database for profiles and a RAM-based database for high scores. The application stores different information about the same people in different databases. Without the database choice, it is common to bend one database to multiple separate uses and find that it does a poor job. AWS makes it simpler to use the correct database type for the different data that your application requires. Choose the right database solutions.

Posted in General | Comments Off on AWS Principles: Choose the Right Database Solutions

AWS Principles: Design Services, not Servers

Most of the AWS design principles are about using the unique features and limitations of the AWS platform. With on-premises enterprise infrastructure, applications can assume that the infrastructure is perfect and will handle failures without the application knowing. The result of this enterprise infrastructure is that it is an acceptable solution to have a single server that delivers an application, features such as VMotion and vSphere HA will ensure the application is operational. On AWS, applications must expect the infrastructure to fail and must continue to deliver services when there is a failure. On AWS, there is no equivalent to VMotion or HA; your application architecture must ensure service availability. It is uncommon, but not unknown, for the EC2 service to fail for an entire AZ or to have network or storage issues that affect some or all of an AZ. If you have a single EC2 instance as a server, any of these outages means your application is offline. The best practice is to have your application spread across multiple AZs and abstracted by a multi-AZ (regional) service.

Continue reading
Posted in General | Comments Off on AWS Principles: Design Services, not Servers

Build Day TV with VMware SD-WAN

As I’m sure you know, VMware has been making a big move into networking in the last few years. The acquisition of VeloCloud in 2017 added WAN capabilities to the data center networking of NSX, from the Nicira acquisition in 2012. I learned a lot about the newly renamed VMware SD-WAN solution when we did a Build Day TV series last year. I remembered from the original news, that there is custom on-premises hardware (Edge device) and a cloud-based management platform (Orchestrator). The element that I was not aware of is the forwarding plane (Gateway) that can be a shared service cloud platform operated by VMware or enabled on a high spec Edge device and can be augmented with distributed peer-to-peer connections amongst Edge devices. As you probably know, I like policy-based management and the VMware SD-WAN is all about policies that are applied to groups of Edge devices while still allowing overrides and location-specific configuration for each device. There are a few more advanced use-cases covered too; using an AWS EC2 instance as an edge to provide SD-WAN into your VPC and using cloud on on-Edge device network security services.

Here’s the list of Build Day TV videos where Rohan Naggi explains the solution and implementation to Jeffrey and me.

The beginners Guide to VMware SD-WAN

Unbox and Set Up VMware SD-WAN Locations

Cloud VPN and Routing of Your VMware SD-WAN

VMware SD-WAN Application Performance

Intrinsic Security with VMware SD-WAN

Posted in General | Comments Off on Build Day TV with VMware SD-WAN