AWS Surprises – One Datacentre Is Not Enough

Most on-premises IT infrastructure designs treat a datacentre as a highly available platform, having an entire datacentre off-line is a disaster. It is a bit of a surprise then that AWS recommends we treat a datacentre as a failure domain and plan to keep our applications operational even if a datacentre fails. AWS doesn’t actually expose individual datacentres in its services; they present Availability Zones. An Availability Zone (AZ) is the smallest area we can usually select for running applications on AWS and is made up of one or more datacentres that are very close together. As far as customers are concerned, we treat an AZ like one datacentre. The EC2 service, and its storage EBS, is scoped at the AZ; an EC2 instance in one AZ cannot be powered on in another AZ. AWS recommends that we have multiple EC2 instances spread across multiple AZs for high availability because an AZ or an AZ scoped service can fail.  If you take a look at the AWS Post Event Summaries page you will see events where specific services were unavailable; usually the EC2 or EBS events impacted only a single AZ.

Multi-AZ is a standard design practice for production applications on AWS, DR is usually considered for region to region failure. Failover between AZs is part of the application design, usually with scale-out EC2 for compute and a decoupling service like a load balancer or queue that is regionally scoped. The regionally scoped service continues to operate even when one AZ fails, allowing the surviving EC2 instances to keep delivering application services. The ability to scale-out to provide HA is a part of the application design, rather than a feature of the infrastructure.

The equivalent design practice on-premises is a highly redundant virtualization platform in a single datacentre, DR is used to recover to another datacentre. All of the redundancy and availability of the virtualization layer is invisible to the application, which is often even unaware of a DR failover other than as an outage before regular service is restored. There are on-premises designs that have storage and hypervisor clusters that span multiple datacentres with the equivalent scope of AWS multi-AZ. These Metro-Cluster solutions are usually very expensive and used only for highly critical applications. Metro-Cluster places all of the failover awareness and functionality in the infrastructure; applications are generally still unaware of the failover.

On AWS, a single datacentre is not enough for any production application deployment. Deploying highly available applications on AWS requires that the application be designed with the awareness of the AWS infrastructure. Cloud-native applications are designed with an awareness of the limitations of cloud-native infrastructure. Enterprise applications deployed on enterprise infrastructure expect perfect reliability from the infrastructure. Take a moment to look back at the Post Event Summaries page, think about the number of datacentres AWS operates (currently 76 AZs), and then think about whether your on-premises datacentres experience fewer outages than AWS.

Posted in General | Comments Off on AWS Surprises – One Datacentre Is Not Enough

AWS Surprises – Choose a configuration from a menu

On the surface, there is no surprise here, AWS offers a list of services, and you order what you want from the list. But the devil is always in the detail, or the operational consequence. This actual AWS surprise came when I first played with EC2 instances and looked at changing the configuration of an existing EC2 instance. One does not simply add 4GB of RAM to an instance. The sizes of EC2 instances are fixed by AWS; you choose a size option from the list. For each EC2 instance family, there is a fixed relationship between the number of CPU cores and the amount of RAM. Within the family, there are fixed sizes; most often, the next size up is exactly twice as much resource in each dimension. To get more RAM in an existing EC2 instance, you either double the size of the instance or choose a size from a whole new instance family. The M5 family has 4GB per core, so an M5.Large has two cores and 8GB, while an M5.24XLarge has 96 cores and 384GB of RAM. From the M5.Large ($0.12 per hour in Sydney), the next size up is M5.XLarge, with four cores and 16GB of RAM it is exactly twice the size of an M5.Large and twice the cost per hour at $0.24 per hour. That is a large increase in price if my application only wants 4GB more RAM. I am probably better off changing to an R5.Large, which has two cores and 16GB of RAM and will only cost me $0.15 per hour in Sydney. The R5 series is more RAM heavy; the R5.24XLarge has 96 cores and 768GB of RAM. It is not just the CPU and RAM that are fixed per instance; the available network bandwidth is related to the size and family of instance. Ephemeral local storage called Instance Store is also fixed per instance size, and most instance families don’t even have Instance Store.

While there are a few dozen instance families and a few hundred possible combinations of family and size, for any given application, there will only be a small selection that are suitable. Choosing the wrong compromise of instance resources and cost will seriously affect the viability of your application on AWS. Make sure you don’t simply consider doubling the size of an EC2 instance, choosing another instance family might be a better option. Just remember that you cannot change the resources separately, you can only select an EC2 configuration from the menu.

Posted in General | Comments Off on AWS Surprises – Choose a configuration from a menu

Zoom Mute/Unmute Using Stream Deck

Zoom is the “new” way we are all doing meetings. Whether it is last week attending Cloud Field Day presentations or this week teaching AWS training, Zoom is the constant for meetings. With Zoom, microphone control is essential. You want to be able to interact immediately but don’t want to interrupt when you need to cough or swear because you spilled your drink on the desk. The result is that we enter Zoom meetings muted and only unmute when we have something to say. The simplest way is to hold down the space bar; if Zoom has focus, then your mic is unmuted while you hold down the spacebar. The obvious problem is where you need to use another application while you are in Zoom, then you hold down the space bar and stay muted as happened to me at least once last week.

Continue reading

Posted in General | Comments Off on Zoom Mute/Unmute Using Stream Deck

Hardware Offloads, Not Everything Is x86

While software is busy eating the world, we do still need hardware to run that software. One of the things that we are learning is that an x86 processor is not always the best way to solve every computing problem. The most obvious demonstration is the absence of x86 based smartphones; there have been a couple of attempts but nothing successful. Of course, mobile is a very different use case to the data center, and most data centers are full of x86 based servers. What we are seeing is that the x86 CPU in these servers is being supplemented by increasing numbers of specialized processors that handle functions that are better suited to different processor architectures. The first was network cards, NICs that could offload a lot of the computing functions for handling ethernet and TCP packets. Rather than tying up an x86 CPU core for every 1-2GBps of network throughput, a powerful NIC managing the network allows 10GB and even 100GB ethernet to be utilized without saturating the main CPU. We have also seen GPUs being added to servers for some workloads, in particular workloads that suit parallel compute with moderate amounts of data. Another type of offload is computational storage from NGD systems, which uses additional ARM core inside SSDs to process data inside the SSD. Computational storage offload seems to be the reverse of GPU offload, huge data but not as much compute demand, although, with a lot of NGD SSDs, those ARM cores do add up. We have also seen more consolidated offload for virtualization with Amazon’s Nitro architecture offloading network, storage, and server management into a custom add-in card. What is clear is that general-purpose CPUs are not the right solution for every computational task.

The AWS Nitro card appears to have a cousin in the Pensando Distributed Services Card, which seems to be the hardware magic that delivers Pensando software-defined services. The Pensando web site talks a lot about software-defined edge services. I believe the edge that they mean is a telco point of presence, what used to be a telephone exchange, but is now really a datacenter close to the telco’s subscribers. It does appear that the target customer is a cloud provider or telco that delivers cloud-like services, lots of networking and security. The front page of Pensando’s web site suggested to me that this might be a platform for building business applications, it appears to be more for building network applications. Next week I will hear more detail from Pensando at Cloud Field Day 7, join me for the live stream, or catch up with the videos afterward.

Posted in General | Comments Off on Hardware Offloads, Not Everything Is x86

AWS Surprises – AWS is a developer enablement platform

This AWS surprise took a while for me to understand, and I don’t think I’m alone in not catching on at first. Some of my early thinking about the public cloud was that it was merely a replacement for on-premises virtualization. The value proposition was that cloud providers could rack, stack, and operate servers more efficiently than any on-premises enterprise. We saw public cloud platforms offered by HP, Cisco, and even VMware through partners, which were centered on offering VMs as a service. With that in mind, let’s then step back and look at the first two services that AWS offered: SQS and S3. Neither service offers a VM; both are simply reusable components for building an application. The audience for these services is not IT infrastructure. The audience is developers. Also, think about how shadow IT on AWS developed. It was not infrastructure teams looking for places to run VMs; it was dev teams looking for resources to deliver their applications. The primary purpose of AWS is to enable developers to build applications that reside on AWS. Most of the AWS services are built to enable applications to be developed rapidly and with most of the effort spent on the features and functionality that are specific to the customer business. This is why Cisco and HP both got out of providing their own public cloud platform because they only offered VMs to infrastructure teams and not application services to developers. Customers didn’t simply want a faster way to get VMs; they wanted a faster way to build applications that deliver business value. AWS remains a developer enablement platform.

Posted in General | Comments Off on AWS Surprises – AWS is a developer enablement platform

Cloud Field Day 7 – Who will be presenting?

It’s April already, and that means I will be attending Cloud Field Day 7 this month. Usually, this means a trip to the US and a few non-stop days of jetlag and learning. Due to the travel bans currently in place, Cloud Field Day 7 will be an online-only event, so I won’t be jetlagged, just up early, although not as early as Justin Warren. There is an interesting list of presenters, as usual, there are some familiar faces and some new companies.

Covid-19 responses mean that some of these presenters will be at other Tech Field Day events, the final lineup is still subject to change.

Aruba has been a frequent presenter at Tech Field Day events since 2012, mostly at Network and Wireless Field Days, as well as hosting Tech Field Day at their own conference AirHeads. It looks like their recent past presentations have been around modern wireless networks and SD-WAN. I imagine Aruba will talk about hybrid cloud connectivity, probably with an emphasis on policy-based management. Aruba has its own Tech Field Day event, Aruba Round Table with Tech Field Day.

Igneous presented at Tech Field Day 12, quite a few years ago. It looks like their focus has changed a little, now they are about data protection and management. I am interested in how they achieve petabyte-scale data protection and what tangible business value they deliver with DataDiscover, which is their data indexing/management product. Igneous will now present at Cloud Field Day 8.

Illumio has presented at both Network and Security Field Day events over the past few years. Illumio has a set of products that are about network microsegmentation and policy-based control of network traffic. I wonder whether we will see some of their special sauce applied to Kubernetes as a replacement for Envoy in a service mesh. I’m also interested in hearing about unified policy-based network security management in a multi-cloud environment and hope Illumio will have something to say.

Pensando is making their first Tech Field Day appearance, so I can only judge from their web site. The web site is full of buzz words like edge and 5G. I was a bit concerned about the lack of any concrete information until I found that there is custom hardware in their software-defined services. It looks like part of the solution is a PCIe card that is similar to the AWS Nitro card, delivering hardware-accelerated storage and network functions virtualization. The web site talks about software-defined service es at the edge. I want to hear about the operational deployment of this platform and the developer experience for creating applications that run on the platform.

SolarWinds are old friends for Tech Field Day, and for me,  I’ve been to the SolarWinds HQ in Austin for at least two Tech Field Day events and always get good barbeque for lunch when we visit. With the breadth of management tools that SolarWinds has, it is hard to know what they will be showing us.

Stellus is new to Tech Field Day this year, having presented at Storage Field Day 19 in January. Their product is a ridiculously high-performance NAS. A cursory look at their web site shows that they separate the persistent storage layer from the throughput/data mover layer. Capacity runs to over 1PB while throughput is up to 80GBps with the two numbers scaling independently. I will be interested to hear what the cloud angle is on a hardware product, maybe we will learn about a whole new product.

VMware is also an old friend of Tech Field Day, with 19 event badges on their page, including the very first Tech Field Day event. I want to hear more about Tanzu and the suite of products for multi-cloud application deployment and management.

Posted in General | Comments Off on Cloud Field Day 7 – Who will be presenting?

AWS Surprises – No VMotion

In my on-premises VMware experience, VMotion was a game-changing technology, so I was very surprised to find that there is no equivalent for EC2 instances on AWS. The basic premise with VMotion is that it divorces a Virtual Machine (VM) from the underlying physical server. VMware’s vSphere goes even further, using VMotion to provide mobility within a cluster of physical hosts and abstract away the individual hosts into a cluster. On AWS, the VM service is EC2, and it offers no way to move an EC2 instance (VM) to another physical host without powering the instance off. The crucial architectural difference here is that vSphere wants us to stop thinking about individual physical servers, and AWS wants us to stop thinking about individual VMs. On-premises it is common to have a single VM that offers a service, i.e., this is the CRM server. The CRM server VM is critical and must remain operational at all times, so we want to migrate a running VM to a new physical server. On AWS, we build services, rather than individual servers, the service should remain operational even if one of its servers has a performance problem or an outage. Rather than one single server for CRM, we might have five servers in EC2 instance and a load balancer to deliver the CRM application. If one instance is overloaded or fails, the load balancer uses the instances that are still operational. When we use EC2 autoscaling, the instances are created automatically and can even be destroyed automatically if they fail. A single EC2 instance is a disposable resource, so there is no need to migrate one between physical servers. This disposability of the compute resource is a common characteristic with cloud-native applications. If you are looking for VMotion on AWS, then you are probably building or bringing a legacy architecture in the public cloud. Aim to move away from on-premises style architecture as soon as possible in your public cloud journey.

Posted in General | Comments Off on AWS Surprises – No VMotion

AWS Surprises – Change is Constant

Being comfortable with change is an integral part of a career in IT, I like the saying that “if you don’t like change, you will have to accept being irrelevant.” The thing is that the rates of change are very variable. VMware releases a new major product version every three to four years for each product. Between major version releases, there are usually minor version releases every year or two, and between those are point, or update releases every few months. As a VMware trainer, I might teach a particular version of a course for a year or more. Similarly, customers would run a particular release, with a fixed feature set, for years.

On AWS, (surprise) there are no exposed versions. Products update and gain new features every week. In fact, AWS publishes a weekly newsletter of the new features and capabilities in the last week, and there are usually at least 20 items each week. For me, this means that courses are also changing constantly, about every two months there is a new release of Architecting on AWS with new slides. Every second week there is an update to the labs because of AWS console changes for one service or another. Because of these changes, the courses (and certifications) tend to focus on basic principles rather than details such as speeds and specifications.

With all of these new services, new features, and new capabilities, there is change all of the time. In fact, I like to paraphrase Werner Vogels with “Everything changes, all the time.” Another perspective might be the growth in AWS service numbers. When I first attended AWS training in about 2014, there were 42 services; as of early 2020, there are over 200 services.  There is no way for a single person to stay completely up to date on every AWS service, or even know every aspect of any significant service. A result is that on AWS, more than any other platform, knowing where to find the answer is more important than knowing the right answer since the answer often changes. There is a psychological shift in not expecting to know the details off the top of your head but expecting to look up the answer. AWS expertise is not about knowing every product fact and feature.  There is another angle, too; architectural design patterns change, so you should be slow to judge old architectures. As an example, before the Transit Gateway service, it was very painful to join a lot of VPCs together into a routed network using only VPC peering. As soon as Transit Gateway was released, it became the standard for new VPC connectivity. However, the older networks using peering and routers in EC2 instances did not disappear because they still work.

When everything changes all of the time, you need to check your assumptions and validate whether you should keep on doing what you have always done.

Posted in General | Comments Off on AWS Surprises – Change is Constant

Vendor Briefing – Retrospect

I associate Retrospect as an end-point backup solution and have the Dantz brand attached in my head. I am about a decade out of date on both the product and the company. After a changing ownership a few times (including EMC and Roxio ownership), Retrospect is now owned by StorCentric, along with Drobo, Vexata, and Nexsan. Retrospect is now able to protect servers as well as end-point devices such as laptops and desktops as well as use public cloud as a destination and a SaaS management console. This month Retrospect announced new versions of both their Retrospect Backup and Retrospect Virtual products. JG Heithcock briefed me about both the company and the updates. StorCentric has assembled a portfolio of storage from high-end NVMe all-flash to SMB focussed, with Retrospect in the SMB data protection category.

One of the key new features in both Retrospect Backup 17, and Retrospect Virtual 2020 is the simple onboarding, essentially a single, Internet-accessible, URL for deploying a pre-configured agent and license. Simple onboarding is essential for end-point protection, where a laptop may never connect to the corporate LAN and so cannot get easily get updates from the on-premises corporate servers. For on-premises resources such as servers and desktops, the simple onboarding can integrate with your chosen software deployment tool.

I like the SaaS console to manage across multiple Retrospect servers, although complete management is still available at each server. The web console provides a holistic view of your data protection status for the entire organization. I also like that restores can happen from local storage on the Retrospect server or from lower-cost storage on a public cloud. Licensing is flexible, either a monthly subscription covering all version updates or a perpetual license for a specific version.

You can also read what Dan Frith wrote about the Retrospect announcement.

Posted in General | Comments Off on Vendor Briefing – Retrospect

A Phase Complete, Learning about Cohesity

Today marks the end of me documenting my journey of documenting my learning about Cohesity, so I thought it might be useful to recap some of the things I learned. Probably the most significant thing is that simplicity is the ultimate sophistication. Making a product that is easy to use for complex requirements requires focus; it is easy to get caught up in the minute details and end up missing the ease of use. With Cohesity, I found that features are easy to use, and the amount of time I spent with the Cohesity console was less than I expected. I liked that I could reuse the Protection Policies across different data sources. Even restores are simple due to the universal search feature, especially helpful when users only know the name of a file, not where the directory where they saved it before deleting their important version. I also found a lot of breadth in Cohesity; for a single product company, the product does a lot fo different things. Data protection, as well as data storage with protection. Protection for VMs, SaaS (Office365), and protection for physical servers. I barely scratched the surface of using the Public Cloud with Cohesity since I only used AWS as storage expansion for my Cohesity cluster. I haven’t done their migration from on-premises, DevOps integration or DR to the cloud. You can find all of the videos and blog posts about my Cohesity learning experience on my Cohesity page.

Posted in General | Comments Off on A Phase Complete, Learning about Cohesity