Demitasse

AWS Surprises – AWS Has Virtually Infinite Resources

Posted on July 15, 2020 by Alastair

Sometimes the AWS surprises are not so much about how AWS is different, but how you design solutions differently on AWS than on-premises. One of the significant differences is that you have a near-infinite amount of resources available on AWS, while on-premises, you are always aware of a finite resource limit. On-premises your workload must fit inside those limited resources; on AWS, you can rent as much resource as your workload requires. One typical pattern on-premises is to defer reporting or bulk processing until off-peak hours, overnight when the office is empty. The office is never empty at AWS, so you might as well do that reporting or processing right away. The only time you might defer is if the spot price for the EC2 instance you want is too high.

As an example, there are plenty of problems that we solve by using a lot of compute resources to get a timely answer. On-premises we will have a limited quantity of CPU time and RAM, and these resources (servers) have a lifespan of 3-5 years, so more resources that will only be used for part of their life are expensive. On-premises it is common to consume all these limited resources for a long time to complete some complex tasks; we may have to wait hours or days for an answer. On AWS, we rent CPU time and RAM as EC2 instances and pay by the hour for what we use. On AWS, we can scale out just for the duration of the job and use maybe 50x as much resource to get an answer faster. There is no cost difference between using 5 EC2 instances for 100 hours and 250 EC2 instances for two hours, so scaling out massively is an option.

Other near-infinite resources include storage, networking, and even application services. The Simple Storage Service (S3) allows you unlimited storage capacity and only charges you for what you actually store. The VPC network and it’s supporting features such as ELB provide colossal capacity that is available on-demand, and you are billed for consumption, not capacity. Even application services such as the Simple Queue Service (SQS) offer near-unlimited messages per second in a queue and only charge you for the transactions on that queue. There are a lot of AWS services that allow you to draw from a nearly limitless pool of resources and only pay for the resources that you use.

Capacity Is Never Infinite

One caveat is that while AWS has near infinite capacity, there is always a finite amount, and, in some situations, that limited amount may not be as large as you might hope. When you start deploying unusual and new EC2 instance types, and particularly when you use them in their largest configurations, you may get Insufficient Compute Capacity Errors (ICCE, pronounced ice). Remember that each EC2 family and generation runs on its own dedicated physical servers, M5 instances only run on M5 servers, which in turn only run M5 instances. The larger the size within the family and the more instances you request, the more previously unused capacity is required. So, if you decide to deploy a cluster of six X1e.32XLarge across three availability zones, you may find that one of those AZs does not have two whole X1e hosts to dedicate to your cluster immediately. Hopefully, you have a good relationship with your local AWS team and can get this information before it causes you a problem. They may suggest that you use smaller instances and more of them, or that you will have a better result with a different region or a different EC2 family.

If you had on-demand access to a virtually infinite amount of computing resources, how would your IT and business operate differently? On AWS, resources are available, and you pay for what you use each month. To get the best out of AWS, you should deploy the resources you need as you need them, and cast off the implicit implication of purchased on-premises IT.

Posted in General | 2 Comments

AWS Surprises – You still need infrastructure architecture on AWS

Posted on June 30, 2020 by Alastair

It is a popular idea that “the cloud means I don’t have to care” however, nothing could be further from the truth. It isn’t really an AWS Surprise to me that infrastructure architecture is still essential for many customers on AWS. Naturally, there are many infrastructure elements that AWS manages; You don’t need to worry about racking and cabling servers or power and cooling. You do still need to choose VM resources (EC2 instance families and sizes) for each application component. You do need to design the network connectivity and isolation when you put together a VPC. Applications that ran on-premises, which you migrate to AWS, will require cloud infrastructure that replicates the on-premises infrastructure.

Similarly, applications built to on-premises architectures will require similar infrastructure on AWS. On-premises infrastructure architects can augment their skills to design infrastructure on AWS. Like any new platform, you will need to learn the capabilities and limitations of the AWS platform. You can find a few of the things I learned on my AWS Surprises page. One thing to prepare for: moving up the stack. Expect to learn more about application and integration architecture as the infrastructure becomes more of a commodity.

No Infrastructure

Not everything on AWS requires conventional infrastructure; more serverless application components mean less infrastructure. It is entirely possible to build large and complex applications on AWS without requiring a single EC instance or subnet. Services like Lambda, DynamoDB, API Gateway, and you can even assemble older services like S3, SQS, and SNS into a microservices-based application without a single VM. These services do not exist in on-premises enterprise datacentres. Only applications developed specifically on AWS will use these services. With a fully serverless application, there is a large amount of application architecture to design rather than infrastructure architecture.

Assumed Infrastructure

One thing to watch for is elements that are provided by on-premises infrastructure that are not automatically delivered by AWS. One example is data protection for backup/recovery, compliance, and disaster recovery. On AWS, these capabilities must be added to or configured for the services, where on-premises, they are often just a fundamental part of the infrastructure. Even if there is no infrastructure to design to support functional requirements, often there are non-functional requirements that the infrastructure team would usually handle.

Posted in General | Comments Off

New Zealand Is like the Boy in a Bubble

Posted on June 17, 2020 by Alastair

You may have seen the new, New Zealand has no active COVID-19 cases, the coronavirus has been eliminated from New Zealand. As of Monday, 8 June, the last infected person had recovered, and it has been over three weeks since the last new case was diagnosed. We have moved from having some of the strictest lockdown rules to totally relaxed, at least within the country. There is almost no risk of COVID-19 transmission inside New Zealand, so we are now protecting ourselves at the border. Anybody arriving in New Zealand is subject to a two-week, government-controlled, quarantine and a COVID test. We have very little immunity to COVID in New Zealand, only 1,100 or so confirmed cases out of five million people. We now live in a bubble, surrounded by countries that still have active transmission, and any breach of our bubble will cause us to go back to lockdown. We will not be safe to leave the bubble until other counties eliminate COVID or a vaccine is widespread.

Continue reading →

Posted in General | 1 Comment

I Want Network Integration, I’m Not Getting It

Posted on June 4, 2020 by Alastair

I like having consistent management interfaces and having a single operational model across as much of my IT estate as possible. I don’t like point solutions that function or are managed differently; they add up to more problems. With this in mind, I would like to see far deeper network integration between AWS and VMware Cloud on AWS (VMC) even though I know why I won’t get this integration for a while. At Cloud Field Day 7, we had two sessions that focussed on network connectivity between AWS (AWS presentation) and VMC (VMware presentation); neither said it works the same as everything else they offer.

Continue reading →

Posted in General | Comments Off

AWS Surprises – One Datacentre Is Not Enough

Posted on May 22, 2020 by Alastair

Most on-premises IT infrastructure designs treat a datacentre as a highly available platform, having an entire datacentre off-line is a disaster. It is a bit of a surprise then that AWS recommends we treat a datacentre as a failure domain and plan to keep our applications operational even if a datacentre fails. AWS doesn’t actually expose individual datacentres in its services; they present Availability Zones. An Availability Zone (AZ) is the smallest area we can usually select for running applications on AWS and is made up of one or more datacentres that are very close together. As far as customers are concerned, we treat an AZ like one datacentre. The EC2 service, and its storage EBS, is scoped at the AZ; an EC2 instance in one AZ cannot be powered on in another AZ. AWS recommends that we have multiple EC2 instances spread across multiple AZs for high availability because an AZ or an AZ scoped service can fail. If you take a look at the AWS Post Event Summaries page you will see events where specific services were unavailable; usually the EC2 or EBS events impacted only a single AZ.

Multi-AZ is a standard design practice for production applications on AWS, DR is usually considered for region to region failure. Failover between AZs is part of the application design, usually with scale-out EC2 for compute and a decoupling service like a load balancer or queue that is regionally scoped. The regionally scoped service continues to operate even when one AZ fails, allowing the surviving EC2 instances to keep delivering application services. The ability to scale-out to provide HA is a part of the application design, rather than a feature of the infrastructure.

The equivalent design practice on-premises is a highly redundant virtualization platform in a single datacentre, DR is used to recover to another datacentre. All of the redundancy and availability of the virtualization layer is invisible to the application, which is often even unaware of a DR failover other than as an outage before regular service is restored. There are on-premises designs that have storage and hypervisor clusters that span multiple datacentres with the equivalent scope of AWS multi-AZ. These Metro-Cluster solutions are usually very expensive and used only for highly critical applications. Metro-Cluster places all of the failover awareness and functionality in the infrastructure; applications are generally still unaware of the failover.

On AWS, a single datacentre is not enough for any production application deployment. Deploying highly available applications on AWS requires that the application be designed with the awareness of the AWS infrastructure. Cloud-native applications are designed with an awareness of the limitations of cloud-native infrastructure. Enterprise applications deployed on enterprise infrastructure expect perfect reliability from the infrastructure. Take a moment to look back at the Post Event Summaries page, think about the number of datacentres AWS operates (currently 76 AZs), and then think about whether your on-premises datacentres experience fewer outages than AWS.

Posted in General | Comments Off

AWS Surprises – Choose a configuration from a menu

Posted on May 12, 2020 by Alastair

On the surface, there is no surprise here, AWS offers a list of services, and you order what you want from the list. But the devil is always in the detail, or the operational consequence. This actual AWS surprise came when I first played with EC2 instances and looked at changing the configuration of an existing EC2 instance. One does not simply add 4GB of RAM to an instance. The sizes of EC2 instances are fixed by AWS; you choose a size option from the list. For each EC2 instance family, there is a fixed relationship between the number of CPU cores and the amount of RAM. Within the family, there are fixed sizes; most often, the next size up is exactly twice as much resource in each dimension. To get more RAM in an existing EC2 instance, you either double the size of the instance or choose a size from a whole new instance family. The M5 family has 4GB per core, so an M5.Large has two cores and 8GB, while an M5.24XLarge has 96 cores and 384GB of RAM. From the M5.Large ($0.12 per hour in Sydney), the next size up is M5.XLarge, with four cores and 16GB of RAM it is exactly twice the size of an M5.Large and twice the cost per hour at $0.24 per hour. That is a large increase in price if my application only wants 4GB more RAM. I am probably better off changing to an R5.Large, which has two cores and 16GB of RAM and will only cost me $0.15 per hour in Sydney. The R5 series is more RAM heavy; the R5.24XLarge has 96 cores and 768GB of RAM. It is not just the CPU and RAM that are fixed per instance; the available network bandwidth is related to the size and family of instance. Ephemeral local storage called Instance Store is also fixed per instance size, and most instance families don’t even have Instance Store.

While there are a few dozen instance families and a few hundred possible combinations of family and size, for any given application, there will only be a small selection that are suitable. Choosing the wrong compromise of instance resources and cost will seriously affect the viability of your application on AWS. Make sure you don’t simply consider doubling the size of an EC2 instance, choosing another instance family might be a better option. Just remember that you cannot change the resources separately, you can only select an EC2 configuration from the menu.

Posted in General | Comments Off

Zoom Mute/Unmute Using Stream Deck

Posted on April 27, 2020 by Alastair

Zoom is the “new” way we are all doing meetings. Whether it is last week attending Cloud Field Day presentations or this week teaching AWS training, Zoom is the constant for meetings. With Zoom, microphone control is essential. You want to be able to interact immediately but don’t want to interrupt when you need to cough or swear because you spilled your drink on the desk. The result is that we enter Zoom meetings muted and only unmute when we have something to say. The simplest way is to hold down the space bar; if Zoom has focus, then your mic is unmuted while you hold down the spacebar. The obvious problem is where you need to use another application while you are in Zoom, then you hold down the space bar and stay muted as happened to me at least once last week.

Continue reading →

Posted in General | Comments Off

Hardware Offloads, Not Everything Is x86

Posted on April 16, 2020 by Alastair

While software is busy eating the world, we do still need hardware to run that software. One of the things that we are learning is that an x86 processor is not always the best way to solve every computing problem. The most obvious demonstration is the absence of x86 based smartphones; there have been a couple of attempts but nothing successful. Of course, mobile is a very different use case to the data center, and most data centers are full of x86 based servers. What we are seeing is that the x86 CPU in these servers is being supplemented by increasing numbers of specialized processors that handle functions that are better suited to different processor architectures. The first was network cards, NICs that could offload a lot of the computing functions for handling ethernet and TCP packets. Rather than tying up an x86 CPU core for every 1-2GBps of network throughput, a powerful NIC managing the network allows 10GB and even 100GB ethernet to be utilized without saturating the main CPU. We have also seen GPUs being added to servers for some workloads, in particular workloads that suit parallel compute with moderate amounts of data. Another type of offload is computational storage from NGD systems, which uses additional ARM core inside SSDs to process data inside the SSD. Computational storage offload seems to be the reverse of GPU offload, huge data but not as much compute demand, although, with a lot of NGD SSDs, those ARM cores do add up. We have also seen more consolidated offload for virtualization with Amazon’s Nitro architecture offloading network, storage, and server management into a custom add-in card. What is clear is that general-purpose CPUs are not the right solution for every computational task.

The AWS Nitro card appears to have a cousin in the Pensando Distributed Services Card, which seems to be the hardware magic that delivers Pensando software-defined services. The Pensando web site talks a lot about software-defined edge services. I believe the edge that they mean is a telco point of presence, what used to be a telephone exchange, but is now really a datacenter close to the telco’s subscribers. It does appear that the target customer is a cloud provider or telco that delivers cloud-like services, lots of networking and security. The front page of Pensando’s web site suggested to me that this might be a platform for building business applications, it appears to be more for building network applications. Next week I will hear more detail from Pensando at Cloud Field Day 7, join me for the live stream, or catch up with the videos afterward.

Posted in General | Comments Off

AWS Surprises – AWS is a developer enablement platform

Posted on April 8, 2020 by Alastair

This AWS surprise took a while for me to understand, and I don’t think I’m alone in not catching on at first. Some of my early thinking about the public cloud was that it was merely a replacement for on-premises virtualization. The value proposition was that cloud providers could rack, stack, and operate servers more efficiently than any on-premises enterprise. We saw public cloud platforms offered by HP, Cisco, and even VMware through partners, which were centered on offering VMs as a service. With that in mind, let’s then step back and look at the first two services that AWS offered: SQS and S3. Neither service offers a VM; both are simply reusable components for building an application. The audience for these services is not IT infrastructure. The audience is developers. Also, think about how shadow IT on AWS developed. It was not infrastructure teams looking for places to run VMs; it was dev teams looking for resources to deliver their applications. The primary purpose of AWS is to enable developers to build applications that reside on AWS. Most of the AWS services are built to enable applications to be developed rapidly and with most of the effort spent on the features and functionality that are specific to the customer business. This is why Cisco and HP both got out of providing their own public cloud platform because they only offered VMs to infrastructure teams and not application services to developers. Customers didn’t simply want a faster way to get VMs; they wanted a faster way to build applications that deliver business value. AWS remains a developer enablement platform.

Posted in General | Comments Off

Cloud Field Day 7 – Who will be presenting?

Posted on April 3, 2020 by Alastair

It’s April already, and that means I will be attending Cloud Field Day 7 this month. Usually, this means a trip to the US and a few non-stop days of jetlag and learning. Due to the travel bans currently in place, Cloud Field Day 7 will be an online-only event, so I won’t be jetlagged, just up early, although not as early as Justin Warren. There is an interesting list of presenters, as usual, there are some familiar faces and some new companies.

Covid-19 responses mean that some of these presenters will be at other Tech Field Day events, the final lineup is still subject to change.

Aruba has been a frequent presenter at Tech Field Day events since 2012, mostly at Network and Wireless Field Days, as well as hosting Tech Field Day at their own conference AirHeads. It looks like their recent past presentations have been around modern wireless networks and SD-WAN. I imagine Aruba will talk about hybrid cloud connectivity, probably with an emphasis on policy-based management. Aruba has its own Tech Field Day event, Aruba Round Table with Tech Field Day.

Igneous presented at Tech Field Day 12, quite a few years ago. It looks like their focus has changed a little, now they are about data protection and management. I am interested in how they achieve petabyte-scale data protection and what tangible business value they deliver with DataDiscover, which is their data indexing/management product. Igneous will now present at Cloud Field Day 8.

Illumio has presented at both Network and Security Field Day events over the past few years. Illumio has a set of products that are about network microsegmentation and policy-based control of network traffic. I wonder whether we will see some of their special sauce applied to Kubernetes as a replacement for Envoy in a service mesh. I’m also interested in hearing about unified policy-based network security management in a multi-cloud environment and hope Illumio will have something to say.

Pensando is making their first Tech Field Day appearance, so I can only judge from their web site. The web site is full of buzz words like edge and 5G. I was a bit concerned about the lack of any concrete information until I found that there is custom hardware in their software-defined services. It looks like part of the solution is a PCIe card that is similar to the AWS Nitro card, delivering hardware-accelerated storage and network functions virtualization. The web site talks about software-defined service es at the edge. I want to hear about the operational deployment of this platform and the developer experience for creating applications that run on the platform.

SolarWinds are old friends for Tech Field Day, and for me, I’ve been to the SolarWinds HQ in Austin for at least two Tech Field Day events and always get good barbeque for lunch when we visit. With the breadth of management tools that SolarWinds has, it is hard to know what they will be showing us.

Stellus is new to Tech Field Day this year, having presented at Storage Field Day 19 in January. Their product is a ridiculously high-performance NAS. A cursory look at their web site shows that they separate the persistent storage layer from the throughput/data mover layer. Capacity runs to over 1PB while throughput is up to 80GBps with the two numbers scaling independently. I will be interested to hear what the cloud angle is on a hardware product, maybe we will learn about a whole new product.

VMware is also an old friend of Tech Field Day, with 19 event badges on their page, including the very first Tech Field Day event. I want to hear more about Tanzu and the suite of products for multi-cloud application deployment and management.

Posted in General | Comments Off

Capacity Is Never Infinite

No Infrastructure

Assumed Infrastructure

Past posts