AWS Design – Automate your environment

In a long distant former life, I looked after a farm of around a hundred Citrix servers. It was so long ago that they were physical servers, and we built or rebuilt servers following dozens of pages of written instructions. You can imagine that there were plenty of helpdesk calls for faults in the build of individual servers. This environment typifies the “handcrafted perfection” that Enterprise IT operations teams had to build. Even back in 2001, I created an automated build process of these servers to avoid manual builds. Manual processes work at the speed of humans, and they are full of human errors. To work faster and smarter, we need methods that are protected from human errors. Operational processes must be executed precisely the same every time. With manual processes, you must get it right every time. With automation, you only need to get it right once. Automating build and deployment is an excellent start as it will deliver consistent and reliable infrastructure on demand.

Ideally, infrastructure automation should be more like software development and use a declarative configuration service to implement the specification in a design file. The design file is the source code for the built environment and is stored in version control, just like the source code for the application. Once you can deploy an environment automatically, the environment is automatically recreatable. You might recreate an environment that broke rather than troubleshoot the problem. You might recreate rather than restore from backups. You should also create a copy for testing, for both changes to the environment and changes to the deployed software.

With deployment automated, it becomes easy to build an environment for testing, then dispose of that environment when the testing is complete. I don’t mean at the end of the project, but at the end of the test. When we want to implement Continuous Integration and Continuous Delivery or Deployment (CI/CD), we need to run tests for each code change made by a developer. Without build automation, these tests simply cannot happen at the pace demanded by CI/CD methodologies. There is a strong argument to be made for keeping the infrastructure design file alongside the application source code, a single source location for both the application and the infrastructure that the application requires.

Infrastructure automation aims to increase the consistency and velocity of operations. Once the infrastructure lifecycle is automated, you unlock the ability to automate application lifecycles. With the ability to innovate in applications safely, business agility is easier and more reliable, providing greater value for overall IT spend.

Posted in General | Comments Off on AWS Design – Automate your environment

Build Day TV – AWS Networking Fundamentals

If you are just starting out with AWS, you might find the networking a little different from what you are used to on-premises. Take a look at this video series we recently ran on Build Day Live; it was the first series of Build Day TV episodes. Most of the episodes are in two parts, a theoretical video, and a hands-on demonstration.

AWS VPC networking Basics Series

  1. VPC Fundamentals
  2. VPC Fundamentals – Hands on Demo
  3. VPC Firewalls
  4. VPC Firewalls – Hands on Demo
  5. Elastic Load Balancing
  6. Elastic Load Balancing – Hands on Demo
  7. VPC High Availability
  8. VPC High Availability – Hands on Demo
  9. Linking VPCs in AWS
  10. Linking VPCs in AWS – Hands on Demo
  11. VPC to on-premises VPN
  12. VPC to on-premises VPN – Hands on Demo
  13. Transit Gateway
  14. Transit Gateway – Hands on Demo
  15. VPC to on-premises Direct Connect
  16. VPC to on-premises Direct Connect – Hands on Demo
  17. Controlling VPC costs

Build Day TV is regularly published video episodes, usually a series of episodes on a single topic. The second series was our coverage of the Oracle Cloud VMware Service (OCVS). The latest series is about the VMware SD-WAN solution, formerly known as Velocloud.

Posted in General | Comments Off on Build Day TV – AWS Networking Fundamentals

AWS Design – Enable Scalability

One of the defining capabilities of public cloud is elasticity, the ability to use more or less resource over time to meet the load requirements of your application. When your application is quiet, you should consume and pay for fewer resources than when your application is busy. Not all AWS services have scalability built-in, many services require that you manage your own scalability. Managed services like Lambda and Fargate, mange capacity for you, delivering the resources that are required for your workload. More lightly managed services, such as EC2 and RDS, leave scalability up to you, although they may provide tooling like autoscaling you can use.

Continue reading

Posted in General | Comments Off on AWS Design – Enable Scalability

Ten Design Principles on AWS

Having previously looked at some surprises I discovered as I learned about AWS, I’m going to take a look at some of the basic architectural design principles on AWS. As in the last series, there will be blog posts for each principle that go into some basic details. Here are the ten principles:

  1. Enable scalability: What happens if demand increases? Or doesn’t increase? What if demand goes up and down over time?
  2. Automate your environment: Computers are good at doing things the same every time, humans are not
  3. Use disposable resources: “Everything fails, all the time” Werner Vogels. Replace broken things with brand new things, rather than spend a lot of time fixing them.
  4. Loosely couple your components: When one element of your application changes or has an issue, the rest of the application should still work.
  5. Design services, not servers: An EC2 instance should not be a single point of failure. Use several instances and a load balancer or a queue.
  6. Choose the right database solutions: I don’t mean Microsoft SQL Server vs Oracle. I mean use the right database for the data you need to store, some will be better in non-relational databases.
  7. Understand your single points of failure: There are always SPOFs, make sure you know where they are and try to eliminate as many as possible.
  8. Optimize for cost: Your AWS bill will arrive every month & you will pay for what you use. Make sure you are getting value for every dollar spent on that AWS bill.
  9. Use caching: Your data is not all of the same value or location, nor are resources of the same cost. Caching uses small amounts of resource that is fast or close.
  10. Secure your infrastructure at every layer: “Dance like nobody’s watching, encrypt like everyone is” Werner Vogels. By now, we should all understand that defense in depth is the only viable strategy.
Posted in General | Comments Off on Ten Design Principles on AWS

AWS Surprises – AWS Has Virtually Infinite Resources

Sometimes the AWS surprises are not so much about how AWS is different, but how you design solutions differently on AWS than on-premises. One of the significant differences is that you have a near-infinite amount of resources available on AWS, while on-premises, you are always aware of a finite resource limit. On-premises your workload must fit inside those limited resources; on AWS, you can rent as much resource as your workload requires. One typical pattern on-premises is to defer reporting or bulk processing until off-peak hours, overnight when the office is empty. The office is never empty at AWS, so you might as well do that reporting or processing right away. The only time you might defer is if the spot price for the EC2 instance you want is too high.

As an example, there are plenty of problems that we solve by using a lot of compute resources to get a timely answer. On-premises we will have a limited quantity of CPU time and RAM, and these resources (servers) have a lifespan of 3-5 years, so more resources that will only be used for part of their life are expensive. On-premises it is common to consume all these limited resources for a long time to complete some complex tasks; we may have to wait hours or days for an answer. On AWS, we rent CPU time and RAM as EC2 instances and pay by the hour for what we use. On AWS, we can scale out just for the duration of the job and use maybe 50x as much resource to get an answer faster. There is no cost difference between using 5 EC2 instances for 100 hours and 250 EC2 instances for two hours, so scaling out massively is an option.

Other near-infinite resources include storage, networking, and even application services. The Simple Storage Service (S3) allows you unlimited storage capacity and only charges you for what you actually store. The VPC network and it’s supporting features such as ELB provide colossal capacity that is available on-demand, and you are billed for consumption, not capacity. Even application services such as the Simple Queue Service (SQS) offer near-unlimited messages per second in a queue and only charge you for the transactions on that queue. There are a lot of AWS services that allow you to draw from a nearly limitless pool of resources and only pay for the resources that you use.

Capacity Is Never Infinite

One caveat is that while AWS has near infinite capacity, there is always a finite amount, and, in some situations, that limited amount may not be as large as you might hope. When you start deploying unusual and new EC2 instance types, and particularly when you use them in their largest configurations, you may get Insufficient Compute Capacity Errors (ICCE, pronounced ice). Remember that each EC2 family and generation runs on its own dedicated physical servers, M5 instances only run on M5 servers, which in turn only run M5 instances. The larger the size within the family and the more instances you request, the more previously unused capacity is required. So, if you decide to deploy a cluster of six X1e.32XLarge across three availability zones, you may find that one of those AZs does not have two whole X1e hosts to dedicate to your cluster immediately. Hopefully, you have a good relationship with your local AWS team and can get this information before it causes you a problem. They may suggest that you use smaller instances and more of them, or that you will have a better result with a different region or a different EC2 family.

If you had on-demand access to a virtually infinite amount of computing resources, how would your IT and business operate differently? On AWS, resources are available, and you pay for what you use each month. To get the best out of AWS, you should deploy the resources you need as you need them, and cast off the implicit implication of purchased on-premises IT.

Posted in General | 2 Comments

AWS Surprises – You still need infrastructure architecture on AWS

It is a popular idea that “the cloud means I don’t have to care” however, nothing could be further from the truth. It isn’t really an AWS Surprise to me that infrastructure architecture is still essential for many customers on AWS. Naturally, there are many infrastructure elements that AWS manages; You don’t need to worry about racking and cabling servers or power and cooling. You do still need to choose VM resources (EC2 instance families and sizes) for each application component. You do need to design the network connectivity and isolation when you put together a VPC. Applications that ran on-premises, which you migrate to AWS, will require cloud infrastructure that replicates the on-premises infrastructure.

Similarly, applications built to on-premises architectures will require similar infrastructure on AWS. On-premises infrastructure architects can augment their skills to design infrastructure on AWS. Like any new platform, you will need to learn the capabilities and limitations of the AWS platform. You can find a few of the things I learned on my AWS Surprises page. One thing to prepare for: moving up the stack. Expect to learn more about application and integration architecture as the infrastructure becomes more of a commodity.

No Infrastructure

Not everything on AWS requires conventional infrastructure; more serverless application components mean less infrastructure. It is entirely possible to build large and complex applications on AWS without requiring a single EC instance or subnet. Services like Lambda, DynamoDB, API Gateway, and you can even assemble older services like S3, SQS, and SNS into a microservices-based application without a single VM. These services do not exist in on-premises enterprise datacentres. Only applications developed specifically on AWS will use these services. With a fully serverless application, there is a large amount of application architecture to design rather than infrastructure architecture.

Assumed Infrastructure

One thing to watch for is elements that are provided by on-premises infrastructure that are not automatically delivered by AWS. One example is data protection for backup/recovery, compliance, and disaster recovery. On AWS, these capabilities must be added to or configured for the services, where on-premises, they are often just a fundamental part of the infrastructure. Even if there is no infrastructure to design to support functional requirements, often there are non-functional requirements that the infrastructure team would usually handle.

Posted in General | Comments Off on AWS Surprises – You still need infrastructure architecture on AWS

New Zealand Is like the Boy in a Bubble

You may have seen the new, New Zealand has no active COVID-19 cases, the coronavirus has been eliminated from New Zealand. As of Monday, 8 June, the last infected person had recovered, and it has been over three weeks since the last new case was diagnosed. We have moved from having some of the strictest lockdown rules to totally relaxed, at least within the country. There is almost no risk of COVID-19 transmission inside New Zealand, so we are now protecting ourselves at the border. Anybody arriving in New Zealand is subject to a two-week, government-controlled, quarantine and a COVID test. We have very little immunity to COVID in New Zealand, only 1,100 or so confirmed cases out of five million people. We now live in a bubble, surrounded by countries that still have active transmission, and any breach of our bubble will cause us to go back to lockdown. We will not be safe to leave the bubble until other counties eliminate COVID or a vaccine is widespread.

Continue reading

Posted in General | 1 Comment

I Want Network Integration, I’m Not Getting It

I like having consistent management interfaces and having a single operational model across as much of my IT estate as possible. I don’t like point solutions that function or are managed differently; they add up to more problems. With this in mind, I would like to see far deeper network integration between AWS and VMware Cloud on AWS (VMC) even though I know why I won’t get this integration for a while. At Cloud Field Day 7, we had two sessions that focussed on network connectivity between AWS (AWS presentation) and VMC (VMware presentation); neither said it works the same as everything else they offer.

Continue reading

Posted in General | Comments Off on I Want Network Integration, I’m Not Getting It

AWS Surprises – One Datacentre Is Not Enough

Most on-premises IT infrastructure designs treat a datacentre as a highly available platform, having an entire datacentre off-line is a disaster. It is a bit of a surprise then that AWS recommends we treat a datacentre as a failure domain and plan to keep our applications operational even if a datacentre fails. AWS doesn’t actually expose individual datacentres in its services; they present Availability Zones. An Availability Zone (AZ) is the smallest area we can usually select for running applications on AWS and is made up of one or more datacentres that are very close together. As far as customers are concerned, we treat an AZ like one datacentre. The EC2 service, and its storage EBS, is scoped at the AZ; an EC2 instance in one AZ cannot be powered on in another AZ. AWS recommends that we have multiple EC2 instances spread across multiple AZs for high availability because an AZ or an AZ scoped service can fail.  If you take a look at the AWS Post Event Summaries page you will see events where specific services were unavailable; usually the EC2 or EBS events impacted only a single AZ.

Multi-AZ is a standard design practice for production applications on AWS, DR is usually considered for region to region failure. Failover between AZs is part of the application design, usually with scale-out EC2 for compute and a decoupling service like a load balancer or queue that is regionally scoped. The regionally scoped service continues to operate even when one AZ fails, allowing the surviving EC2 instances to keep delivering application services. The ability to scale-out to provide HA is a part of the application design, rather than a feature of the infrastructure.

The equivalent design practice on-premises is a highly redundant virtualization platform in a single datacentre, DR is used to recover to another datacentre. All of the redundancy and availability of the virtualization layer is invisible to the application, which is often even unaware of a DR failover other than as an outage before regular service is restored. There are on-premises designs that have storage and hypervisor clusters that span multiple datacentres with the equivalent scope of AWS multi-AZ. These Metro-Cluster solutions are usually very expensive and used only for highly critical applications. Metro-Cluster places all of the failover awareness and functionality in the infrastructure; applications are generally still unaware of the failover.

On AWS, a single datacentre is not enough for any production application deployment. Deploying highly available applications on AWS requires that the application be designed with the awareness of the AWS infrastructure. Cloud-native applications are designed with an awareness of the limitations of cloud-native infrastructure. Enterprise applications deployed on enterprise infrastructure expect perfect reliability from the infrastructure. Take a moment to look back at the Post Event Summaries page, think about the number of datacentres AWS operates (currently 76 AZs), and then think about whether your on-premises datacentres experience fewer outages than AWS.

Posted in General | Comments Off on AWS Surprises – One Datacentre Is Not Enough

AWS Surprises – Choose a configuration from a menu

On the surface, there is no surprise here, AWS offers a list of services, and you order what you want from the list. But the devil is always in the detail, or the operational consequence. This actual AWS surprise came when I first played with EC2 instances and looked at changing the configuration of an existing EC2 instance. One does not simply add 4GB of RAM to an instance. The sizes of EC2 instances are fixed by AWS; you choose a size option from the list. For each EC2 instance family, there is a fixed relationship between the number of CPU cores and the amount of RAM. Within the family, there are fixed sizes; most often, the next size up is exactly twice as much resource in each dimension. To get more RAM in an existing EC2 instance, you either double the size of the instance or choose a size from a whole new instance family. The M5 family has 4GB per core, so an M5.Large has two cores and 8GB, while an M5.24XLarge has 96 cores and 384GB of RAM. From the M5.Large ($0.12 per hour in Sydney), the next size up is M5.XLarge, with four cores and 16GB of RAM it is exactly twice the size of an M5.Large and twice the cost per hour at $0.24 per hour. That is a large increase in price if my application only wants 4GB more RAM. I am probably better off changing to an R5.Large, which has two cores and 16GB of RAM and will only cost me $0.15 per hour in Sydney. The R5 series is more RAM heavy; the R5.24XLarge has 96 cores and 768GB of RAM. It is not just the CPU and RAM that are fixed per instance; the available network bandwidth is related to the size and family of instance. Ephemeral local storage called Instance Store is also fixed per instance size, and most instance families don’t even have Instance Store.

While there are a few dozen instance families and a few hundred possible combinations of family and size, for any given application, there will only be a small selection that are suitable. Choosing the wrong compromise of instance resources and cost will seriously affect the viability of your application on AWS. Make sure you don’t simply consider doubling the size of an EC2 instance, choosing another instance family might be a better option. Just remember that you cannot change the resources separately, you can only select an EC2 configuration from the menu.

Posted in General | Comments Off on AWS Surprises – Choose a configuration from a menu