One of the defining capabilities of public cloud is elasticity, the ability to use more or less resource over time to meet the load requirements of your application. When your application is quiet, you should consume and pay for fewer resources than when your application is busy. Not all AWS services have scalability built-in, many services require that you manage your own scalability. Managed services like Lambda and Fargate, mange capacity for you, delivering the resources that are required for your workload. More lightly managed services, such as EC2 and RDS, leave scalability up to you, although they may provide tooling like autoscaling you can use.
Part of the objective with scaling is to control costs, you could simply deploy enough resources for the maximum load you can imagine, just like we did on-premises. On AWS we get a bill every month for the resources we use, so we want to be getting business value for our money spent. Even with services that manage scalability for you, there is cost impact to the scaling. For example if you store data in the S3v service, you pay monthly per GB stored. Retaining a lot of useless data on S3 can add up to a significant bill. DynamoDB also changes you per month for GB stored and is about ten times the price of S3 so you should clean up DynamoDB tables too. Make sure to clean up temporary data when it is no longer useful.
One of the challenges of scaling is that you sometimes simply change where the overload lies when one element scales. There is no sense scaling the web servers at the front of your application if your back end database is already overwhelmed. More web servers will only spend their time waiting for access to the database. It is crucial to know how much you can scale one element before another element becomes a bottleneck, then scale the new limiting element.
Scaling in is harder
Adding more nodes is usually relatively easy. With EC2 Autoscaling more instances are deployed on demand, usually when a CloudWatch alarm is breached. The new instances are added to the elastic load balancer, or start retrieving messages from an SQS queue. Once these instances are doing work, how do you know they are finished? Scaling in means deleting the instances, make sure you aren’t interrupting them when they are doing real work.
Scaling up is harder still
Scaling out is great when you can have many identical and disposable instances, what happens when you have a single instance such as a database server? Then you must scale the single instance up and down based on demand and cost. In order to scale an EC2 instance up or down, you need to shut down the instance, and its applications. The RDS database service avoids this outage by clustering a new, larger or smaller, instance with the old instance and failing the service over before terminating the old instance. If you are running an application in a single EC2 instance, can you use the same pattern or does scaling up/down mean application downtime?
Failure is always an option
There are occasions and applications where you might choose not to scale to meet demand. If there is no link between application experience and business value, then you might choose to let performance suffer to control cost. For example, you would scale out the web server farm that handles customer orders even if it was under DDoS attack, so that your customers can still place orders. But maybe you would allow that DDoS attack to overwhelm the web servers that host your customer community discussions, which seldom lead to sales.
Scaling is important
A vital part of getting the advantages of public cloud is to have applications that scale based on demand. AWS manged services can handle the scaling for you, or you can build your won scaling using less managed services. Always remember to work out how you are going to scale in after scaling out and always validate that you are getting value from your monthly AWS bill.
© 2020, Alastair. All rights reserved.