Building Blocks for Highly Available Systems on AWS
An upcoming article from DZone's Guide to Building and Deploying Applications on the Cloud, coming out next week!
Join the DZone community and get the full member experience.
Join For FreeHigh availability describes a system operating with almost no downtime. Even if a failure occurs, the system is able to recover automatically, although a short service interruption might be necessary. When talking about high availability, we refer to the AEC-2 classification of Harvard Research Group: “minimally interrupted computing services … during most hours of the day and most days of the week throughout the year”.
Uptime Matters
Downtimes are painful. Customers aren't able to place orders in your online shop, access your content, or interact with your customer support team. An outage causes direct (e.g. less orders) and indirect costs (e.g. loss of trust). It is also a stressful situation for everyone involved in operating a system.
Building highly available systems was expensive in the past. For example, distributing a system across isolated data centers resulted in noticeable costs, as a second data center was needed including floor space, racks, hardware, and software.
Thanks to the services and infrastructure AWS is offering, operating highly available systems has become much more affordable. That's why highly available systems are becoming the new standard even for start-ups, small, and mid-sized companies.
A highly available system reduces risks for your business and avoids burned out operations engineers.
High Availability On AWS
AWS offers more than 70 different services. Some of them are offering high availability by default:
SQL database (RDS with Multi-AZ deployment)
No-SQL database (DynamoDB)
Object storage (S3)
Message queue (SQS)
Load balancer (ELB)
DNS (Route 53)
Content Delivery Network (CloudFront)
...
The availability of your system depends on its weakest part. Whenever you are adding a new AWS service to your architecture ask yourself the following questions:
Is this service highly available by default?
If not, how can I use the service to be able to recover from a failure automatically?
There should always be a positive answer at acceptable costs to one of these two questions.
EC2 is Not Highly Available By Default
High availability is a key principle of AWS. Nevertheless, Werner Vogels, CTO of Amazon.com is quoted with saying "Everything fails, all the time". This doesn't imply AWS is offering unreliable services. The opposite is true. The quote visualizes how AWS is treating failure: by planning for it.
An important part of AWS is not highly available by default: virtual machines (EC2 instances). A virtual machine might fail because of issues with the host system or due to networking issues. By default, a failed EC2 instance is not replaced, but AWS offers tools to recover from failures automatically.
AWS offers data centers in different regions worldwide. Each region consists of at least two isolated data centers, called Availability Zones. Distributing your workload on multiple servers in at least two Availability Zones allows you to recover even if a whole data center fails.
Launching EC2 instances in multiple Availability Zones is easy and comes at no extra cost. You are able to do so manually or by using a tool called Auto Scaling Groups. An Auto Scaling Group automates the creation of multiple EC2 instances based on the same blueprint. It also allows you to evenly distribute a fleet of servers across multiple Availability Zones.
1st Prerequisite for High Availability: Stateless Server
To be able to spread your workload on multiple servers, you need to get rid of state stored on a single server. If your EC2 instances are stateless, you can replace them whenever you need to without loosing data.
Your applications probably need to store data, but where to store your data if not on your EC2 instances? Outsourcing your data to storage services might help. AWS offers the following data stores:
SQL database (RDS with Multi-AZ deployment)
No-SQL database (DynamoDB)
Object storage (S3)
All these storage services are offering high availability by default. So you are able to use them as a building block without introducing a single-point-of-failure.
If your legacy applications are storing data on disk, using one of these three storage options is impossible. Currently there are two options:
Use the beta service Elastic File System, which offers storage available through NFS.
Synchronize state between your EC2 instances (e.g. with distributed file systems like GlusterFS).
Unfortunately there is no production-ready out-of-the-box service available from AWS right now.
2nd Prerequisite for High Availability: Loose Coupling
Another pre-requisite is to decouple your virtual machines from incoming requests. To be able to distribute your workload on multiple, automatically replaced EC2 instances, you are in need of a reliable and static entry point into your system.
Depending whether incoming requests need to be processed synchronously or asynchronously there are two different services available on AWS to act as an entry point into your system: a load balancer (ELB) or a queue (SQS).
Opinions expressed by DZone contributors are their own.
Comments