All about the AWS Elastic Load Balancing service

To make your workloads highly available on AWS you can use AWS Elastic Load Balancing service. This service allows you to balance the incoming traffic between EC2 instances in different Availability Zones (AZ) within the same region*. This service will scale automatically based on the request demand without you having to worry about having extra capacity in your load balancers.

If you want to load balance your traffic between AWS regions, look into Route53's Latency-Based Routing. Besides DNS, Route53 also offers other routing policies that you might find useful.

Load Balancer Types

Amazon's Elastic Load Balancers fit into two types:

Classic ELB: routes based on network information (layer 3/4) using listener checks that check for requests from clients and forward the requests to registered clients.
Application ELB: routes based on content of the request (think iRules in F5 BigIP) using listeners that forward traffic from clients to target groups (registered targets grouped together). Each target group has its own health checks. Target groups only support HTTP and HTTPS (you can change the ports though).

You can have external/Internet-facing load balancers with external DNS names or internal load balancers with internal DNS names. You can also choose between different subnets and Availability Zones. It is highly recommended to always pick at least 2 AZs in case one of the AZs goes down. Do note that internal ELB can route to instances with private IP only. External-facing ELB can route traffic to instances with public IPs.

Interesting thing to note is that since AWS manages and scales ELB for you based on your traffic needs, the load balancer will be placed on multiple nodes within AWS. This is done by ELB service creating a node in each AZ that sits behind a DNS hostname that you then forward your traffic to. The IP address of the nodes may change on a whim without notice (Amazon sets the TTL is set to 60).

Due to this, you should not use the IP address of your ELB endpoint that you get through dig, nslookup, et al and are strongly encouraged to use the DNS hostname that AWS gives you. You can on another hand assign an Elastic IP to the load balancer but why would you tie yourself to a pricey IP address?

The really cool thing though is that Route53 supports Alias records for ELB endpoints - so you can magically pick ELB DNS hostnames from Route53 when creating DNS records. This will let you route any traffic destined to your domain name and forward them to your ELB. Cool!

Health Checking

Amazon's Elastic Load Balancing will do standard health checking of registered EC2 instances for you. ELB will send periodic pings or requests to EC2 instances. Any instance that is healthy will show up as InService and an unhealthy instance will show up as OutOfService.

You can specify different ping protocols such as HTTP/HTTPs, SSL, and TCP. EC2 Console default is HTTP and CLI/API default is TCP. You can also set ping ports and paths of HTTP requests (for example if you wanted to check whether /healthcheck.html is accessible).

Other health check settings that you should watch out for are:

Response Timeout: amount of time to wait when receiving a response from the instance
Health Check Interval: amount of time between health checks
Unhealthy Threshold: number of consecutive failed health checks
Healthy Threshold: number of consecutive successful health checks

AWS ELB documentation has a table of defaults and valid values for the above settings.

Sticky Sessions

Sticky sessions allow the load balancer to keep routing traffic to the same back-end instances during users' access to the application or website. This ensures that the same EC2 instances keep serving the requests where those sessions originally went to. When enabled, by default ELB will route requests to instances with the smallest load. Sticky sessions are not enabled by default. You can enable sticky sessions from AWS CLI or AWS Console. There are two types of Sticky Sessions:

Duration-based session stickiness: load balancer itself

generates a session cookie called AWSELB. If the cookie is not found in the client request, ELB will pick an instance based on an existing load balancing algorithm. If a cookie is found, the traffic will be routed to the instance specified in the cookie. Cooking expiration is also configured int he stickiness policy.
Application-controller session stickiness: load balancer uses

a cookie to associate the session with the original server that handled the request but the timeout is specified by the application. You have to tell ELB the name of this application cookie.

Monitoring

AWS ELBs can be monitored in AWS CloudWatch. Metrics are sent from ELB at 60-second intervals only when requests are flowing through the load balancer. ELB monitoring is one of the few AWS services in CloudWatch where you do not get charged extra for detailed monitoring. You will see those metrics in the AWS/ELB namespace inside of CloudWatch.

Some of the more interesting metrics that ELB sends to CloudWatch are:

RequestCount
BackendConnectionErrors
Number of HTTPCode_Backend_{2,3,4,5}XX response codes
Number of HTTPCode_ELB_{4,5}XX error codes
Latency (time of when the ELB sent a request until when the instance started sending response headers)
SurgeQueueLength
SpilloverCount
Healthy and unhealthy host counts

It is recommended to pay special attention to SurgeQueueLength and SpilloverCount metrics. SurgeQueueLength monitors the total amount of requests that are pending submission to a registered instance. Likewise, SpilloverCount monitors a number of rejected requests due to full surge queues. These two metrics can show you whether your backend instances are overloaded or if you're reaching max connections in your applications. Remember that AWS will scale the internal load balancers based on traffic, but your instances will also need to scale using Auto Scaling Groups or some other custom mechanism within your application.

This Amazon Premium Support articlehas some good tips on lowering your SurgeQueueLength and SpilloverCount metrics in your environment. Some of the tips include:

use of Auto Scaling Groups within EC2
monitoring your EC2 instances to see where the bottle neck is (CPU vs Memory, database contention, application-specific metrics)
increasing child processes on your instances' operating system (tweaking /etc/security/limits.conf on RHEL/CentOS can help you there)
lastly when all fails resizing instance types (m4.large to m4.10xlarge)

Special Considerations

If you delete your load balancer, your instances will keep running.
Each ELB needs a Security Group. It is recommended to set your Security Group's outbound traffic to be able to hit the health check and instance listener. Also pay special attention to any Network ACLs if you happen to use them in your VPC.
ELB Access Logs can be used (if enabled) to capture detailed request information from the clients. Information such as time of request, client's IP address, latency, requests paths and server responses can be found in Access Logs. This log information is forwarded to Amazon S3 at set intervals of either 5 or 60 minutes.
ELBs can be prewarmed by AWS at request if you expect your traffic to surge or prior to a load test. AWS will need to know start/end time of the flash traffic, expected request rate per second, and total size of requests and responses.
Please read through Best Practices in Evaluating Elastic Load Balacing

Load Balancer Types

Health Checking

Sticky Sessions

Monitoring

Special Considerations

Comments