How Load Balancing works?

A load balancer is a device that distributes network or application traffic across a cluster of servers

Aim of Load Balancing

Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource.
Load balancing improves responsiveness and increases availability of applications.

How does Load Balancers Work?

A load balancer sits between the client and the server farm accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various methods.

By balancing application requests across multiple servers, a load balancer reduces individual server load and prevents any one application server from becoming a single point of failure, thus improving overall application availability, design responsiveness and load time.

Categories of Load Balancer

Layer-4
Layer-7

Layer-4 Load Balancing

Layer 4 load balancing is widely acknowledged for its simple way to load balance the network traffic through multiple servers.

Layer 4 load balancers act upon data found in network and transport layer protocols (IP, TCP, FTP, UDP).

Layer-7 Load Balancing

Layer 7 load balancers distribute requests based upon data found in application layer protocols such as HTTP.

Layer-7 load balancing is more refined and sophisticated way of network traffic load balancing than Layer 4.

This mode is based on the content of the user’s request in which load balancer sends user request to the web servers according to the content of a request.

This is the very advantageous way because users can run multiple web servers on the same domain and port.

Load Balancing Algorithms

Round robin
- clients are given IP in round robin fashion. IP is assigned to clients for a time quantum.
- Round Robin method of load balancing, does not require a dedicated software or hardware node.
- multiple IP addresses are associated with a single domain name
Weighted round robin
- The Load Balance Weight configuration setting determines how transactions are allocated to clustered nodes.
- New connections are forwarded in proportion to each node’s assigned weight.
- As a result, traffic is distributed more efficiently to the servers that you rank as being more capable of handling requests.
Least connections
- Minimum numbers of connections are considered here for the server selection.
- Highly recommended for longer sessions.
Least response time
- selects the service with the least number of active connections and the least average response time
- The response time also called Time to First Byte, or TTFB is the time interval between sending a request packet to a server and receiving the first response packet back.