Weighted Round Robin (WRR) is an intelligent load balancing algorithm designed to address the limitations of the basic Round Robin algorithm. While Round Robin treats all servers equally, WRR acknowledges that servers may have varying capacities or processing power. By assigning weights to each server, WRR ensures a fairer distribution of requests proportional to each server’s capabilities.
How Weighted Round Robin Works
WRR builds upon the Round Robin concept but adds a crucial element – weights. Here’s how it works:
- Server Weights: Each server in the pool is assigned a numerical weight representing its relative capacity or performance. For instance, a powerful server might have a weight of 3, while a less powerful one might have a weight of 1.
- Virtual Queues: The load balancer maintains virtual queues for each server, proportional to their weights. In the previous example, server A (weight 3) would have three queues, server B (weight 2) would have two, and server C (weight 1) would have one.
- Request Distribution: When a request arrives, the load balancer places it in the next available queue of the server with the highest weight. It then moves on to the next queue of the same server if available, or to the queue of the server with the next highest weight.
- Cyclical Pattern: This process continues in a cyclical manner, ensuring that each server receives a number of requests proportional to its weight.
Benefits of Weighted Round Robin
- Fairness and Efficiency: WRR overcomes the main limitation of Round Robin by accounting for server capacities. This results in a fairer distribution of workload, where more powerful servers handle a larger share of requests, maximizing resource utilization.
- Flexibility: Weights can be easily adjusted to reflect changes in server capacity or to prioritize certain servers based on specific needs.
- Simplicity: Despite the added complexity of weights, WRR is still relatively straightforward to implement and understand.
Considerations and Best Practices
- Weight Assignment: Carefully consider the relative capacities of your servers when assigning weights. You can use metrics like CPU cores, RAM, or benchmark results to guide your decision.
- Monitoring and Adjustment: Regularly monitor server performance and adjust weights as needed to maintain optimal load distribution.
- Session Persistence: If your application requires session persistence, combine WRR with techniques like IP hash or cookie-based persistence to ensure that requests from the same client are directed to the same server.
Implementation
Python
import itertools
class WeightedRoundRobin:
    def __init__(self, servers):
        self.servers = servers
        self.gcd = self.__gcd(servers.values())
        self.max_weight = max(servers.values())
        self.current_weight = 0
        self.index = 0
    def get_server(self):
        while True:
            self.index = (self.index + 1) % len(self.servers)
            if self.index == 0:
                self.current_weight -= self.gcd
                if self.current_weight <= 0:
                    self.current_weight = self.max_weight
            server = list(self.servers.keys())[self.index]
            if self.servers[server] >= self.current_weight:
                return server
    @staticmethod
    def __gcd(weights):
        def gcd_pair(a, b):
            while b:
                a, b = b, a % b
            return a
        return reduce(gcd_pair, weights)
# Example usage
servers = {'server1': 5, 'server2': 3, 'server3': 1}
wrr = WeightedRoundRobin(servers)
for _ in range(20):
    server = wrr.get_server()
    print(f"Request sent to {server}")Real-World Examples
- Heterogeneous Server Environments: In a web server cluster with servers of varying specifications (e.g., some with more powerful CPUs or RAM), WRR ensures that more capable servers handle a larger portion of the traffic, preventing bottlenecks.
- Prioritizing Certain Services: In an application with multiple services, WRR can be used to give priority to critical services by assigning them higher weights. For instance, a payment processing service might have a higher weight than a product recommendation service.
- Cloud Environments: Cloud-based load balancers often offer WRR as a standard feature, allowing users to easily distribute traffic across virtual machines or containers with varying resources.
Hypothetical Real-World Interview Examples
Example 1: Load Balancing for a Database Cluster
- Interviewer: “Design a load balancing system for a database cluster where some database servers have higher storage capacity and processing power than others.”
- Candidate: “I’d use the Weighted Round Robin algorithm to ensure that the more powerful servers handle a larger share of the database queries, preventing them from becoming bottlenecks. The weights would be assigned based on factors like storage capacity, CPU cores, and memory. I’d also implement health checks to ensure that only active and healthy servers receive queries.”
 
Example 2: Load Balancing for a Content Delivery Network (CDN)
- Interviewer: “How would you design a load balancing strategy for a CDN with servers distributed across multiple geographic regions, some with higher bandwidth capacity than others?”
- Candidate: “I’d combine Weighted Round Robin with Global Server Load Balancing (GSLB). GSLB would direct user requests to the nearest region based on their geographic location. Within each region, I’d use Weighted Round Robin to distribute traffic among the CDN servers, giving higher weights to servers with greater bandwidth capacity. This would ensure optimal content delivery speed and reduce latency for users.”