Rate Limiting VS Load Shedding

TL;DR (Too long didn't read)

Any production service must be protected from overload, especially if you cannot control your clients. Though it can be avoided by structuring systems to scale proactively before they experience overload. This might not protect some servers from undergoing heavy resource utilization which may lead to server crashes. So, we need to apply multiple layers of defense. This includes automatic scaling as well as techniques for gracefully shedding excess load.

You might have heard about Rate Limiting and some may have used it also. Today I learned a new concept Load Shedding. You guys might be familiar with it, but for me, it was new and interesting.

So, you must ask what is Load Shedding and why we need it when we have Rate Limiting.

Let's Learn About the concepts in detail, in case you are familiar, please skip.

Rate Limiting:

Pros:

Rate Limiting prevents your servers from some common problems such as the Noisy Neighbour, some hackers trying to bombard your API, etc. It ensures that our servers are working without any overload failure.
Rate limiting can be applied to the API endpoint(restricting the no. of hits to a particular endpoint) or the IP of the calling server/system or some unique ID provided in headers.
It is a must-have if the APIs are exposed to openAPI since anyone can try to bombard the system via some script.

Cons:

There are no cons as such but some limitations to it. Every company tries to implement dynamic scaling to optimize the cost and keep the system always available.
Since Rate Limiting is generally applied on the load balancer, there might be some servers underneath utilizing more than 70-80% of resources due to load balancer configuration.
Once the upper limit of scaling is reached, more traffic can't be served even though it is within the rate limit defined. This may lead to server crashes. It may require a significant amount of time to make it online again.
This is where Load Shedding comes into the picture.

Load Shedding:

Pros:

Load Shedding works on the server level. The main struggle was determining the default number of connections the server would allow being open to clients at the same time. This setting was designed to prevent a server from taking on too many resources and becoming overloaded.
This is used in many big tech giants. Different industries have different approaches to adaptive load shedding.
- Google claims it mostly uses CPU to determine the cost of a request:
  - In platforms with garbage collection, memory pressure naturally translates into increased CPU consumption.
  - In other platforms, it is possible to maintain the remaining resources in such a way that they are very unlikely to run out before the CPU runs out.
- Facebook uses concurrent (inflight) requests to determine the cost of a request.
- Amazon(AWS lambda) predicts the number of resources per call with resource isolation.
As you can guess, it gives 503(Service Unavailable) status in case your shedding criteria reaches which is retryable error and the load balance will mark the particular server node as unhealthy based on the health check(which is checked after a configured interval of time).
This will result in DoS(Denial of Service) which is temporary and self-repairing because as soon as the resource consumption is back to normal, the server will be marked healthy on the load balancer.

Cons:

This may not be the permanent solution if DoS is observed very frequently. We may further need to improve system performance by monitoring thread and heap dumps.
This may also increase system latency which is always a better trade-off as compared to completed failure.
Load Shedding is just a preventative defense to protect our servers from complete failure, which may require a significant amount of time to be online again.

There are real-world examples where Data Shedding is crucial. Here are some articles which you can go through:

https://cloud.google.com/blog/products/gcp/using-load-shedding-to-survive-a-success-disaster-cre-life-lessons

https://aws.amazon.com/builders-library/using-load-shedding-to-avoid-overload/

I will also try to provide an implemented solution for Data Shedding in SpringBoot soon.

Noisy Neighbour

Noisy neighbour is a term that is often applied to general architecture patterns and strategies. The idea behind noisy neighbour is that a user of a system could place load on the system’s resources that could have an adverse effect on other users of the system. The end result could be that one user could degrade the experience of another user. This concept has expanded relevance in a multi-tenant environment where tenants may be consuming shared resources. Adding to the complexity here is that workloads in a multi-tenant environment can be unpredictable. The combination heightens the need for SaaS architectures to employ their own strategies that can manage and minimize the potential impacts of noisy tenants.