rate limiting

A look at rate limiting

Written by: Marcel Koert B.S.E.E. | Posted on: | Category:

Rate Limiting & Throttling


police man rate limiting speed
Rate Limiting:

Controlling the incoming and outgoing messages from your application so that they do not cross over a predetermined limit.


Controlling the incoming and outgoing messages on bases of resource measurements. These can be CPU/Disk IO/Network IO. But also, Threads used or functional measurements.


Rate limiting & Throttling are used to protect you application from none agreed high loads. But you not only protect your application. You also protect your back ends because they get the load you send to them. The load to the back ends is not always a 1 to 1. The load can be a 1:N relation so for every call you get in you do multiple calls to your back ends. This can sometimes give strange behaviours where your applications is able to handle it but your back ends are not. So with rate limiting & Throttling your protect not just your application but also the Chain in which your applications is located.


Token bucket

When you know the maximum amount of a resource (TPS/Bytes received/etc..) you have per second you can create a bucket and every second you add the amount of resources (tokens) you want to spend in that bucket. Now when a request comes in you determine how many resources (tokens) this will use and you take that out of the bucket. If the bucket is empty you have some chooses.

• You can Drop the request. • Or queue it for later processing when you have space in your bucket again.

leaky bucket rate limiting speed

Leaky bucket

This is almost the same as the Token bucket with a few major changes. • The bucket has a Fixed capacity  associated with each virtual connection or user. • It leaks at a standard rate. • When the bucket is empty leaking stops. • When the bucket is full and a token is added the bucket will stay on the same amount.

Fixed window counting

A very easy way of implementing rate limiting. For every window of time you have a certain limit of resources if in that time frame you break the limit you stop processing the requests. So, for instance you agree on 100 transaction a minute and then when the one hundred and first transaction comes in within the same minute you stop processing the requests. When that minute is over you start processing again. And then again you have the same options for the requests that you do not preform: • You can Drop the request. • Or queue it for later processing when you have space in your bucket again. The Fixed window has a problem. If your request comes in latter stages of the fixed window it will have a higher likely hood of being not processed. Because the bucket of that windows can be already filled up. So this can be solved by doing a sliding window.

Sliding window log

leaky bucket rate limiting speed

The sliding window log is a more memory hungry way of doing rate limiting. What you do is you say in 1 second I can do 100 transactions. When you execute a transaction, you log the time you did this transaction in a list that you sort on time. Then you count how many transactions you can do on a sliding window. So, every few milliseconds. Don’t take it to small otherwise you end up only working on this and not what you wrote you API for. Now you will have to keep track of the times that you did the transactions and transactions that you did but now fall outside the sliding window need to be removed. This orchestration will take cpu time and memory.

Sliding window count

This option has a little of everything but it is a little harder to understand. The best way to explain it is this. Let’s take the rate limiting to be 120 transactions per minute. Now we created a variable for every second so over 1 minute you will have 60 values. Now when you do a transaction you do a +1 on the value of the second in where you did the transaction. Now you can easily check how many transactions you did in the last 60 seconds and see if you hit your rate limiting. If a value is older than 61 seconds it can be removed and you create new variables for the next second. The common wisdom is that the variables should span 1/60th of rate limiting time. So if your rate limiting time is a hour then the variable should span 1 minute.


I hope this helps you in deciding what to do about rate limiting. Rate limiting is important to protect your own environments. It is not something that you should think about after the fact it is something that needs to be thought about before open up your environment for users to use.

© 2019 Marcel Koert for MeloMar IT BV