I’ve said many times before how much I love Heroku. Though they’ve had an unusually large amount of downtime this week (~1 hour), I’m still loving their service. One of the applications that we’re building is expecting a large wave of traffic shortly following a blitz PR campaign. The golden question of the hour is: How much traffic can we handle?
Currently we’re on the Koi 1 plan, which gives us a 20GB database (shared) and 1 dyno. A “dyno” is basically 1 concurrent request. It’s very similar to a super market. You get in line and you are served in the order in which you came. If some jackass has a cartload of Natural Light & Cheeze Whiz in front of you, you’re gonna have to wait a while. Adding another Dyno is like opening a 2nd lane.
The key to good performance on Heroku is to watch the Queue Depth. If you ever reach too many people in the queue at one time your app will stop serving up requests past that mark, you’ll get a Backlog Too Deep error. I’m not sure what the limit is, but I’m guessing it’s around 100 (which is HUGE).
For more information on Heroku performance, see their excellent docs.
To perform my tests I’m using 2 tools: Apache Bench & New Relic RPM. Both are stellar.
Test 1: Serving up the landing page (1 Dyno, 5000 requests, 20 at a time)
This is a simple GET request that the Heroku routing mesh likely caches heavily. There’s no database queries on the homepage and no logic.
Requests: 5000Concurrency: 20# of Requests/Sec: 85
This is pretty darn good. 50% of the requests are served in under 300ms and the max request took less than 1 second. No requests failed.
Test 2: A simple API read operation (1 Dyno, 5000 requests, 20 at a time)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
Again, great results. This is due to caching of course, but at least we know we can handle lots of requests on this API.
Test 3: A simple API write operation (1 Dyno, 5000 requests, 20 at a time)
This API operation inserts 2 records in the database. Again, 5000 requests, 20 at a time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
You can see here that we failed 4165 requests. That’s not good! Let’s try adding a dyno and do it again.
Test 4: A simple API write operation (2 Dynos, 5000 requests, 20 at a time)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
We’re getting a pretty healthy 36 requests per second now, and look, no errors!
Test 5: A simple API write operation (2 Dynos, 10000 requests, 20 at a time)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | |
We slipped a tad on overall throughput (down to 30 reqs/sec) but we still served up all of the requests in a reasonable time without any failures.
Let’s check out New Relic & See what it says.
In this graph you can see the overall HTTP Throughput (higher is better) against the Heroku backlog depth. The more dynos you have, the quicker your backlog is cleared out, equating to more throughput. You can clearly see that we are now hitting a limit on the throughput we can handle because the queue depth is increasing. We might choose to optimize the site or add another dyno to squeeze some more perf out of this.
Also in the image if you were to mouseover on the queue stacks on the website, you’d see that my peak queue depth was 29. That’s pretty high, and you can definitely see a cap in the total # of requests that we can serve.
In the end, I’m able to pretty reliably say that we can handle a 10k request spike when running on 2 dynos. I’d suggest more dynos if the client wanted to handle more than that.
Gotta love tools that make investigation like this easy. Props to Heroku & New Relic.