Load Testing our Heroku app
I’ve said many times before how much I love Heroku. Though they’ve had an unusually large amount of downtime this week (~1 hour), I’m still loving their service. One of the applications that we’re building is expecting a large wave of traffic shortly following a blitz PR campaign. The golden question of the hour is: How much traffic can we handle?
Currently we’re on the Koi 1 plan, which gives us a 20GB database (shared) and 1 dyno. A “dyno” is basically 1 concurrent request. It’s very similar to a super market. You get in line and you are served in the order in which you came. If some jackass has a cartload of Natural Light & Cheeze Whiz in front of you, you’re gonna have to wait a while. Adding another Dyno is like opening a 2nd lane.
The key to good performance on Heroku is to watch the Queue Depth. If you ever reach too many people in the queue at one time your app will stop serving up requests past that mark, you’ll get a Backlog Too Deep error. I’m not sure what the limit is, but I’m guessing it’s around 100 (which is HUGE).
For more information on Heroku performance, see their excellent docs.
To perform my tests I’m using 2 tools: Apache Bench & New Relic RPM. Both are stellar.
Test 1: Serving up the landing page (1 Dyno, 5000 requests, 20 at a time)
This is a simple GET request that the Heroku routing mesh likely caches heavily. There’s no database queries on the homepage and no logic.
Requests: 5000Concurrency: 20# of Requests/Sec: 85
This is pretty darn good. 50% of the requests are served in under 300ms and the max request took less than 1 second. No requests failed.
Test 2: A simple API read operation (1 Dyno, 5000 requests, 20 at a time)
Concurrency Level: 20
Time taken for tests: 68.067 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 4276315 bytes
HTML transferred: 2665000 bytes
<strong>Requests per second: 73.46 [#/sec] (mean)</strong>
Time per request: 272.267 [ms] (mean)
Time per request: 13.613 [ms] (mean, across all concurrent requests)
Transfer rate: 61.35 [Kbytes/sec] received
Connection Times (ms)min mean[+/-sd] median
maxConnect: 68 75 4.2 74 97
Processing: 85 197 73.9 180 607
Waiting: 85 197 73.9 180 607
Total: 157 272 73.8 255 685
Percentage of the requests served within a certain time (ms)
50% 255
66% 291
75% 313
80% 325
90% 365
95% 407
98% 477
99% 535
100% 685 (longest request)
Again, great results. This is due to caching of course, but at least we know we can handle lots of requests on this API.
Test 3: A simple API write operation (1 Dyno, 5000 requests, 20 at a time)
This API operation inserts 2 records in the database. Again, 5000 requests, 20 at a time.
Concurrency Level: 20
Time taken for tests: 320.981 seconds
Complete requests: 5000
Failed requests: 4165
(Connect: 0, Receive: 0, Length: 4165, Exceptions: 0)
Write errors: 0
Total transferred: 4022365 bytes
Total POSTed: 1935387
HTML transferred: 2651618 bytes
Requests per second: 15.58 [#/sec] (mean)
Time per request: 1283.923 [ms] (mean)
Time per request: 64.196 [ms] (mean, across all concurrent requests)
Transfer rate: 12.24 [Kbytes/sec] received 5.89 kb/s sent 18.13 kb/s totalConnection
Times (ms)min mean[+/-sd] median
maxConnect: 68 71 3.9 70 104
Processing: 134 1210 865.8 975 5559
Waiting: 134 1210 865.8 975 5558
Total: 202 1281 865.7 1048 5627
Percentage of the requests served within a certain time (ms)
50% 1048
66% 1364
75% 1638
80% 1819
90% 2430
95% 3176
98% 3898
99% 4248
100% 5627 (longest request)
You can see here that we failed 4165 requests. That’s not good! Let’s try adding a dyno and do it again.
Test 4: A simple API write operation (2 Dynos, 5000 requests, 20 at a time)
Concurrency Level: 20
Time taken for tests: 135.882 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 4045448 bytes
Total POSTed: 1935774
HTML transferred: 2675000 bytes
Requests per second: 36.80 [#/sec] (mean)
Time per request: 543.530 [ms] (mean)
Time per request: 27.176 [ms] (mean, across all concurrent requests)
Transfer rate: 29.07 [Kbytes/sec] received
13.91 kb/s sent
42.99 kb/s totalConnection Times (ms)min mean[+/-sd] median
maxConnect: 67 95 32.7 72 979
Processing: 121 448 162.4 420 1481
Waiting: 121 448 162.5 419 1481
Total: 199 543 154.4 515 1549
Percentage of the requests served within a certain time (ms)
50% 515
66% 580
75% 624
80% 654
90% 751
95% 840
98% 946
99% 1034
100% 1549 (longest request)
We’re getting a pretty healthy 36 requests per second now, and look, no errors!
Test 5: A simple API write operation (2 Dynos, 10000 requests, 20 at a time)
Concurrency Level: 20
Time taken for tests: 326.317 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 8091062 bytes
Total POSTed: 3870000
HTML transferred: 5350000 bytes
Requests per second: 30.65 [#/sec] (mean)
Time per request: 652.634 [ms] (mean)
Time per request: 32.632 [ms] (mean, across all concurrent requests)
Transfer rate: 24.21 [Kbytes/sec] received
11.58 kb/s sent
35.80 kb/s totalConnection Times (ms)min mean[+/-sd] median
maxConnect: 68 72 13.8 70 1004
Processing: 113 580 233.1 539 2145
Waiting: 113 580 233.1 539 2145
Total: 183 652 233.5 610 2213
Percentage of the requests served within a certain time (ms)
50% 610
66% 701
75% 771
80% 822
90% 971
95% 1103
98% 1254
99% 1374
100% 2213 (longest request)
We slipped a tad on overall throughput (down to 30 reqs/sec) but we still served up all of the requests in a reasonable time without any failures. Let’s check out New Relic & See what it says.
In this graph you can see the overall HTTP Throughput (higher is better) against the Heroku backlog depth. The more dynos you have, the quicker your backlog is cleared out, equating to more throughput. You can clearly see that we are now hitting a limit on the throughput we can handle because the queue depth is increasing. We might choose to optimize the site or add another dyno to squeeze some more perf out of this. Also in the image if you were to mouseover on the queue stacks on the website, you’d see that my peak queue depth was 29. That’s pretty high, and you can definitely see a cap in the total # of requests that we can serve. In the end, I’m able to pretty reliably say that we can handle a 10k request spike when running on 2 dynos. I’d suggest more dynos if the client wanted to handle more than that. Gotta love tools that make investigation like this easy. Props to Heroku & New Relic.