Some time ago we changed how Bootzooka handles HTTP requests.

My goal is to compare two great tools:

  1. complete web framework scalatra 2.3.1 (latest stable)
  2. called by one of it’s project leaders a web toolkit akka-http 2.4.2

from a performance perspective. I find both of them very helpful but they work in different ways so the idea is to see how much that impacts the application and it’s users.

Whenever you buy a car - you look at its specification. How fast it is, how amazing is the acceleration, how much petrol does it burn, how much load it can carry. I’d like to know such metrics for tools I use to build web applications. Authors don’t always publish such metrics and often Google doesn’t find any. One such metric would be to compare benchmarks but there are none so let’s make a comparison.

From the Great Jez Humble’s Continuous Delivery book, introduction to chapter 9, comes the definition of app performance:

First of all, let’s clear up some confusion around the terms. We’ll use the same terminology as Michael Nygard.1 To paraphrase, performance is a measure of the time taken to process a single transaction, and can be measured either in isolation or under load. Throughput is the number of transactions a system can process in a given timespan. It is always limited by some bottleneck in the system. The maximum throughput a system can sustain, for a given workload, while maintaining an acceptable response time for each individual request, is its capacity. Customers are usually interested in throughput or capacity. In real life, “performance” is often used as a catch-all term; we will try to be rather more careful in this chapter.

So I’ll try to find throughput and capacity for each versions of bootzooka.

Brief details of tested tools internals

The version of Scalatra that I’ve tested works in synchronous & blocking way while akka-http is by default “reactive” (asynchronous and non-blocking - ”reactive” trait might be easily lost when the implementation is blocking, so our code also needs to process asnychronously).

Scalatra is thread based, handling servlets and requires servlet container like jetty to run, it can delegate work to akka (from its docs, never tried this one), while akka-http is actor per request, doesn’t require a container and writing actor model code is more natural pattern to follow in this case. Both have comparably pleasant DSLs for writing routes.

Setup

Equipped with gatling.io (for generating fake traffic of user scenarios) and GCViewer (for watching a memory consumption, this is influenced by Bartek’s post to see if something spectacular is going on) I can measure what happens to my beloved framework (bootzooka) after switching web frameworks.

The test was performed on macbook pro, 2,5 GHz Intel Core i7 with 16 GB of ram. I use default JVM flags what is Xss=1 MB (thread stack size) and InitialHeapSize ~ 260 MB and Xmx ~ 4 GB (note: heroku 1 dyno standard is 350 MB).

$ java -XX:+PrintFlagsFinal version
    uintx InitialHeapSize                          := 268435456                           
    uintx MaxHeapSize                              := 4294967296                          
    uintx MaxNewSize                               := 1431306240                          
    uintx NewSize                                  := 89128960                            
    uintx OldSize                                  := 179306496                           
     intx ThreadStackSize                           = 1024                                

I’m running postgres 9.4 database locally with two separate schemas one for each scalatra and akka-http versions of bootzooka. So database configuration is same for each system. Gatling scenarios are being run on same local machine as bootzooka backends are deployed ( frontend is written in angular and it’s performance is fine). This configuration is a kind of “lab environment” where network latency and bandwidth doesn’t play any role, there is no data transmission over the wire what often is the main issue of webapp performance.

Provisioning

The newbootzooka schema is for akka-http bootzooka version and oldbootzooka schema is for scalatra.

export DATABASE_URL=postgres://newbootzooka:newbootzooka@localhost:5432/newbootzooka && \
java -Dserver.port=8081 -Xloggc:newbootzooka-$(date +"%Y-%m-%d_%H-%M-%S").log \
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -jar new-bootzooka/bootzooka.jar

export DATABASE_URL=postgres://oldbootzooka:oldbootzooka@localhost:5432/oldbootzooka && \
java -Dembedded-jetty.port=8082 -Xloggc:oldbootzooka-$(date +"%Y-%m-%d_%H-%M-%S").log \
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -verbose:gc -jar old-bootzooka/bootzooka.jar

Scenario

It’s very basic as bootzooka doesn’t do complicated stuff. In the first step user goes to /, waits for http 200 code and in second sends post request (to /api/users/register) to register himself and awaits for success string in response body.

Scenario setup 1

Throwing at each system 200 users at once just to see what happens BootzookaRegistrationScn

Bootzooka akka-http version:

./gradlew -Dserver.port=8081 loadTest

First round results:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        400 (OK=387    KO=13    )
    > min response time                                     18 (OK=18     KO=111   )
    > max response time                                   3991 (OK=3991   KO=1227  )
    > mean response time                                  1364 (OK=1397   KO=403   )
    > std deviation                                       1226 (OK=1232   KO=331   )
    > response time 50th percentile                        957 (OK=1103   KO=333   )
    > response time 75th percentile                       2445 (OK=2446   KO=552   )
    > mean requests/sec                                 79.586 (OK=77     KO=2.587 )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           170 ( 43%)
    > 800 ms < t < 1200 ms                                  33 (  8%)
    > t > 1200 ms                                          184 ( 46%)
    > failed                                                13 (  3%)
    ---- Errors --------------------------------------------------------------------
    > java.net.ConnectException: Connection reset by peer                13 (100.0%)
    ================================================================================

You can see there was 400 requests as each user makes 2 requests. There were 13 timeouts, so maybe because of JIT compilation, let’s run test a second round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        400 (OK=400    KO=0     )
    > min response time                                      7 (OK=7      KO=-     )
    > max response time                                    679 (OK=679    KO=-     )
    > mean response time                                   193 (OK=193    KO=-     )
    > std deviation                                        215 (OK=215    KO=-     )
    > response time 50th percentile                         45 (OK=45     KO=-     )
    > response time 75th percentile                        404 (OK=404    KO=-     )
    > mean requests/sec                                171.969 (OK=171.969 KO=-     )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           400 (100%)
    > 800 ms < t < 1200 ms                                   0 (  0%)
    > t > 1200 ms                                            0 (  0%)
    > failed                                                 0 (  0%)
    ================================================================================

Looks good, how about doubling number to 400 users at once, third round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        800 (OK=800    KO=0     )
    > min response time                                      7 (OK=7      KO=-     )
    > max response time                                   1443 (OK=1443   KO=-     )
    > mean response time                                   440 (OK=440    KO=-     )
    > std deviation                                        471 (OK=471    KO=-     )
    > response time 50th percentile                        200 (OK=200    KO=-     )
    > response time 75th percentile                        920 (OK=920    KO=-     )
    > mean requests/sec                                212.993 (OK=212.993 KO=-     )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           541 ( 68%)
    > 800 ms < t < 1200 ms                                 183 ( 23%)
    > t > 1200 ms                                           76 ( 10%)
    > failed                                                 0 (  0%)
    ================================================================================

No failed requests, mean req/sec ~ 212. Let’s double users for forth round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                       1600 (OK=1565   KO=35    )
    > min response time                                      4 (OK=4      KO=159   )
    > max response time                                   4167 (OK=4167   KO=1625  )
    > mean response time                                  1016 (OK=1016   KO=991   )
    > std deviation                                       1032 (OK=1041   KO=428   )
    > response time 50th percentile                        896 (OK=895    KO=1285  )
    > response time 75th percentile                       1515 (OK=1519   KO=1288  )
    > mean requests/sec                                228.571 (OK=223.571 KO=5     )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           720 ( 45%)
    > 800 ms < t < 1200 ms                                 157 ( 10%)
    > t > 1200 ms                                          688 ( 43%)
    > failed                                                35 (  2%)
    ---- Errors --------------------------------------------------------------------
    > java.net.ConnectException: Connection reset by peer                35 (100.0%)
    ================================================================================

2% of failed requests but no error in logs (maybe we don’t handle that case yet). And in logs I’ve found:

java.sql.SQLTimeoutException: Timeout after 1015ms of waiting for a connection.

and more:

13:53:45.304 [main-akka.actor.default-dispatcher-82] ERROR c.s.b.Main$$anon$1 - Exception during client request processing: Task slick.backend.DatabaseComponent$DatabaseDef$$anon$2@71688eb2 rejected from java.util.concurrent.ThreadPoolExecutor@4131e49e[Running, pool size = 20, active threads = 20, queued tasks = 1000, completed tasks = 4067]

Let’s repeat, same setup, fifth round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                       1600 (OK=1600   KO=0     )
    > min response time                                      4 (OK=4      KO=-     )
    > max response time                                   1905 (OK=1905   KO=-     )
    > mean response time                                   378 (OK=378    KO=-     )
    > std deviation                                        631 (OK=631    KO=-     )
    > response time 50th percentile                         57 (OK=57     KO=-     )
    > response time 75th percentile                        300 (OK=300    KO=-     )
    > mean requests/sec                                225.384 (OK=225.384 KO=-     )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                          1310 ( 82%)
    > 800 ms < t < 1200 ms                                  23 (  1%)
    > t > 1200 ms                                          267 ( 17%)
    > failed                                                 0 (  0%)
    ================================================================================

No failed requests this time,75th percentile means: 75% of requests handled within below 300 milliseconds.

Bootzooka scalatra version:

./gradlew -Dserver.port=8082 loadTest

First round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        400 (OK=284    KO=116   )
    > min response time                                     61 (OK=61     KO=10486 )
    > max response time                                  11823 (OK=11823  KO=11805 )
    > mean response time                                  5800 (OK=3502   KO=11426 )
    > std deviation                                       5640 (OK=5156   KO=199   )
    > response time 50th percentile                       1747 (OK=224    KO=11447 )
    > response time 75th percentile                      11470 (OK=11334  KO=11571 )
    > mean requests/sec                                 31.464 (OK=22.339 KO=9.125 )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           200 ( 50%)
    > 800 ms < t < 1200 ms                                   0 (  0%)
    > t > 1200 ms                                           84 ( 21%)
    > failed                                               116 ( 29%)
    ---- Errors --------------------------------------------------------------------
    > regex(success).find(0).exists, found nothing                      116 (100.0%)
    ================================================================================

116 failed request, same case let’s give a JIT a chance. Second round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        400 (OK=327    KO=73    )
    > min response time                                      1 (OK=1      KO=4298  )
    > max response time                                   5015 (OK=5015   KO=4773  )
    > mean response time                                  2016 (OK=1449   KO=4556  )
    > std deviation                                       2279 (OK=2143   KO=101   )
    > response time 50th percentile                         67 (OK=53     KO=4576  )
    > response time 75th percentile                       4623 (OK=4583   KO=4627  )
    > mean requests/sec                                 67.261 (OK=54.986 KO=12.275)
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           222 ( 56%)
    > 800 ms < t < 1200 ms                                   6 (  2%)
    > t > 1200 ms                                           99 ( 25%)
    > failed                                                73 ( 18%)
    ---- Errors --------------------------------------------------------------------
    > regex(success).find(0).exists, found nothing                       73 (100.0%)
    ================================================================================

73 failed, ok it’s better then 116 but let’s keep it warming. Third round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        400 (OK=357    KO=43    )
    > min response time                                      1 (OK=1      KO=3261  )
    > max response time                                   3929 (OK=3929   KO=3667  )
    > mean response time                                  1505 (OK=1269   KO=3463  )
    > std deviation                                       1626 (OK=1563   KO=109   )
    > response time 50th percentile                        291 (OK=250    KO=3468  )
    > response time 75th percentile                       3507 (OK=3462   KO=3553  )
    > mean requests/sec                                  80.89 (OK=72.194 KO=8.696 )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           229 ( 57%)
    > 800 ms < t < 1200 ms                                   9 (  2%)
    > t > 1200 ms                                          119 ( 30%)
    > failed                                                43 ( 11%)
    ---- Errors --------------------------------------------------------------------
    > regex(success).find(0).exists, found nothing                       43 (100.0%)
    ================================================================================

43 failed, number is decreasing but we see mean 80 req/sec with 11% of failures. Forth round:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        400 (OK=339    KO=61    )
    > min response time                                      1 (OK=1      KO=2005  )
    > max response time                                   5830 (OK=5830   KO=5528  )
    > mean response time                                  2162 (OK=1648   KO=5021  )
    > std deviation                                       2601 (OK=2466   KO=957   )
    > response time 50th percentile                         61 (OK=21     KO=5370  )
    > response time 75th percentile                       5412 (OK=5318   KO=5451  )
    > mean requests/sec                                 58.651 (OK=49.707 KO=8.944 )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           230 ( 58%)
    > 800 ms < t < 1200 ms                                   7 (  2%)
    > t > 1200 ms                                          102 ( 26%)
    > failed                                                61 ( 15%)
    ---- Errors --------------------------------------------------------------------
    > regex(success).find(0).exists, found nothing                       61 (100.0%)
    ================================================================================

61 failed, looks like won’t be better, however let’s double users to compare it with akka-http system. Fifth round:

400 requests at once:

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                        800 (OK=572    KO=228   )
    > min response time                                      1 (OK=1      KO=1993  )
    > max response time                                  14464 (OK=14464  KO=14065 )
    > mean response time                                  5533 (OK=2370   KO=13468 )
    > std deviation                                       6667 (OK=5087   KO=1723  )
    > response time 50th percentile                        116 (OK=35     KO=13728 )
    > response time 75th percentile                      13715 (OK=144    KO=13827 )
    > mean requests/sec                                 50.614 (OK=36.189 KO=14.425)
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                           458 ( 57%)
    > 800 ms < t < 1200 ms                                   5 (  1%)
    > t > 1200 ms                                          109 ( 14%)
    > failed                                               228 ( 29%)
    ---- Errors --------------------------------------------------------------------
    > regex(success).find(0).exists, found nothing                      228 (100.0%)
    ================================================================================

50 mean req/sec with 29% of failed and in app logs I see:

java.sql.SQLTimeoutException: Timeout after 1003ms of waiting for a connection.

Memory usage

In both cases, memory usage is comparable, I saw same 99.97% of jvm code throughput and even almost same number of GC pauses. I have two files but nothing spectacular here. The upper window is akka-http bootzooka version the lower is scalatra version.

GCviewer bootzooka comparision

  • newjvm-2016-03-29_15-11-42.log
  • oldjvm-2016-03-29_15-33-13.log

Conclusion after scenario setup 1

This first scenario setup is naive and should show “something” and that is akka-http app behaves better than scalatra and continue to operate while getting bigger traffic too. Akka-http stops responding under much bigger load 800 users vs scalatra 200 users (mean 225 req/sec vs 50-80req/sec). It looks like akka-http version has better default configuration (thread pool size) and integrates well with slick (functional relational mapping tool) and hikari-cp (tool that managed database connection pool) which overall handles resources in more efficient way.

Comparison of second round of both frameworks:

For comparison I’ve used gatling-reports and the legend .

simulation duration successCount errorCount min p50 p95 p99 max avg stddev requestPerSecond apdex rating
akka-http 1.98 400 0 4 43 581 678 679 191.2 217 201.71 1 Excellent
scalatra 5.48 327 73 0 65 4901 4984 5015 2014.55 2284 59.73 0.79 Fair
  • duration = seconds
  • succesCount = number of requests
  • min, p50, p95, p99, max, avg, stddev = miliseconds

errors

image alt
duration
image alt
throughput
image alt

Comparison of best runs for 200 users:

simulation duration successCount errorCount min p50 p95 p99 max avg stddev requestPerSecond apdex rating
akka-http 1.98 400 0 4 43 581 678 679 191.2 217 201.71 1 Excellent
scalatra 4.49 357 43 0 259 3770 3869 3928 1473.31 1654 79.49 0.8 Fair

Comparison for 400 users:

simulation duration successCount errorCount min p50 p95 p99 max avg stddev requestPerSecond apdex rating
akka-http 3.41 800 0 4 199 1390 1438 1443 436.42 474 234.33 1 Excellent
scalatra 15.33 572 228 0 111 14156 14398 14464 5529.42 6674 37.3 0.59 Poor

errors

image alt
duration
image alt
throughput
image alt

Comparison for 800 users (where akka-http started to have a problems):

simulation duration successCount errorCount min p50 p95 p99 max avg stddev requestPerSecond apdex rating
akka-http 8.69 1564 36 4 418 3095 3296 3345 1011.16 1117 180.06 0.85 Fair

Hints for tuning

Very often performance tests reveal that access to database is a bottleneck. In second (scalatra) of above cases we can observe it quite soon.

First natural approach is to increase the database connection pool what should suffice for a while. Increasing timeout is highly undesirable because it would only slow everything down. Another practise (this time expensive) is to set a database cluster or/and queue updates and queries to database. But any of that moves requires expertise knowledge in every of touched areas (like distributed setup of databases and applications). But that is not the case here, tests are being run on defaults.