How Raygun increased throughput by 2,000% with .NET Core (over Node.js)

John-Daniel.NET, Node.js, Tech Stuff, Web Development22 Comments

How Raygun increased throughput by 2,000% with .NET Core (over Node.js)

A few months ago, Raygun was featured on the Microsoft website with how we increased throughput by 2,000 percent with a change from Node.js to .NET Core.

The performance improvements were immediate and nothing short of phenomenal. In fact, using the same-size server, we were able to go from 1,000 requests per second per node with Node.js, to 20,000 requests per second with .NET Core. (You can read the case study here for a quick refresh.)

This is a 2,000 percent increase in throughput, which means Raygun reduced our server count by 60 percent. 

These results are astounding, and understandably, there were a few questions around our infrastructure, and the exact scenario which enabled us to gain these results.

Adding context to the increased throughput

We’re really excited about the direction Microsoft is taking with .NET Core. Even with our impressive performance gains from moving to .NET Core 1, there’s more coming in .NET Core 2, with another 25 percent lift in performance.

Performance is absolutely a feature. So much so, that Raygun includes a Real User Monitoring capability to track software performance for customers. We know the better performance we deliver, the happier we make our customers, and the more efficient we can run our infrastructure. (40% of users will stop using a site that takes longer than three seconds to be ready.)

No offence to Jeff, but I’d far prefer to pour our dollars into improving our product than into our hosting bill.

In my role as CEO, I focus more on the question: “How can I deliver more value to our customers?”

So, making the platform more efficient with better systems means we can deliver more for the same price. Squeezing more out of our infrastructure investment means we can direct that spending into building features our customers need and love. 

For example, our latest large release was a custom dashboard, which includes live data updates.

How we got from Node.js to .NET Core

We originally built using Mono, which just bled memory and would need to be constantly recycled. So we looked around at the options and what would be well suited to the highly transactional nature of our API. We settled on Node.js, feeling that the event loop model worked well given the lightweight workload of each message being processed. This served us well for several years.

As our test case is our production system, we are unable to release the code to help others to reproduce the results. However, we did also get asked about which EC2 instances we used and for any other technical information that would help understand where the gains have come from.

Which EC2 we used

In terms of EC2 we utilized c3.large nodes for both the Node.js deployment and then for the .NET Core deployment. Both sit as backends behind an nginx instance and are managed using scaling groups in EC2 sitting behind a standard AWS load balancer (ELB).

In terms of where the gains have come from, when we started to look at .NET Core in early 2016, it became quite obvious that being able to asynchronously hand off to our queuing service greatly improved throughput. Unfortunately, at the time, Node.js didn’t provide an easy mechanism to do this, while .NET Core had great concurrency capabilities from day one. This meant that our servers spent less time blocking on the hand off, and could start processing the next inbound message. This was the core component of the performance improvement.

Beyond just the async handling, we constantly benchmark Node, and several of the common web frameworks. Our most recent performance test between Hapi, Express, and a few other frameworks found the following results (you can read about the environments and how to replicate the test here.)

 

We were previously using Express to handle some aspects of the web workload, and from our own testing we could see that it introduces a layer of performance cost. We were comfortable with that at the time, however the work that Microsoft have invested in the web server capabilities has been a huge win.

Memory usage

Another popular question about our increased throughput was around memory usage, specifically if we achieved any gain, and if this was something we specifically needed to manage with high throughput. Indeed we did see a gain, although in both cases memory was fairly static. However, our Node.js deployments would operate with a 1GB footprint, and the .NET Core deployment reduced the size of that footprint to 400MB. For both deployments we have a level of ‘working’ memory involved associated with each concurrent active request. Similarly, with the improvements to the overall footprint, the operating overhead was reduced in moving to .NET Core.

It’s hard to overstate the efforts they’ve put in. (Seriously. Go and watch some of the GitHub discussions, you’ll see all sorts of incredible, crazy optimizations to deliver amazing performance.)

Raygun aren’t the only folks benefiting from the performance improvements of increased throughput.

MMO Age Of Ascent also benefited from a switch to .NET Core, with 2300 percent more request processed per second – a truly amazing result!

Raygun arent the only ones who have increased throughput with .NET Core

ASP.NET 4.6 and NodeJs are bottom left. This graph shows rapid strides in performance the leaner, more agile and componentized stack has taken (blue Windows, orange Linux) in just a few short months.

 

Two schools of thought

From the questions we received around the specifics of our performance improvements, there seems to be two schools of thought:

  1. Of course it’s faster, Node is slow
  2. You must be doing Node wrong, it can be fast

In my view, this is an incredibly simplistic view of the world. Node is a productive environment, and has a huge ecosystem around it, but frankly it hasn’t been designed for performance.

Could we have got more out of it? Sure. Do I think we could get it to beat .NET Core for our specific use case?

No, probably not.

Further reading

C# Performance tips and tricks

How to provide more value as a developer 

Next level software intelligence across your entire stack.

22 Comments on “How Raygun increased throughput by 2,000% with .NET Core (over Node.js)”

  1. Pingback: How Raygun increased throughput by 2,000% with .NET Core (over Node.js) – Full-Stack Feed

  2. Jess Telford

    > We settled on Node.js, feeling that the event loop model worked well given the lightweight workload of each message being processed.

    > [.NET Core is] able to asynchronously hand off to our queuing service [which] greatly improved throughput […] our servers spent less time blocking on the hand off.

    I’m curious if you looked into solutions like pm2 for managing multiple node processes (ie; threads) at once? I don’t imagine pm2 would garner 2000% gains (.NET Core having built in support for threads will beat out pm2’s tacked on option), but I hazard a guess it may reduce the discrepancy.

  3. John-Daniel

    Hi Jess,

    We were running a multi-process setup to ensure that we were leveraging all the CPU cores that were available. We didn’t go to town on optimizations to see what we could squeeze out of Node beyond things like multi-process. Having said that, we also haven’t invested heavily in optimising the .NET Core version either 🙂

    I’d personally encourage taking a look at a small project and porting it to .NET Core to see how you find it. That was one aspect of why we started with our API, it was a small enough project for us to bite off and see what we found 🙂

    I hope that helps,

    John-Daniel Trask

  4. EntroperZero

    Despite your pre-emptive strike, even the /r/dotnet subreddit thinks you must be doing Node wrong. 🙂

    I think the most curious part is this:

    > it became quite obvious that being able to asynchronously hand off to our queuing service greatly improved throughput. Unfortunately, at the time, Node.js didn’t provide an easy mechanism to do this

    Given that Node is designed to be asynchronous from the ground up, it seems very odd that the only way you could communicate with your queuing service required blocking I/O.

    I think most people who read a headline claiming a 2000% performance improvement are going to reach for a few grains of salt. Some more details would really help the credibility of your claim.

    1. John-Daniel

      Somebody on reddit disagrees? I, for one, am shocked! 😉

      Here’s the thing: I’m sure we could have spent a bunch of time improving things. I’m sure we could do the same with .NET Core to get more out of that too though.

      We were using callbacks, and totally understand that under the covers, Node leverages threads for IO to ensure that you don’t block the event loop. Having said that, we’d still want that acknowledgement from the queue pass to return to the client a success state. My hunch (and it is just a hunch) would be that the way the event loop and task queues are managed leads to a small about of time loss on each cycle. Given that we are dealing with billions of small requests, those stack up in a pretty big way with how that concurrency model is handled under the covers.

      We did use the multi-process extensions to try and ensure that Node would handle more per server. All of this work did not make huge inroads in the performance story.

      The benefits, I believe, stack up from:

      • The concurrency model in .NET Core is more beneficial to our workloads.
      • The overhead of Express was relatively high compared to the overhead of the .NET Core web request handling.
      • The .NET Core team are obsessed with performance right now. They count every allocation that the run time is making, that the web server is making, etc. This has compounded and compounded (examples being that .NET Core 2 is looking 20-25% faster than the previous version again). I don’t see the same emphasis with Node — and that’s OK, it might not be their focus right now.

      I’m all for folks disagreeing, they’re welcome to. I’m just telling our story of what we experienced and why we think it happened 🙂

      I’d love to read the stories about folks moving from .NET Core to Node and getting huge performance gains too, so please feel free to share them if you have some. So far I’m only seeing ourselves and other organisations tell stories about the benefits of moving to .NET Core.

      I hope that helps 🙂

    1. John-Daniel

      We’re running on Linux servers. Since we scale out the API nodes, we wanted to keep them as inexpensive as possible. We have no special Windows requirements with those servers, so linux was the easiest path forward. I hope that helps Salman!

    1. John-Daniel

      We chose that early on because we just scale out – add more servers. If we used chunkier servers, then we’d be wasting server capacity when we needed to push forward a little bit (think of it like step changes being able to more tightly follow the actual demand curve). It was cheaper to just add additional servers.

      Ultimately we could have just had one big server, but redundancy is important. We also compared with .NET Core using the same server type to make it Apples with Apples comparison. When we started, we didn’t deal with the data volumes that we do today, so we’d still require a fleet of servers, even if much larger. There is an element of legacy size requirements in the server size choice 🙂

      We’re also keeping an eye on things like AWS Lamda and Azure Functions to even more tightly follow the demand curve and control costs there. We think that could one day be an improvement and we’re tracking those platforms as they mature.

      I hope that helps Michael.

  5. Michael Benin

    “Ultimately we could have just had one big server, but redundancy is important. We also compared with .NET Core using the same server type to make it Apples with Apples comparison.”

    This is not an equal comparison. Know the language you’re using. Java and C# will thrive in this environment.

    Why even have clustering if you’re running this on 1 core? Clustering is pointless here for node.

    Languages like java and c# will always win in this kind of comparison. Try running this same benchmark on t2.2xlarge which has 8 cores.

  6. Greg Young

    I am just curious how in a what I assume to be IO bound process you managed to get 20x performance gains when both node.js and asp.net core both run over the same C code for handling IO (libuv). This sounds like snake oil.

  7. Pingback: Automatic Breadcrumb logging in Raygun4Net - how to get set up

  8. Chriss

    Seriously, asp.net faster than node.Js? This sounds like a Microsoft marketing campaign. Especially without any code to share (but lots of links to several Ms github posts etc). How about the fact that they added async in .net 4.5 and they advice using entity framework which does not support concurrent async requests on the same context, where all node.js orm support this just fine? Is this a performance killer or not? EF 6.1.3

    I can come with more “gems” like this easily

  9. Josh

    With SignalR slated to be released Q3, what stack does Raygun use for live updates since it isn’t officially out for .NET Core 2?

    1. Freyja Spaven

      Hi Josh, we use SignalR for our live updates, we only use .NET Core for the public API. The API used by the webapp is ASP.NET

    1. Freyja Spaven

      Hi Matt, we almost certainly developed it on Windows, we probably did some testing on Linux before releasing it to production but we just develop on Windows and deploy to Linux now.

      Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *