How to measure and improve Node.js performance

| 12 min. (2403 words)

Change is the only constant in software, and few languages change like JavaScript. One of the biggest changes we’ve seen is that JavaScript is now extremely fast.

One of the fastest JavaScript engines in existence is Google’s V8, which powers both Chrome and Node.js. In simple benchmarks, well-optimized JS (JavaScript) executed by V8 often performs at 80-90% of the speed of the corresponding C++ code.

This speed is thanks to the immense amount of work that has gone into V8’s JIT (just-in-time) compiler. The JIT compiler analyzes your code as it’s running, and optimizes the critical pieces of your code.

So if JavaScript is so fast, why are so many Node.js applications slow?

Part of the problem is that many Node.js developers lack experience profiling apps. Thankfully, there are great tools for profiling our apps built into the JS engines we use every day.

In this article, I’ll show how we can profile a Node app using both the built-in profiler and Raygun APM for Node.js. I’ll also provide some tips for optimizing the performance of Node.js applications.

Using Node’s built-in profiler

Node has a high-quality profiler built into the Node debugger. We can enable it by passing the --inspect flag when running an application.

Here is a sample application we can use to test out the Node.js profiler. It includes both a fast route that only sends a response and a slow route that loops, allocates, and sleeps.

If you want to follow along yourself, you will need Node.js (v12+) installed, as well as Chrome/Chromium.

Here’s the code we’ll be profiling to start with. Save it as index.js in a new project directory:

const express = require('express');
const app = express();

app.get('/', slowRoute);
app.get('/fast', fastRoute);

function slowRoute(req, res) {
  loop();
  allocate();
  setTimeout(() => res.send('Success'), 100);
}

function fastRoute(req, res) {
  res.send('Success');
}

function loop() {
  for (let i = 0; i <= 1e8; i++) {}
}

function allocate() {
  const items = [];

  for (let i = 0; i < 1e6; i++) {
	items.push({ count: i });
  }
}

app.listen(3030, 'localhost', () => console.log(`Listening on localhost:3030`));

We now need to install express, via npm install express on the command line in our project directory.

We can now run this example with the debugger by running node --inspect index.js.

If it worked, you’ll see output like this screenshot. If you encounter an error, it might be because the Node.js debugger is already running for another app.

The Node debugger should read "Waiting for the debugger to disconnect"

Now, we can connect to the debugger session by opening Chrome and then opening the DevTools. The easiest way to do this is by right-clicking the page and selecting Inspect.

If your Node application is running with the debugger enabled, you’ll see a Node icon on the left of the DevTools control bar.

Click the green Node icon to open a new DevTools window

Clicking the green Node.js icon will open a Node.js DevTools window. By default, the Console window is open and will show output from your application.

Navigate to the Profiler tab

Navigate to the Profiler tab, and click the Start button to begin capturing a CPU profile.

Click the Start button in the Profile tab in DevTools

The profile is now recording. We can make some requests of our web application so we can profile the different routes.

The profile is now recording

Open localhost:3030/fast in your browser, or run curl localhost:3030/fast on the command line, then click the Stop button in the Node.js DevTools window.

Open localhost:3030/fast in your browser

After the profile has finished processing, it will open. Select the “Chart” view from the dropdown at the top of the profile view if it is not already selected.

By default, the profiler shows the entire trace, which contains a lot of idle time. We can zoom in on the key part using the mouse wheel or by clicking and dragging in the timeline at the top of the profile view.

By default the profiler shows the whole trace

The fast route in our sample application only responds with the string “Success”. Most of this profile shows internal frames from express.

It helps when profiling an application to build up a narrative about what’s happening in your code.

To start with, we can see an onconnection frame. Hovering over it shows that it comes from the Node standard library file net.js. This shows the socket creation for the new connection.

The next chunk of frames sits under parserOnHeadersComplete. It includes routing logic, parameter processing, and our handler (fastRoute). The final sections show the response sending over the socket, followed by the socket closing.

This profile is a trivial one since it doesn’t contain any slow user code, but it’s useful as a baseline when comparing against a slower request.

Using the same process as before, we can capture another profile. This time, instead of loading localhost:3030/fast, we’ll load localhost:3030, which hits our slow route.

By default the profiler shows the whole trace

The profile for our slow route contains a lot more information. We can see some interesting detail about how V8 is executing our code. The slowRoute function takes up the most time in this trace. The loop function is the first to run, followed by allocate. The small gray frames below allocate are GC (garbage collection) frames. They represent V8 pausing the main JS thread to clean-up any unused memory.

It can be hard to understand how GC impacts the performance of our Node.js applications. By profiling key sections of code we can see when we’re running into delays caused by GC.

We can also see the result of calling setTimeout before sending the response. After the allocate function ends, you can see a gap before the final response sends where the Node process is idle.

There’s one more interesting thing to note about this trace, and it’s that the allocate function has split into two different frames. This is strange because allocate is only called once, and doesn’t do any complicated control flow. The reason for this is that V8’s CPU profiler is a sampling profiler.

Some profilers work by tracking data when function/method calls start or end. This is often called a method-level profiler.

In contrast, a sampling profiler captures the call stack at a regular interval. This means that not every function call will show up in a profile, since it might run too fast to be sampled.

By default, the V8 profiler samples the call stack every millisecond. This means that if your function executes in less than a millisecond, it might not appear on a profile.

The upside of this design is performance. The overhead of a method level profiler increases as your application gets busier since there are more method calls to track. In contrast, the overhead of a sampling profiler stays steady regardless of how busy your app is. On top of that, sampling profilers will only omit code that executes extremely quickly. The slow functions that we care about profiling are always included.

Node.js performance tip

Creating new objects in JavaScript takes up memory, but did you know it also takes up CPU? The more objects you have, the longer the garbage collector spends finding them to clean up. This means that creating many objects can cause severe Node performance issues. If you need to make lots of objects, consider using a Buffer-based data structure like Flatbuffers or CapNProto instead.

Let’s capture one more profile with the Node DevTools. This time, start a profile, then make two requests of localhost:3030 in quick succession. The easiest way to do this is to run curl localhost:3030 & curl localhost:3030 from a Unix shell. You can also navigate to localhost:3030 in your browser and then immediately refresh once it finishes loading.

A new profile in the Node debugger

In this profile, we can see a slowRoute call for each of our requests. What’s notable here is that the requests process concurrently, due to the use of setTimeout. Once the loop and allocate steps are complete for the first request and setTimeout is called, the second request begins to process. Once the second bout of loop and allocate calls are complete, the response for the first request is sent.

node --inspect and Chrome’s DevTools are a convenient way to profile our local applications. But what if we want to get this sort of information in test/production? Or if we want to see more information, like HTTP queries and database requests?

Using Raygun’s profiler

We can use Raygun APM for Node.js to profile our applications. Raygun APM for Node.js takes the V8 profiler to the next level by providing a polished APM experience out of the box. It includes support for common web frameworks and database adapters, and much more.

To capture profiles using Raygun APM for Node.js, we first need to install the Raygun APM Agent.

After we’ve installed and started the agent, we’ll install the raygun-apm Node package.

npm i raygun-apm --save

Then, we need to require raygun-apm/http from the start of our app:

require('raygun-apm/http'); // Add at the top of your application’s entrypoint
const express = require('express');
const app = express();

We can start our app with node index.js, we no longer need the --inspect flag.

Now, when we make a request of our app, raygun-apm will run a capture a profile and send it to Raygun via the APM Agent.

We can now navigate to https://app.raygun.com/, select the app we created for testing, and view the traces we’ve captured. By default, the APM Agent will only include one trace every minute for each route, so not all requests will result in a trace appearing. You can configure this on the sampling page, which is in the sidebar under APM.

Raygun APM helps to improve Node.js performance

And we can now see our captured trace. The most noticeable difference we can see is that this trace contains far fewer frames from Express. By default, raygun-apm will collapse consecutive frames from a third-party library into a single frame. This helps to improve trace clarity and highlight user code.

This is configurable. If we set the environment variable RAYGUN_APM_DISABLE_STACK_SIMPLIFICATION=true, we’ll see these frames.

Raygun APM provides both detail and performance in traces

We can also configure how often the profiler captures a sample, which allows us to increase detail at the expense of more overhead.

We can use the environment variable RAYGUN_APM_SAMPLING_FREQUENCY, which defaults to 1000. This variable is in microseconds and equates to one sample every millisecond, or a thousand times a second.

This default provides a balance of detail and performance. If we want more detail, we can set RAYGUN_APM_SAMPLING_FREQUENCY to a lower number.

Raygun APM provides both detail and performance in traces

The request takes longer to complete with RAYGUN_APM_SAMPLING_FREQUENCY=1, especially the loop function. This is due to the overhead of capturing a stack sample every microsecond, or a million samples every second. While you wouldn’t want to sample this often in a production environment, it’s interesting to try out.

The dark sections around the 800-900ms mark aren’t special UI, they’re actually tiny frames.

Raygun APM surfacing small frames helping to measure Node.js performance

This zoomed-in segment shows the allocate function running for 307 microseconds, followed by a 102 microsecond GC pause. This level of detail is beyond what we generally need to observe when optimizing our applications, which is why we default to 1ms samples.

raygun-apm also supports capturing information about queries and requests your application performs.

Raygun APM surfacing small frames and queries helping you to measure Node.js performance

Here is a trace from a sample application I’ve created, that acts as a cache for avatars loaded from Gravatar. It will first check if we have a cached Gravatar for the given email, and otherwise load it from the Gravatar API. After we fetch the Gravatar, it’s stored in Redis and returned to the user. The yellow frames show database queries to Redis, and the green frame represents an HTTPS request to the Gravatar API.

The above trace is from a cache miss. You can see the initial get query to Redis to check the cache. Then, the request to load the gravatar, with a final short query to store the avatar in Redis.

Raygun APM surfacing a trace from a cache hit

For comparison, this is a trace from a cache hit. This request only took 24ms and we can see that it’s dominated by the Redis query. It’s much easier to optimize our applications when we can see which parts are IO bound vs CPU bound.

An example trace from Gravatar

raygun-apm also supports profiling applications that use worker_threads from the Node standard library. Here we can see the above Gravatar example, except now we’re also running Sharp in a Worker in order to resize the Gravatar before saving it to Redis. Sharp uses a native library to resize the image, which doesn’t currently show up in the trace. We can see where the image resizing starts and ends in the worker.

raygun-apm will automatically profile Workers on supported versions. The profile starts when they’re sent a message from code that is being profiled, without requiring any code changes.

It’s worth noting that the Node application is still idle for most of this request, and is able to handle other requests while waiting.

raygun-apm captures one profile per request, at most two at a time by default (configurable via an environment variable). We also capture information about the app’s asynchronous contexts. We use this information to generate profiles that only contain details from a single request.

Traces in Raygun APM handling incoming requests

Traces in Raygun APM handling incoming requests

These two traces are from a pair of parallel requests. We can see both of them handling the incoming request, querying Redis, and returning the cached avatar. What’s noteworthy is the position of the (idle) and (garbage collector) frames. If there’s no call stack at a particular point in a profile, it’s because the Node process was running code that doesn’t relate to that request.

This behavior differs from the Chrome DevTools and some APM offerings, which show all the CPU activity over a certain period. Our approach makes it clearer what was happening for a request, without all the other noise.

raygun-apm also integrates with Raygun’s Error Monitoring software.

Traces in Raygun APM handling incoming requests

I’ve changed the Gravatar cacher to report errors to Raygun if we have any issues loading a user’s avatar. Here, I’ve simulated a network issue by redirecting www.gravatar.com to localhost via my system’s hosts file. That’s why the error we’re seeing is about not being able to connect to 127.0.0.1.

We can select the error (shown as a red dot), and then click “View in Crash Reporting” to bring up the full error report.

Raygun Crash Reporting's error summary for a more detailed insight into

This solution makes it easier to understand which part of your application was responsible for an error. You can also view any associated trace for an error by clicking the “Trace” tab from the Crash Reporting view.

Raygun APM for Node.js is available now. If you’re an existing Raygun customer, log in to the Raygun app, select APM from the sidebar, and follow the setup instructions to start your free trial. If you are new to Raygun, take your trial here.