LineRate Performance Tip: Logging in Node.js

LineRate's Node.js engine lets you program in the datapath with JavaScript. Once you start embedding business logic in your proxy, you'll need to debug this logic and report on the actions your proxy is taking. This can take several forms from the simple to the complex. When you're just getting started, nothing beats the venerable console.log() and util.inspect() for simplicity.

Whether your service scales to millions of requests per second, or just a trickle, there are some simple performance considerations that you should keep in mind. Knowing these up front prevents you from learning expensive habits that are harder to fix later.

1. Use log levels

If your logging is impacting performance, one easy way to fix it is to channel your inner Bob Newhart: "Stop it!"

But you don't want to be left without any logs when you're debugging a problem. A very common approach regardless of language is a logging helper that respects log levels like "error", "warning", "info" or "debug". Your code contains all the logging statements you could need, but by tuning the logging level you adjust which ones fire.

function handleRequest(servReq, servRes, cliReq) {
  log.debug('Caught new request for %s at %s', servReq.url, new Date());
  if (url.toLowerCase() != url) {
    log.warning('Request URL %s is not lower case', servReq.url);
    try {
      fixLowercaseUrl(servReq);
    } catch (err) {
      log.error('Error fixing URL: %s, sending 502', err);
      send502(servReq, servRes);
      return
    }

    // ...
  }

It's easy to find a node.js logging library that supports this, or you can make your own. Here's a simple gist that I use for logging (download from my gist😞

var log = (function () {
  "use strict";
  var format = require('util').format;
 
  // Customize levels, but do not name any level: "log" "getLevel" "setLevel"
  var levels = { ERROR: 10, WARN: 20, INFO: 30, DEBUG: 40, TRACE: 50 };
 
  var currentLevel = 'INFO';
  function doGetLevel() { return currentLevel; }
  function doSetLevel(level) {
    if (!levels[level]) { throw new Error('No such level ' + level); }
    currentLevel = level;
  }
 
  function doLog(level, otherArgs) {
    if (!levels[level] || levels[level] > levels[currentLevel]) { return; }
 
    var args = [].slice.call(arguments);
    args.shift();
    console.log(level + ': ' + format.apply(this, args));
  }
 
  var toRet = { log: doLog, getLevel: doGetLevel, setLevel: doSetLevel };
 
  Object.getOwnPropertyNames(levels).forEach(function (level) {
    this[level.toLowerCase()] = function () {
      var args = [].slice.call(arguments);
      args.unshift(level);
      return doLog.apply(this, args);
    }
  }, toRet);
 
  return toRet;
})();

2. Defer string concatenation

What's the difference between this call

log.debug('The request is ' + util.inspect(servReq) +
          ', and the response is ' + util.inspect(servRes));

and this one?

log.debug('The request is ', servReq,
          ', and the response is ', servRes);

These statements both produce the same output. Let's consider what happens if you're not running at the debug log level. Suppose we're running at logLevel INFO (the default in my simpleLog library).

Let's see what happens in the first call. Our code says to call log.debug() with one argument. The argument is a long string formed by concatenating 4 smaller strings:

'The request is'
util.inspect(servReq)
', and the response is '
util.inspect(servRes)

You can check out the implementation of util.inspect() and probably notice that there's a lot of things going on, including recursive calls. This is all building up lots of little strings, which are concatenated into bigger strings and finally returned from util.inspect(). The javascript language is left-associative for string concatenation, but different engines may optimize as long as there aren't observable side effects; conceptually you can think about the costs by thinking about the tree of strings that the engine is building:

The 2 hard-coded strings and the 2 strings returned from util.inspect() are concatenated one last time into the argument to log.debug(). Then we call log.debug(). If you're using my simpleLog, you can see that the first thing it does is bail out without doing anything:

if (!levels[level] || levels[level] > levels[currentLevel]) { return; }

So we went through all the effort of building up a big string detailing all the aspects of the server request and server response, just to unceremoniously discard it. What about the other style of call (spoiler alert: it's better :-)?

Remember the second form passed four arguments; 2 arguments are just plain strings, and the other 2 are objects that we want to dump:

log.debug('The request is ', servReq,
          ', and the response is ', servRes);

This style is taking advantage of a console.log() behavior: "If formatting elements are not found in the first string, then util.inspect is used on each argument." I'm breaking up the arguments and asking console.log() to call util.inspect() on my behalf. Why? What's the advantage here?

This structure avoids calling util.inspect() until it is needed. In this case, if we're at loglevel INFO, it's not called at all. All those recursive calls and concatenating tiny strings into bigger and bigger ones? Skipped entirely. Here's what happens:

The four separate arguments (2 strings, 2 objects) are passed to the log.debug() function.
The log.debug() function checks the loglevel, sees that it is INFO, which means log.debug() calls should not actually result in a log message.
The log.debug() function returns immediately, without even bothering to check any of the arguments.

The call to log.debug() is pretty cheap, too. The 2 strings are compile-time static, so it's easy for a smart optimizing engine like v8 to avoid making brand new strings each time. The 2 objects (servReq and servRes) were already created; v8 is also performant when it comes to passing references to objects.

The log.debug() helper, Node's built-in console.log(), and logging libraries on NPM let you defer by using either the comma-separated arguments approach, or the formatted string (%s) style.

A couple of final notes on this topic: First, the optimization in console.log() is awesomeness straight from plain node.js; LineRate's node.js scripting engine just inherited it. Second, remember that JavaScript allows overriding the Object.prototype.toString() method to customize how objects are stringified. util.inspect() has code to avoid invoking toString() on an object it is inspecting, but the simple '+' operator does not. A simple statement like this:

console.log('x is ' + x);

could be invoking x.toString() without you knowing it. If you didn't write x.toString(), you may have no idea how expensive it is.

3. Stdout/stderr is synchronous

In node.js, process.stdout and process.stderr are two special streams for process output, in the UNIX style. console.log() writes to process.stdout, and console.error() writes to process.stderr. It's important to know that writes to these streams may be synchronous: depending on whether the stream is connected to a TTY, file or pipe, node.js may ensure that the write succeeds before moving on. Successful writes can include storing in an intermediate buffer, especially if you're using a TTY, so in normal operation this synchronous behavior is no problem.

The performance pain comes if you're producing writes to stdout or stderr faster than the consumer can consume them: eventually the buffer will fill up. This causes node.js to block waiting on the consumer.

Fundamentally, no amount of buffer can save you from a producer going faster than a consumer forever. There are only two choices: limit the producer to the rate of the consumer ("blocking"), or discard messages between the producer and consumer ("dropping"). But if your program's log production is bursty, then a buffer can help by smoothing out the producer. The risk is if the consumer fails while data is left in the buffer (this may be lost forever).

LineRate's node.js environment provides a little bit of both: LineRate hooks your node.js script up to our system logging infrastructure; you can use our CLI or REST API to configure where the logs end up. We route logs to those configured destinations as fast as we can with buffering along the way. If the buffers are exceeded, we do drop messages (adding a message to the logs indicating that we dropped). We ensure the logging subsystem (our consumer) stays up even if your script fails with an error. All this ensures you get the best logs you can, and protects you from catastrophically blocking in node.js.

4. Get counters on demand

One last approach we've used to reduce logging is to provide a mechanism to query stats from our node.js scripts. LineRate includes an excellent key-value store called redis. LineRate scripts can use redis to store and retrieve arbitrary data, like script-specific counters. If all you're tring to do is count page requests for a specific URL, you could use a simple script like:

function serveCounter(servRes, counter) {
  redisClient.get(counter, function (error, buffer) {
    if (error) {
      servRes.writeHead(502);
      servRes.end();
    } else {
      servRes.writeHead(200, { 'Content-Type' : 'application/json',
                               'Content-Length' : buffer.length });
      servRes.end(buffer);
    }
  });
}

vsm.on('exist', 'vsStats', function (vs) {
  vs.on('request', function (servReq, servRes, cliReq) {
    if (servReq.url === '/path/to/counterA') {
      serveCounter(servRes, 'counterA');
    } else {
      cliReq();  // Or maybe an error?
    }
  });
});

or as fancy as a mini-webserver that serves back an HTML webpage comprising a dashboard and client-side AJAX to periodically pull down new stats and plot them. If you're in plain node.js, you'd use http.createServer() instead of vs.on('request', ...) and you'd have to generate an error page yourself instead of calling cliReq(), but the idea is the same.

Conclusion

I've presented a summary of good and bad logging behaviors in node.js. The combination of JavaScript's syntactic simplicity and node.js' awesome non-blocking I/O capabilities mean that you can spend more time on your business logic and less time shuffling bytes around. But, there are a few sharp corners. Logging libraries can smooth out those sharp corners, as long as you're aware of how to use them and the problems like output buffering that they can't solve.

All of my advice applies to node.js and LineRate's node.js scripting engine. I hope it serves you well for both. If you're looking for a proxy you can program in node.js, or if you'd just like to try out the examples, remember you can deploy LineRate today for free.

Updated Jun 06, 2023

Version 2.0

linerate

node