Java performance optimization tips: how to avoid common pitfalls

Taylor LodgeProvider Updates, Tech Stuff, Web Development3 Comments

java performance optimization

In this post, I’m going to take you through some Java performance optimization tips. I’ll specifically look at certain operations in your Java programs. These tips are only really applicable in specific high-performance scenarios, so there’s no need to go writing all your code in this approach as the difference in speed will be minor. In hot code paths, however, they could make a considerable difference.

I’ll be covering the following topics:

Use a profiler!

Before performing any optimizations, the first task any developer must do is check that their assumptions about the performance are correct. Maybe the portion of code they believe is the slow part is, in fact, masking the true slow part, resulting in any improvements having a negligible effect. They must also have a comparison point to be able to know if their improvements have improved anything, and if so, by how much.

The easiest way to achieve both these goals is to use a profiler. The profiler will give you the tools to find which portion of the code is actually slow and how long it is taking. Some profilers that I can recommend are VisualVM (free) and JProfiler (paid – and totally worth it).

Armed with that knowledge you can be assured that you are optimizing the correct portion of the code and your changes have a measurable effect.

Taking a step back to think about the approach to the problem

Before attempting to micro-optimize a specific code path, it’s worth thinking about the current approach it is taking. Sometimes the fundamental approach might be flawed, meaning even if you expend a great effort and manage to make it run 25% faster by performing all the optimizations possible, changing the approach (using a better algorithm) could result in order of magnitude or more performance increase. This often happens when the scale of data being operated on changes – it’s straightforward to write a solution that works well enough now, but when you get real data, it starts falling over.

Sometimes this could be as simple as changing the data structure you are storing your data in. To use a contrived example, if your data access patterns are mostly random access and you’re using a LinkedList just switching to an ArrayList could be a significant speed boost. For large data sets and performance-sensitive work, it’s critical that you select the right data structure for the shape of the data and the operations being performed on it.

It’s always worth taking a step back and thinking about whether the code you are optimizing is already efficient and is just slow because of how it’s written, or whether it’s slow because of the approach it’s taking is sub-optimal.

Streams API vs the trusty for loop

Streams are a great addition to the Java language, letting you easily lift error prone patterns from for loops into generic, more reusable blocks of code with consistent guarantees. But this convenience doesn’t come for free; there is a performance cost associated with using streams. Thankfully it appears this cost isn’t too high – anywhere from a few percent faster to 10-30% slower for common operations, but it is something to be aware of.

99% of the time the loss of performance from using Streams is more than made up by the increased clarity of the code. But for that 1% of times where maybe you’re using a stream inside of a hot loop, it’s worth being aware of the performance trade-off. This is especially true for any very high throughput applications, the increased memory allocations from the streams API (according to this StackOverflow post each filter adds 88 bytes of used memory) can cause enough increased memory pressure to require more frequent GC runs causing a heavy hit on performance.

Parallel streams are another matter, despite their ease of use they are something that should only be used in rare scenarios and only after you’ve profiled both the parallel and serial operations to confirm the parallel one is in fact faster. On smaller data sets (the cost of the stream computation determines what constitutes a smaller data set) the cost of splitting up the work, scheduling it on other threads and stitching it back together once the stream has been processed will dwarf the speedup from running the computations in parallel.

You must also consider the type of execution environment your code is running in, if it’s running an already heavily parallelized environment (like a website for example) then it’s unlikely you will even get the speedup of running the stream in parallel. In fact, under load, this might be worse than non-parallel execution. This is because the parallel nature of the workload is most likely already making as much use of the remaining CPU cores as it can, meaning you’re paying the cost of splitting the data up without the benefit of increasing the amount of available computation power.

A sample of the benchmarks I performed. The testList is a 100,000 element array of the numbers 1 to 100,000 converted to a String, shuffled.

In summary, streams are a great win for code maintenance and readability with a negligible performance impact for the vast majority of cases but it pays to be aware of the overhead for the rare case where you really need to wring the extra performance out of a tight loop.

Date transport and manipulation

Don’t underestimate the cost of parsing a date string into a date object and formatting a date object to a date string. Imagine a scenario where you had a list of a million objects (either strings directly or objects representing some item with a date field on them backed by a string), and you had to perform an adjustment to the date on them. In the context where this date is represented as a string you first have to parse it from that string into a Date object, update the Date object and then format it back into a string. If the date was already represented as a Unix timestamp, (or a Date object, because it’s effectively just a wrapper around a Unix timestamp) all you have to do is perform a simple addition or subtraction operation.

Per my test results, it is up to 500x faster to just manipulate the date object compared to having to parse and format it from/to a string. Even just cutting out the parsing step results in a speedup of ~100x. This may seem like a contrived example, but I’m sure you’ve seen cases where dates were being stored as strings in the database or returned as strings in API responses.

In summary, always be conscious of the cost of parsing and formatting date objects and unless you need the string right then, it’s much better to represent it as a Unix timestamp.

String operations

String manipulation is probably one of the most common operations in any program. However, it can be an expensive operation if done incorrectly, which is why I’ve focused on string manipulation in these Java performance optimization tips. I’ll list some of the common pitfalls below. However, I would like to point out that these problems only present themselves in very fast code paths or with a considerable number of strings, none of the following will matter in 99% of cases. But when they do, they can be a performance killer.

Using String.format when a simple concatenation would have worked

A very simple String.format call is on the order of 100x slower than manually concatenating the values into a string. This is fine most of the time because we’re still talking about 1 million operations p/s on my machine, but inside of a tight loop operating on millions of elements the loss of performance could be substantial.

One instance of where you should use string formatting instead of concatenation in high-performance environments, however, is debug logging. Take the following two debug logging calls:

logger.debug("the value is: " + x);

logger.debug("the value is: %d", x);

The second instance, which may seem counter-intuitive at first, can be faster in a production environment. Since it’s unlikely you will have debug logging enabled on your production servers the first causes a new string to be allocated and then never used (as the log is never outputted). The second requires loading a constant string and then the formatting step will be skipped.

Not using a string builder inside of a loop

If you’re not using a string builder inside of a loop, you’re throwing away a lot of potential performance. The naive implementation of appending to a string inside of a loop would be to use += to append the new portion of the string to the old string. The problem with this approach is that it will cause an allocation of a new string every iteration of the loop and require copying the old string into the new string. This is an expensive operation in and of self, without even bringing the extra garbage collection pressure into account from creating and discarding so many strings. Using a StringBuilder will limit the number of memory allocations resulting in a large performance speedup. In my testing using a StringBuilder resulted in a speedup of greater than 500x. If you can at least have a good guess at the size of the resulting string when constructing the StringBuilder, setting the correct size (which means the internal buffer won’t need to be resized causing an allocation and copy each time) can result in a further 10% speedup.

As a further note, (almost) always use StringBuilder instead of StringBuffer. StringBuffer is designed for being used in multi-threaded environments, and as such has internal synchronization, the cost of performing the synchronization must be paid even if it’s only being used in a single threaded environment. If you do need to append to a string from multiple threads (say in a logging implementation), that’s one of the few situations where StringBuffer should be used instead of a StringBuilder.

Using a StringBuilder outside of a loop

This is something I have seen recommended on the internet and seems like it would make sense. But my testing showed it was in fact 3x slower than using += is using a StringBuilder – even when not in a loop. Even though using += in this context is translated into StringBuilder calls by javac it seems to be much faster than using a StringBuilder directly, which surprised me.

If anyone has any idea why this is, I’d love to hear about it in the comments.

 

In summary, string creation has a definite overhead and should be avoided in loops where possible. This is easily achieved by using a StringBuilder inside of the loop instead.

I hope this post has provided you with some useful Java performance optimization tips. Once again I would like to stress that all of the information in this post does not matter for most code being executed, it doesn’t make any difference if you can format a string 1 million times a second or 80 million times a second if you’re only doing it a few times. But in those critical hot paths where you may, in fact, be doing it millions of times having that 80x speedup could save hours on a long-running piece of work.

This article is just a taste of the deep world of optimizing Java applications for high performance.

I’ve attached a zip file containing all the benchmarks and data that I used to write this post and see below for an output of the full benchmark run. All of these results were run on a desktop with an i5-6500. The code was run with JDK 1.8.0_144, VM 25.144-b01 on Windows 10

All of the benchmark code can be found here on GitHub

If you’d like more clues as to where performance problems may be lurking in your Java applications, try Raygun free for 14 days. Read more here. 

Read more on performance and Java

Four top Java exceptions that Raygun can help fix

How Raygun saved customers 75 hours per month with Real User Monitoring

Raygun’s latest feature: Compare performance data side-by-side

Next level software intelligence across your entire stack.

3 Comments on “Java performance optimization tips: how to avoid common pitfalls”

  1. Jackson Davis

    The final example is amusing.

    In stringAppend, all the +s compile into individual StringBuilder calls, and the JVM uses them into a single String construction.

    In stringAppendBuilder, that same optimization fails, though I don’t quite understand why. Running with -XX:+PrintOptimizeStringConcat (from a debug build) prints out a ton of debugging state about why it failed, but nothing obvious sticks out.

    That optimization is well known to be very brittle which is why java 9 uses the new String Concat invokedynamic for + instead

    1. Taylor Lodge

      Hey Jackson,

      Thanks for taking the time to read this and having a poke into that. I figured it had to be some extra optimization the JVM applied to bytecode it detected being from String concatenation that didn’t get applied to StringBuilder.append calls. Just surprised me looking at the generated bytecode as the stringAppendBuilder bytecode is a lot more efficient.

  2. Dave Brosius

    Here’s an amazing counterintuitive thing with Strings… which is faster to convert an int to a String?

    (a) String.valueOf(myInt)
    or
    (b) “” + myInt

    Let’s look at what (b) really does..,
    new StringBuilder().append(“”).append(myInt).toString()
    which if you look at the code really does
    new StringBuilder().append(“”).append(String.valueOf(myInt)).toString()

    so clearly (b) is a true super set of instructions of (a).

    So which one is faster? Yup, (b).

    It’s all about the JIT authors optimizing for common patterns..

Leave a Reply

Your email address will not be published. Required fields are marked *