Improving search in Raygun

By Callum Gavin | Posted Feb 20, 2014 | 5 min. (894 words)

We’re constantly working on various parts of the whole Raygun platform. One aspect we’ve been working on recently is improving search. A powerful and flexible search index on your data is crucial given the huge amounts of data that Raygun can capture from your application.

In this post I’m going to detail some of the existing features you might not have known about, some of the recent features that boosts the power of Raygun search, and a bit about how it’s implemented (the juicy part).

Prefixes for targeting properties

First off, something for existing Raygun customers that you may be unaware of. Raygun search allows you to begin your query with one of several keywords. The engine will then look for your query only in the field specified by the keyword. These keywords are:

version
machinename
url
message
impacteduser
exceptiontype
stacktrace
innererror
customtags
customdata

For instance, the query

‘exceptiontype: WebException’

will only find errors that have a ClassName of WebException.

Note: previously you had to use one of these prefixes to return the match you want. For instance, if you had a ‘foo’ value in an error’s custom data, you’d have to query for ‘customdata: foo’. This has been corrected, so querying without a prefix will return results from all field indexes.

Search query strings use AND

The word tokens you enter in search queries are now ANDed together. This is a recent change, and should improve the experience during searching by providing more relevant results. Previously we’d been defaulting to an OR but this is not how people generally expect a search to work. Fixed!

Partial email address search

We’ve made a bunch of improvements to our Elasticsearch backend, which results in much nicer searching for email addresses. This is invaluable for hunting down valuable customers and letting them know once you’ve fixed their bugs. Say you’ve got a couple of addresses like these mentioned in some of your error instances:

person1.work@raygun.com

person1.home@raygun.com

Search queries containing any of the following tokens will match those two:

‘person’, ‘1’, ‘raygun’, ‘io’

Adding ‘home’ or ‘work’ will exclude the other one, as you’d expect.

camelCase string support

The search backend now splits camel cased strings (lower to upper casing changes), as well as on number-letter transitions (as shown above). By example, here’s an Objective C type:

‘NSWindow makeKeyAndOrderFront’

To locate that line in a stack trace, you can search for any of ‘NSWindow’, ‘window’, ‘make, ‘order’, and so on. Supporting code search is a key feature for Raygun, and this improvement should make it a lot easier to track down that bug with inexact queries.

Implementing with Elasticsearch

Elasticsearch’s API is powerful and logical, giving you the tools to customize the analyzing and tokenizing of your input data, and the same for queries. To implement an Elasticsearch indexing strategy, you begin by specifying what analysis you want to perform. An analyzer comprises one Tokenizer and zero or many TokenFilters, which splits one large input string into tokens. Examples of built-in ones include the Whitespace tokenizer, or the Letter tokenizer which splits strings at letter-to-number transitions.

A TokenFilter modifies the tokens created by the Tokenizer. Examples include lowercasing (practically expected by users), removing common words, or adding synonyms/phonetically related words.

The built-in ones can get you most of the way, but chances are you’ll need to customize. The API will vary depending on if you’re interacting with the system via JSON or a client for your language. We happen to be using NEST (the .NET client), so the examples below will be in C# using NEST’s API, but the principles are transferrable as most clients seek to wrap the JSON API as detailed in Elasticsearch’s docs.

To implement the above behaviour, create an index specifying the name, and add your mappings (you’d configure replicas/shards depending on your environment):

_client.CreateIndex(yourIndexName, c => c
  .NumberOfReplicas(0)
  .NumberOfShards(1)
  .AddMapping(e => e.MapFromAttributes())
    .IndexAnalyzer("indexAnalyzer"))

Note that you set the analyzer to use on the mapping object. Then, on the ‘c’ parameter, call Analysis() with another lambda expression to set up your actual analyzer and the Tokenizer and TokenFilter it will use:

.Analysis(analysis => analysis
  .Analyzers(a => a
    .Add("indexAnalyzer", new CustomAnalyzer()
    {
      Tokenizer = "myPatternTokenizer",
      Filter = new List() { "myLowercaseFilter" }
    }))
  .Tokenizers(t => t.Add("myPatternTokenizer", new PatternTokenizer()
  {
    Pattern = "([.@:]\\W*)|\\s+"
  }))
  .TokenFilters(f => f
    .Add("myLowercaseFilter", new LowercaseTokenFilter()))
  ));

Again, note that you define your Tokenizer and TokenFilters by adding them to a string-object Dictionary, then using them by setting the Tokenizer/Filter property on the Analyzer object to their assigned strings.

The key magic is placed in the Pattern string, which is a regex which is used to split your long input strings into tokens. The above example is a fairly trivial regex which will split on one occurrence of a period, @ symbol, or colon, followed by zero or more non-word characters – or one or more whitespace characters. This you will need to tweak depending on your unique input data.

There’s also examples of useful regexes on the Pattern Analyzer documentation page, including a standard whitespace tokenizer, non-word character tokenizerm and a camelCase tokenizer.

Check out Raygun search now

Got a Raygun account? You can try out the features mentioned in this post now by logging in, but if you don’t have an account never fear – there’s a 30-day free trial available here, no credit card needed. Happy error blasting!