The Auckland Software Craftsmanship team asked us to give a presentation at their most recent meetup, so I went along and gave a talk on how we use Raygun internally when working on Raygun (I’m not going to make that yo dawg joke again, don’t worry). The talk’s up on Youtube if you’d like to watch it – if you just want the cliff notes, then this blog post is for you!
Before I worked at Raygun, I was part of a large dev team with a lot of process involved in releasing: QA, many meetings, signoff from people, more meetings, and a 6 week release cycle. By the end of it, you were pretty sure nothing was going to go wrong. When I joined Raygun, JD wouldn’t give me the toilet door code until I released something to production. Part of me was super excited about this new-found freedom to just get stuff done, but my inner pessimist wanted a bit more security about the work I was deploying than “don’t worry, we can roll it back if it breaks”.
What I didn’t realise at the time is that being able to just roll it back if it’s broken only works if you have tooling in place to detect failures. You can’t rely on customers to complain, by the time tickets come in things could have been broken for ages! So part of being able to deploy quickly and often is having that kind of information available and visible.
We use Raygun on all of our deployables – all of our websites, the API, all the tools, workers, and scheduled tasks. Everything feeds into separate Raygun apps. From there devs are subscribed to notifications about deployables that they are responsible for. We also feed everything into Slack channels – there’s a channel for the ops team, a general errors channel, one for the dev team that gets errors in our staging environments, amongst others. This means that errors are highly visible and if the right person isn’t paying attention, there is a high likelihood that someone else is.
If you’re not on top of your errors, then this can be overwhelming to begin with. Part of my job is checking up on new errors, triaging them and making sure someone is dealing with them. This way when I check the support queue and see a customer ticket about a bug, I’ve either got a plan to fix it already or I’ve assigned it to someone. Being able to tell someone “yes I saw you were having trouble with that, I’ve got a developer on it and a fix should be out today” is a massive win for support – we’ve gotten customers for life by being responsive to customer issues.
There are a few features that make this work. Obviously the error details page gives me the information I need to triage and assign an issue. The User Tracking feature helps to prioritise issues (the squeaky wheel gets the grease). The Slack integration, which I’ve talked about already. The Daily Digest helps us keep an eye on the number of errors that are happening each day.
I also did a bit of a tech demo and talked about the architecture of the app, but this post is already getting out of control and we’ve already covered that in earlier blog posts.
If you also want to get insight into what happens with your code once you’ve pushed it into the wild, you can Try Raygun free for 14 days – we support all languages and platforms.