No new errors in app since before 2021-06-14T08:00 UTC even though we get notifications
greg.mcgee
Posted on
Jun 14 2021
Our applications aren't showing any new error events in the past 11 hours even though we're getting email alerts that these errors are occurring.
John-Daniel Trask
Raygun
Posted on
Jun 14 2021
Hi Greg,
The team are currently investigating an issue as several customers have noted this. We can see normal volumes hitting our API, and going through the system, but there is a noticeable drop off in what's being ingested into the data store itself. Not sure yet what's going on here, but it is under active investigation.
The status page here is being kept up to date: https://status.raygun.com/
It appears to only relate Crash Reporting, not RUM or APM. As you note, the data is making it most of the way through the pipeline, so is triggering notifications. I'll update this thread when it's resolved.
Sorry for the inconvenience Greg!
John-Daniel Trask
John-Daniel Trask
Raygun
Posted on
Jun 14 2021
Hi Greg,
Early days, but data is now being indexed properly into the data store again. The team are still investigating, but I wanted to let you know that new data is now beginning to flow.
John-Daniel Trask
greg.mcgee
Posted on
Jun 15 2021
Thanks for the update, new events seem to be ingesting properly now.
There is a time period yesterday where we'd expect to see events but we don't see any. Are things still being processed on Raygun's end or is that data lost?
John-Daniel Trask
Raygun
Posted on
Jun 15 2021
Hi Greg,
The data isn't lost, and will be back soon. In effect, one data store that is used for querying for the dashboard, was what lost data. The data is being restored from another data store, used for a different type of querying. In effect, 'replaying' that data through the pipeline. There's a couple of corner cases being ironed out, but the data should be backfilled soon.
I appreciate your patience Greg,
John-Daniel Trask
John-Daniel Trask
Raygun
Posted on
Jun 17 2021
Hi Greg,
Just confirming all data should be back now (was actually completed in the wee hours earlier today, I'm just closing off the updates to folks one on one).
Sorry again about this, we will be doing a full post mortem and identifying what needs to be improved to avoid this in future.
Kind regards,
John-Daniel Trask