The developer's guide to monitoring for e-commerce

Get the visibility to deliver fast and error-free code
View full PDF

Raygun: the fastest way to improve your customers’ experience

Software quality in the e-commerce industry can be a matter of survival. A critical error or slow-loading asset on a key page could drastically undercut revenue in a matter of days or even hours. Without visibility, a persistent performance issue can lurk undetected for months or years, eating away at session lengths, conversion rates, and revenue.

In partnership with our e-commerce customers, Raygun has determined how our powerful monitoring tools can best address and prevent your industry’s key concerns. Monitoring for e-commerce focuses on key points of the customer journey to detect errors and crashes before they can wreak havoc, prevent performance regressions, and surface opportunities for improvement. Raygun keeps a watchful eye over your website or application, so you can take care of issues as they occur.

Part 1: Crashes and errors

10 ways you can solve this with Crash Reporting

Part 2: Performance and Customer Experience

9 ways you can solve this with Real User Monitoring

What's Next

Get up and running

Part 1: Crashes and errors

The issue: Compromised user experiences

Errors undercut software quality, resulting in janky, laggy, or dysfunctional page interactions, and ultimately translating into lost revenue for e-commerce sites. Development work on new features and functionality is rendered ineffective when errors cause disruptions. Leaving errors in your digital storefront vastly increases your cost to acquire customers. One in five online shoppers in Europe say they’ll abandon a purchase if checkout takes more than one minute, and with cart abandonment rates at 68.8% globally, e-commerce teams can’t afford any barriers in this crucial area.

Why it occurs

Customers are only growing more intolerant and mistrustful of sites that fall short of their rising expectations, and 47% of customers now say they will stop buying from a company if they have a subpar experience. Needless to say, loss of revenue due to errors impacts the entire business.

The issue: Lack of visibility and underestimating impact

Errors are often underestimated by e-commerce businesses, both in terms of scale and impact. Retail tech teams typically don’t track errors, or if they do, lack a clear method to prioritize the errors that affect the highest number of users and cost the business money. It can be difficult to focus on and quantify the user experience, making it challenging to benchmark as a metric or to demonstrate improvements over time.

Why it occurs

It’s easy for errors to fly below the radar. Just because you’re not hearing about it, doesn’t mean it doesn’t exist; less than 1% of customers experiencing crashes will submit a user report. Most will just leave your site, and you’ll never know why you lost out on that sale. 88% of users will abandon an app because of bugs.

The issue: Ineffective error replication and resolution

Even when errors do get noticed and fixed, it tends to be a costly and inefficient process. Customer dialogue, logging, and a trial-and-error approach to finding and diagnosing errors are often frustrating experiences for everyone involved.

Why it occurs

Default error messages tend to be unhelpful to the customer. Error messages don’t give enough detail to help the developer identify or replicate the error, and recreating an issue based on the end-user’s description is particularly expensive, as well as a poor user experience. While expensive session replay tools offer a visual record from the customer’s point of view, this leaves out the technical details that are actually useful for developers trying to replicate and resolve the issue at the code level. The industry standard for cost per call generally hovers around $7, but we’ve had e-commerce clients with per-call costs of up to $50.

The issue: Dealing reactively with customer-facing issues

Reactive teams responsively fix the issues their customers face once they’ve become impossible to ignore. Proactive teams are ahead of these issues, resolving them before they drive away customers (due to poor experiences) or bring down developer morale (due to tedious bug-fixing processes and context switching).

Why it occurs

With less than 1% of customers reporting an issue, it’s easy to assume that everything’s working as intended, while in reality, you’re losing customers and revenue to silently recurring issues. When the problem does surface, the pressure to fix it fast means developers must drop everything. This kind of disruptive context switching is shown to lead to poor development outcomes, with work taking twice as long and code being twice as buggy. As more errors are introduced, the engineering team is perpetually battling quality and technical debt issues.

10 ways you can solve this with Crash Reporting

1. Smart error grouping reveals how many unique issues you have and full diagnostics for every user issue.

Use intelligent error grouping to understand the extent of the issues that impact your customers and identify the specific customers affected. With this detailed data, you can ship fixes before reports get lodged or support tickets get created, and prevent future customers from running into the same issues. Get accurate and granular data for every error instance to understand the impact of an issue across your entire customer base and full tech stack.

2. Monitor every deployment and get alerted to issues in real-time.

Integrate monitoring with your CI/CD workflow and track each deployment to rapidly identify newly introduced errors, regressions, and performance issues. Raygun Alerting will notify you of any new or regressed errors in your latest deployment or a specific version of your application. Correlate errors with deployments at a glance via the Crash Reporting dashboard and compare the performance impact of your deployments with the Compare feature in Real User Monitoring.

3. Use the Crash-Free Users score as a customer experience metric when communicating with the wider business.

This is a metric that’s easily understood by all teams, but especially management. The objective is to get your apps to 100% Crash-Free Users, meaning your customers aren't experiencing crashes. Raygun customers have successfully used our recommended workflow to improve this score, combining Crash Reporting to identify the errors that are affecting the highest number of users and Real User Monitoring to monitor the total user experience. We’ve seen customers increase their Crash-Free Users percentage from 60% to 95-99%, and some have even adopted an established Crash-Free Users benchmark as a standard KPI for all new applications. This can dramatically decrease churn and prove the value of development work. Teams can visualize and share top-down statistics like these across their organization using our highly customizable dashboards, free with every Raygun plan.

4. Reduce noise by setting up thresholds and creating smart alerts for selected issues.

Raygun Alerting gives you full control to set threshold-based alerts for escalating exceptions, so you can proactively address issues as they occur. Customizable filters let you get even more specific about the errors that you want to hear about, matching specific error messages, tags, or versions. Error notifications can be sent to selected Slack or Microsoft Teams channels, Webhooks, or email lists to alert your on-call team, or whoever’s best positioned to take action immediately.

5. Replicate the entire customer journey in full technical detail without the end-user.

While session replay offers a top-level view of a customer’s interaction with your UI, this has limited value for error resolution. Detailed breadcrumbs and session view in error reports give developers the actionable technical data (that session replay won’t provide), to replicate and resolve an issue rather than simply watching a video of the customer having a bad experience without any indication of the root cause. This also helps Customer Service teams to find and understand the relevant errors for a specific customer and their session, significantly reducing the amount of time spent on support calls.

“Customers are busy and don’t always want to describe the problem, or they get frustrated when you don’t understand where the error is.” - Bob Bond, CEO, WriteUpp
6. Plug seamlessly into your existing development workflow.

No matter what development tools you use for alerts, issue tracking, deployment, or customer support, Raygun can connect seamlessly with over 30 third-party integrations. No need to write custom software; issues you create and resolve in JIRA and Trello will be reflected in Raygun Crash Reporting. Assign a developer in Raygun and they’ll be notified in Slack. Set up a custom alert with your chosen threshold and conditions and trigger an alert in PagerDuty to the on-call team whenever the condition is met.

7. Monitor your checkout flow, prioritize issues, and get notified immediately about critical issues.

You can create an unlimited number of applications to monitor the different parts of your website and environments (e.g. Checkout -> JavaScript -> Production). Use this to highlight errors originating from a particular area, like your checkout process. Adding tags allows you to indicate both priority level (P1-P4) and at which point the error occurred (e.g. Step1ShoppingCart). Combined with Alerting, you’ll be notified immediately of critical issues coming from key areas of your application (for example, a PagerDuty or Slack alert when a P1 issue occurs within the checkout process for production).

You can also filter your errors by tags on the Crash Reporting dashboard.

All this helps promote and protect an exacting standard of experience in the most revenue-critical parts of your digital storefront.

“Anything checkout-related is going to be a fatal flaw. The customer not being able to add to cart, view the cart, edit the cart, or any other kind of interaction in the checkout process… In e-commerce, it's going to be a make-or-break situation. I mean, you don't trust a checkout process with your credit card details if they can't keep errors out of it.” - Erling Gudjohnsen, Attikk

8. Proactively triage and address customer issues

Errors and crashes are more than just numbers on a graph – they're real users running into issues with your software. Minimizing the time it takes to identify and resolve high-priority issues improves your customer experience, conversion rate, and revenue growth. Raygun enables teams to:

  • Prioritize fixing the highest-impact issues based on the tags and the number of users affected
  • Get deeper insights into the root cause of errors and crashes with the full stack trace that takes you to the exact line of code
  • Connect errors to your source-code repository using our link to source integration with GitHub, GitLab, and Bitbucket
9. Reduce your Mean Time to Resolution (MTTR)

Bring down MTTR with detailed, actionable diagnostics and increase your deployment frequency, helping your tech teams to meet or exceed the standards required by management or SLAs.

10. Run more efficient support with the visibility of detailed error reports.

Support staff and developers can replicate a major customer-facing issue without involving the end-user, and even reach out to let them know that the error is fixed, without relying on the customer to lodge an error report. Ensure the affected customer is happy before any negative feedback occurs, the relationship is damaged, or management gets involved. Save on support infrastructure, development time, and support resource costs. You can also lean on Raygun to automate actions that don’t need human attention, like error triage. Hyperfish saved USD$150,000 in support wages by automating error reporting for enterprise customers.

(Note that using tracking is completely optional. If you prefer not to send PII data, you can use a UUID instead — you’re in total control of what you send into your Raygun app).

Part 2: Performance and Customer Experience

The issue: Slow pages cost you customers and sales

While often harder to detect than errors, slow-loading pages also detract from user experience for e-commerce sites, and often create cumulative damage as they may go undiscovered longer.

Why it occurs

When it comes to web performance, people are very, very impatient. 47% of users expect a maximum of 2 seconds loading time when they visit a site, and when these expectations aren’t met, the consequences are real.

57% of shoppers will leave a slow site to buy from a competitor

• As page load times go from one to five seconds, the probability that the customer will bounce increases by 90%

• A customer is three times more likely to make a purchase on the one-second page than the five-second page, and five times more likely than on a 10-second page

40% of shoppers will abandon a website that takes more than three seconds to load

21% abandon a purchase if checking out takes longer than one minute

• A 2-second delay in load time resulted in abandonment rates of up to 87%

COOK increased conversions by 7% after reducing page load time by 0.85 seconds, and Revelry boosted conversions by 30% after increasing site speed by 43%.

If your e-commerce site is making $100K per day, even a 1-second page delay could potentially cost $2.5M in lost sales every year. And even a slight increase in speed could mean big gains in revenue.

Performance is a feature, and one of the best ways to enrich the experience of interacting with your product. While it may not be as exciting as new functionality, better performance has a powerful cumulative effect on perception and success. If you let quality slide, even the most impressive range of features won’t be enough to save your user experience.

The issue: E-commerce struggles with Core Web Vitals

60-70% of all sites fail Google's Core Web Vitals, a search ranking criteria strongly associated with improved session lengths and conversions, and e-commerce sites are typically among the lowest-scoring. Conversely, when shopping sites do meet Core Web Vitals standards, users are 24% less likely to abandon the page mid-load.

Why it occurs

Core Web Vitals is a different way of scoring performance that focuses on the perceptions of the user, but teams who are focused on load speed or simply shipping features will likely fall short of Google’s standards. E-commerce is particularly prone to poor scores because of media-rich sites, with pictures, video and heavy 3rd-party resources. Marketing and design teams want heavy websites with images, fonts, user tracking, and complex animations, and usually don’t take performance into account. Even when sites start out with good Core Web Vitals, regressions occur as more features and design elements are added.

The issue: Development work goes unseen by the broader business

Executives often don’t have time to scrutinize or decipher technical data, while technical teams lack the language to easily convey progress or connect their work to the broader concerns of the team. Marketers and customer-facing teams don’t have an accessible way to understand what’s happening with software quality or how this might affect their ongoing efforts to attract and retain business.

Why it occurs

Even while the overall goals are broadly the same across the business, everybody is pursuing and measuring them differently, and each team uses their own set of KPIs. Meanwhile, Engineering is constantly receiving new requests for features and fixes from different teams, with no easy way to justify putting high-impact work first.

9 ways you can solve this with Real User Monitoring

1. Get vastly more accurate and trustworthy data than synthetic monitoring.

Real User Monitoring doesn't rely on sampling or simulated user sets. There’s really no replacement for capturing real user interactions, and RUM tools track performance as experienced by your actual users on their devices. The accuracy and scope of real-user data helps development teams to quantify and improve the user experience of their software.

2. Replicate full user journeys through your code.

See details about a specific user, retracing their journey through your application, and understanding how each page loaded (along with a full waterfall chart) to diagnose the root cause of customer-facing performance issues and elements.

3. Set benchmarks and review performance of A/B tests.

Use the Compare feature to assess competing test sets and Google Tag Manager to understand the impact of optimizations or new deployments, proving the value of development work.

4. Create clear priorities on which performance issues to fix.

Find and prioritize strategic targets like your slowest and most requested pages. We’ve established that page load speed has a serious bearing on several essential business objectives — RUM gives teams precise speed metrics, pinpointing which areas are slow for your users, and which ones to target for the biggest and easiest improvements that will affect the highest portion of your customer base.

5. Center the metrics that matter with first-class support for Core Web Vitals.

All three metrics are included in Raygun’s data reporting at the individual URL level, capturing user cohorts falling into the Good, Needs Improvement, and Poor categories. Core Web Vitals notifications are also available via Alerting, proactively telling you if your scores fall below the “Good” thresholds.

The level of detail captured by RUM is instrumental to identifying commonalities, setting priorities, and improving your scores — which in turn boosts Google search ranking. By improving Core Web Vitals on their home page, achieved a 10% increase in conversion rates.

6. Prevent high-profile headaches in retail apps.

Even world-leading retailers can fall victim to preventable performance issues, and we’ve all heard horror stories about outages caused by traffic surges on Black Friday – a $3 billion e-commerce day. In 2021, Office Depot went down for hours on Cyber Monday, while other massive brands like Walmart and GameStop also reported outages during this crucial shopping period. RUM surfaces the overall trends in your software performance over time to help prevent these worst-case scenarios, equipping your team to catch and resolve issues before they become a crisis. Tracking trends can act as an “early warning” system, establishing what your normal range is in order to catch any irregularities early. The Performance page is a great place to start identifying and prioritizing issues to fix.

7. Get alerts for changes in performance.

Using your RUM trend data, define a baseline for your “normal” standards of performance, and set strategic alerts to notify the right members of your team if anything goes outside these parameters.

8. See performance through the eyes of your customer.

User-centric metrics like Largest Contentful Paint (LCP) help developers understand how a visitor perceives the load speed of the page. This can answer questions like, “how long before the user sees the page is loading?” and “when did the user first see something on the page?”

9. Connect your support team, development team, and everybody else.

Support is a key component of customer experience. Raygun gives your team the tools they need to rapidly tackle the issues that inevitably slip through the net. You may not be able to prevent every error, but you can respond and resolve faster than ever before.Set alerts to notify your team if loading speeds ever exceed 4 seconds (according to Google, 4 seconds and above is the threshold for poor performance). Use Real User Monitoring to zoom in on any spikes in load time, and to drill down to the pages where the worst performance outliers occur. From there, you can go even deeper, down to the slowest individual page assets or XHR calls.

What's Next

Monitoring with Raygun shows you exactly where to direct your attention, take action and see the results of your error and performance optimization. Our tools are built by developers, for developers, with a first-hand understanding of the answers technical teams need to move faster and get results. For a closer look at our product, you can view a full demo video, or explore our documentation for yourself on Crash Reporting and Real User Monitoring.

Get up and running

To get started improving the quality of your code, jump into a free 14-day trial and get Raygun into your production environment. It takes less than 15 minutes to get set up and start seeing real-time data, and you might be surprised what you'll find. Then, book a call with our expert team to get specialized guidance on monitoring for your unique tech stack, goals, and requirements.