Responsible engineering prevents costly failures in a scaling world
Posted Sep 3, 2024 | 9 min. (1907 words)Here at Raygun, we are committed to providing awesome digital experiences. Technology can transform the digital world, just like the physical one, for the people’s benefit. We believe responsible engineering drives this transformation, which we summarize as follows:
- Quality: Following good engineering principles.
- Autonomy: Giving engineers autonomy to design the best possible solutions.
- Support: Expertly supporting engineers and their teams, especially when incidents occur.
In this article, we demonstrate how these core principles—which we all share at Raygun—have informed the design of our tools. Software tools enable engineers to assure the quality of their evolving products, and ultimately support seamless digital experiences for end users.
Going beyond Raygun, any organization that operates in the engineering domain should also be committed to these core values!
Assume nothing, measure everything.
As your company scales, so do the complexities and challenges of maintaining software systems. What once worked smoothly in a smaller setting begins to creak under the weight of growth. As your delivery velocity increases, critical errors that were once rare become frequent, and the risk of catastrophic production failures increases exponentially with each new deployment. These failures lead to lost revenue, eroded end-user trust, and wasted resources. With the current economic shifts, these issues aren’t just inconvenient—they’re existential threats for a growing company!
Additionally, this expanding complexity introduces technical challenges and a psychological toll on your engineering team. With each new deployment, the anxiety of potential failures increases. Will the system stay up during peak traffic? Will a critical error slip through the cracks? Can I sift through tons of data or have the mental composure during an incident to navigate a UI that does not highlight actionable insights?
The pressure mounts as the stakes increase, leaving little room for error. However, what compounds this stress is the realization that software-intensive systems have become so complex that no single person or organization can effectively manage every aspect alone. When issues arise, timely, personal support is crucial to resolve a crisis. Imagine waking up in the middle of the night to a critical system failure, only to find your monitoring tools don’t provide the insights you need!
Adding to this burden is the reality that many error monitoring tools limit engineers with cost-based restrictions. These limitations can lead to engineers making difficult decisions under pressure, like disabling error monitoring to avoid unexpected charges. This dangerous compromise forces engineers to cut corners precisely when they need comprehensive visibility and diagnostic capabilities the most. When cost concerns dictate engineering decisions, the risk of catastrophic failures increases, and the stress on the team escalates to unsustainable levels. In other words, to avoid bad outcomes, engineers need autonomy to make informed decisions with proper support and follow established processes.
The value of responsible engineering
Engineers have overcome some of the most significant challenges by leveraging systematic processes—foundational principles for creating functional, resilient, and scalable systems. Software standards like the Software Engineering Body of Knowledge (SWEBoK) and frameworks like the Washington Accord embed these principles, emphasizing closed-loop software quality assurance and human factors in user interfaces for engineering tools.
Monitoring feedback loops are a crucial component of the software quality assurance process, especially during software maintenance and operation. These loops provide continuous, real-time insights into system performance, allowing for the early detection of anomalies, inefficiencies, and potential failures. As a responsible engineer, you must design for the systematic collection and analysis of data from various system components: feedback loops help ensure that the software consistently meets established quality standards. Further, this allows for effective defect management, performance evaluation, and compliance monitoring, enabling your team to proactively address issues before they escalate.
Beyond just software, consider the case of NASA’s Apollo 13 mission. When an oxygen tank exploded mid-mission, the spacecraft’s survival hinged on the team’s ability to monitor the situation, diagnose the problem, and quickly determine the root cause. The Earth-based mission control constantly supported the astronauts, where experts collaborated intensively to troubleshoot the issues in real-time. Their success was a triumph of ingenuity and disciplined, responsible engineering practices backed by unwavering support. Imagine an alternative scenario where the crew’s error detection system—let’s call it Sentry—asked for their credit card information and then put them on hold for a week before assisting them.
This image was created with the assistance of DALL-E 3
Similarly, the Three Mile Island nuclear incident demonstrated the critical importance of real-time monitoring and diagnosing. When the reactor malfunctioned, the initial failure to correctly diagnose the problem almost led to a catastrophic meltdown. Diligent monitoring, effective root cause analysis, and a coordinated team response averted the disaster. This incident reinforced the need for robust error detection and resolution systems to address minor issues before they escalate into a full-blown crisis. Now, imagine the engineers were also dealing with cost constraints on their diagnostic systems—let’s say Sentry sent an alarm. Still, budget limitations forced the engineers to leave out the temperature readings, making them afraid of getting told off.
This image was created with the assistance of DALL-E 3
The Raygun philosophy: we champion responsible software engineering!
At Raygun, we believe that responsible engineering is the foundation of success for any growing business. We are passionate about empowering development teams to uphold the highest quality standards, no matter how quickly they scale. Our philosophy is rooted in three core principles, which together represent our view of responsible software engineering:
-
Adherence to software quality processes: We emphasize the importance of systematic engineering processes to ensure your software is resilient, scalable, and built to withstand the growth pressures. As one of Raygun’s leaders, I give back to the engineering community by participating in Engineering degree accreditation panels for New Zealand universities and maintain my Chartered Professional Engineer status with Engineering New Zealand.
-
Autonomy for engineers: We believe that engineers should operate in an environment where they can make the best possible decisions without undue stress or cost pressures. Engineering teams should have the autonomy to act decisively, with the confidence that their tools will support them fully, without limitations. Even the inspiration for Raygun Crash Reporting came from an engineer who, back in the day, took the initiative to have all customer issues sent directly to his email, ensuring that the pending, unread messages compelled immediate action.
-
Unwavering support for engineers: We understand that only some people or organizations can manage the complexities of software-intensive systems alone. That’s why we offer you excellent, personal customer support provided by one of our own engineers, a fellow engineer. In times of crisis, this support ensures your team can always handle challenges. But even before an issue occurs, we are upfront about service costs, and we dislike the dishonesty of competing platforms that start from a seemingly low price and constantly try to upsell you.
With the advent of AI, our view of responsible engineering includes NOT training AI (e.g., large language models) on end-user and customer data. You can read our AI Principles page on the stance we are taking with regards to AI.
We were shocked when Sentry updated its terms of service to use data to train AI with no opt-out (though they did later retract due to complaints).
The Raygun solution: Empowering teams to detect and proactively resolve critical errors
To bring this philosophy to life, we’ve developed a tool suite that empowers your team to stay ahead of potential issues, ensuring they can work with the confidence that comes from having the right tools and support. Our tools don’t just report or log errors; they offer a comprehensive view tying together information from the back end to an end user’s front end—especially when Real User Monitoring (RUM) is enabled.
Underpinned by frequent user studies, customer feedback, and self-usage (yes, Raygun uses Raygun!), we offer a user-friendly UI that empowers developers to efficiently manage and resolve errors, especially in the complex landscape of contemporary software projects with multiple interconnected services. Our solutions prioritize an intuitive interface that reduces distractions and streamlines the workflow, allowing developers to focus on what truly matters. By clearly organizing error presentations and segmenting them by different applications, we make it easy for developers to navigate to the most relevant issues quickly, find actionable insights, and address errors without friction. This design enhances productivity and ensures developers maintain clarity and control, leading to faster resolutions and more reliable software.
We recognize that as your company grows, so does your development team. That’s why we offer unlimited developer seats—because licensing constraints should never limit quality engineering. Every team member can contribute to maintaining your software’s integrity, ensuring that nothing slips through the cracks. This feature supports large, distributed teams, ensuring there are always enough hands on deck to tackle issues quickly and effectively, no matter when or where they arise.
Additionally, by offering the add-on of capped costs, we eliminate the fear of unexpected charges, allowing engineers to make the best possible decisions without hesitation. This financial certainty, combined with our platform’s robust feature set, ensures that we, as professional and responsible engineers, have the autonomy to focus on quality without distractions or compromises.
Crucially, excellent support should complement the power of our solution. That’s why we offer personalized customer support from our engineers—professionals who understand your team’s challenges and can provide the timely, expert assistance you need. Our engineers’ expertise ensures that you and your team are never alone, even in the most critical moments.
Our Crash Reporting’s three unique features
Best-in-class error-grouping: We leverage—and continuously adapt—advanced algorithms to automatically group errors by their root cause, allowing your team to focus on what matters most. Instead of being overwhelmed by a flood of error reports, your team can quickly identify the true source of the problem and address it before it spirals out of control. This capability prevents costly failures and reduces the stress of managing complex error logs, ensuring your team can diagnose issues efficiently, even under pressure.
Unlimited symbols and attachments: A stack trace alone doesn’t always cut it. Raygun lets you add unlimited attachments to errors, giving your team all the necessary diagnostic power—without extra costs. We also provide unlimited symbols to ensure your stack traces stay readable and actionable, even when your code is optimized or obfuscated. Plus, for JavaScript developers, we offer a source map validator to help you check your source maps before uploading or linking them to Raygun.
AI-powered error resolution with collaborative chat: Complex problems require complex solutions. Our AI-powered error resolution system identifies potential fixes and facilitates collaboration between team members through an integrated chat platform. AI insights and team communication enable your team to resolve every error quickly and effectively, combining the strengths of both human and machine intelligence.
Build the future on a foundation of responsible engineering: quality, autonomy & support.
If you’re ready to prevent costly production failures and build a future-proof system that can confidently scale, it’s time to embrace responsible engineering. Join us in championing a proactive approach to software quality.
Empower your team to detect and resolve critical errors before they become business crises, and give yourself the tools and support you need to reduce the stress and uncertainty of managing complex systems.
Experience the power of good engineering firsthand—try Raygun free for 14 days and take the first step toward uncompromised quality. When you build with responsibility, you build something that lasts—and when your team is equipped to handle anything, they can work with the confidence and peace of mind they deserve.