Fight engineer overwhelm with effective exception monitoring

| 6 min. (1066 words)

Customers want applications that solve real world problems, executives are calling for engineers to do more with less, and competition is brutal at every level. Even as pressure mounts to ship new features faster, we can’t lose sight of people and processes that are focused on quality. Especially given that the elephant in the room, technical debt, currently stands at $1.52 trillion in the US alone, according to the most recent CISQ report.

What do you do when you’re being pressured to innovate quickly, maintain a rapid release cadence, uphold quality, and conduct maintenance?

One of the most fundamental formulas for balancing innovation, quality, and speed is to optimize methods for catching, preventing and addressing errors. Exception monitoring allows companies to spot issues prior to release, and to quickly address the issues that do make it to production. The result is software teams that are more proactive and less reactive, and more stable and resilient software systems. When you boil it down to the sheer financial impact of poor practices, it’s pretty stark: well-known findings from IBM say that “defects found in testing are 15 times more costly than if they were found during the design phase and 2 times more than if found during implementation”. If an error was discovered during maintenance, the error would cost up to 100 times more.

Push vs. Pull

There are two major trends that impact IT operations and developers. They are:

  1. Quality Assurance (QA) responsibilities being passed to front-end and back-end developers; and

  2. Autonomous delivery pipelines and analytics.

Previously, development teams staffed large QA groups to run functional tests prior to releases. In today’s development shops, quality is everyone’s responsibility. QA teams are called Quality Engineering (QE), and they build test case strategies and automation for end-to-end testing.

While developers are the first line of defence for bugs, they are also expected to live test driven development, run unit and initial functional tests, and, if given proper lab environments, conduct end-to-end testing on every commit across the entire multi-tier application. In many businesses, this organizational structure is already there, but it should be the standard for the entire industry.

From orchestration to deployment to test runs, automation backed by a strong reporting and analytics engine is the only way to make this new organizational structure at least possible, and ideally extensible and sustainable. As an organization makes its transition from the push to the pull method, IT and development need advanced tools to provide information on problems, identify areas of improvement and report on trends without draining critical dev resource.

Culture, process, tools

In the simplest terms, DevOps comes down to shifting QA responsibility and autonomous delivery pipelines. However, there’s one more, equally critical element: culture.

Culture does not define what is popular in mainstream development or the handful of startups in Silicon Valley who use their own methods. DevOps culture is the deliberate prioritization of results, metrics and quality. If culture is not created consciously, it will develop organically — usually contrary to the overall objectives of the company.

Considering the advancements we’ve made in processes and tools, the speed and quality of development output should increase. But in many cases, an increase does not occur because the development team is focused so narrowly on either upcoming releases or what’s currently in production. Developers cultivate an intentional tunnel vision so they are able to complete their tickets and address bugs efficiently. IT operations use their own tunnel vision to keep production running and identify potential issues. So while they intend to approach the delivery pipeline holistically, the “right now” demands often get in the way.

Tools are the easy part

Exception monitoring is something developers do every day. Try/Catch blocks are in every application. (If not, there is a more serious problem!) In most systems, exception blocks belong to a specific developer on a specific branch of the application with results leveraged only in the Integrated Development Environment (IDE) and in production to prevent catastrophic failure. But exception blocks can be used far more proactively during production and releases.

Adding one more line of code in error blocks to process a large amount of data and push it into a modern analytics and reporting engine allows developers to vastly improve exception monitoring, both for the individual and across the whole team. This analytics engine synthesizes errors, including complete stack traces, across the entire application and pushes critical events and trends to relevant team members. This lets developers address issues during continuous integration (CI) prior to hitting production and respond before greater problems are caused.

By adding just one extra line of code, tasks developers used to look at manually are now completed automatically. Since the tool tracks current and previous releases, critical errors are more easily surfaced. For example, if an error has repeated across multiple releases without a major impact to application quality, the system will list this error as a low-level priority.

Moving faster, easing cognitive load, increasing innovation

The net result of automating exception handling is that:

  • Developers have less to think about;

  • Unit tests become simpler; and

  • Quality Assurance requires less effort and focus.

Implementing exception monitoring directly results in more developer time focused on adding new features, improving user interfaces and addressing customer concerns. Removing the pressure of the release-to-release mentality lets the team exist in a more relaxed environment that promotes creativity and drives innovation — the exact type of environment that will result in better engineering, with the ability to prioritize quality and encourage curiosity.

The upshot

No tool can produce innovation, but automation tools can give developers the time to be creative and produce new and original technology. As you explore this opportunity, identifying automation that easily integrates into your current processes without additional effort is critical.

Exception monitoring is the perfect solution. Error monitoring provides a historic, holistic view of application quality by pushing critical events directly to the team and prioritizing issues by understanding how previous errors have impacted the application quality.

Since developers are trained to create exception handling in their code, it’s an appropriate place to implement this simple line of new code that takes exception blocks and expands their results beyond individual developers and individual releases. Exception monitoring is the easiest way to support innovation without additional effort, direct more of your developer time to proactively coding and spend less time reacting to software issues.