software solutions / project leadership / agile coaching and training

Don’t break stuff

Posted on September 23, 2016

When you change code, how do you know that you’re not breaking something? If the code change is small, it can be pretty easy (especially if you have tests). I’m talking about more complicated code changes.

Sure, you have unit tests, but how do you know that everything in the system works together correctly?

Sure, you have integration tests in your code, but how do you know that all of the systems involved are working together?

Sure, you have end to end acceptance tests for all of the scenarios that you know about, but what if a scenario exists that you didn’t think of?

We can come up with any number of excuses for why bugs happen in production. But what happens when a bug could cause a big problem?

I’m in this situation right now because the relatively large code change I’m implementing could impact how much money we charge our customers. The code I’m writing takes in some input files from a third party and processes them. We could have a big problem if someone gets overcharged or undercharged, or even worse, not at all. There are lots of things that can go wrong. I have a lot of good acceptance tests for my code, so I have good test coverage for the scenarios that I know. What I worry about is the scenarios that I don’t know, or what could cause my code to not get called at all. Because of this, I have to think past the traditional ways that I usually test things.

Searching for test cases

I did a lot of analysis on the data that my code is going to consume. I even wrote some small apps and SQL queries that will parse historical data and look for different combinations of data, and used that to come up with the test cases. What’s tricky is that some of the scenarios do not happen very often, so that means I need to do more digging to find them. At some point, this can become tedious, but in my case it’s worth it.

Watching for unexpected scenarios

Thanks to my data analysis, I have a list of expected scenarios that I’ve seen from past data. In addition to writing tests for all of these scenarios, I’m also writing some queries that will check for evidence that some unexpected scenario happened in production, and I’m running these queries every day. By doing this, I’ll be aware of any potential changes in how the source data comes in, and I also avoid having to write acceptance tests for scenarios that probably aren’t going to happen.

Throw exceptions when you find unexpected scenarios

Not only do I have queries to check for unexpected scenarios, I also have inline checks that will cause the process to stop and throw an exception if I encounter specific unexpected scenarios that could cause an issue, especially when the issue would not otherwise be obvious (in other words, it wouldn’t cause the system to fail, but the system might give me incorrect results). In many cases, it’s a waste of time to discuss, implement, and test some edge cases that is very unlikely to ever happen, but I at least want someone to know if by chance that happens. Now if you’re writing a UI for a website, you might not have the luxury of being able to do this, but when you’re just processing backend files, you can get away with things like this.

Parallel testing against production data

We have a test environment set up with 2 databases – one has 2 day old production data and one has 1 day old production data. We restore these databases from backups every morning and then run in all of the files from 2 days ago into the 2 day old environment, then we compare the results with what’s in the 1 day old database, and theoretically everything should match. There are always exceptions, especially when you have data that was modified by a user after it was created in the database, but this allows me to easily check almost every column on a database table and see that the results from my process match what is currently in production. This has been a huge lifesaver, not only in finding bugs in my process, but for finding unknown scenarios, and for finding bugs in other related processes as well. People have a lot more confidence in your work when you’re able to show tangible proof that your system is working the same as what’s in production.

Writing audit queries that validate the process end to end

In addition to the queries that check for unexpected data, I also have queries that validate that any data received at the beginning of the process will have some resulting output. I need to make sure that the process isn’t silently failing or is failing in a way that I don’t expect.

Checking logs

Hopefully your app logs exceptions somewhere, so as a last resort you should always check for any fallout in the logs.

A different thought process

I find this way of thinking to be a different thought process when it comes to testing. I’m moving past the mechanics of testing (what are my test cases, what tests am I going to write, how can I mock this class, etc.), and trying to find whatever means necessary to not break stuff. Sometimes this is done with automated tests, manual tests, inline sanity checks, writing exploratory code to help me discover scenarios, or some form of production monitoring. The latter three are where I’ve found I’m doing things in a way that I haven’t always done before.

I think the shift in mindset partly comes from owning the responsibility for a feature. At one point a long time ago, I wanted to get my code to work and let QA test it and find bugs. Then I moved to wanting my code to work and wanting to come up with a way to test it. The next step in the progression for me is finding ways to be able to make large changes to the code, not break stuff, and ensure that everything will work with no impact to the business. In each step in the progression, I’m starting to look at the problem at a larger scale and looking at business value and business impact instead of just technical concerns.

4 Comments »

  1. [...] Don’t Break Stuff [...]

    Professional Development – 10/03/16 – 10/09/16 – The Software Mentor — October 10, 2016 @ 8:55 am

  2. So…did it break?

    Surgeon — October 12, 2016 @ 2:07 pm

  3. Haha, so far so good!

    Jon Kruger — October 12, 2016 @ 8:58 pm

  4. [...] Don’t Break Stuff (via The Software Mentor) [...]

    Professional Development – 2016 – Week 41 – Geoff Mazeroff — October 17, 2016 @ 12:04 am

Leave a comment





SERVICES
SOFTWARE SOLUTIONS
I have over 15 years of software development experience on several different platforms (.NET, Ruby, JavaScript, SQL Server, and more). I recognize that software is expensive, so I'm always trying to find ways to speed up the software development process, but at the same time remembering that high quality is essential to building software that stands the test of time.
PROJECT LEADERSHIP
I have experience leading and architecting large Agile software projects and coordinating all aspects of a project's lifecycle. Whether you're looking for technical expertise or someone to lead all aspects of an Agile project, I have proven experience from multiple projects in different environments that can help make your project a success.
PROCESS COACHING
Every team and every situation is different, and I believe that processes and tools should be applied with common sense. I've spent the last 10+ years working on projects using Agile and Lean concepts in many different environments, both in leadership roles and as a practitioner doing the work. I can help you develop a process that works best in your organization, not just apply a prescriptive process.
Have any questions? Contact me for more information.
PRESENTATIONS
From Stir Trek 2017
Iteration Management - Your Key to Predictable Delivery
From Stir Trek 2016 and QA or the Highway 2015
From CodeMash 2016, QA or the Highway 2014, Stir Trek 2012
The Business of You: 10 Steps For Running Your Career Like a Business
From CodeMash 2015, Stir Trek 2014, CONDG 2012
From Stir Trek 2013, DogFoodCon 2013
(presented with Brandon Childers, Chris Hoover, Laurel Odronic, and Lan Bloch from IGS Energy) from Path to Agility 2012
From CodeMash 2012 and 2013
(presented with Paul Bahler and Kevin Chivington from IGS Energy)
From CodeMash 2011
An idea of how to make JavaScript testable, presented at Stir Trek 2011. The world of JavaScript frameworks has changed greatly since then, but I still agree with the concepts.
A description of how test-driven development works along with some hands-on examples.
From CodeMash 2010
From CodeMash 2010