I recently faced a tough technical problem at work. We had an issue with a custom MySQL plugin that provoked a crash during reboots. The problem was introduced with a major change in our build system and development environment. It was kind of a big deal and a great stopper for a couple of weeks, during which we didn't know the solution, not even if the solution was in our hands. We had a lot of work to rollback if we wouldn't find the cause of the problem, so a lot of people got nervous. But we finally found the solution: we introduced a bug in our build scripts (more details).
Once it was solved, QA people asked to have a reboot test in our regression suite to prevent similar problems in the future, and immediately put hands on to create the test and include it in our regression.
I found that to be a bad idea, it would make our regression take longer time. It is the test suite we use to validate every change to the codebase, and it is already quite bloated. We are struggling to keep its coverage but reduce the time it takes for it to run. It is a priority target to improve time for the feedback loop. And now we will have a single test case rebooting our system, which will take around 5-10 minutes more.
Then I thought why it took us that long to find the bug in the build scripts for MySQL plugins. We would have found the source of the problem much faster if we had checked for dynamic linkage dependencies in two versions of the plugin binary: one compiled using the former build scripts and the new one. But we did perform that check by visual inspection of ldd command output, and we had around 50 items in both lists that looked the same at first sight.
The thing is our plugin actually needed just 10 out of those 50 dynamic linkage dependencies, and it would have been far easier to locate the offending libraries in a list of 10 items. So there was a lot of noise in ldd output, which was preventing us to locate potencial errors. It's a similar situation as when you don't care about compiler warnings and your compilation output gets filled with hundreds or thousands of warning messages that no one ever reads. You will eventually face the situation of finding a bug after several days or weeks of debugging and testing and then realize that there was a warning message for the very same code line you changed to fix the bug.
So quality assurance (QA) is not just bloating your regression test suite with one more test case every time a new, unforeseen scenario pops up. QA is also keeping the house clean and quite so that there is little place for bugs to hide. And that, my friends, is something beyond the scope of testing.
No comments:
Post a Comment