My second favorite method of learning is to learn from my mistakes. Or as I often put it, learn from your mistakes so that the next time you can make a new one, which isn’t quite as boring as repeating the same mistake over and over again.
But even better than learning from my mistakes is learning from yours. No, not you specifically. Everyone else. Because while learning from mistakes is useful, making mistakes isn’t fun at all. If someone else already made a mistake, the why should I have to repeat that?
And apparently I am not alone in this. For the July 2020 edition of T-SQL Tuesday, Kerry Tyler (b|t) invites us all to share our mistakes so that he can learn from them without having to make them himself. And by extension, so can you.
Here’s my contribution. It’s very short, and very basic. Let’s just say that I hardly ever make any mistakes. (And if you believe that, I’ve got a bridge to sell to you).
The unusual restore
I typically work as query tuning expert and as database developer. I try to avoid pure DBA roles. But for smaller customers, I often am the only one capable of spelling SQL, so I end up taking on the DBA activities as well.
Even for a full time specialized production DBA, restoring a database is a task not often done. For me it is even more rare. But I do know that, when it’s needed, there’s a lot of pressure. So I have a step by step playbook prepared, and scripts for tasks that can be scripted. All to minimize the chance of mistakes.
And then I got an unusual request. A bug fix had to be tested, but due to the nature of the bug that required the test database to first be replaced by a recent production backup; however another project had been preparing test cases in that test database. So I was asked to first restore a recent production backup to the test database, then wait for the bugfix test to be completed, and then restore the test database back to the point in time before I started the first restore.
Complex? Yes, a bit. But not too hard, and nothing I can’t do. But it’s sufficiently different from what my playbook (which is specifically targeted to restoring production after an emergency) covers, so I never even looked there. Instead, I first scripted all the actions needed to restore the test database to a copy of itself at a point in time to a few minutes before I built the script. I then, in a second window, scripted the actions needed to restore the test database to the most recent full backup of production. I checked, double checked, and triple checked both scripts, verified that I could start, and then kicked off the script to restore test from the last full production backup. And, when done, told the developers they could run their tests.
Two hours later the tests were done; the bug fix could be deployed to production, and I got the go signal to revert the test database to its original state. I kicked off the other script, monitored its progress, and when done shut down my computer after a productive day.
Job well done?
Well, no. I should have consulted the playbook. Even though I could not straight on use it, I should have checked.
If I had consulted my playbook, I would have seen the big warning, in red and extra-large font: After a restore, do not forget to enable Service Broker.
So I got a call the next day. BizTalk was throwing errors in the test environment. All requests were rejected wit a message saying that Service Broker was not enabled.
The fix was easy. Good thing it happened in test, where nobody is really hurt when I need to kick everyone out to get an exclusive lock (which, apparently, is needed to enable Service Broker … don’t ask). In production, once I tell the organization that the database is back online, there’s no way to get a exclusive lock without causing another disruption.
So, what have I learned? Multiple things already.
First: No matter how often I forget to enable Service Broker after a restore, I keep forgetting that this is needed. Apparently I am not as good at learning from my mistakes as I like to think.
And second: A playbook is more than a script. In fact, even a script is more than a script, since it contains the fixes for all my previous mistakes. If I have a script, or a playbook, for a task, then always consult it. Even if the specific task is different this time and the script cannot be used, I should still bring it up, read it, and see if there’s anything there that I need to do in the current situation as well.