MySQL replication, how the slave saved the master

I spent a good deal of last night repairing a MySQL master-slave setup for a customer of ours, running a fairly large website. One of their developers made the usual mistake of running a query much like “UPDATE table SET x = ‘This is a test’” and forgot the WHERE clause. This wouldn’t have been a big deal if it had been issued on their development environment, but this was on production.

Since this was a table with close to 60 million records, it took a while to execute on the slave, so we were informed in time and could prevent the query from propagating to the slave by killing it, forcing a rollback and stopping the slave replication. Luckily, this saved us the hassle of restoring from backups. We successfully dumped the table on the slave and updated the master again with the correct entries still on the slave using “SET SESSION SQL_LOG_BIN=0″. The master was saved and in good shape.

However, this left the slave without replication, and our worst nightmare came true. The Query Of Death was executed more than once on the master and thus was all over the binlogs that would be replicated. This meant that it was not as simple as skipping the next query on the slave and start replication again from there. The only thing I could think of was replaying the binlogs on the slave with the –safe-updates argument passed to MySQL, but that would take hours. We consulted MySQL to see if there would be an easier way, unfortunately it turns out there was not. So there I was, painfully executing log by log, anxiously monitoring every query that was executed. Finally at 4 o’clock everything was back up and running. The best thing? The website’s users didn’t notice a thing, apart from seeing ”This is a test” in some places for a while. Since the slave is used for full-text indexing with Sphinx as well, the searches were a bit off too, but nothing major.

I just wish there would have been an easier way… A query blacklist? Something?
If anyone has tips on how this could have been solved in an elegant way, please do tell. :)

Explore posts in the same categories: Software

2 Comments on “MySQL replication, how the slave saved the master”

  1. Ghosty Says:

    Tip: do not let developers run untested queries on production? Seems to me this should not have happened in the first place …

  2. Gerry Says:

    Unfortunately we’re not involved in the development part of the project. Of course, it would never come to my mind to let a developer execute such queries on production, so yes, I agree, it is unforgivable that this even happened.

Comment: