Python is a fantastic tool to know, and despite being a beginner I find myself using it more and more for everyday tasks. Bash is great for knocking together quick scripts, but when you want to something a little more complex such as interfacing with APIs or other systems over a network, you really need a more fully-featured programming language.
The topic of this post, however, is the kind of task that bash is perfect for. Thanks to mysqldump, a database backup script can be written in a few lines and dump/restores are easily automated. So why on earth would we do this in Python?
Because we want to learn. :)
The scenario that this grew out of is a small army of software developers that frequently want to copy databases from the live production environment to staging. We don’t want to do this automatically because the devs need to make schema changes on staging to test and we don’t want to overwrite these. We don’t want to do it manually because the requests are frequent. So the mid-point is a simple app with pre-defined jobs which can be executed with the click of a button.
The tool that does the heavy lifting is mysqldump. While it would be nice to do everything the “Pythonic” way and not use any external system calls, the fact is that we’d be reinventing the wheel. mysqldump is a stable and mature tool, and if you think you can do a better job you’re either a genius or mad. Although some people still try.
I won’t detail the entire app, as it’s pretty unpolished at the moment and has some pretty massive holes (if anyone is interested I may look at publishing the source on github later, subject to employer approval). But if you want to know about calling mysqldump from a Python script there are some things to watch out for.
My first implementation went as follows:
log("Exporting database '%s' from '%s'" % (sdb,shost)) # sdb is source database try: subprocess.check_call( "mysqldump -u %s -p%s -h %s -e --opt --max_allowed_packet=512M -c %s | gzip -c > %s" % (username, password, shost, sdb, tmpfile), shell=True) log("OK") except subprocess.CalledProcessError as e: log("Error: mysqldump ended with status %s, check DB credentials" % e.returncode) exit(1)
More experienced programmers may be able to spot the problem here. If the process called by subprocess.check_call returns anything other than zero, a CalledProcessError exception is raised, which I catch. But even if mysqldump exits with a non-zero status, check_call will still see zero because only the exit status of the last command in the pipe is returned, and that’s the gzip process. Some errors do cause gzip to exit with non-zero status, for example if you specify a database that doesn’t exist, gzip will exit with a broken pipe and the exception will be raised. But if you enter an incorrect username or password, or get access denied for some reason, the pipe is apparently never opened and gzip exits with status 0 (don’t ask me why!)
There are a couple of simple solutions.
If you’re using Python 2.7+ you can use subprocess.check_output and examine the output that way, but I’m unfortunate enough to be Python 2.6 on Debian 6.0.
In bash you can use the ${PIPESTATUS[n]} array to get the exit status of the command (where n is an integer), so a quick fix might be to append “; exit ${PIPESTATUS[1]}” to the end of the command (disclaimer: I haven’t tested this, and it may not work as Python calls /bin/sh). But why pass up the opportunity to learn how to pipe in Python!
Here’s what I came up with:
try: p1 = subprocess.Popen("mysqldump -u %s -p%s -h %s -e --opt --max_allowed_packet=512M -c %s" % (username, password, shost, sdb), stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) p2 = subprocess.Popen("gzip -c > %s" % (tmpfile), stdin=p1.stdout, shell=True) p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits. output = p1.stderr.read() if output == '': log("OK") else: log(output) exit(1) except subprocess.CalledProcessError as e: log("Error: process exited with status %s" % e.returncode)
A quick explanation (for complete documentation see subprocess.popen):
p1 and p2 are our processes that we want to connect with a pipe. p1’s stdout (the data from mysqldump) is directed to a pipe (subprocess.pipe), and the stdin of p2 (gzip) is connected to it. Then we read the stderr output of p1, which will be nothing if the operation succeeds, and an error message if the dump fails for any reason.
It would have been nice to use Python’s built-in gzip library, but apparently it’s not that simple because the gzip file object doesn’t implement all the methods required by Popen. All the advice I could find said to call gzip externally when dealing with an IO stream like this.
All in all, doing the pipe within Python turned out to be just a few lines longer, and is much more flexible than calling the whole pipeline in subprocess.check_call or check_output. Note that these code samples are obviously not complete scripts, remember to import subprocess!
This is great, I have adapted this slightly for my squish testing, I can now backup and restore databases and tables with ease from within test scripts.