Zero-downtime deploys with Gunicorn and virtualenv
tl;dr
Not interested in the gory details? Skip to the Implementation section below, and the docstring should tell you everything you need to know.
Abstract
Gunicorn supports code reloading via signals, allowing us to update a running web application without interrupting our users. In practice, an app might read from the filesystem during a reload (for example, for HTML templates or module imports), so the code update in the filesystem should be atomic.
Symlinks provide an simple mechanism for this. However, many of the Python standard library functions that Gunicorn employs, following POSIX, resolve symlinks to absolute paths. This creates some challenges with a symlink-based approach:
- Gunicorn stores its working directory at startup time, effectively resolving the symlink once at startup, rather than every time we reload.
- Symlinking our virtualenvs means that even if Gunicorn resolves the symlink on reload, it’s possible to have it run new code in the old virtualenv.
- If the application Gunicorn runs sees its working directory as the symlink, and updating the symlnk is not atomic with the reload, the old app version could accidentally read the new version’s HTML templates or modules (for instance).
This document outlines our approach to these challenges, using Gunicorn’s own hooks to implement simple atomic reloads with symlinks.
Background: our deploy setup
At Wingu, we use Gunicorn behind Nginx to serve a continously-deployed Flask application, with dependencies installed in a virtualenv. Our directory structure looks basically as follows:
# Deploy directories containing code:
app/deploys/version-1
app/deploys/version-2
# Virtualenvs with dependencies (Gunicorn, Flask, etc.):
app/environments/environment-a
app/environments/environment-b
# Symlinks for the current running version:
wsgi/current -> app/deploys/version-2
wsgi/current/env -> app/environment/environment-b
# Symlinks for the previous version (for quick rollback):
wsgi/old -> app/deploys/version-1
wsgi/old/env -> app/environments/environment-a
Environments are only built as necessary: if dependencies don’t change between version-x
and version-y
, both versions have an env
symlink that points to environment-a
.
Code and virtualenvs are built in subdirectories under app
, and symlinks are created to those directories from wsgi
. This way, we can start deploying a new version under app
without contaminating or breaking anything under wsgi
. Additionally, updating the symlink and reloading Gunicorn lets us quickly roll forward (or back) with minimal downtime.
POSIX and absolute paths
Swapping symlinks can make deploys simpler and minimize downtime. However, Gunicorn uses Python standard library functions like os.getcwd
that (thanks to POSIX) resolve symlinks implicitly. When Gunicorn calls these functions during startup, the symlinks are only ever resolved once, and updating them during a deploy, after Gunicorn has started, has no effect.
We’ve implemented a solution using Gunicorn’s pre_exec
hook: by re-resolving symlinks during a reload, between the time Gunicorn forks a new master and the time it actually starts serving from that new master, we’re able to make Gunicorn seamlessly switch directories and virtual environments. This way, our continuous deployments (and rollbacks) are atomic and truly zero-downtime.
Background: Gunicorn reloading
Gunicorn starts a master process and a pool of worker processes. The master listens on a port, accepts incoming requests, and delegates them to the worker processes for handling. A worker handles a single request at a time (when configured as a synchronous, which is the default), and this makes it fairly easy to reason about your app’s execution and performance. Each worker process loads the application when it starts, and continues serving requests one at a time until it’s told to stop.
Under this model, deploying a new version of the app is fairly easy: update the code in the filesystem, and ask Gunicorn to kill existing worker processes and start new ones. As each new worker process starts, it imports modules from the updated files on the filesystem. When it receives a HUP
signal, Gunicorn does exactly this – including waiting for existing workers to finish processing whatever request they’re working on (from the old code, which they have loaded in memory), so the transition to the new version is seamless.
Gunicorn also supports a “live rollback” mode, which our deploys use instead. When it receives a USR2
signal, it forks a new master and re-execs it, and leaves the old master and workers running. The new master spawns its own workers and takes over serving the application, but if it dies or is killed, the old master resumes control and effectively rolls back to whatever version the old master was running. This is a useful option for robust deploys: send a USR2
to Gunicorn, verify that the newly deployed version is sane, and kill the old master if it’s sane or the new master if it’s not.
Why can’t we just update the code in place?
Just to make it explicit: Python loads the source files into memory as modules and basically caches them, so updating the code in place would require us to reload
these modules at runtime. This works in development, but it’s risky in a production environment because there’s complexity in reloading modules that other modules reference.
Symlinks and unexpected behavior
So, to recap, Gunicorn’s ability to reload worker processes via signals provides us with good options for doing a zero-downtime deploy. However, the use of symlinks in our deploy configuration (for deploy directories and for virtualenvs within them) causes a few problems for us. We’ll outline them now and see how we fixed them in the next section.
Problem #1: new workers load old code
When a Gunicorn worker starts, it ultimately uses __import__
to load the application. This looks to sys.path
when attempting to figure out how to load modules; sys.path
typically prefers the current working directory of the process. In short, if the Gunicorn worker’s current working directory is wrong, it’ll end up running the wrong code.
When a Gunicorn master starts, it uses os.getcwd()
, which ultimately resolves symlinks to an absolute path, to figure out the current working directory. It later uses that directory when starting a new master in response to a USR2
. If the Gunicorn master’s saved CWD is wrong, it will end up forking a new master in the wrong working directory and its workers will run the wrong code.
Let’s look at an example:
- We deploy
version-1
and symlinkcurrent
toversion-1
. - We run Gunicorn from
current
; its working directory iscurrent
, but on startup it saves its working directory asversion-1
becauseos.getcwd()
implicitly resolves the symlink. - Later, we deploy
version-2
and symlinkcurrent
toversion-2
. - We send Gunicorn a
USR2
, which causes it to run a new master from the saved working directory (version-1
). The new master’s workers are therefore running code fromversion-1
instead ofversion-2
(!).
Problem #2: new workers have the wrong virtualenv
In addition to starting the new master in the symlink-resolved working directory, Gunicorn also starts up in the wrong virtualenv.
When it gets a USR2
, Gunicorn’s master process actually does a few things: it forks a new master, and in the new master (only) sets up some file descriptors and the working directory (using the resolved symlink that we described above), then calls os.exec
to essentially restart Gunicorn from scratch in that new master.
The os.exec
call is important here: Gunicorn os.exec
s the same Python binary as the old master using sys.executable
. We run Gunicorn via the virtualenv’s python
binary because it automatically adds modules from the virtualenv to sys.path
. Because that binary is in a symlinked directory, sys.executable
gets set to its resolved path, and when we fork the new master, we’re running it in the old version’s virtualenv (and by extension, without access to any new dependencies).
Consider the example from problem #1:
- We deploy
version-1
and create a virtualenv with its dependencies inenvironment-a
. We symlinkcurrent
toversion-1
andcurrent/env
toenvironment-a
. - We run Gunicorn via
current/env/bin/python
, butsys.executable
get set toenvironment-a/bin/python
. - Later, we deploy
version-2
and create a virtualenv with its dependencies (including some new ones) inenvironment-b
. We symlinkcurrent
toversion-2
andcurrent/env
toenvironment-b
. - We send Gunicorn a
USR2
, which causes it to run a new master viaenvironment-a/bin/python
. The workers therefore run in theenvironment-a
virtualenv rather than the newenvironment-b
(!).
Caveat: reading from the filesystem during a request
There’s another potential issue that we haven’t discussed: sometimes, during a request, our WSGI app needs to read files like HTML templates or static files directly from disk.
Even if we’ve fixed the problems above and the workers have the current
symlink as their working directory, there’s a small window between the time we update the current
symlink from version-1
to version-2
and the time we send a USR2
to reload workers. Within this window, workers are running code from version-1
, but because of the symlink update, any static files they read from disk are now actually from version-2
. Files that are different (or worse, no longer exist) in the new version can cause problems with requests that occur in this window, so we ideally need the symlink switch and worker reload to be atomic from the point of view of an incoming request.
Solution: resetting paths during pre_exec
Our solution relies on Gunicorn’s pre_exec
hook, the fact that it gives us access to the internal state of the server, and the fact that it runs after a new master is forked but before it actually starts serving requests.
The pre_exec
hook
Gunicorn has several hook functions that can be specified in a configuration file to extend and override behavior. The pre_exec
hook allows us to run code in the new master between the calls to os.fork
and os.exec
, meaning that this hook gives us a chance to change the undesirable paths that were inherited from the old master before they go to os.exec
.
The pre_exec
hook is given a server
instance, which contains a START_CTX
dictionary in which parameters like the current working directory and sys.executable
are saved. The old master sets values in START_CTX
on startup, the new master gets these values when determining how to call os.exec
, and in between, our pre_exec
hook can manipulate them.
Effectively, when we start Gunicorn for the very first time, we set an environment variable that contains the directory to “reset” to – in our case, the path of the current
symlink, which is what we’ll update on each deploy. The environment variable is inherted by the new master after os.fork
, and our pre_exec
hook uses this path to “reset” the working directory and arguments in START_CTX
so os.exec
gets called on the symlink rather than the resolved path.
What about our earlier caveat about atomicity and file system reads? Because the pre_exec
hook runs between os.fork
and os.exec
, it doesn’t actually prevent the master and workers from resolving that symlink, so the workers still read from a resolved path. This means that nothing is ever replaced out from under a running worker.
Running a deploy with our pre_exec
hook
Let’s see how the issues are addressed with a final example:
-
We deploy
version-1
and create a virtualenv with its dependencies inenvironment-a
. We symlinkcurrent
toversion-1
andcurrent/env
toenvironment-a
. -
We run Gunicorn via
current/env/bin/python
and setGUNICORN_APP_ROOT=current
.- Worker CWD is
version-1
- Master
sys.executable
isenvironment-a/bin/python
- Worker CWD is
-
Later, we deploy
version-2
and create a vritualenv with its dependencies (including some new ones) inenvironment-b
. We symlinkcurrent
toversion-2
andcurrent/env
toenvironment-b
. * Worker CWD is stillversion-1
, so it still reads the correct templates -
We send Gunicorn a
USR2
, which causes it to run a new master viasys.executable
, which isenvironment-a/bin/python
so the workers use theenvironment-a
virtualenv rather than the newenvironment-b
.- Old worker CWD is still
version-1
; this never changes - Old master
sys.executable
is stillenvironment-a/bin/python
- In the new master,
pre_exec
sets CWD back tocurrent
andsys.executable
back tocurrent/env/bin/python
- New master
sys.executable
is nowenvironment-b/bin/python
via symlink resolution - New worker CWD is now
version-2
via symlink resolution
- Old worker CWD is still
Ultimately, we let Gunicorn store the resolved symlinks, and modify them during the reload, and let Gunicorn again store the resolved symlinks in the new master and workers, minimizing the overall impact of our fix.
Implementation
Finally, here’s our implementation. Feel free to adapt our code or technique to your specific deploy process.
def pre_exec(server):
"""
Resets the working directory of the server to GUNICORN_APP_ROOT.
We run Gunicorn from a symlinked directory, which Gunicorn ends up
dereferencing (via os.getcwd()) on startup and saving in START_CTX['cwd'].
This means that simply updating the symlink and forking a new master won't
work: the new master will run from the dereferenced directory, which is the
same as the old master's working directory.
This hook, which runs in the new master after fork() but before starts
handling connections, lets us correct this. We reset START_CTX['cwd'] back
to the symlink directory, which the deploy script passes in as an
environment variable and os.chdir(). This means that the new master is
exec'd in the correct directory (which it later dereferences, but by then
we don't care) and we don't have to touch the old master.
Moreover, the workers are forked with a working directory that has been
dereferenced from the symlink, so we can actually *remove* or re-point the
symlink without affecting workers, present or future.
"""
app_root = os.environ.get('GUNICORN_APP_ROOT')
print '[pre_exec] Starting hook, app_root = %r' % app_root
if app_root:
orig_cwd = server.START_CTX['cwd']
server.START_CTX['cwd'] = app_root
os.chdir(app_root)
print '[pre_exec] Switching cwd: %s -> %s' % (orig_cwd, app_root)
orig_path = os.path.dirname(sys.executable)
new_path = os.path.join(app_root, '..', 'env', 'bin')
server.START_CTX[0] = server.START_CTX[0].replace(orig_path, new_path)
server.START_CTX['args'] = [arg.replace(orig_path, new_path)
for arg in server.START_CTX['args']]
print '[pre_exec] Done running hook, START_CTX = %r' % server.START_CTX