Adding Memcached Support to TurboStan

Filed under: turbostan memcached 

I got a bit frustrated doing real work tonight, and my experiments with clustering TurboGears instances behind Nginx got me thinking about cache coherency (TurboStan has been using an internal dictionary for caching; efficient for a single instance, but not for clustered applications).

I'd never used memcached before, but I quickly located Sean Reifschneider's python-memcached module in the Cheeseshop. It wouldn't easy_install, but that was no problem since it's a single Python file.

Next I set out to rewire the caching in TurboStan to support an additional configuration directive. If you add:

stan.memcached = "127.0.0.1:11211"

to your TurboGears configuration, TurboStan tries to connect to a memcache at that location.

As documented on the Memcached home page, you start memcached with something like:

memcached -d -m 16 -l 127.0.0.1 -p 11211

This starts memcached with a 16Mb cache (hey, I'm not running LiveJournal) using port 11211 on localhost. At first I thought I could get away with 4Mb of cache, but I spent a while trying to figure out why some objects weren't cached before realizing memcached was out of memory.

In order to support both the internal dictionary cache or the memcache, I created a new class that provides the memcache API for a dictionary and allows you to select one mode or the other.

Memcached is remarkably simple to use. One caveat is that because I'm caching code objects and memcache.py uses cPickle, I was forced to marshal code objects prior to passing them to memcache. This means that the code objects are marshalled and pickled. Not too efficient. I think I'll modify memcache.py to always marshall (the special case code it has for pickling or not pickling objects isn't relevant to this purpose since everything will always need to be pickled). The fact that marshal isn't portable across Python versions is irrelevant too since it's a temporary cache.

Anyway, subjective performance was actually marginally slower, but the point of memcached isn't performance but rather scalability (a different thing altogether athough confused by some). And besides, cache coherence is what I was really after (I have methods for destroying cache entries, but in a cluster only one backend would receive the instruction to destroy a cache entry, leaving the other instances with outdated objects in their caches).

References:

  • Memcached Home
  • Python-Memcached


1 comments Leave a comment