Dealing with lru_cache While Testing Django Applications

lru_cache is the simplest method of caching expensive function calls. It's hassle-free and it doesn't need a backend (such as Redis). But there's a gotcha when testing your application. Data could leak from a test case to another since they are cached. Depending on what you're testing and how you're testing it, this might not be a problem at all, but if each test case expects different inputs or outputs, this could turn ugly.

In this article, we will go over several workarounds and solutions to deal with this problem.

The Problem

Imagine this hypothetical scenario:

class User(Model):
    username = models.CharField(max_length=16)
    ...  # Other fields

    @classmethod
    @lru_cache(maxsize=1)
    def get_all_usernames(cls) -> set:
        all_usernames = cls.objects.distinct('username') \
            .values_list('username', flat=True)
        return set(all_usernames)


class UserTestCase(TestCase):
    def test_something(self):
        # Hopefully you're using FactoryBoy or Fixtures instead of this
        user = User.objects.create(username='uname_1')

        # other stuff to make this test make sense!

        self.assertTrue(user.username in User.get_all_usernames())

    def test_something_else(self):
         user = User.objects.create(username='uname_2')

        # other stuff to make this test make sense!

        self.assertTrue(user.username in User.get_all_usernames())

When test_something is run, it passes. But the return value of get_all_usernames is cached, and when test_something_else is run, the data from the previous test leaks into this one, and you'll end up with a failed test.

The Solution That Should Not Be Considered

Before going over the actual solutions, I will talk about a solution that I've seen being implemented. And it's implementing another wrapper around lru_cache, and using it only if not in testing mode. Generally making your code aware of the environment (testing, production, etc) is a bad practice, and could lead to unexpected errors. For instance by making a problem go away only in testing.

The Solution That May or May Not Work

An obvious solution is the old-school event-based cache invalidation. You just have to invalidate the cache whenever you create a new user:

class User(Model):
    username = models.CharField(max_length=16)
    ...  # Other fields

    @classmethod
    @lru_cache(maxsize=1)
    def get_all_usernames(cls) -> set:
        all_usernames = cls.objects.distinct('username') \
            .values_list('username', flat=True)
        return set(all_usernames)

    def save(self):  # I was too lazy to include the super class's arguments :-)
        super().save()

        self.get_all_usernames.clear_cache()

Pros

It's the simplest and the cleanest solution (even cleaner than "The Clean Solution" coming up).
It's not just for testing. Proper cache invalidation is of the utmost importance if you are planning on caching anything.

Cons

It doesn't always work. For instance, the save method is never called if you use bulk_create or bulk_update or if god forbid, you run raw queries (using cursor.execute() for instance). This also means that the pre and post save signals are not sent either.

The Clean Solution

IMHO this is the cleanest solution and it's explicit, i.e. it's obvious what's being done to resolve the issue. No hidden magic!

class UserTestCase(TestCase):
    def setUp(self):
        # Solution is here
        User.get_all_usernames.clear_cache()

    def test_something(self):
        # Hopefully you're using FactoryBoy or Fixtures instead of this
        user = User.objects.create(username='uname_1')

        # other stuff to make this test make sense!

        self.assertTrue(user.username in User.get_all_usernames())

    def test_something_else(self):
         user = User.objects.create(username='uname_2')

        # other stuff to make this test make sense!

        self.assertTrue(user.username in User.get_all_usernames())

As you can see, in the setUp method (which is run before every single test method) we clear the cache using the method provided by lru_cache.

Pros

Clean and obvious; no muss to fuss.
Can be done on a per-test-case basis. You can opt-out of doing it in another test case if you want/need to.

Cons

if caching affects multiple or many test cases, you'd have to repeat yourself a lot.
It's just good for fixing failed tests. If your app relies on or requires cache invalidation, you just pass your test and remain vulnerable to bugs.

The Lazy Man's Solution

This is not a clean solution, and to be honest, it's not my favorite. It's hacky, and hard to find for your teammates. But it gets the job done without having to do the same thing over and over again.

As to where to put the code, I would put it in tests/__init__.py so that it would be run only once when the tests start. But if you have a known file or module in which you do test initializations (and your whole team knows about it), it would be much better.

from your_app.models import User

original_setup = SimpleTestCase.setUp


def new_setup(self):
    User.get_all_usernames.clear_cache()
    return original_setup(self)


SimpleTestCase.setUp = new_setup

Or if you feel like going crazy on lru_cache, you can clear all caches:

import gc
import functools

from django.test import SimpleTestCase

original_setup = SimpleTestCase.setUp


def new_setup(self):
    gc.collect()
    for obj in gc.get_objects():
        if isinstance(obj, functools._lru_cache_wrapper):
            obj.cache_clear()
    return original_setup(self)


SimpleTestCase.setUp = new_setup

I have chosen to monkey patch SimpleTestCase because it's the super class of the other test cases in Django. Here's the inheritance tree of different kinds of TestCases:

unittest.TestCase
|-- django.test.SimpleTestCase
|    |-- django.test.TransactionalTestCase
|    |    |-- django.test.TestCase

Also, note that the call to original_setup is redundant at this time and I've included it just in case something changes in the future. But django.test.SimpleTestCase doesn't implement setUp at all, and here's the implementation in unittest.TestCase:

# unittest.TestCase
def setUp(self):
    "Hook method for setting up the test fixture before exercising it."
    pass

Pros

If you have this issue in many test cases, it resolves them all.

Cons

It's something automagical that happens behind the scenes, hidden away in some file.
If you're not careful, it could disable caches that shouldn't have been and you'll end up chasing a bug for hours (regardless of the fact that generally, tests should not rely on any sort of cache)
Like the previous method, not proper cache invalidation is in place.

Finally, I think that the first solution is best, and for cache invalidation:

Do it on save or when a post save signal is sent
When you update data in a way that doesn't call the save method, remember to invalidate or reconstruct the cache manually.

Well, I hope this article was helpful. Remember that you should always pick the best solution based on your requirements. And please share any mistakes you might have found in the article or your better ideas for solving this.

Dealing with lru_cache While Testing Django Applications

The Problem

The Solution That Should Not Be Considered

The Solution That May or May Not Work

The Clean Solution

The Lazy Man's Solution

Comments

More from this blog

Setting Up a Machine Learning Pipeline For FREE

Turning Your Camera into a Keyboard

Command Palette

The Problem

The Solution That Should Not Be Considered

The Solution That May or May Not Work

The Clean Solution

The Lazy Man's Solution

Comments

More from this blog