<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Backend Realm]]></title><description><![CDATA[Backend Realm]]></description><link>https://blog.nozary.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1711807761005/8WRtKCQqE.png</url><title>Backend Realm</title><link>https://blog.nozary.com</link></image><generator>RSS for Node</generator><lastBuildDate>Sun, 19 Apr 2026 14:10:30 GMT</lastBuildDate><atom:link href="https://blog.nozary.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Setting Up a Machine Learning Pipeline For FREE]]></title><description><![CDATA[Recently I needed to set up a machine learning pipeline for my project, Camera to Keyboard, and since it's an open source project I needed a way to set up a pipeline for free. In this article, you'll read about my approach and its constraints.

A mac...]]></description><link>https://blog.nozary.com/setting-up-a-machine-learning-pipeline-for-free</link><guid isPermaLink="true">https://blog.nozary.com/setting-up-a-machine-learning-pipeline-for-free</guid><category><![CDATA[Python]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Pipeline]]></category><category><![CDATA[automation]]></category><category><![CDATA[CI/CD]]></category><dc:creator><![CDATA[Milad Nozari]]></dc:creator><pubDate>Mon, 01 Apr 2024 10:43:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1711931974302/603de38c-f9fc-4f5d-80e4-302cdff62509.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Recently I needed to set up a machine learning pipeline for my project, <a target="_blank" href="https://blog.nozary.com/turning-your-camera-into-a-keyboard-6a0c1076f4bd">Camera to Keyboard</a>, and since it's an open source project I needed a way to set up a pipeline for free. In this article, you'll read about my approach and its constraints.</p>
<blockquote>
<p>A machine learning pipeline is a series of interconnected data processing and modeling steps designed to automate, standardize and streamline the process of building, training, evaluating and deploying machine learning models.</p>
<p>Source: <a target="_blank" href="https://www.ibm.com/topics/machine-learning-pipeline">ibm.com</a></p>
</blockquote>
<h2 id="heading-requirements">Requirements</h2>
<p>There are several limiting factors that you have to consider before choosing this approach. Especially that we'll be using GitHub Actions for the training process. In short:</p>
<ul>
<li><p>Training the model on CPU has to finish in less than 6 hours</p>
</li>
<li><p>There's a limit on how many times you can download the trained model (a few workarounds have been mentioned, though)</p>
</li>
</ul>
<h3 id="heading-training-constrains">Training Constrains</h3>
<p>As mentioned earlier, we're going to use GitHub Actions and each job in your workflow has a time limit of 6 hours. Moreover, your model's going to be trained on a CPU which is much slower than CUDA. GitHub, however, has started offering <a target="_blank" href="https://resources.github.com/devops/accelerate-your-cicd-with-arm-and-gpu-runners-in-github-actions/">GPU enabled actions</a> in private beta to Teams and Enterprise accounts (at the time of this writing). So whether it will be free for public repositories or not will remain to be seen (highly unlikely).</p>
<h3 id="heading-download-constraints">Download Constraints</h3>
<p>For storing the trained model, I'm using AWS S3's free tier, which offers:</p>
<ul>
<li><p>5GB of storage</p>
</li>
<li><p>100GB of data transfer per month</p>
</li>
<li><p>20,000 GET requests per month</p>
</li>
</ul>
<p>The 5GB storage is fine, you most probably don't need to keep all older model versions. But the other 2 factors need to be taken into account.</p>
<p>Furthermore, your bucket has to be public. You can allow public reads using the following bucket policy, whiling requiring authentication for writing:</p>
<pre><code class="lang-json">{
    <span class="hljs-attr">"Version"</span>: <span class="hljs-string">"2012-10-17"</span>,
    <span class="hljs-attr">"Statement"</span>: [
        {
            <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"PublicList"</span>,
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Principal"</span>: <span class="hljs-string">"*"</span>,
            <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"s3:ListBucket"</span>,
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:s3:::YOUR_BUCKET_NAME"</span>
        },
        {
            <span class="hljs-attr">"Sid"</span>: <span class="hljs-string">"PublicRead"</span>,
            <span class="hljs-attr">"Effect"</span>: <span class="hljs-string">"Allow"</span>,
            <span class="hljs-attr">"Principal"</span>: <span class="hljs-string">"*"</span>,
            <span class="hljs-attr">"Action"</span>: <span class="hljs-string">"s3:GetObject"</span>,
            <span class="hljs-attr">"Resource"</span>: <span class="hljs-string">"arn:aws:s3:::YOUR_BUCKET_NAME/*"</span>
        }
    ]
}
</code></pre>
<h2 id="heading-the-use-case">The Use Case</h2>
<p>The 2 key factors for my use case are that I train an object detection model and I don't have frequent dataset changes (if you do, read on till the end of the article). The dataset is also not large, so I can get away with storing the training data in the repository and I won't even need to use git LFS since each file is pretty small.</p>
<h2 id="heading-the-pipeline">The Pipeline</h2>
<p>Here's an overview of how the pipeline works:</p>
<ol>
<li><p>New training data is committed into the repository</p>
</li>
<li><p>The GitHub Action checks for changes. If any, will train the new model</p>
</li>
<li><p>The trained model is uploaded to S3</p>
</li>
<li><p>Every time my app runs, it will check for a new version of the model and if one exists, it will be downloaded and used</p>
</li>
</ol>
<p>For detecting changes in the dataset, I initially went with checking the current git commit for changes in the dataset directory, which can be done using the following command:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># with the --quiet flag, git exits with code 1 if there are changes</span>
git diff --quiet HEAD~1..HEAD dataset_dir || <span class="hljs-built_in">echo</span> <span class="hljs-string">'changed'</span>
</code></pre>
<p>Ultimately I went with a more robust approach though, which is calculating the checksum of the dataset using md5. Yes, md5 is not secure, but the only concern is a collision and the chances of that are as high as winning the lottery (i.e. none of them are going to happen). But if it happens, feel free to use sha512 <code>¯\_(ツ)_/¯</code>.</p>
<p>What about rollbacks, I hear you say? That's an excellent question. In case the performance of your model depreciates, for example, all you need to do to rollback is to revert the git commit that added new data to the repository and delete the trained model from S3 (if it's already uploaded). Although, this last step could be automated. You can have another workflow that checks for revert commits and if it involves your dataset, deletes the relevant version from S3.</p>
<p>Let's go over the solution in detail now. I will not paste all the code here though, as it will make the article too long, and they're already available publicly. I will however link to the relevant files so that you can easily refer to them.</p>
<h3 id="heading-the-trainer">The Trainer</h3>
<p>First off, I have my trainer class that takes care of the training:</p>
<p><a target="_blank" href="https://github.com/mnvoh/cameratokeyboard/blob/v0.0.3/cameratokeyboard/model/train.py">https://github.com/mnvoh/cameratokeyboard/blob/v0.0.3/cameratokeyboard/model/train.py</a></p>
<pre><code class="lang-python"><span class="hljs-comment"># When instantiating the trainer, you can specify where the trained</span>
<span class="hljs-comment"># model should be copied to. That will allow the trainer to be used</span>
<span class="hljs-comment"># both in CI, and when running the trainer locally, for instance using</span>
<span class="hljs-comment"># `python app.py train`</span>

target_dir = os.path.join(tempfile.tempdir, <span class="hljs-string">'myproject'</span>)
trainer = Trainer(config, target_dir)
trainer.run() <span class="hljs-comment"># Runs the actual training process</span>

<span class="hljs-comment"># You can also get the current model version, or the next version</span>
<span class="hljs-comment"># to be exact, if it hasn't been trained yet </span>
print(trainer.calc_next_version())
</code></pre>
<h3 id="heading-the-ci">The CI</h3>
<p>Now, for the CI action, I opted to have an accompanying python script. That just makes life easier and will keep the workflow simple. You can check out the files here:</p>
<ul>
<li><p>The workflow: <a target="_blank" href="https://github.com/mnvoh/cameratokeyboard/blob/v0.0.3/.github/workflows/ml_pipeline.yml">ml_pipeline.yml</a></p>
</li>
<li><p>The python script: <a target="_blank" href="https://github.com/mnvoh/cameratokeyboard/blob/v0.0.3/ci_train_and_upload.py">ci_train_and_upload.py</a></p>
</li>
</ul>
<p>The workflow has 5 steps:</p>
<ol>
<li><p>source checkout</p>
</li>
<li><p>configure-aws-credentials: Get credentials to make requests to s3 (required for the next step)</p>
</li>
<li><p>Train: Calls the <code>train</code> function in <code>ci_train_and_upload.py</code>. Before training, though, it checks whether the current version has already been trained and uploaded to S3.</p>
</li>
<li><p>configure-aws-credentials: Again, yes. In my case, training would take more than an hour, which is the default expiration time of the AWS token. And alternative to getting the credentials again is to set the <a target="_blank" href="https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#credential-lifetime">Credential Lifetime</a> parameter.</p>
</li>
<li><p>And finally, upload the model to S3 by calling the <code>upload_model</code> function in <code>ci_train_and_upload.py</code></p>
</li>
</ol>
<h2 id="heading-integrating-the-pipeline-into-the-app">Integrating the Pipeline Into the App</h2>
<p>Now's the time to reap the rewards of the pipeline. We can simply get the objects in our s3 bucket, find the latest one based on <code>LastModified</code>, and check if it has already been downloaded or not. If not, download it! Here's the implementation of that class:</p>
<p><a target="_blank" href="https://github.com/mnvoh/cameratokeyboard/blob/v0.0.3/cameratokeyboard/model/model_downloader.py">https://github.com/mnvoh/cameratokeyboard/blob/v0.0.3/cameratokeyboard/model/model_downloader.py</a></p>
<h2 id="heading-final-thoughts">Final Thoughts</h2>
<p>There are a lot of improvements that can be made here. To name a few:</p>
<ul>
<li><p>Training the model from scratch, every time, is redundant and a waste of time. Especially if it takes a long time. Again, I won't have frequent dataset changes, but if you do, consider saving your checkpoints and resume training with the new data, while keeping an eye out for over-fitting.</p>
</li>
<li><p>If you have a large dataset that just can't be trained within 6 hours on a CPU, you can alternatively spin up a remote node (say an EC2 instance with GPU) and train and upload your model on that instance.</p>
</li>
<li><p>If the free tier of S3 isn't enough for you, consider alternative storage options. For instance:</p>
<ul>
<li><p>Cloudflare R2 has a more generous free tier and its <a target="_blank" href="https://developers.cloudflare.com/r2/api/s3/api/">API is S3 compatible</a>.</p>
</li>
<li><p>You might even get away with using <a target="_blank" href="https://developers.google.com/drive/api/guides/about-sdk">Google Drive</a> or <a target="_blank" href="https://www.dropbox.com/developers/documentation/http/documentation">Dropbox</a>. I have not explored these options though and don't know for sure if they're feasible or not.</p>
</li>
</ul>
</li>
<li><p>And finally, regarding the versioning system, I'm still not sure if that's the best idea. It has its own merits, but maybe just following a semantic versioning and tagging the models with the commit IDs that introduce changes to the model (for rollbacks) is a more solid approach. It all depends on your use case, though.</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Turning Your Camera into a Keyboard]]></title><description><![CDATA[A while ago, looking at my big mouse mat I decided to print my own (never did though, so it makes this story a bit ironic). Then I realized my keyboard was in the way. So I thought how fantastic it would be if you could just print your keyboard, seam...]]></description><link>https://blog.nozary.com/turning-your-camera-into-a-keyboard-6a0c1076f4bd</link><guid isPermaLink="true">https://blog.nozary.com/turning-your-camera-into-a-keyboard-6a0c1076f4bd</guid><category><![CDATA[virtual hardware]]></category><category><![CDATA[Python]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Computer Vision]]></category><dc:creator><![CDATA[Milad Nozari]]></dc:creator><pubDate>Tue, 19 Mar 2024 10:56:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1711739733887/2064f181-3d17-4fe3-a704-f1d26c942f3e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A while ago, looking at my big mouse mat I decided to print my own (never did though, so it makes this story a bit ironic). Then I realized my keyboard was in the way. So I thought how fantastic it would be if you could just print your keyboard, seamlessly integrated into the design. I searched for a while but didn’t find any programs that would do that. And that’s the core idea behind this project and how/why I started this journey. Although, it could also have other substantial applications, such as in cell phones. Just put your phone in front of you and you have a keyboard. Or maybe in VR. That’s a mighty long way ahead though, at least for me and this project, since at the time of writing, it’s a PoC. I have only been working on this for almost 14 days now, and in my opinion it’s not a trivial problem to solve. What’s certain though is that in time it will get much, much better.</p>
<p>Figure 0 — The apple that hit me in the head. Now I’m no designer for sure and even this lame design took me an hour (just designing that keyboard and putting it over the mat) but I’m sure talented people would make amazing designs.</p>
<p>So let’s dive into how it works and how it was implemented. But before that here’s a disclaimer: I neither specialize in math nor in CV or ML, in fact I’ve been a backend dev for 7 out of 8 years of my professional career. But I just saw a problem and had to solve it (couldn’t help myself, sorry). So there are probably many mistakes here. Feel free to speak up and point them all out!</p>
<h3 id="heading-how-does-it-work">How does it work?</h3>
<p>The app requires a camera and 4 markers (aka control points, aka Position Detection Patterns, aka Finder Patterns) in front of the camera to detect the boundaries of the imaginary keyboard. Ideally, though, the user would need to know where the keys are, so the markers print could include an actual keyboard as depicted in figure-1.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711739007782/27bd7e3d-5764-4d9c-8a6b-5d6f96583068.jpeg" alt /></p>
<p><em>Figure 1 — The keyboard</em></p>
<p>The actual virtual hardware hasn’t been implemented yet, and it’s in the roadmap.</p>
<h3 id="heading-the-challenges-and-the-solutions">The Challenges and the Solutions</h3>
<p>In this section I will go over <strong>some</strong> of the main challenges, their solutions and future plans for them.</p>
<h3 id="heading-challenge-1-the-model">Challenge 1: The Model</h3>
<p>At first, I thought that this was my biggest challenge by a large margin (spoiler alert; I was wrong). So I dove in, and decided to use YOLOv8. Thanks to the people at Ultralytics, that was one of the easiest tasks of my life. Except for the annotations. Unlike training and inference (and how easy doing those was), labeling hundreds of images was one of the most cumbersome tasks I had ever done. And here’s the worst part, I had to do it multiple times. First time, my images just weren’t good enough. Second time, everything worked out fine, and it was working flawlessly as seen in figure-2.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711739008924/45675030-d654-47a6-b7b8-6b16dfb86677.jpeg" alt /></p>
<p><em>Figure 2 — The first model</em></p>
<p>Here’s what was wrong with that though, the pinky finger boxes, for instance, were too wide. Basically, there just isn’t a reliable way to get the coordinates for the fingertips. Or is there? Yes, and it’s called computer vision and machine learning. So I went back to the drawing board, literally, to draw the bounding boxes from scratch. But this time, only including fingertips. I wasn’t optimistic since we’re supposed to be working with webcams with low picture quality as well, and I was afraid that there might not be enough detail for the model that way. Thankfully, it worked.</p>
<p>Another thing that’s worth mentioning is that I added only 2 classes. Fingers and thumbs. Maybe not the best decision, but I figured that when typing, thumbs are only used for the space bar. And I’m 99% positive that if I had added a class per finger, I would’ve gotten a considerable amount of false positives/negatives.</p>
<p>There’s also a whole module dedicated to dataset preparation. I have all my files (images and labels) in a single directory, then I partition and augment them. But this whole module will be completely removed in favor of a better data pipeline, maybe Roboflow.</p>
<h3 id="heading-challenge-2-mapping-coordinates">Challenge 2: Mapping Coordinates</h3>
<p>Now that I had the detection results, it was time to determine which is which. Based on the time that I had, I went for the most naive approach. Regarding the markers, due to perspective distortion the upper markers (relative to camera’s view) are closer together (i.e. they form a trapezoid), so enumerating them from left to right yields bottom left, top left, top right and bottom right. This needs to be improved so that the points are validated in case of false positives.</p>
<p>The same goes for the fingers, assigned them based on their order. If I wanted to improve this, though, it’s probably going to be harder than the markers. I tried combining the current model with depth prediction transformers (tried both intel/dpt-large and intel/dpt-hybrid which is the smaller and faster one) but they are way too slow for a use case like this which requires tens of predictions per second.</p>
<p>Ultimately, I feel like I should’ve gone with pose estimation instead of object detection from the beginning. Having the pose of the fingers and their angles would help with identifying the fingers, especially the pinkies. Just having those two coordinates would make the acceptable area for finger coordinates much smaller, thus reducing the chance of errors.</p>
<h3 id="heading-challenge-3-detecting-keystrokes">Challenge 3: Detecting Keystrokes</h3>
<p>Remember how I told you about thinking the model would be my biggest challenge? Well, this right here is the greatest challenge, the final boss, the bane of the project, you get the idea. Even with my limited knowledge in math/this field, I justifiably thought that calculating coordinates in 3D space with just ONE image from a single angle and 2d coordinates and no extra hardware was virtually impossible. Sure, if you had 2 cameras, things would’ve been different, but that’s not a reasonable requirement to have. But I also didn’t think that it would be this hard, when I was imagining the solution, it seemed much simpler having those 4 markers (pfft, imagination, right?).</p>
<p>So now that determining exact 3D coordinates is out of the question, let me explain clearly why that is a problem. Imagine (or just checkout figure-3) that your finger is hovering above one of the keys on the second row (from the top). To the view of the camera (which is in front of you) it could seem like your finger is down on a key on the fourth row (because we can’t get the exact x, y, z coordinates). So it’s not directly possible to say for certain which key is being pressed, or not being pressed for that matter.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1711739010328/a9d69467-9868-48c2-9a4e-c84f826ef24e.jpeg" alt /></p>
<p><em>Figure 3</em></p>
<p>Now let’s explore the solutions I came up with and which one worked. First, I prepared a layout file, which consists of all the relative fractional coordinates of the keys (for example, box <code>[x: 0.1, y: 0.2, w: 0.07, h: 0.2]</code> is the Q key). This would also help with making keyboard physical layouts configurable.</p>
<p>The first solution was comparing the current distance of the fingers from a reference point (in this case the average Y position of all fingers) against a calibration value. It did not work out well! In my next attempt, I replaced the reference point with an adjacent finger. It was basically the same, failed.</p>
<p>That brings us to my final attempt, which doesn’t work well enough for a functional keyboard, but it’s way better than the previous disappointments. I introduced velocity into the equation over a sliding window. I used this in combination with the previous solution. If a finger is lower than it should be, and it has a negative velocity, relative to a downward +Y axis, then that finger should be on its way back home after a long day’s work (pressing a key).</p>
<p>After being able to tell which finger was down where, the rest was easy. I just got the perspective transform matrix of my markers in a unit box (i.e. (0, 0) to (1, 1) box) and got the dot product of that matrix and my finger’s coordinates. Which I then used in my keyboard layout to map it to a key. Here’s that part of the code:</p>
<pre><code class="lang-python">perspective_boundry = np.float32([
    markers.bottom_left_marker.xy,
    markers.top_left_marker.xy,
    markers.bottom_right_marker.xy,
    markers.top_right_marker.xy,
])
target_boundry = np.float32([
    [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>], [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>], [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>], [<span class="hljs-number">1</span>, <span class="hljs-number">1</span>],
])

matrix = cv2.getPerspectiveTransform(perspective_boundry, target_boundry)
transformed = np.dot(matrix, [*finger_coordinates.xy, <span class="hljs-number">1</span>])
transformed /= transformed[<span class="hljs-number">2</span>]

<span class="hljs-comment"># x = transformed[0], y = transformed[1]</span>
</code></pre>
<p>The next solution I will try will be using signal peak detection. Finger positions are essentially a signal, a time series. And when you lower your finger to press a key and take it back up, you’re kind of forming a down-facing parabola and the peak of that parabola will be the coordinates of the pressed key.</p>
<h3 id="heading-challenge-4-ui">Challenge 4: UI</h3>
<p>I really don’t think this is even worth mentioning as a challenge, it really wasn’t but here it is anyway. For the UI, I chose pygame + pygame-gui, and it’s probably one of the worst decisions I made in the project. Not because those are bad libraries, on the contrary. But because they have another use case. Nevertheless, I had a few encounters way back when concerning updating an image view many times a second in desktop GUI libraries. I just didn’t think it was possible or at least even remotely efficient without hardware acceleration. Not a big deal though, it’s just one class, and it can be replaced at any time effortlessly. To the outside world, this is what it looks like:</p>
<pre><code class="lang-python">detect_task = asyncio.create_task(self._detect())<span class="hljs-keyword">await</span> self._ui.run()
<span class="hljs-comment"># when the detection task detects markers, fingers, etc:</span>
self._ui.update_data(detected_frame_data=self._detected_frame)

<span class="hljs-comment"># when a keystroke is detected</span>
self._ui.update_text(key)
</code></pre>
<p>So it’s really easy to just swap it out for something else, for instance another UI developed with Tkinter, since the finished app wouldn’t even need to show the live feed, maybe except in a diagnostics view, or in a calibration view if it’s still needed by then at all (hopefully calibration won’t be required in the future).</p>
<p>Thanks for reading. Again, feel free to share your thoughts, improvement ideas, questions or criticisms. And if you’re interested, feel free to contribute to the project on GitHub, here: <a target="_blank" href="https://github.com/mnvoh/cameratokeyboard">https://github.com/mnvoh/cameratokeyboard</a></p>
]]></content:encoded></item><item><title><![CDATA[Dealing with lru_cache While Testing Django Applications]]></title><description><![CDATA[lru_cache is the simplest method of caching expensive function calls. It's hassle-free and it doesn't need a backend (such as Redis). But there's a gotcha when testing your application. Data could leak from a test case to another since they are cache...]]></description><link>https://blog.nozary.com/dealing-with-lrucache-while-testing-django-applications</link><guid isPermaLink="true">https://blog.nozary.com/dealing-with-lrucache-while-testing-django-applications</guid><category><![CDATA[Django]]></category><category><![CDATA[Testing]]></category><category><![CDATA[caching]]></category><dc:creator><![CDATA[Milad Nozari]]></dc:creator><pubDate>Sat, 27 Feb 2021 06:34:10 GMT</pubDate><content:encoded><![CDATA[<p><code>lru_cache</code> is the simplest method of caching expensive function calls. It's hassle-free and it doesn't need a backend (such as Redis). But there's a gotcha when testing your application. Data could leak from a test case to another since they are cached. Depending on what you're testing and how you're testing it, this might not be a problem at all, but if each test case expects different inputs or outputs, this could turn ugly.</p>
<p>In this article, we will go over several workarounds and solutions to deal with this problem.</p>
<h2 id="the-problem">The Problem</h2>
<p>Imagine this hypothetical scenario:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">User</span>(<span class="hljs-params">Model</span>):</span>
    username = models.CharField(max_length=<span class="hljs-number">16</span>)
    ...  <span class="hljs-comment"># Other fields</span>

<span class="hljs-meta">    @classmethod</span>
<span class="hljs-meta">    @lru_cache(maxsize=1)</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_all_usernames</span>(<span class="hljs-params">cls</span>) -&gt; set:</span>
        all_usernames = cls.objects.distinct(<span class="hljs-string">'username'</span>) \
            .values_list(<span class="hljs-string">'username'</span>, flat=<span class="hljs-literal">True</span>)
        <span class="hljs-keyword">return</span> set(all_usernames)


<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">UserTestCase</span>(<span class="hljs-params">TestCase</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_something</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-comment"># Hopefully you're using FactoryBoy or Fixtures instead of this</span>
        user = User.objects.create(username=<span class="hljs-string">'uname_1'</span>)

        <span class="hljs-comment"># other stuff to make this test make sense!</span>

        self.assertTrue(user.username <span class="hljs-keyword">in</span> User.get_all_usernames())

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_something_else</span>(<span class="hljs-params">self</span>):</span>
         user = User.objects.create(username=<span class="hljs-string">'uname_2'</span>)

        <span class="hljs-comment"># other stuff to make this test make sense!</span>

        self.assertTrue(user.username <span class="hljs-keyword">in</span> User.get_all_usernames())
</code></pre>
<p>When <code>test_something</code> is run, it passes. But the return value of <code>get_all_usernames</code> is cached, and when <code>test_something_else</code> is run, the data from the previous test leaks into this one, and you'll end up with a failed test.</p>
<h2 id="the-solution-that-should-not-be-considered">The Solution That Should Not Be Considered</h2>
<p>Before going over the actual solutions, I will talk about a solution that I've seen being implemented. And it's implementing another wrapper around <code>lru_cache</code>, and using it only if not in testing mode. Generally making your code aware of the environment (testing, production, etc) is a bad practice, and could lead to unexpected errors. For instance by making a problem go away only in testing.</p>
<h2 id="the-solution-that-may-or-may-not-work">The Solution That May or May Not Work</h2>
<p>An obvious solution is the old-school event-based cache invalidation. You just have to invalidate the cache whenever you create a new user:</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">User</span>(<span class="hljs-params">Model</span>):</span>
    username = models.CharField(max_length=<span class="hljs-number">16</span>)
    ...  <span class="hljs-comment"># Other fields</span>

<span class="hljs-meta">    @classmethod</span>
<span class="hljs-meta">    @lru_cache(maxsize=1)</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_all_usernames</span>(<span class="hljs-params">cls</span>) -&gt; set:</span>
        all_usernames = cls.objects.distinct(<span class="hljs-string">'username'</span>) \
            .values_list(<span class="hljs-string">'username'</span>, flat=<span class="hljs-literal">True</span>)
        <span class="hljs-keyword">return</span> set(all_usernames)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">save</span>(<span class="hljs-params">self</span>):</span>  <span class="hljs-comment"># I was too lazy to include the super class's arguments :-)</span>
        super().save()

        self.get_all_usernames.clear_cache()
</code></pre>
<p><strong>Pros</strong></p>
<ul>
<li>It's the simplest and the cleanest solution (even cleaner than "The Clean Solution" coming up).</li>
<li>It's not just for testing. Proper cache invalidation is of the utmost importance if you are planning on caching anything.</li>
</ul>
<p><strong>Cons</strong></p>
<ul>
<li>It doesn't always work. For instance, the <code>save</code> method is never called if you use <code>bulk_create</code> or <code>bulk_update</code> or if god forbid, you run raw queries (using <code>cursor.execute()</code> for instance). This also means that the pre and post save signals are not sent either.</li>
</ul>
<h2 id="the-clean-solution">The Clean Solution</h2>
<p>IMHO this is the cleanest solution and it's explicit, i.e. it's obvious what's being done to resolve the issue. No hidden magic!</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">UserTestCase</span>(<span class="hljs-params">TestCase</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setUp</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-comment"># Solution is here</span>
        User.get_all_usernames.clear_cache()

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_something</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-comment"># Hopefully you're using FactoryBoy or Fixtures instead of this</span>
        user = User.objects.create(username=<span class="hljs-string">'uname_1'</span>)

        <span class="hljs-comment"># other stuff to make this test make sense!</span>

        self.assertTrue(user.username <span class="hljs-keyword">in</span> User.get_all_usernames())

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_something_else</span>(<span class="hljs-params">self</span>):</span>
         user = User.objects.create(username=<span class="hljs-string">'uname_2'</span>)

        <span class="hljs-comment"># other stuff to make this test make sense!</span>

        self.assertTrue(user.username <span class="hljs-keyword">in</span> User.get_all_usernames())
</code></pre>
<p>As you can see, in the <code>setUp</code> method (which is run before every single test method) we clear the cache using the method provided by <code>lru_cache</code>.</p>
<p><strong>Pros</strong></p>
<ul>
<li>Clean and obvious; no muss to fuss.</li>
<li>Can be done on a per-test-case basis. You can opt-out of doing it in another test case if you want/need to.</li>
</ul>
<p><strong>Cons</strong></p>
<ul>
<li>if caching affects multiple or many test cases, you'd have to repeat yourself a lot.</li>
<li>It's just good for fixing failed tests. If your app relies on or requires cache invalidation, you just pass your test and remain vulnerable to bugs. </li>
</ul>
<h2 id="the-lazy-mans-solution">The Lazy Man's Solution</h2>
<p>This is not a clean solution, and to be honest, it's not my favorite. It's hacky, and hard to find for your teammates. But it gets the job done without having to do the same thing over and over again.</p>
<p>As to where to put the code, I would put it in <code>tests/__init__.py</code> so that it would be run only once when the tests start. But if you have a known file or module in which you do test initializations (and your whole team knows about it), it would be much better. </p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> your_app.models <span class="hljs-keyword">import</span> User

original_setup = SimpleTestCase.setUp


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">new_setup</span>(<span class="hljs-params">self</span>):</span>
    User.get_all_usernames.clear_cache()
    <span class="hljs-keyword">return</span> original_setup(self)


SimpleTestCase.setUp = new_setup
</code></pre>
<p>Or if you feel like going crazy on <code>lru_cache</code>, you can clear all caches:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> gc
<span class="hljs-keyword">import</span> functools

<span class="hljs-keyword">from</span> django.test <span class="hljs-keyword">import</span> SimpleTestCase

original_setup = SimpleTestCase.setUp


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">new_setup</span>(<span class="hljs-params">self</span>):</span>
    gc.collect()
    <span class="hljs-keyword">for</span> obj <span class="hljs-keyword">in</span> gc.get_objects():
        <span class="hljs-keyword">if</span> isinstance(obj, functools._lru_cache_wrapper):
            obj.cache_clear()
    <span class="hljs-keyword">return</span> original_setup(self)


SimpleTestCase.setUp = new_setup
</code></pre>
<p>I have chosen to monkey patch <code>SimpleTestCase</code> because it's the super class of the other test cases in Django. Here's the inheritance tree of different kinds of TestCases:</p>
<pre><code>unittest.TestCase
|<span class="hljs-comment">-- django.test.SimpleTestCase</span>
|    |<span class="hljs-comment">-- django.test.TransactionalTestCase</span>
|    |    |<span class="hljs-comment">-- django.test.TestCase</span>
</code></pre><p>Also, note that the call to <code>original_setup</code> is redundant at this time and I've included it just in case something changes in the future. But <code>django.test.SimpleTestCase</code> doesn't implement <code>setUp</code> at all, and here's the implementation in <code>unittest.TestCase</code>:</p>
<pre><code class="lang-python"><span class="hljs-comment"># unittest.TestCase</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">setUp</span>(<span class="hljs-params">self</span>):</span>
    <span class="hljs-string">"Hook method for setting up the test fixture before exercising it."</span>
    <span class="hljs-keyword">pass</span>
</code></pre>
<p><strong>Pros</strong></p>
<ul>
<li>If you have this issue in many test cases, it resolves them all.</li>
</ul>
<p><strong>Cons</strong></p>
<ul>
<li>It's something automagical that happens behind the scenes, hidden away in some file.</li>
<li>If you're not careful, it could disable caches that shouldn't have been and you'll end up chasing a bug for hours (regardless of the fact that generally, tests <strong>should not</strong> rely on any sort of cache)</li>
<li>Like the previous method, not proper cache invalidation is in place.</li>
</ul>
<hr />
<p>Finally, I think that the first solution is best, and for cache invalidation:</p>
<ul>
<li>Do it on <code>save</code> or when a post save signal is sent</li>
<li>When you update data in a way that doesn't call the <code>save</code> method, remember to invalidate or reconstruct the cache manually.</li>
</ul>
<p>Well, I hope this article was helpful. Remember that you should always pick the best solution based on your requirements. And please share any mistakes you might have found in the article or your better ideas for solving this.</p>
]]></content:encoded></item></channel></rss>