Deploy scalable and maintainable projects in Python

Deploy scalable and maintainable projects in Python

Overwhelmed by releasing to production? Worry no longer!

Featured on Hashnode

Introduction

You build an amazing new project, go to deploy it, and realize it's overly burdened by unwanted dependencies, a bloated environment, and slow wake times. Who hasn't been there?

While certain packages (Django) or platforms (Heroku) may give a production checklist, I rarely see a generic checklist for releasing production code, so I thought I'd make one! This article will focus on tools you can use in Python for these concepts, but the concepts themselves should relate to any programming language.

Checklist

Stable & Lightweight environment

One of the most important things for a scalable system is a stable & lightweight production environment. For most modern applications, this means a docker container. This can help lower costs and lessen wake times of new instances, as traffic increases.

Smaller instances will help with cost as you'll need a smaller server (if it's a single instance) or a smaller cluster of servers (if it can scale to multiple instances). If it can scale to multiple instances (as is the case with most serverless applications), a smaller container typically lessens wake times, which we'll dive more into below.

(For a full write-up on serverless applications, checkout my previous article).

Clean dependency list

The easiest way to have a lightweight environment is to only have your production dependencies be loaded into production. This typically excludes things like testing, linting, and formatting packages as well as anything needed for one-off scripts or development.

To help with this, I use Poetry for my dependency management as it makes it easier to differentiate between production and development dependencies. Let's see how!

First I install a production dependency,

$ poetry add FastAPI

Now my pyproject.toml has the follow entries,

[tool.poetry.dependencies]
python = "^3.8"
fastapi = "^0.75.0"

Then let's add a dev dependency,

poetry add pytest --dev

And now I have it under my dev dependencies!

[tool.poetry.dev-dependencies]
pytest = "^7.0.1"

Later when I use poetry install --no-dev, it will know to only install my production dependencies. You can also use poetry export to export your production dependencies to a requirements.txt.

Pin your dependencies

Something I see a lot of times in a requirements.txt like so,

pandas
numpy
scikit-learn

While it's good to have a dependency list in general, having them unpinned can be extremely dangerous in production. Think about this. You make a quick small bug change on January 30th, 2020 thinking everything is normal when your beautiful data-science project starts crashing. You're getting these weird pandas missing argument errors that you've never seen before for a project that's been running for a well over a year. You go into your local environment to reproduce when you realize pandas-1.0.0 has been released and there are some backwards incompatibility issues.

Now this is only an example but situations like this happen all the time and are usually not too bad but can sometimes cause wide outages. For this a simple fix is pinning dependencies. Typically this looks something like pandas ~= "1.2.0" (pandas = "^1.2.0" in Poetry). This is saying "get any version of pandas that is above "1.2.0" but below "2.0.0". For some more "experimental" packages like FastAPI I'd even recommend pinning it to a specific version like FastAPI = "0.75.0".

Efficient base image

If you're using Docker for your deployment, then having a good base image is a must. For those new to Docker, that is the FROM ... at the start of the file. When starting to use Docker, I always just copy and pasted from various sources without knowing what I was really doing and a lot of the time my base images were way too complex (and heavy) for what I needed.

Your base image is the core of your container so by having a temperamental (Alpine) or bloated image, you run the risk of wasting money and (wake) time.

For a more in-depth look at base images check out pythonspeed's article. As for me, I typically use python:<version>-slim or python:<version>-slim-bullseye image (with the <version> depending on what the specifications of the project are) and then change it if any added resources are needed.

Ignore unnecessary files

Lastly, having a good environment comes with ignoring unnecessary files in production. This typically comes in the form of a .dockerignore file but can also be something like a .gcloudignore if you're working with the Google Cloud Platform system. These files allow you to ignore files from entertaining into your production container and bloating it up. These typically include things as small as testing, linting, and formatting files (.coverage, pytest.ini, etc.) all the way to not loading in an entire virtual environment (venv/).

I typically start with this one.

Those are the main things I focus on when building an efficient production environment / container! Now let's move on to security within a production environment.

Security

For this, I won't go into authentication or anything of the sorts because there's a million resources for that already on the web. For this, I'll go over the main issue that I've seen happen quite often in production.

Put secrets in a vault

When you're in development, it's very easy to have secrets (passwords, admin keys, etc.) all over your code instead of having to source them from somewhere which can be a pain. BUT once this code will be saved forever to a repository (or worse deployed to production) all the secrets should be removed and stored in a vault somewhere. This will allow for another level of security from anyone gaining access to your production environment (and potentially sensitive information).

I'd recommend using AWS' git-secrets to programmatically block secrets from being committed to your git repository.

For a few references, anything running on Github Actions can use Github Secrets and I personally use Google's Secret Manager when working within the GCP environment. As for other suites, they all have some sort of secrets vault and I would highly recommend using them.

Have readable code

Nothing is worse than releasing a project, having it run for a few months, getting a bug, and then not being able to read your previous code. Been there, done that and nothing is worse than being the bad programmer causing your own headaches. To help, there's a few rules to follow that will mitigate this.

Before we start, think about code as an instruction manual. Would it help if all the steps were random characters? Or if one "step" had 10 instructions? Things like this might not pop out when you're first creating a project but being mindful of how your code reads will pay dividends when it comes to maintainability.

Easy to understand variable names

I think we can all agree there's nothing worse than reading through a codebase where none of the variables make sense. The typical culprits I've seen are,

  • Single letter names (n, i, x)
  • Acronyms (pw, abc, xyz)
  • Meaningless names (foo, bar, foobar)
  • Generic names (toggle, increment, process)
  • Reassigning variables throughout the code

Docstrings & comments

This seems to be a contrarian opinion with most senior developers but I haven't come across a code base that is too commented or explained. Yes, maybe one day I'll come across something where every line has a comment but I've yet to see the day. For this, my rule-of-thumb is to error on the side of too much information (giving it doesn't significantly slow down releases) as you'll thank yourself later for the guide.

While large functions should be broken up, sometimes it isn't possible. If you catch yourself with a long or complex chunk of code, I would leave a brief docstring about what it's doing and maybe some inline comments breaking it up into logic chunks.

Avoid gigantic "one-liners"

While list (and dict) comprehensions are a joy of Python, they can quickly get out of control. If your "one-liner" is hundreds of characters and needs multiple lines to be seen easily, you may want to rethink what is happening. Maybe a piece can be moved to a function (or maybe the whole thing should be one). These Pythonic features are meant to make code beautiful but they have to be used correctly to do so.

Summary

Thanks for reading and I hope you enjoyed! For more information about production code check out my other posts below!