Why you should not use python in your production system

Published on Thursday, July 25th 2019 at 21:51
Last updated on Thursday, March 21st 2024 at 15:58

Python is undoubtedly a language boasting one of the largest communities to date. Its toolbox is huge and contains a plethora of modules, from web servers to specialized libraries for scientific computing. The flexibility of public libraries and the simplicity of its paradigm and syntax allowed many people to become increasingly familiar with programming, and to quickly bootstrap small and big projects alike. Unfortunately, the same naivety that brings so many people to its yard also causes a panoply of headaches within production systems.

In this article I will explain why, in my humble opinion, you should not use Python in your production environment. Beware I am not implying you should avoid the language alltogether. I am merely suggesting that this particular technology is not fit to support production-grade systems in the long run; at least not without paying a heavy toll in maintenance costs.

Type system (or the absence thereof)

Python lacks strong types

This is probably the flashiest selling point for rookies and amateurs. Python is a duck-typed language: developers do not need to explicitly define what a variable may contain or what a function is supposed to return. Objects simply have to satisfy an “implicit” contract, composed by the set of all fields and methods that are going to be accessed during its lifespan. If it quacks like a duck and walks like a duck… it’s probably a duck. What’s the problem with this approach? Simply the fact the contract between an interface and its client is “silent” and there is no way to enforce it at the language level. Therefore the burden of making sure parameters contain valid objects is unloaded onto the developers, resulting in a codebase polluted with assertions, instanceof and branches. Not only does it make your code less readable but it is less future proof: as the codebase expands, it becomes harder and harder to remember all the places where contracts were enforced. You’ll have to write more tests, resulting in more time wasted in developing checks other languages get for free. Last but not least, the absence of strong types makes automatic refactoring very hard or impossible, further increasing the time developers have to dedicate to maintaining the codebase.

Pros:

Easier for beginners to pick up
Allows flexible meta-programming

Cons:

Assertions/checks scattered around
Automatic refactoring is nearly impossible
IDEs cannot provide suggestions/hints
Contracts are very easily broken
Additional tests required to ensure valid objects are used
Favours poor documentation

Type hints are a placebo

Introduced in python 3.5, type hints are syntactic constructs that allow developers to explicitly specify the type of data any parameter, variable or return object is supposed to contain. Thanks to handy structures like Union, you don’t have to sacrifice flexibility over maintainability. Nevertheless, type hints have absolutely zero effect on your code. The specification simply states they are valid productions in the python grammar, yet the interpreter is not obliged to take any action stemming from them. In fact, they were originally intended as a means to make code more understandable for developers. The official specification actually reads:

“While these annotations are available at runtime through the usual __annotations__ attribute, no type checking happens at runtime. Instead, the proposal assumes the existence of a separate off-line type checker which users can run over their source code voluntarily.”

PEP-484www.python.org/dev/peps/pep-0484

That’s why, if you want to enforce type checking, you have to run another program alongside the python interpreter to validate the source code, for example mypy. This not only adds to your CI/CD scripts, but also means that you have to actively work to make builds repeatable. Since the type checker is a separate entity, any developer can run the program with different arguments. In order to make results the codebase provably consistent, you have to make sure everyone runs the same script. In addition to that, any programmer can decide to ignore, disable or override type hints in any place they deem necessary. Not to say that the same is unfeasible in other strongly-typed language; but in general it is more difficult to alter type semantics unless you really put your mind to it.

Even though type hints are part of PEP, other type checking conventions exist, e.g. type comments. If you work in a large code base, it is possible for different versions of the language to coexist, and making sure type checking is consistent across those versions only adds to the total workload.

Pros:

Allow type-checking
Enable automatic refactoring and other IDE tools

Cons:

Need to configure, learn and use additional tools
Require developers to actively seek a common type-checking profile
Other type conventions exist: need to actively maintain more tools/scripts
Can be easily ignored by developers

Imports and dependencies

By experience, I know that developers spend a lot of time fixing dependencies, making sure builds are repeatable and avoiding new bugs to be introduced by en erroneous version upgrade of one hidden library. While building large, complex software, the glue between the blocks become as important as the block itself. That’s why, in my opinion, a good language should make dependency management a breeze, rather than repeatedly beating developers to death.

Imports are complicated

There is a whole slew of documentation that illustrates how imports work in python. As a developer, I expect an import statement to do exactly what I think it does, without any frills or added magic. However, python seems to have a talent for making things uselessly complicated in the face of a very elegant and simple syntax. The flexibility it allows also incurs in a plethora of potential problems:

Relative imports: these pretty much work like you would expect but only if you remember to place an __init__.py file in the right place. They are however clunky to work with and cause lots of cascading changes during refactoring (which is pretty much expected in any language with relative imports);
Absolute imports: which aren’t really absolute. They depend on the location the interpreter is being executed in. If you run python in the folder of a subpackage, absolute imports will not work as expected;
Local imports: useful for not polluting the import “header”, but they can cause unwanted shadowing of global names, with consequent side effects;
Local imports in a loop: in theory importing a package twice should load the cached version but you can explicitly reload a module, causing all kinds of different problems;
Name aliasing: this is useful and I haven’t found it to produce problems so far.

If you want to read more about the spicy world of python imports, here is an interesting article. Prepare to be amazed by what you thought was simple and linear but it is actually a tortous road towards depression.

Python imports have side effects

Python is a very dynamic language. Nothing exists until code is executed. This also holds for external modules and libraries. Therefore when you import a module, the interpreter actually loads its content in memory and proceeds to execution. Since it is possible to write statements at the root level of any file, importing may cause undesired side effects.

For example, I had a similar experience when I was trying to run tests in our codebase. I could either run them in parallel (same as our CI) or sequentially. Since some of the tests were inexplicably ignored by pytest, I tried to execute them sequentially using Unittest to try and debug potential issues one by one. However, not only did I not get the same outcome, but nothing would execute at all, since my command failed to find any test! Later I discovered this to be caused by a silly mistake: not running the command in the proper directory. Then how did parallel tests work in the first place?? Thirty minutes later we discovered that importing xdist - the module used to run tests in parallel - added the current path to the PYTHONPATH environment variable, which incidentally made tests visible.

This shows you how unpredictable python imports can be. They may cause unwanted side-effects and there is nothing to warn you of such things until it’s too late. Additionally, it is all too easy for developers to fall in the same traps and implement hidden side effects of their own. You may end up downloading hundreds of megabytes of dynamic content or uploading your code to the cloud without even noticing.

Pros:

nothing really

Cons:

Code loading becomes unpredictable
Potential security risks and slow-downs
No easy way to prevent developers from implementing side effects of their own

The package manager that never works alone

Ah, pip… What should I say? It’s painful. Installing packages and maintaining them properly synchronized takes a lot of effort. I should start by pointing out that when you execute pip install, all packages are downloaded and installed to the same location (usually in site-packages) and they are uniquely identified by their package name.

All python applications running on the same (development) machine must share the same packages;
It is not possible to have multiple versions of the same package alongside each other. Thus if you want to develop two projects that depend on different package versions you are pretty much screwed.

Of course, the last statement is too drastic, since we have tools like anaconda or virtualenv, but bear with me for a moment. By default, the official package manager does not allow two of the most fundamental features for dependency management: isolation and specificity. To add insult to injury, if you ever wanted to properly persist a long detailed list of all your current project dependencies, and ran pip freeze, you’d end up with a handful of surprises down the line. In fact, the latter command yields a snapshot of all packages currently installed. Your final requirements.txt will be bloatest with useless modules and/or wrong versions. All these factors combined are simply a recipe for disaster.

You may argue that in the age of contenerization, this should not be a problem. However, developers work on a single machine and they need to cope with multiple different projects at once. It is thus for the sake of people that alternate more sane solutions must exist.

Thankfully, the developer community saw how unfit pip is for serious use cases, and came up with environment managers. These programs allow you to create multiple isolated environments, each one with a specific python interpreter, standard library and module repository. All good and well, but still the development toolchain is more clunky and requires more interaction than running npm install or mvn package.

A language of myth

Python lacks access levels… and is proud of it

I know, this is a feature, not a bug. The python manifesto encourages developers to be responsible in what and how they code, because everything is accessible from anywhere and anyone. It states that visibility modifiers are just meaningful excuses for security. However, it also encourages the usage of underscore-based naming conventions for methods or classes that are not meant to be accessed from external clients. Using two underscores as a prefix instead of just one even triggers name mangling on behalf of the interpreter.
To me, this seems very contradictory.

The idea that everything is public and should be taken with care is just an illusion at best and a mere faerie tale at worst. Developers want things done quickly. They scout stackoverflow posts casting their eagle eye around for working source code. If a workaround yields a fast easy solution, it will be used, even if it entails a temporary suspension of disbelief.
It is just silly that the more underscore you use, the more reserved entities become:

_please_do_not_use_this
__absolutely_do_not_use_this
___for_gods_sake_do_not_invoke_this_or_earth_wll_explode___

In my experience, this does not stop developers from peeking into class internals. The python culture is noble but ultimately succumbs to the harsh reality of human nature: if people are allowed to make mistakes, they will. Computer science is all about preventing users from making wrong decisions. Validation, verification, tests, UX, design… they all serve a similar purpose. Developers are no different. A language should make it hard for programmers to err, not give them all the tools they need to hurt themselves.

In conclusion

Use python, but make it quick. Use it for prototyping. If you absolutely must use it for scientific computation, wrap it in small self-contained units and build on more resilient languages to create an overarching infrastructure.