No matter if a team works using pair programming or by requiring code review before checking in on the main branch, code formatting is a constant source of distraction, when not a source of discussion.

Writing even a few lines of code requires a lot of code style decisions, which, if not agreed in advance, yields a lot of bickering and hard to read code review diff.

The obvious step is to establish team-wide style guide, to avoid arguments and limit messy diff.

But we are developers, we are lazy by nature, and there is little immediate reward in constantly reformatting by hand the code you are writing. You know it matters, but it’s a burden (and an unwise way to spend time) to do it manually.

I have been using a style guide for a very long time, and very early on I started pushing checks in the CI.

But I still felt I was lacking a ton of automation, so I started studying the tools that would near me to the complete automation goal.

I want to share my findings in this quest, that might be useful for others.

Adapting expectations

Automation comes at a cost: you depend on the style rules available in existing linting/formatting tools (unless you want to write one), and you have to balance what you think as “good” code, with what the tools can achieve.

So the first, less pleasing, part of the process is going through the existing tools, and choosing the ones closer to what you want, and then tweaking their output to suit your style; but in the end you will have to accept a few compromises. It took me a while to get along with this (and it’s one of the reasons why it’s been a long process to me).

After a few months of tests with real projects, I think I found a pretty good balance, and I learnt to live with the less pleasing outcomes.

Running the checks

As we depend on tools to handle all the checks, we must decide when to run them.

As everyone in the team is free to use their editor, we decided to keep the checking / formatting configuration in the repository itself, as an easy way to use exactly the same tools without relying on mixed support from each editor.


This made also very trivial to run the same checks in the continuous integration pipeline, as we want to run the checks on every push to ensure code is sane when you open the pull request and avoid he annoying code style comments during the review. Over the years and I have used gitlab-ci, travis, github actions and others. As I wrapped all the checks in [tox environments](( the CI specific details didn’t matter and I ported configurations across CI environments with zero effort.



But before pushing, you want to make sure everything passes,to avoid wasting CI resources, and your time. For a long time, I relied upon simply running tox locally, which was effective but tedious. And then I decided to give a read to the pre-commit documentation and it made my life so much easier. Pre-commit create a pre-commit git hook and launch a set of checks on every file staged for commit, before it’s committed: this ensure that committed files respect the configured checks. This is really game changing, because you can use your coding workflow without any extra step required for formatting and linting: edit, save, add and commit, and all the changes will be validated. Apart from adding the standalone tools, pre-commit comes with a lot of integrated checks (which partially overlaps with the others, but the added value is totally worth the extra steps) (see config file for the configured checks and pre-commit hooks documentation for the detail). For projects where python and JavaScript code are in the same repository, JavaScript lint-staged command is also integrated as a pre-commit tool, to run the frontend linting together with the python one.

The checks

Code style and format

This is of course the core of a linting suite, and where the team consensus can be somewhat harder to reach.


Example Editorconfig is a specification for basic text formatting (line length, indentation rules etc) which is used by the text editors and IDEs as a way to configure editor automation. It’s not python specific and it only provides basic rules, but it’s very handy as it’s basically transparent, once you enable editorconfig support in your editor.



Flake8 is the first tool I introduced quite a while ago, and it does quite a good job.

It has a very comprehensive set of rules to check the code for style or logic errors. It covers a lot of non-trivial cases, and it brings a much more consistent code.

In this iteration I added a lot of plugins for quite a strict configuration, even if most of the work is done by black, this still find quite a few quirks worth of being fixed.

Plugin Description
flake8-broken-line Report usage of \ to break new lines, use () instead
flake8-bugbear very opinionated -but very sensible- syntactic and logic checks
flake8-builtins Report local variables shadowing python builtins
flake8-coding Used to reject encoding magic comments in files: as codebase is now py3 only, we don’t need those
flake8-commas Check trailing commas, it has some compatibility issue with black
flake8-comprehensions Advise about usage of comprehensions and generators to help writing more idiomatic code
flake8-eradicate Report commented out code: we don’t need it and we can retrieve older versions of the code from git
flake8-quotes Check quote consistency, somewhat obsolete due to black formatting quotes according to its rules
flake8-tidy-imports Enforce some rules for module imports


Example Stop worrying about organising imports has been one of the easiest victory I achieved in my quest for the perfect linting setup.

isort is basically the tool you want to use to organise python imports, it does a great and consistent job, with many different styles to suit any preference, and once you set it up it just runs and do its job. Only occasionally I found weird inconsistencies, but they can be easily fixed. Beware if using with black, as the import formatting style must match the one expected by black.


Black is a very much touted tool nowadays, and for good reasons.

It’s fast, it’s easy to use, and it’s really consistent.

It does a great job at formatting the code according to the code style, and I have yet to find an error (as in code that doesn’t run or with black introduced bug).

On the other hand it’s totally opinionated to the point that you barely can configure it. It took me a while to digest this, even more as some of its style decisions look questionable to me. Yet, the output is good enough and the amount of work it lift is so much, that I think it would be stupid not to use it.


My latest discovery is this tool (which is currently only integrated in pre-commit), which detects and improve the syntax according to the features available in the target python version. Going through its documentation is also very instructive to learn more idiomatic python.


One of the most overlooked part of developing the software is the inline documentation (with narrative documentation being the most overlooked part), so a gentle (or not so gentle) nudge toward writing documentation is really important.


The most basic check for your documentation is that it compiles successfully, by adding Sphinx building to the pipeline checks, we ensure the correct syntax. On github you can just configure building documentation on each pull request via readthedocs, which is definitely the best way to test the documentation as it provides a browsable fully rendered version for proofreading and other non automated checks.

On GitLab you can add the relevant job line 13 ensuring the -W flag is used to treat warnings as errors.


Example Pydocstyle checks the presence of doc strings at different levels (module, classes, methods, etc), the formatting and its wording style.

Its default configuration is quite aggressive, and it required me quite a few try to balance automatic check stubborness with actual useful documentation.

It’s still somewhat the weakest link in this setup. Maybe because I don’t write enough documentation?🤔


Example It’s a very recent addition and I am still evaluating it (by setting its failure threshold to 0).

On one hand its logic is simpler than pydocstyle: it merely count the number of docstrings-covered “objects” (module, classes, methods, etc).

But this approach provides a more nuanced approach at covering code with documentation, as it allows to avoid adding stub / void doc strings where not needed, or when working on an existing codebase.

My only current concern is its unnecessary failure when an excluded directory does not exist, which leads to more maintenance. I plan to propose a PR to fix this.

Package information

If only I can count the botched pypi uploads due to errors in manifest file or other package meta information (namely correct rst formatting in package long description). While not fundamental, they are a good safeguards for easier applications maintainer life.


Example Does exactly what its name says, it checks the manifest file against the repo files and report any file not included in manifest declarations. Even in mature applications, it saved me in a couple of occasions; it has a bit of setup/maintenance work at the beginning as you will have to accurately check which files legitimately are not going in the package file.


Example If you are going to release your application as a package, you’d better check that it actually builds. Among other things, pep517 does exactly that.


Example I use twine to upload built packages to pypi or private registries, but it also provides a command to check the long description syntax, which is really important as pypi is now refusing packages with broken long description, and you don’t want to do multiple releases because of a syntax typo somewhere.


Moving to auto formatted and strictly checked codebase helped me a lot focusing more on the substance of writing code (design and architecture) and less on the formal details.

When working with a team, it also significantly reduced the time for a code review and to get a pull request merged, as one can focus on the implementation by having a cleaner diff and a consistent code.

The journey is far from over, as there is room for improvements in the current setup (starting from checking overlapping between flake8 and black), but overall it is proving a solid base with interesting returns.