Python Packaging: Are We Nearly There Yet?

A brain dump on deploying python applications.

Aims:

  1. repeatable. In other words no depending on the internet and the C compiler.
  2. revertable. We should not have to restore from backup if things go wrong.
  3. works for third party or locally produced software.
  4. packaging of dependencies. If something is not in the OS we should be able to package it.
  5. upgrades for security are easy.
  6. devveloper’s work is easy.

OS Packages

Advantages:

  • very solid, works everywhere
  • os tools like debsums can come in handy
  • dependencies are easier to some degree, e.g. dpkg-shlibdebs mean subtle ABI changes should not break your packages unexpectedly. This is nice because eventually you end up depending on something from the OS. First party software will probably have at least some kind of CI system going but probably not for 3rd party.
  • you can bundle whatever files you want and just change the import paths to make it work
  • some packages are very easy to debianize and will integrate very well with the system
  • much automation for things like init scripts and man pages

Disadvantages:

  • slow to install large packages
  • packages can be completely arcane. It’s no good if the developers are too scared to change the packaging.
  • pinning exact versions is pretty awkward. We tried specifying versions puppet but it’s way too much ceremony and doesn’t seem to work that well anyway.
  • simple to use debian repositories don’t allow multiple versions or force you to use unsigned packages
  • some software has strange requirements and it takes forever to package them, especially things which expect you to edit their source tree for configuration
  • if you’re missing dependencies or you conflict with the base OS then you’re in for a lot of work
  • you miss out on a lot of automation if you wnat to do something custom like installing your entire tree to /usr/lib/.
  • isn’t well factored for monolithic packages – you just aren’t supposed to do it
  • dpkg -i when you have many packages is a lot of work; rolling back an entire deployment is not one-click either
  • deploying bundled gems is hopeless

Virtualenv

Manually manage a virtualenv on the target machine.

Advantages:

  • simple and it works for what it is supposed to do

Disadvantages:

  • upgrades are manual
  • reverts are manual and would involve reading through some log to work out what you have to change; problematic if you’re deploying from an internal git repository or similar
  • depends on the internet
  • awkward to deploy your application’s initscripts and so on
  • build and so on has to happen on the deployment (unless using other tools)
  • typically these repositories sit there until they are broken and it’s a crisis to fix it

Buildout

Advantages:

  • some automation
  • you can relatively easily copy all your dependencies into a deb and install that

Disadvantages:

  • not a very good buildsystem compared to rake, cmake etc so you end up making scripts anyway
  • no lock file like gems
  • relatively difficult to integrate with 3rd party software
  • problems with system python so you have to deploy a virtualenv anyway (actually I managed without it but I wouldn’t recommend it)
  • for the parts I use it’s not much better than requirements.txt
  • depends on the internet without a lot of work
  • depends on the os-specific compiler
  • obscure to work out what needs upgrading

Wheel

Python binary distributables like eggs but better.

Advantages:

  • seem to be the new standard which solve a lot of obscure problems with eggs
  • make a lot of existing patterns faster (no rebuilds); functions a lot like vendor/cache with ruby bundler
  • build is repeatable so long as you distribute the same .whl files
  • some (most?) can be imported on the pythonpath though this does not seem to be the default
  • possibly a deployment of a virutalenv plus whls would be somewhat less massive than a deployment of buildout eggs

Disadvantages:

  • you still have to distribute them to the deployment machine or package them in a repository
  • the files themselves are not managed as they would be with an install from deb
  • your deployment will potentially need a lot of complicated scripting to manage a virtualenv (are you safe if you run out of disk space for example?) and some aspects of the deployment will always be specific to the venv
  • reverting a package will not necessarily revert your dependencies as would happen with a bundled deb
  • basically has all of the same problems of building a single-package .deb.
  • entry points are hard to get working unless you fully install the whl; it seems in general that having .whl files on the import path is not the done thing.
  • probably some whls will not work unless they are extracted, at least this was the case with eggs

Docker

Using linux-containers to make mini virtual machines.

Advantages:

  • the most complete concept for deploying a whole system
  • would work well for CI

Disadvantages:

  • the work is basically just another kind of packaging except now you need to deploy docker as well
  • still seems to depend on the internet
  • no good for legacy systems where we can’t be arbitrary about what gets installed
  • it doesn’t seem to do anything we can’t already do with puppet (except skip the audit trail …)

(Jury is still out because I haven’t used this one.)

FPM

Wraps many package systems.

Advantages:

  • quick and easy for what it implements
  • usually enough features to get what you want working, provided the default package doesn’t need much work. For example it’s hard to do symlink tricks to move packages’ configuration into /etc
  • some automation is lost from the source package system

Disadvantages:

  • if it doesn’t have a feature you’re stuck
  • doesn’t in general bundle all dependencies, you often get many packages
  • no locking of dependencies so more difficult to maintain a monolithic deployment
  • redundant for 1st party packages because you end up making a custom build around fpm instead of whatever else you’re using

Pbundler

(Which is a simple script I just wrote)

Advantages:

  • manages your dependencies with locks
  • could manage a deployment virtualenv using the lock files and could use whl to make that repeatable
  • explicit locks mean that going from deployment versions rather than just package versions is possible
  • explicit locks simplifies 3rd party monolithic debs

Disadvantages:

  • still needs a complicated script which runs on the deployment environment — ultimately venvs will never be relocatable.
  • chicken and egg deployment if you use it on the production machine and don’t package it globally; bit of a problem given that wheel will need very up to date pip and setuptools
  • just a hackup at the moment
Advertisements
Python Packaging: Are We Nearly There Yet?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s