Data migrations à la Lino

As the maintainer of a database application that is being used on one or several production sites you will care about how these production sites will migrate their data.

Data migration is a complex topic. Django needed until version 1.7 before they adapted a default method to automating these tasks (see Migrations). Django migrations on a Lino site describes how to use Django migrations on a Lino site.

But Lino also offers a very different approach for doing database migrations.

Advantages of migrations à la Lino:

  • They make the process of deploying applications and upgrading production sites simpler and more transparent. As a site maintainer you will simply write a Python dump before upgrading (using the old version), and then load that dump after upgrading (with the new version). See Upgrading a production site for details.

  • They can help in situations where you would need a magician. For example your users accidentally deleted a bunch of data from their database and they don't have a recent backup. See Repairing data for an example.

Despite these advantages you might still want to use the Django approach because Lino migrations have one disadvantage: they are slower than Django migrations. Users cannot use the site during that time. There are systems where half an hour downtime for an upgrade is not acceptable.

Rule of thumb: If your application uses either the inject_field or BabelField features (or if it uses a plugin that uses them), then Django migrations won't work. If your site does need to use Django migrations, then you cannot use inject_field and BabelField.

General strategy for managing data migrations

There are two ways for managing data migrations: either by locally modifying the restore.py script or by writing a migrator.

Locally modifying the restore.py script

Locally modifying a restore.py script is the natural way when there is only one production site that needs to migrate and when the application developer is also the site maintainer. It is a common situation when a new customer project has gone into production but is being used only on that customer's site.

Certain schema changes will migrate automatically: new models, new fields (when they have a default value), unique constraints, ...

If there were unhandled schema changes, you will get error messages during the restore. And then you can just change the restore.py script and try again. You can run the restore.py script as often as needed until there are no more errors.

The code of the restore.py script is optimized for easily applying most database schema changes. For example if a model or field has been removed, you can just comment out one line in that script.

TODO: write detailed docs

Designing data migrations for your application

Designing data migrations for your application is easy but not yet well documented.

The main trick that any restore.py file generated by pm dump2py contains the following line

settings.SITE.install_migrations(globals())

This means that the script itself will call the install_migrations method of your application before actually starting to load any database object. And it passes her globals() dict, which means that you can potentially change everything.

To see real-life example, look at the source code of lino_welfare.migrate and lino_welfare.old_migrate.

A magical before_dumpy_save attribute may contain custom code to apply inside the try...except block. If that code fails, the deserializer will simply defer the save operation and try it again.

Models that get special handling

  • ContentType objects aren't stored in a dump because they can always be recreated.

  • Site and Permission objects must be stored and must not be re-created

  • Session objects can get lost in a dump and are not stored.

Writing a migrator

When your application runs on more than one production site, you will prefer writing a migrator.

TODO: write detailed docs