Django Automatic Migration and Seeding

Published in

Kami PeoPLe

5 min readMar 17, 2020

Get automatic with your data.

This article is written as a part of Individual Review competency for Software Projects course 2020 at Faculty of Computer Science, University of Indonesia.

Courtesy of https://dribbble.com/shots/3541232-Database

Introduction

Some of the most important aspects of every Django project are data. Django provides database support to most Database Management System (DBMS) with a lot of features on store. For everyone learning Django framework, features like model classes and schema migration are essentials. Django provides commands like makemigrations and migrate to apply changes to our database schema without hassle. Every Django developer must have, like, used those two commands numerous times. Beyond schema migration, Django provides support to data migration and data seeding, which is the focus of this article.

Data Migration

A plain makemigrations command is usually used to generate migration files which define changes to the database schema before being applied to the real database by the migrate command. I’d like to point that it performs schema migration, where we tell Django that we plan to “commit” changes to the database, structurally. Here, data migration is quite different from schema migration. Instead of making changes to the schema, it changes the real data (rows) in the database itself.

Unlike schema migrations where Django can generate automatically, we need to define the migration operations by ourselves.

Creating and Running Data Migration

For the following example, suppose that I have a model called UserAccount which subclasses AbstractUser, as a custom user model. Since it subclasses AbstractUser, it inherits many of the fields belonging to AbstractUser like username, password, email, etc. The model is defined in an app named api_auth.

UserAccount model

To start, we need to create an empty migration file. To do so, we actually use makemigrations command. The difference is that we provided extra argument — empty. Basically, it tells Django to generate an empty migration file with added dependencies.

python manage.py makemigrations --empty api_auth

If all goes well, Django will generate a migration file inside api_auth’s migration directory. Note that the name of the file itself is auto-generated since there was no additional arguments given to the command.

Generated migration file

As we can see from the snippet above, the value of dependencies attribute is automatically filled. At this point, to define what our data migration we need to fill the value of operations list.

We’ll go with the basic here. In order to make data migration, we need to:

Define a function/callable that will perform the data migration.
Have RunPython use the function.

RunPython is basically a class that can be used to run custom Python code from a database context. We can use this class to perform custom data updates, alterations, and anything else we need to access the ORM or Python code.

For example, let’s say we want to modify every user instance’s first_name so that they would be set to the value of username or the combination of first name and last name if last_name has any value. Let’s call the function set_first_name . Some things to note that the function should at least takes two arguments, apps which signifies the app where the migration sits, and schema_editor which can be used to manually effect database schema changes (although using this could cause some issues to the migration autodetector).

Defined callback and operation

As we can see from the snippet above, we defined a function set_first_name and have RunPython call it inside the operations list.

To apply this migration, it’s no different than standard schema migration. Just run migrate command and we’re good to go.

python manage.py migrate

Data migration can be useful in a situation where we need to modify existing data after applying changes to models. It is very convenient and flexible because we can define what changes we would like inside a function. But beware of this power since we need to be careful not to break the integrity of our existing database. Having a database dump before applying migration is a good idea in case anything goes south.

Data Seeding

When first setting up an app, it is always good idea to provide some sample data for further testing the app. In this situation, we can pre-populate our database with hard-coded data. Our first approach would be using the admin interface or using Python code inside the Django shell. Those steps work but would take time and somewhat hard to automate.

This section will cover the use of automatic data seeding in three approaches:

Population script
Fixture
Django-seed library

For each example, we will use the following model from a project named Test and app named selection .

Population Script

A population script is just a regular Python script which automatically populate the database in one run.

We will create a script named seed_selection_period.py and place it on the project’s root, where manage.py is placed.

Pay attention to where we import the model class. We need to setup our Django setting (line 6–9) before importing it. Otherwise, the script won’t run correctly.

To run the script, we can just run python seed_selection_period.py . If all goes well, we will get 50 selection period instances in our database.

Fixture

Another way to load data to the database is by using fixtures. Basically, a fixture is a collection of data that Django knows how to import into a database. Fixtures can be written as JSON, XML, or YAML documents. We should store fixtures in a fixtures directory inside the app.

JSON fixture

JSON fixture

YAML

Note that the format is yaml, not yml. Otherwise, Django will fail to recognize the fixture. In addition, we need to have pyyaml library installed first.

To load the data from the fixture, we can just run python manage.py loaddata selection_periods.<json/yaml/xml> .

Each format defined in the fixture is related to how Django serialize objects. Please refer to the related documentation for detailed information.

Django-seed Library

Another way of seeding our database is by using this awesome library. This library allows us to write code to generate models, and seed the database with one simple manage.py command.

To install the library, run pip install django-seed .

Then, add the app to the installed apps list in settings.py :

INSTALLED_APPS = (
    ...
    'django_seed',
)

The library offers a simple command to do automatic seeding. For instance, if we want to seed 50 of each model for the app selection , we can run python manage.py seed selection --number=50 .

Beyond seed command, this library offers automatic seeding with Python code. Please refer to the repository for more information.

Conclusion

Data is an important aspect of our Django app. Knowing how to migrate and seed automatically can ease us in the development, testing, and production phase. As data migration involves changing the existing data, we need to be careful as to not break our database. While data migration can sometime prove useful for production environment, data seeding is more appropriate to use in development or testing environments.

This wraps up the article. If you have any suggestions, don’t hesitate to comment :)

Thanks for reading!

References

Providing initial data for models | Django documentation | Django

It's sometimes useful to pre-populate your database with hard-coded data when you're first setting up an app. You can…

docs.djangoproject.com

Brobin/django-seed

Django-seed uses the faker library to generate test data for your Django models. This has been "hard-forked" from…

github.com