Migrating a multi-tenant Django application to Citus
Here we investigate specifically how to migrate multi-tenant Django applications to a Citus storage backend with the help of the django-multitenant library.
This process will be in 5 steps:
Introducing the tenant column to models missing it that we want to distribute
Changing the primary keys of distributed tables to include the tenant column
Updating the models to use the
TenantModel
Distributing the data
Updating the Django Application to scope queries
Preparing to scale-out a multi-tenant application
Initially you’ll start with all tenants placed on a single database node. To be able to scale out django, some simple changes will have to be made to your models.
Let’s consider this simplified model:
from django.utils import timezone
from django.db import models
class Country(models.Model):
name = models.CharField(max_length=255)
class Account(models.Model):
name = models.CharField(max_length=255)
domain = models.CharField(max_length=255)
subdomain = models.CharField(max_length=255)
country = models.ForeignKey(Country, on_delete=models.SET_NULL)
class Manager(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, on_delete=models.CASCADE,
related_name='managers')
class Project(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager)
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
The tricky thing with this pattern is that in order to find all tasks for an account, you’ll have to query for all of an account’s project first. This becomes a problem once you start sharding data, and in particular when you run UPDATE or DELETE queries on nested models like task in this example.
1. Introducing the tenant column to models belonging to an account
1.1 Introducing the column to models belonging to an account
In order to scale out a multi-tenant model, it’s essential for queries to quickly locate all records that belong to an account. Consider an ORM call such as:
Project.objects.filter(account_id=1).prefetch_related('tasks')
It generates these underlying SQL queries:
SELECT *
FROM myapp_project
WHERE account_id = 1;
SELECT *
FROM myapp_task
WHERE project_id IN (1, 2, 3);
However, the second query would go faster with an extra filter:
-- the AND clause identifies the tenant
SELECT *
FROM myapp_task
WHERE project_id IN (1, 2, 3)
AND account_id = 1;
This way you can easily query the tasks belonging to one account.
The easiest way to achieve this is to simply add a account_id
column on every object that belongs to an account.
In our case:
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, related_name='tasks',
on_delete=models.CASCADE)
Create a migration to reflect the change: python manage.py makemigrations
.
1.2. Introduce a column for the account_id on every ManyToMany model that belongs to an account
The goal is the same as previously. We want to be able to have ORM calls and queries routed to one account. We also want to be able to distribute the ManyToMany relationship related to an account on the account_id.
So the calls generated by:
Project.objects.filter(account_id=1).prefetch_related('managers')
Can include in their WHERE
clause the account_id
like this:
SELECT *
FROM "myapp_project" WHERE "myapp_project"."account_id" = 1;
SELECT *
FROM myapp_manager manager
INNER JOIN myapp_projectmanager projectmanager
ON (manager.id = projectmanager.manager_id
AND projectmanager.account_id = manager.account_id)
WHERE projectmanager.project_id IN (1, 2, 3)
AND manager.account_id = 1;
For that we need to introduce through
models. In our case:
class Project(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class ProjectManager(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE)
manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
Create a migration to reflect the change: python manage.py makemigrations
.
2. Include the account_id in all primary keys and unique constraints
Primary-key and unique constraints on values other than the tenant_id will present a problem in any distributed system, since it’s difficult to make sure that no two nodes accept the same unique value. Enforcing the constraint would require expensive scans of the data across all nodes.
To solve this problem, for the models which are logically related to an account (the tenant for our app), you should add account_id to the primary keys and unique constraints, effectively scoping objects unique inside a given account. This helps add the concept of tenancy to your models, thereby making the multi-tenant system more robust.
2.1 Including the account_id to primary keys
Django automatically creates a simple “id” primary key on models, so we will need to circumvent that behavior with a custom migration of our own. Run python manage.py makemigrations appname --empty --name remove_simple_pk
, and edit the result to look like this:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
# leave this as it was generated
]
operations = [
# Django considers "id" the primary key of these tables, but
# we want the primary key to be (account_id, id)
migrations.RunSQL("""
ALTER TABLE myapp_manager
DROP CONSTRAINT myapp_manager_pkey CASCADE;
ALTER TABLE myapp_manager
ADD CONSTRAINT myapp_manager_pkey
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
ALTER TABLE myapp_project
DROP CONSTRAINT myapp_project_pkey CASCADE;
ALTER TABLE myapp_project
ADD CONSTRAINT myapp_product_pkey
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
ALTER TABLE myapp_task
DROP CONSTRAINT myapp_task_pkey CASCADE;
ALTER TABLE myapp_task
ADD CONSTRAINT myapp_task_pkey
PRIMARY KEY (account_id, id);
"""),
migrations.RunSQL("""
ALTER TABLE myapp_projectmanager
DROP CONSTRAINT myapp_projectmanager_pkey CASCADE;
ALTER TABLE myapp_projectmanager
ADD CONSTRAINT myapp_projectmanager_pkey PRIMARY KEY (account_id, id);
"""),
]
2.2 Including the account_id to unique constraints
The same thing needs to be done for UNIQUE
constraints. You can have explicit constraints that you might have set in your model with unique=True
or unique_together
like:
class Project(models.Model):
name = models.CharField(max_length=255, unique=True)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, related_name='tasks',
on_delete=models.CASCADE)
class Meta:
unique_together = [('name', 'project')]
For these constraints, you can simply change in the models the constraints:
class Project(models.Model):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class Meta:
unique_together = [('account', 'name')]
class Task(models.Model):
name = models.CharField(max_length=255)
project = models.ForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, related_name='tasks',
on_delete=models.CASCADE)
class Meta:
unique_together = [('account', 'name', 'project')]
Then generate the migration with:
python manage.py makemigrations
Some UNIQUE
constraints are created by the ORM and you will need to explicitly drop them.
This is the case for OneToOneField
and ManyToMany
fields.
For these cases you will need to: 1. Find the constraints 2. Do a migration to drop them 3. Re-create constraints including the account_id field
To find the constraints, connect to your database using psql
and run \d+ myapp_projectmanager
You will see the ManyToMany
(or OneToOneField
) constraint:
"myapp_projectmanager" UNIQUE CONSTRAINT myapp_projectman_project_id_manager_id_bc477b48_uniq,
btree (project_id, manager_id)
Drop this constraint in a migration:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
# leave this as it was generated
]
operations = [
migrations.RunSQL("""
DROP CONSTRAINT myapp_projectman_project_id_manager_id_bc477b48_uniq;
"""),
Then change your models to have a unique_together
including the account\_id
class ProjectManager(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE)
manager = models.ForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
class Meta:
unique_together=(('account', 'project', 'manager'))
And finally apply the changes by creating a new migration to generate these constraints:
python manage.py makemigrations
3. Updating the models to use TenantModel and TenantForeignKey
Next, we’ll use the django-multitenant library to add account_id to foreign keys, and make application queries easier later on.
In requirements.txt for your Django application, add
django_multitenant>=2.0.0, <3
Run pip install -r requirements.txt
.
In settings.py, change the database engine to the customized engine provided by django-multitenant:
'ENGINE': 'django_multitenant.backends.postgresql'
3.1 Introducing the TenantModel
The models will now inherit from TenantModel
which is the base model for tenant-based models .
To do that in your models.py
file you will need to do the following imports
from django_multitenant.models import TenantModel
Previously our example models inherited from just models.Model, but now we need to change them to inherit from TenantModel.
You will also, at this point, introduce the tenant_id to define which column is the distribution column.
class TenantManager(TenantModel):
pass
class Account(TenantModel):
...
class TenantMeta:
tenant_field_name = 'id'
class Manager(TenantModel):
...
class TenantMeta:
tenant_field_name = 'account_id'
class Project(TenantModel):
...
class TenantMeta:
tenant_field_name = 'account_id'
class Task(TenantModel):
...
class TenantMeta:
tenant_field_name = 'account_id'
class ProjectManager(TenantModel):
...
class TenantMeta:
tenant_field_name = 'account_id'
3.2 Handling ForeignKey constraints
For ForeignKey
and OneToOneField
constraint, we have a few different cases:
Foreign keys (or One to One) between distributed tables, for which you should use the
TenantForeignKey
(orTenantOneToOneField
).Foreign keys between a distributed table and a reference table don’t require a change.
Foreign keys between a distributed table and a local table, which require to drop the constraint by using
models.ForeignKey(MyModel, on_delete=models.CASCADE, db_constraint=False)
.
Finally your models should look like this:
from django.db import models
from django_multitenant.fields import TenantForeignKey
from django_multitenant.models import TenantModel
class Country(models.Model): # This table is a reference table
name = models.CharField(max_length=255)
class Account(TenantModel):
name = models.CharField(max_length=255)
domain = models.CharField(max_length=255)
subdomain = models.CharField(max_length=255)
country = models.ForeignKey(Country, on_delete=models.SET_NULL) # No changes needed
class TenantMeta:
tenant_field_name = "id"
class Manager(TenantModel):
name = models.CharField(max_length=255)
account = models.ForeignKey(Account, related_name='managers',
on_delete=models.CASCADE)
class TenantMeta:
tenant_field_name = 'account_id'
class Project(TenantModel):
account = models.ForeignKey(Account, related_name='projects',
on_delete=models.CASCADE)
managers = models.ManyToManyField(Manager, through='ProjectManager')
class TenantMeta:
tenant_field_name = 'account_id'
class Task(TenantModel):
name = models.CharField(max_length=255)
project = TenantForeignKey(Project, on_delete=models.CASCADE,
related_name='tasks')
account = models.ForeignKey(Account, on_delete=models.CASCADE)
class TenantMeta:
tenant_field_name = 'account_id'
class ProjectManager(TenantModel):
project = TenantForeignKey(Project, on_delete=models.CASCADE)
manager = TenantForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
class TenantMeta:
tenant_field_name = 'account_id'
3.3 Handling ManyToMany constraints
In the second section of this article, we introduced the fact that with citus, ManyToMany
relationships require a through
model with the tenant column. Which is why we have the model:
class ProjectManager(TenantModel):
project = TenantForeignKey(Project, on_delete=models.CASCADE)
manager = TenantForeignKey(Manager, on_delete=models.CASCADE)
account = models.ForeignKey(Account, on_delete=models.CASCADE)
class TenantMeta:
tenant_field_name = 'account_id'
After installing the library, changing the engine, and updating the models, run
python manage.py makemigrations
. This will produce a migration to make the foreign keys composite when necessary.
Warning
If you use Citus 10, you will need to distribute the model inherited from “models.Model” as reference if you have a “ManyToMany” relationship between a class derived from “TenantModel” and a class derived from “models.Model” as Citus 10 does not support such relationships between distributed and local tables.
Warning
Beginning with Citus 11.3, it is required that identity columns for distributed tables be of type bigint. For more information, please consult the Citus 11.3 changelog <https://github.com/citusdata/citus/blob/main/CHANGELOG.md#citus-v1130-may-2-2023>. From now on, any classes inherited from TenantModel will automatically have bigint as the identity column. If you create new models derived from TenantModel, no additional steps are necessary. However, if you already have existing models derived from TenantModel, you must update the identity column to bigint. After updating models, creating a migration and executing it is necessary afterwards. Furthermore, if you are using models that need to be distributed but do not use TenantModel as the base class, you will need to manually update your models to utilize bigint as the identity column.
4. Distribute data in Citus
We need one final migration to tell Citus to mark tables for distribution. Create a new migration python manage.py makemigrations appname --empty --name distribute_tables
. Edit the result to look like this:
from django.db import migrations
from django_multitenant.db import migrations as tenant_migrations
class Migration(migrations.Migration):
dependencies = [
# leave this as it was generated
]
operations = [
tenant_migrations.Distribute('Country', reference=True),
tenant_migrations.Distribute('Account'),
tenant_migrations.Distribute('Manager'),
tenant_migrations.Distribute('Project'),
tenant_migrations.Distribute('ProjectManager'),
tenant_migrations.Distribute('Task'),
]
With all the migrations created from the steps so far, apply them to the database with python manage.py migrate
.
At this point the Django application models are ready to work with a Citus backend. You can continue by importing data to the new system and modifying views as necessary to deal with the model changes.
Warning
After Citus 11, you may get below error when you try to run the migrations:
ERROR: cannot run type command because there was a parallel operation on a distributed table in the transaction
.
This is because of the new transaction model in Citus 11. To fix this, you can run the migrations in a single transaction by setting below Citus setting at the top of your distribute operations.
operations = [ migrations.RunSQL("SET LOCAL citus.multi_shard_modify_mode TO 'sequential';"),]
Updating the Django Application to scope queries
The django-multitenant library discussed in the previous section is not only useful for migrations, but also for simplifying application queries. The library allows application code to easily scope queries to a single tenant. It automatically adds the correct SQL filters to all statements, including fetching objects through relations.
For instance, in a view simply set_current_tenant
and all the queries or joins afterward will include a filter to scope results to a single tenant.
# set the current tenant to the first account
s = Account.objects.first()
set_current_tenant(s)
# now this count query applies only to Project for that account
Project.objects.count()
# Find tasks for very important projects in the current account
Task.objects.filter(project__name='Very important project')
In the context of an application view, the current tenant object can be stored as a SESSION variable when a user logs in, and view actions can set_current_tenant
to this value. See the README in django-multitenant for more examples.
The set_current_tenant
function can also take an array of objects, like
set_current_tenant([s1, s2, s3])
which updates the internal SQL query with a filter like tenant_id IN (a,b,c)
.
Automating with middleware
Rather than calling set_current_tenant()
in each view, you can create and install a new middleware class in your Django application to do it automatically.
You can either use the base class provided by django-multitenant, or create your own middleware class that calls set_current_tenant
with the appropriate value.
Using the base class provided by django-multitenant
Base class usage is recommended for most applications. It provides a simple way to set the current tenant based on the current user. However, you need to provide a function that returns the tenant object for the current user.
Add
'django_multitenant.middleware.MultitenantMiddleware'
to theMIDDLEWARE
list in yoursettings.py
file:MIDDLEWARE = [ # other middleware 'django_multitenant.middleware.MultitenantMiddleware', ]
Monkey patch
django_multitenant.views.get_tenant
function with your own function which returns tenant object:# views.py def tenant_func(request): return Store.objects.filter(user=request.user).first() # Monkey patching get_tenant function from django_multitenant import views views.get_tenant = tenant_func
Creating your own middleware class
You can add your own Middleware class to your application. This allows you to customize the logic for setting the current tenant. For example, you can set the tenant based on a different value in the session, or based on a different user attribute. Here, below is an example of a middleware class that sets the current tenant based on the current user.
# src/appname/middleware.py
from django_multitenant.utils import set_current_tenant
class MultitenantMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if request.user and not request.user.is_anonymous:
# Your custom logic to set the current tenant
current_tenant=your_method(request)
set_current_tenant(current_tenant)
your
response = self.get_response(request)
return response
Enable the middleware by updating the MIDDLEWARE array in src/appname/settings/base.py:
MIDDLEWARE = [
# ...
# existing items
# ...
'appname.middleware.MultitenantMiddleware'
]