Using bulk update in Django 2.2
My work on adding bulk_update()
to Django has
been merged and will be released in Django 2.2! Like my filtered aggregates feature
it relies heavily on the wonderfully versatile CASE
statement to achieve some pretty good speedups for certain use-cases.
The development documentation gives a pretty clear example as to why it’s useful, but I’ll expand on it below a bit.
#The problem
A not-so uncommon use case in Django is iterating through some models, changing a field and saving the results back to the database. For example:
# Translate our posts into German
for post in Post.objects.all():
post.title = translate_to_german(post.title)
post.save()
Each call to post.save()
will execute a single, individual UPDATE
statement, so if you’ve got a million rows to update
this can be a significant slowdown. The most performant way of achieving this is to try and push the computation down into
the database. For example here we tell the database to add 10,000 views to each of our posts:
# Make our posts look really popular!
Post.objects.update(view_count=F('view_count') + 10000)
For simple numerical operations or basic string manipulations this is a great way of quickly updating a lot of rows. But in our first example we want to translate our post titles to German - no databases can do that natively (not yet at least!) so you are stuck with 1 query per model.
#Using bulk_update
to speed this up
With .bulk_update()
you can now update these rows with a greatly reduced number of queries (typically 1):
posts = list(Post.objects.all())
for post in posts:
post.title = translate_to_german(post.title)
# Save all objects in 1 query
Post.objects.bulk_update(posts, ['title'])
Yay! Much better. With a large number of rows this is much more efficient as the database has to do a lot less work.
#The implementation
The implementation turned out to be pretty straightforward. We utilize the CASE
statement inside an UPDATE
to customize
the values we update. For example:
UPDATE posts
SET title=(CASE
WHEN id=1 THEN 'Title 1'
WHEN id=2 THEN 'Title 2'
END)
WHERE id IN (1, 2)
The only tricky part of this process is with Postgres, which has some strict typing rules about expressions. If you have
a CASE
statement where all values are NULL
it is unable to determine the type of the expression and fails. So we
need to add an explicit CAST
:
UPDATE posts
SET title=(CASE WHEN id=1 THEN NULL
WHEN id=2 THEN NULL
END)::TEXT
WHERE id IN (1, 2)
Django’s expressions framework actually supported all of this already so the method is just a nice wrapper around
creating the CASE
expressions and handing a few corner cases.
#Future work
Postgres actually supports a nicer syntax for this
which is even more performant than CASE
:
update posts as t set
t.title = c.title
from (values
('Title 1', 1),
('Title 2', 2)
) as c(title, id)
where c.id = t.id;
However this is going to take a bit more work as we currency do not support VALUES
in this way. You can
follow the ticket for adding this support here if you’re interested.