My work on adding
bulk_update() to Django has
been merged and will be released in Django 2.2! Like my filtered aggregates feature
it relies heavily on the wonderfully versatile
CASE statement to achieve some pretty good speedups for certain use-cases.
The development documentation gives a pretty clear example as to why it’s useful, but I’ll expand on it below a bit.
A not-so uncommon use case in Django is iterating through some models, changing a field and saving the results back to the database. For example:
# Translate our posts into German for post in Post.objects.all(): post.title = translate_to_german(post.title) post.save()
Each call to
post.save() will execute a single, individual
UPDATE statement, so if you’ve got a million rows to update
this can be a significant slowdown. The most performant way of achieving this is to try and push the computation down into
the database. For example here we tell the database to add 10,000 views to each of our posts:
# Make our posts look really popular! Post.objects.update(view_count=F('view_count') + 10000)
For simple numerical operations or basic string manipulations this is a great way of quickly updating a lot of rows. But in our first example we want to translate our post titles to German - no databases can do that natively (not yet at least!) so you are stuck with 1 query per model.
.bulk_update() you can now update these rows with a greatly reduced number of queries (typically 1):
posts = list(Post.objects.all()) for post in posts: post.title = translate_to_german(post.title) # Save all objects in 1 query Post.objects.bulk_update(posts, ['title'])
Yay! Much better. With a large number of rows this is much more efficient as the database has to do a lot less work.
The implementation turned out to be pretty straightforward. We utilize the
CASE statement inside an
UPDATE to customize
the values we update. For example:
UPDATE posts SET title=(CASE WHEN id=1 THEN 'Title 1' WHEN id=2 THEN 'Title 2' END) WHERE id IN (1, 2)
The only tricky part of this process is with Postgres, which has some strict typing rules about expressions. If you have
CASE statement where all values are
NULL it is unable to determine the type of the expression and fails. So we
need to add an explicit
UPDATE posts SET title=(CASE WHEN id=1 THEN NULL WHEN id=2 THEN NULL END)::TEXT WHERE id IN (1, 2)
Django’s expressions framework actually supported all of this already so the method is just a nice wrapper around
CASE expressions and handing a few corner cases.
Postgres actually supports a nicer syntax for this
which is even more performant than
update posts as t set t.title = c.title from (values ('Title 1', 1), ('Title 2', 2) ) as c(title, id) where c.id = t.id;
However this is going to take a bit more work as we currency do not support
VALUES in this way. You can
follow the ticket for adding this support here if you’re interested.