TIL: Django constraints
Hi there,
I have done some work with constraint recently and I wanted to write something on it just to remind myself. I hope it’s useful to you too.
Database constraints are really useful to make sure each row is valid before to save it. You have two type of constraints you can use just before to record a new line in a table in Django: CheckConstraint and UniqueConstraint.
I use PostgreSQL which has powerful tools, you can set database constraint according what is implemented by Django.
Explanations are always better with an example so here is a constraint to avoid date overlapping between the start date and the end date:
class Event(models.Model): start_at = models.DateField() end_at = models.DateField() class Meta: constraints = [ models.CheckConstraint( check=models.Q(start_at__lt=end_at), name='start_at_before_end_at' ), models.UniqueConstraint( fields=['start_date', 'end_date'], name='unique_event_dates' ), ]
The CheckConstraint ensures that the start_at is before the end_at. The UniqueConstraint ensures that there are no two events with the same start_date and end_date.
I discovered that if you want to make a UniqueConstraint in Django, it considers NULL values as distinct, which I didn’t want in my case.
Going back to my previous example, here we have 2 records:
id | start_date | end_date |
---|---|---|
1 | NULL | 2019-12-31 |
2 | NULL | 2019-12-31 |
Both have the same end_date and start_date set to NULL.
These two records will be considered unique because a NULL is not equal to another NULL in SQL, so the two records are considered distinct.
You can use WITH DISTINCT NULL VALUES with PostgreSQL 15 but in my case I was using an older version, of course, so I needed to use another option.
If you want to make sure that NULL values are not considered distinct in the UniqueConstraint, you can use the hash function with the unique=True attribute instead.
Here is an example of the hash function with the unique value based on the previous code with the Event class. A sha256_hash field which combines all the fields I want in an hash:
import hashlib import json class Event(models.Model): start_at = models.DateField() end_at = models.DateField() fancy_field = models.CharField() sha256_hash = models.BinaryField( max_length=32, unique=True, null=True ) def __save__(self, *args: Any, **kwargs: Any) -> None: super().__save__(*args, **kwargs) self.sha256_hash = self.compute_sha256_hash() self.save() def compute_sha256_hash(self) -> bytes: json_object = { field: getattr(self, field) for field in ( "start_at", "end_at", "fancy_field" ) } # Forces datetime to use timestamp due to inconsistency # in timezone format. json_object["start_at"] = self.start_at.timestamp() json_object["end_at"] = self.end_at.timestamp() json_repr = json.dumps( json_object, sort_keys=True, indent=4, cls=DjangoJSONEncoder, ) return hashlib.sha256(json_repr.encode("utf-8")).digest()
The sha256_hash is unique and shouldn’t allow null values, but with the blank=True and null=True attributes, it can have null values. It’s needed then to populate the existing values if there are some.
What if there is existing data in the table?
The sha256_hash is unique and shouldn’t allow null values, but with the blank=True and null=True attributes, it can have null values. It’s needed then to populate the existing values if there are some.
In order to do that, we change the attribute to allow null values :
class Event(models.Model): start_at = models.DateField() end_at = models.DateField() fancy_field = models.CharField() sha256_hash = models.BinaryField( max_length=32, unique=True, null=True, blank=True ) …
After that, we can populate the value with a django command in by filtering the existing Event table where the field sha256_hash is null, and call the method in order the create the sha256_hash value with the method compute_sha256_hash of the class.
That’s it for me!