- Published on
Full-Text Search with Django & PostgreSQL in Six minutes
An introduction to full-text search with Django & PostgreSQL
Table of Contents
- Introduction
- Prerequisites
- The Django project we’ll be using
- Basic/traditional search
- Searching with basic search
- Full-text search
- What is full-text search?
- Introduction to full-text search with Django & PostgreSQL
- Searching a Single field with the search lookup
- Searching across multiple fields with a SearchVector
- And more…
- Conclusion
Introduction
Prerequisites
Be familiar with Django
The Django project we’ll be using
The project uses a PostgreSQL database:
# Database
# https://docs.djangoproject.com/en/3.2/ref/settings/#databases
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
"NAME": "myblog",
"USER": "myblog",
"PASSWORD": "myblog",
"HOST": "localhost",
"PORT": 5432,
}
}
And has one app called posts
with one model named Post
:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'posts', # <-- posts app
]
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=50, unique=True)
description = models.CharField(max_length=255)
content = models.TextField()
def __str__(self):
return f"Title: `{self.title}` | Description: `{self.description}`"
The database has two posts:
[
{
"title": "Use Your Domain with Gmail",
"description": "Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace)"
},
{
"title": "Set Up Your Own Heroku-like Cloud",
"description": "When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use"
}
]
Find that project on my GitHub: https://github.com/paulonteri/django-rich-text-search-blog-post-code
Basic/traditional search
When it comes to search, with Django, you'll typically start by performing search queries with contains
or icontains
for exact matches.
The above search pattern is what most developers are likely familiar with, which has been part of the SQL standard from the beginning, and is available to all SQL-based databases, as follows:
SELECT column_name FROM table_name WHERE column_name LIKE 'pattern';
That will return the rows where column_name
matches the pattern
, the closest way to translate that SQL statement to the Django ORM is as follows:
>>> from posts.models import Post
>>> Post.objects.filter(title__contains="Gmail")
<QuerySet [<Post: Title: `Use Your Domain with Gmail` | Description: `Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace).`>]>
For advanced use cases, the Q object can be used as well to add AND (&
) or OR (|
) logical operators.
Below, we used the filter method to filter against the title
or description
fields. Furthermore, we also used the icontains extension to check if the query is present in the title
or description
fields (case insensitive). A positive result will be returned if a match is found.
from django.db.models import Q
from posts.models import Post
Post.objects.filter(
Q(title__icontains=query) | Q(description__icontains=query)
)
Searching with basic search
Start the Python interactive interpreter with the command python manage.py shell
Create a basic_search()
function:
Let’s search for the term ‘heroku’:
It returns one result.
Let’s search for the term ‘paas’:
It returns one result.
Let’s search for the term ‘heroku paas’:
We quickly run into problems as we can’t find results for that search term even though we have a post with both ‘heroku’ and ‘paas’ as shown below:
Let’s search for the term ‘heroku cloud’:
We also can’t find results for that search term even though we have a post with both ‘heroku’ and ‘cloud’ in the same model field title
as shown below:
The above problems are some of the reasons why we use rich text search.
Full-text search
What is full-text search?
A full-text search is a comprehensive search method that compares every word of the search request against every word within the document or database. Web search engines and document editing software make extensive use of the full-text search technique in functions for searching a text database stored on the Web or on the local drive of a computer; it lets the user find a word or phrase anywhere within the database or document.
It searches some text inside extensive text data and returns results that contain some or all of the words from the query. In contrast, the traditional search would return exact matches.
Introduction to full-text search with Django & PostgreSQL
The module django.contrib.postgres
contains the support for Full-Text Search in Django. Add it to your [INSTALLED_APPS]
:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'django.contrib.postgres', # <-- Add this line
'posts',
]
search
lookup
Searching a Single field with the The simplest way to start using Full-Text Search in Django is by using search lookup. To search the title
column on the Post
model, append __search
to the column name when filtering the model. Let’s search for the term ‘heroku cloud’ that failed to work using the basic search:
>>> Post.objects.filter(title__icontains="heroku cloud")
<QuerySet []>
>>> Post.objects.filter(title__search="heroku cloud")
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>
Under the hood, Django converts the title
field to a tsvector
and converts the search term to a tsquery
. (Note that you don’t have to know what these are. If you want to learn about them, check out the PostgreSQL docs) You can check the underlying query to verify this:
>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\')) @@ plainto_tsquery(\'heroku cloud\') LIMIT 21'
SearchVector
Searching across multiple fields with a Searching against a single field is great but rather limiting. We can use a SearchVector to query against multiple fields.
Let’s search for the term ‘cloud projects’ where ‘cloud’ is in the title
field and ‘projects’ is in the description
field:
>>> from django.contrib.postgres.search import SearchVector
>>> from django.db.models import Q
>>> from posts.models import Post
>>> query = "cloud projects"
>>> Post.objects.filter(Q(title__icontains=query) | Q(description__icontains=query))
<QuerySet []>
>>> Post.objects.annotate(search=SearchVector("title", "description")).filter(search=query)
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>
We can see how rich-text search can come in handy.
The underlying query Django’s ORM generated uses to_tsvector
to query both the title
and description
fields in the database (again, you don’t have to know what these are):
>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content", to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) AS "search" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) @@ plainto_tsquery(\'cloud projects\') LIMIT 21'
And more…
We can decide to order results by relevance using SearchRank
, highlighted search results with SearchHeadline
, changing the search configuration, weighting queries, etc all without wiring SQL. Learn more in the docs: https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/search/
Conclusion
In this quick guide, we saw the basics of adding full-text search to a Django application.
See the complete code from the GitHub repo.
Sources and more reading:
- https://docs.djangoproject.com/en/4.1/topics/db/search/
- https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/search/ ← was very useful.
- https://www.postgresql.org/docs/14/textsearch.html
- https://www.joshgraham.com/full-text-search-explained/
- https://pganalyze.com/blog/full-text-search-django-postgres
- https://stackoverflow.com/questions/50893437/django-full-text-search-with-multiple-words
- https://www.freecodecamp.org/news/elasticsearch-with-django-the-easy-way-909375bc16cb/
- https://hevodata.com/blog/postgresql-full-text-search-setup/