Picture of the author
Published on

Full-Text Search with Django & PostgreSQL in Six minutes

An introduction to full-text search with Django & PostgreSQL

Table of Contents

Introduction

Prerequisites

Be familiar with Django

The Django project we’ll be using

The project uses a PostgreSQL database:

settings.py
# Database
# https://docs.djangoproject.com/en/3.2/ref/settings/#databases

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "NAME": "myblog",
        "USER": "myblog",
        "PASSWORD": "myblog",
        "HOST": "localhost",
        "PORT":  5432,
    }
}

And has one app called posts with one model named Post:

settings.py
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'posts', # <-- posts app
]
models.py
from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=50, unique=True)
    description = models.CharField(max_length=255)
    content = models.TextField()

    def __str__(self):
        return f"Title: `{self.title}` | Description: `{self.description}`"

The database has two posts:

[
   {
      "title": "Use Your Domain with Gmail",
      "description": "Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace)"
   },
   {
      "title": "Set Up Your Own Heroku-like Cloud",
      "description": "When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use"
   }
]

Find that project on my GitHub: https://github.com/paulonteri/django-rich-text-search-blog-post-code

When it comes to search, with Django, you'll typically start by performing search queries with containsor icontainsfor exact matches.

The above search pattern is what most developers are likely familiar with, which has been part of the SQL standard from the beginning, and is available to all SQL-based databases, as follows:

SELECT column_name FROM table_name WHERE column_name LIKE 'pattern';

That will return the rows where column_name matches the pattern, the closest way to translate that SQL statement to the Django ORM is as follows:

>>> from posts.models import Post
>>> Post.objects.filter(title__contains="Gmail")

<QuerySet [<Post: Title: `Use Your Domain with Gmail` | Description: `Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace).`>]>

For advanced use cases, the Q object can be used as well to add AND (&) or OR (|) logical operators.

Below, we used the filter method to filter against the title or description fields. Furthermore, we also used the icontains extension to check if the query is present in the title or description fields (case insensitive). A positive result will be returned if a match is found.

from django.db.models import Q
from posts.models import Post

Post.objects.filter(
	Q(title__icontains=query) | Q(description__icontains=query)
)

Start the Python interactive interpreter with the command python manage.py shell

Start the Python interactive interpreter

Create a basic_search() function:

Create a basic_search() function

Let’s search for the term ‘heroku’:

search for the term heroku

It returns one result.


Let’s search for the term ‘paas’:

search_paas.png

It returns one result.


Let’s search for the term ‘heroku paas’:

Screenshot 2022-09-03 at 10.39.29.png

We quickly run into problems as we can’t find results for that search term even though we have a post with both ‘heroku’ and ‘paas’ as shown below:

all_posts.png

Let’s search for the term ‘heroku cloud’:

Screenshot 2022-09-03 at 10.56.30.png

We also can’t find results for that search term even though we have a post with both ‘heroku’ and ‘cloud’ in the same model field title as shown below:

all_posts_2.png

The above problems are some of the reasons why we use rich text search.

A full-text search is a comprehensive search method that compares every word of the search request against every word within the document or database. Web search engines and document editing software make extensive use of the full-text search technique in functions for searching a text database stored on the Web or on the local drive of a computer; it lets the user find a word or phrase anywhere within the database or document.

It searches some text inside extensive text data and returns results that contain some or all of the words from the query. In contrast, the traditional search would return exact matches.

Introduction to full-text search with Django & PostgreSQL

The module django.contrib.postgres contains the support for Full-Text Search in Django. Add it to your [INSTALLED_APPS]:

settings.py
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django.contrib.postgres', # <-- Add this line
    'posts',
]

Searching a Single field with the search lookup

The simplest way to start using Full-Text Search in Django is by using search lookup. To search the title column on the Post model, append __search to the column name when filtering the model. Let’s search for the term ‘heroku cloud’ that failed to work using the basic search:

>>> Post.objects.filter(title__icontains="heroku cloud")
<QuerySet []>

>>> Post.objects.filter(title__search="heroku cloud")
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>
search for the term ‘heroku cloud’

Under the hood, Django converts the title field to a tsvector and converts the search term to a tsquery. (Note that you don’t have to know what these are. If you want to learn about them, check out the PostgreSQL docs) You can check the underlying query to verify this:

>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\')) @@ plainto_tsquery(\'heroku cloud\') LIMIT 21'

Searching across multiple fields with a SearchVector

Searching against a single field is great but rather limiting. We can use a SearchVector to query against multiple fields.

Let’s search for the term ‘cloud projects’ where ‘cloud’ is in the title field and ‘projects’ is in the description field:

>>> from django.contrib.postgres.search import SearchVector
>>> from django.db.models import Q
>>> from posts.models import Post

>>> query = "cloud projects"

>>> Post.objects.filter(Q(title__icontains=query) | Q(description__icontains=query))
<QuerySet []>

>>> Post.objects.annotate(search=SearchVector("title", "description")).filter(search=query)
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>

rich text search cloud projects.png

We can see how rich-text search can come in handy.

The underlying query Django’s ORM generated uses to_tsvectorto query both the title  and descriptionfields in the database (again, you don’t have to know what these are):

>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content", to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) AS "search" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) @@ plainto_tsquery(\'cloud projects\') LIMIT 21'

And more…

We can decide to order results by relevance using SearchRank, highlighted search results with SearchHeadline, changing the search configuration, weighting queries, etc all without wiring SQL. Learn more in the docs: https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/search/

Conclusion

In this quick guide, we saw the basics of adding full-text search to a Django application.

See the complete code from the GitHub repo.

Sources and more reading: