Files
supabase/apps/reference/docs/guides/database/full-text-search.mdx
2022-08-13 22:24:54 +08:00

619 lines
14 KiB
Plaintext

---
id: full-text-search
title: 'Full Text Search'
description: How to use full text search in PostgreSQL.
---
import Tabs from '@theme/Tabs'
import TabItem from '@theme/TabItem'
Postgres has built-in functions to handle `Full Text Search` queries. This is like a "search engine" within Postgres.
## Preparation
For this guide we'll use the following example data:
<Tabs
defaultValue="Data"
values={[
{label: 'Data', value: 'Data'},
{label: 'SQL', value: 'SQL'},
]}>
<TabItem value="SQL">
```sql
create table books (
id serial primary key,
title text,
author text,
description text
);
insert into books (title, author, description)
values
('The Poky Little Puppy','Janette Sebring Lowrey','Puppy is slower than other, bigger animals.'),
('The Tale of Peter Rabbit','Beatrix Potter','Rabbit eats some vegetables.'),
('Tootle','Gertrude Crampton','Little toy train has big dreams.'),
('Green Eggs and Ham','Dr. Seuss','Sam has changing food preferences and eats unusually colored food.'),
('Harry Potter and the Goblet of Fire','J.K. Rowling','Fourth year of school starts, big drama ensues.');
```
</TabItem>
<TabItem value="Data">
| id | title | author | description |
| --- | ----------------------------------- | ---------------------- | ------------------------------------------------------------------ |
| 1 | The Poky Little Puppy | Janette Sebring Lowrey | Puppy is slower than other, bigger animals. |
| 2 | The Tale of Peter Rabbit | Beatrix Potter | Rabbit eats some vegetables. |
| 3 | Tootle | Gertrude Crampton | Little toy train has big dreams. |
| 4 | Green Eggs and Ham | Dr. Seuss | Sam has changing food preferences and eats unusually colored food. |
| 5 | Harry Potter and the Goblet of Fire | J.K. Rowling | Fourth year of school starts, big drama ensues. |
</TabItem>
</Tabs>
## Usage
The functions we'll cover in this guide are:
### `to_tsvector()`
Converts your data into searchable "tokens". `to_tsvector()` stands for "to text search vector". For example:
```sql
select to_tsvector('green eggs and ham')
-- Returns 'egg':2 'green':1 'ham':4
```
Collectively these tokens are called a "document" which Postgres can use for comparisons.
### `to_tsquery()`
Converts a query string into "tokens" to match. `to_tsquery()` stands for "to text search query".
This conversion step is important because we will want to "fuzzy match" on keywords.
For example if a user searches for "eggs", and a column has the value "egg", we probably still want to return a match.
### Match: `@@`
The `@@` symbol is the "match" symbol for Full Text Search. It returns any matches between a `to_tsvector` result and a `to_tsquery` result.
Take the following example:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
]}>
<TabItem value="SQL">
```sql
select *
from books
where title = 'Harry';
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.eq('title', 'Harry')
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.eq('title', 'Harry')
.execute();
```
</TabItem>
</Tabs>
The equality symbol above (`=`) is very "strict" on what it matches. In a full text search context, we might want to find all "Harry Potter" books and so we can rewrite the
example above:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
]}>
<TabItem value="SQL">
```sql
select *
from books
where to_tsvector(title) @@ to_tsquery('Harry');
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('title', `'Harry'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('title', "'Harry'")
.execute();
```
</TabItem>
</Tabs>
## Basic Full Text Queries
### Search a single column
To find all `books` where the `description` contain the word `big`:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
{label: 'Data', value: 'Data'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description)
@@ to_tsquery('big');
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('description', `'big'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('description', "'big'")
.execute();
```
</TabItem>
<TabItem value="Data">
| id | title | author | description |
| --- | ----------------------------------- | ----------------- | ----------------------------------------------- |
| 3 | Tootle | Gertrude Crampton | Little toy train has big dreams. |
| 5 | Harry Potter and the Goblet of Fire | J.K. Rowling | Fourth year of school starts, big drama ensues. |
</TabItem>
</Tabs>
### Search multiple columns
To find all `books` where `description` or `title` contain the word `little`:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'Data', value: 'Data'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description || ' ' || title) -- concat columns, but be sure to include a space to separate them!
@@ to_tsquery('little');
```
</TabItem>
<TabItem value="Data">
| id | title | author | description |
| --- | --------------------- | ---------------------- | ------------------------------------------- |
| 1 | The Poky Little Puppy | Janette Sebring Lowrey | Puppy is slower than other, bigger animals. |
| 3 | Tootle | Gertrude Crampton | Little toy train has big dreams. |
</TabItem>
</Tabs>
### Match all search words
To find all `books` where `description` contains BOTH of the words `little` and `big`, we can use the `&` symbol:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
{label: 'Data', value: 'Data'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description)
@@ to_tsquery('little & big'); -- use & for AND in the search query
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('description', `'little' & 'big'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('description', "'little' & 'big'")
.execute();
```
</TabItem>
<TabItem value="Data">
| id | title | author | description |
| --- | ------ | ----------------- | -------------------------------- |
| 3 | Tootle | Gertrude Crampton | Little toy train has big dreams. |
</TabItem>
</Tabs>
### Match any search words
To find all `books` where `description` contain ANY of the words `little` or `big`, use the `|` symbol:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
{label: 'Data', value: 'Data'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description)
@@ to_tsquery('little | big'); -- use | for OR in the search query
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('description', `'little' | 'big'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('description', "'little' | 'big'")
.execute();
```
</TabItem>
<TabItem value="Data">
| id | title | author | description |
| --- | --------------------- | ---------------------- | ------------------------------------------- |
| 1 | The Poky Little Puppy | Janette Sebring Lowrey | Puppy is slower than other, bigger animals. |
| 3 | Tootle | Gertrude Crampton | Little toy train has big dreams. |
</TabItem>
</Tabs>
Notice how searching for `big` includes results with the word `bigger` (or `biggest`, etc).
## Creating Indexes
Now that we have Full Text Search working, let's create an `index`. This will allow Postgres to "build" the documents pre-emptively so that they
don't need to be created at the time we execute the query. This will make our queries much faster.
### Searchable columns
Let's create a new column `fts` inside the `books` table to store the searchable index of the `title` and `description` columns.
We can use a special feature of Postgres called
[Generated Columns](https://www.postgresql.org/docs/current/ddl-generated-columns.html)
to ensure that the index is updated any time the values in the `title` and `description` columns change.
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'Data', value: 'Data'},
]}>
<TabItem value="SQL">
```sql
alter table
books
add column
fts tsvector generated always as (to_tsvector('english', description || ' ' || title)) stored;
create index books_fts on books using gin (fts); -- generate the index
select id, fts
from books;
```
</TabItem>
<TabItem value="Data">
| id | fts |
| --- | --------------------------------------------------------------------------------------------------------------- |
| 1 | 'anim':7 'bigger':6 'littl':10 'poki':9 'puppi':1,11 'slower':3 |
| 2 | 'eat':2 'peter':8 'rabbit':1,9 'tale':6 'veget':4 |
| 3 | 'big':5 'dream':6 'littl':1 'tootl':7 'toy':2 'train':3 |
| 4 | 'chang':3 'color':9 'eat':7 'egg':12 'food':4,10 'green':11 'ham':14 'prefer':5 'sam':1 'unus':8 |
| 5 | 'big':6 'drama':7 'ensu':8 'fire':15 'fourth':1 'goblet':13 'harri':9 'potter':10 'school':4 'start':5 'year':2 |
</TabItem>
</Tabs>
### Search using the new column
Now that we've created and populated our index, we can search it using the same techniques as before:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
{label: 'Data', value: 'Data'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
fts @@ to_tsquery('little & big');
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('fts', `'little' & 'big'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('fts', "'little' & 'big'")
.execute();
```
</TabItem>
<TabItem value="Data">
| id | title | author | description | fts |
| --- | ------ | ----------------- | -------------------------------- | ------------------------------------------------------- |
| 3 | Tootle | Gertrude Crampton | Little toy train has big dreams. | 'big':5 'dream':6 'littl':1 'tootl':7 'toy':2 'train':3 |
</TabItem>
</Tabs>
## Query Operators
Visit [PostgreSQL: Text Search Functions and Operators](https://www.postgresql.org/docs/current/functions-textsearch.html)
to learn about additional query operators you can use to do more advanced `full text queries`, such as:
### Proximity: `<->`
The proximity symbol is useful for searching for terms that are a certain "distance" apart.
For example, to find the phrase `big dreams`, where the a match for "big" is followed immediately by a match for "dreams":
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description) @@ to_tsquery('big <-> dreams');
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('description', `'big' <-> 'dreams'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('description', "'big' <-> 'dreams'")
.execute();
```
</TabItem>
</Tabs>
We can also use the `<->` to find words within a certain distance of eachother. For example to find `year` and `school` within 2 words of each other:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description) @@ to_tsquery('year <2> school');
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('description', `'year' <2> 'school'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('description', "'year' <2> 'school'")
.execute();
```
</TabItem>
</Tabs>
### Negation: `!`
The negation symbol can be used to find phrases which _don't_ contain a search term.
For example, to find records that have the word `big` but not `little`:
<Tabs
defaultValue="SQL"
values={[
{label: 'SQL', value: 'SQL'},
{label: 'JavaScript', value: 'JS'},
{label: 'Dart', value: 'DART'},
]}>
<TabItem value="SQL">
```sql
select
*
from
books
where
to_tsvector(description) @@ to_tsquery('big & !little');
```
</TabItem>
<TabItem value="JS">
```js
const { data, error } = await supabase
.from('books')
.select()
.textSearch('description', `'big' & !'little'`)
```
</TabItem>
<TabItem value="DART">
```dart
final result = await client
.from('books')
.select()
.textSearch('description', "'big' & !'little'")
.execute();
```
</TabItem>
</Tabs>
## Resources
- [PostgreSQL: Text Search Functions and Operators](https://www.postgresql.org/docs/12/functions-textsearch.html)