Files
supabase/apps/docs/pages/guides/ai/structured-unstructured.mdx

115 lines
4.1 KiB
Plaintext

import Layout from '~/layouts/DefaultGuideLayout'
export const meta = {
id: 'structured-unstructured-embeddings',
title: 'Structured and Unstructured',
description:
'Supabase is flexible enough to associate structured and unstructured metadata with embeddings.',
subtitle:
'Supabase is flexible enough to associate structured and unstructured metadata with embeddings.',
sidebar_label: 'Structured and unstructured embeddings',
}
Most vector stores treat metadata associated with embeddings like NoSQL, unstructured data. Supabase is flexible enough to store unstructured and structured metadata.
## Structured
```sql
create table docs (
id uuid primary key,
embedding vector(3),
content text,
url string
);
insert into docs
(id, embedding, content, url)
values
('79409372-7556-4ccc-ab8f-5786a6cfa4f7', array[0.1, 0.2, 0.3], 'Hello world', '/hello-world');
```
Notice that we've associated two pieces of metadata, `content` and `url`, with the embedding. Those fields can be filtered, constrained, indexed, and generally operated on using the full power of SQL. Structured metadata fits naturally with a traditional Supabase application, and can be managed via database [migrations](/docs/guides/getting-started/local-development#database-migrations).
## Unstructured
```sql
create table docs (
id uuid primary key,
embedding vector(3),
meta jsonb
);
insert into docs
(id, embedding, meta)
values
(
'79409372-7556-4ccc-ab8f-5786a6cfa4f7',
array[0.1, 0.2, 0.3],
'{"content": "Hello world", "url": "/hello-world"}'
);
```
An unstructured approach does not specify the metadata fields that are expected. It stores all metadata in a flexible `json`/`jsonb` column. The tradeoff is that the querying/filtering capabilities of a schemaless data type are less flexible than when each field has a dedicated column. It also pushes the burden of metadata data integrity onto application code, which is more error prone than enforcing constraints in the database.
The unstructured approach is recommended:
- for ephemeral/interactive workloads e.g. data science or scientific research
- when metadata fields are user-defined or unknown
- during rapid prototyping
Client libraries like python's [vecs](https://github.com/supabase/vecs) use this structure. For example, running:
```py
#!/usr/bin/env python3
import vecs
docs = vx.get_or_create_collection(name="docs", dimension=1536)
docs.upsert(vectors=[
('79409372-7556-4ccc-ab8f-5786a6cfa4f7', [100, 200, 300], { url: '/hello-world' })
])
```
automatically creates the unstructured SQL table during the call to `get_or_create_collection`.
Note that when working with client libraries that emit SQL DDL, like `create table ...`, you should add that SQL to your migrations when moving to production to maintain a single source of truth for your database's schema.
## Hybrid
The structured metadata style is recommended when the fields being tracked are known in advance. If you have a combination of known and unknown metadata fields, you can accommodate the unknown fields by adding a `json`/`jsonb` column to the table. In that situation, known fields should continue to use dedicated columns for best query performance and throughput.
```sql
create table docs (
id uuid primary key,
embedding vector(3),
content text,
url string,
meta jsonb
);
insert into docs
(id, embedding, content, url, meta)
values
(
'79409372-7556-4ccc-ab8f-5786a6cfa4f7',
array[0.1, 0.2, 0.3],
'Hello world',
'/hello-world',
'{"key": "value"}'
);
```
## Choosing the right model
Both approaches create a table where you can store your embeddings and some metadata. You should choose the best approach for your use-case. In summary:
- Structured metadata is best when fields are known in advance or query patterns are predictable e.g. a production Supabase application
- Unstructured metadata is best when fields are unknown/user-defined or when working with data interactively e.g. exploratory research
Both approaches are valid, and the one you should choose depends on your use-case.
export const Page = ({ children }) => <Layout meta={meta} children={children} />
export default Page