mirror of
https://github.com/supabase/supabase.git
synced 2026-05-06 22:18:00 +08:00
115 lines
4.1 KiB
Plaintext
115 lines
4.1 KiB
Plaintext
import Layout from '~/layouts/DefaultGuideLayout'
|
|
|
|
export const meta = {
|
|
id: 'structured-unstructured-embeddings',
|
|
title: 'Structured and Unstructured',
|
|
description:
|
|
'Supabase is flexible enough to associate structured and unstructured metadata with embeddings.',
|
|
subtitle:
|
|
'Supabase is flexible enough to associate structured and unstructured metadata with embeddings.',
|
|
sidebar_label: 'Structured and unstructured embeddings',
|
|
}
|
|
|
|
Most vector stores treat metadata associated with embeddings like NoSQL, unstructured data. Supabase is flexible enough to store unstructured and structured metadata.
|
|
|
|
## Structured
|
|
|
|
```sql
|
|
create table docs (
|
|
id uuid primary key,
|
|
embedding vector(3),
|
|
content text,
|
|
url string
|
|
);
|
|
|
|
insert into docs
|
|
(id, embedding, content, url)
|
|
values
|
|
('79409372-7556-4ccc-ab8f-5786a6cfa4f7', array[0.1, 0.2, 0.3], 'Hello world', '/hello-world');
|
|
```
|
|
|
|
Notice that we've associated two pieces of metadata, `content` and `url`, with the embedding. Those fields can be filtered, constrained, indexed, and generally operated on using the full power of SQL. Structured metadata fits naturally with a traditional Supabase application, and can be managed via database [migrations](/docs/guides/getting-started/local-development#database-migrations).
|
|
|
|
## Unstructured
|
|
|
|
```sql
|
|
create table docs (
|
|
id uuid primary key,
|
|
embedding vector(3),
|
|
meta jsonb
|
|
);
|
|
|
|
insert into docs
|
|
(id, embedding, meta)
|
|
values
|
|
(
|
|
'79409372-7556-4ccc-ab8f-5786a6cfa4f7',
|
|
array[0.1, 0.2, 0.3],
|
|
'{"content": "Hello world", "url": "/hello-world"}'
|
|
);
|
|
```
|
|
|
|
An unstructured approach does not specify the metadata fields that are expected. It stores all metadata in a flexible `json`/`jsonb` column. The tradeoff is that the querying/filtering capabilities of a schemaless data type are less flexible than when each field has a dedicated column. It also pushes the burden of metadata data integrity onto application code, which is more error prone than enforcing constraints in the database.
|
|
|
|
The unstructured approach is recommended:
|
|
|
|
- for ephemeral/interactive workloads e.g. data science or scientific research
|
|
- when metadata fields are user-defined or unknown
|
|
- during rapid prototyping
|
|
|
|
Client libraries like python's [vecs](https://github.com/supabase/vecs) use this structure. For example, running:
|
|
|
|
```py
|
|
#!/usr/bin/env python3
|
|
import vecs
|
|
|
|
docs = vx.get_or_create_collection(name="docs", dimension=1536)
|
|
|
|
docs.upsert(vectors=[
|
|
('79409372-7556-4ccc-ab8f-5786a6cfa4f7', [100, 200, 300], { url: '/hello-world' })
|
|
])
|
|
|
|
```
|
|
|
|
automatically creates the unstructured SQL table during the call to `get_or_create_collection`.
|
|
|
|
Note that when working with client libraries that emit SQL DDL, like `create table ...`, you should add that SQL to your migrations when moving to production to maintain a single source of truth for your database's schema.
|
|
|
|
## Hybrid
|
|
|
|
The structured metadata style is recommended when the fields being tracked are known in advance. If you have a combination of known and unknown metadata fields, you can accommodate the unknown fields by adding a `json`/`jsonb` column to the table. In that situation, known fields should continue to use dedicated columns for best query performance and throughput.
|
|
|
|
```sql
|
|
create table docs (
|
|
id uuid primary key,
|
|
embedding vector(3),
|
|
content text,
|
|
url string,
|
|
meta jsonb
|
|
);
|
|
|
|
insert into docs
|
|
(id, embedding, content, url, meta)
|
|
values
|
|
(
|
|
'79409372-7556-4ccc-ab8f-5786a6cfa4f7',
|
|
array[0.1, 0.2, 0.3],
|
|
'Hello world',
|
|
'/hello-world',
|
|
'{"key": "value"}'
|
|
);
|
|
```
|
|
|
|
## Choosing the right model
|
|
|
|
Both approaches create a table where you can store your embeddings and some metadata. You should choose the best approach for your use-case. In summary:
|
|
|
|
- Structured metadata is best when fields are known in advance or query patterns are predictable e.g. a production Supabase application
|
|
- Unstructured metadata is best when fields are unknown/user-defined or when working with data interactively e.g. exploratory research
|
|
|
|
Both approaches are valid, and the one you should choose depends on your use-case.
|
|
|
|
export const Page = ({ children }) => <Layout meta={meta} children={children} />
|
|
|
|
export default Page
|