supabase/apps/docs/content/guides/database/replication/bigquery.mdx

---
id: 'bigquery-destination'
title: 'BigQuery destination'
description: 'Configure BigQuery as an external replication (ETL) destination.'
subtitle: 'Replicate Supabase Postgres tables to BigQuery.'
sidebar_label: 'BigQuery'
---

<Admonition type="caution" title="Private Alpha">

External replication (ETL) is currently in private alpha. Managed pipelines run through Supabase ETL. Access is limited and features may change.

</Admonition>

[BigQuery](https://cloud.google.com/bigquery) is Google's fully managed data warehouse. You can replicate your database tables to BigQuery for analytics and reporting.

## Prepare GCP resources

Before configuring BigQuery as a destination, set up the following in Google Cloud Platform:

1. **Google Cloud Platform (GCP) account**: [Sign up for GCP](https://cloud.google.com/gcp) if you don't have one

2. **BigQuery dataset**: Create a [BigQuery dataset](https://cloud.google.com/bigquery/docs/datasets-intro) in your GCP project
   - Open the BigQuery console in GCP
   - Select your project
   - Click **Create Dataset**
   - Provide a dataset ID, for example `supabase_replication`

3. **GCP service account key**: Create a [service account](https://cloud.google.com/iam/docs/keys-create-delete) with appropriate permissions
   - Go to **IAM & Admin > Service Accounts**
   - Click **Create Service Account**
   - Grant the "BigQuery Data Editor" and "BigQuery Job User" roles
   - Create and download the JSON key file

Required permissions:

- `bigquery.datasets.get`
- `bigquery.jobs.create`
- `bigquery.tables.create`
- `bigquery.tables.delete`
- `bigquery.tables.get`
- `bigquery.tables.getData`
- `bigquery.tables.list`
- `bigquery.tables.update`
- `bigquery.tables.updateData`
- `bigquery.routines.get`
- `bigquery.routines.list`

## Configure BigQuery as a destination

1. Navigate to the [**Database > Replication**](/dashboard/project/_/database/replication) section of the Dashboard
2. Click **Add destination**

3. Configure the general settings:
   - **Destination name**: A name to identify this destination, for example "BigQuery Warehouse"
   - **Publication**: The publication to replicate data from
   - **Destination type**: Select **BigQuery**

4. Configure BigQuery-specific settings:
   - **Project ID**: Your BigQuery project identifier, found in the GCP Console
   - **Dataset ID**: The name of your BigQuery dataset, without the project ID

     <Admonition type="note">

     In the GCP Console, the dataset is shown as `project-id.dataset-id`. Enter only the part after the dot. For example, if you see `my-project.my_dataset`, enter `my_dataset`.

     </Admonition>

   - **Service Account Key**: Your GCP service account key in JSON format

5. Optionally expand **Advanced settings** for BigQuery-specific performance tuning:

   | Setting                  | Default            | Description                                                                                                                                                                |
   | ------------------------ | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
   | **Connection pool size** | `4` connections    | Size of the BigQuery Storage Write API connection pool. More connections allow more parallel writes, but consume more resources.                                           |
   | **Maximum staleness**    | No staleness limit | Maximum allowed age, in minutes, for BigQuery cached metadata before reading base tables. Lower values improve freshness. Higher values can reduce query cost and latency. |

6. Click **Create and start** to begin replication

Your replication pipeline now starts copying data from your database to BigQuery.

## How it works

Once configured, replication to BigQuery:

1. Captures changes from your Postgres database (INSERT, UPDATE, DELETE, TRUNCATE operations)
2. Optimizes delivery automatically
3. Creates BigQuery tables automatically to match your Postgres schema
4. Streams data to BigQuery

## Source table requirements

BigQuery replication requires each source table to have a primary key, and the publication must include the primary-key columns. Supabase ETL declares those columns as the BigQuery destination primary key so BigQuery change data capture (CDC) can apply `UPSERT` and `DELETE` rows.

BigQuery primary keys are `NOT ENFORCED`, and BigQuery change data capture (CDC) supports composite primary keys with up to 16 columns. Your source primary key must stay unique and non-null because BigQuery uses it to match CDC rows.

Source tables must also use a BigQuery-compatible Postgres `REPLICA IDENTITY` setting. Most tables can keep the Postgres default, as long as they have a primary key and all primary-key columns are included in the publication.

| Source table setting                             | BigQuery support | Guidance                                                                                                                                                        |
| ------------------------------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `REPLICA IDENTITY DEFAULT` with a primary key    | Supported        | Recommended for most tables. BigQuery uses the replicated source primary key to apply upserts and deletes.                                                      |
| `REPLICA IDENTITY FULL`                          | Supported        | Recommended for tables with large `text`, `jsonb`, `bytea`, or other values that Postgres may store out-of-line using TOAST, especially when those rows update. |
| `REPLICA IDENTITY USING INDEX`                   | Not supported    | BigQuery change data capture (CDC) rows are keyed by the source primary key, not by an alternative unique index.                                                |
| `REPLICA IDENTITY NOTHING`                       | Not supported    | Updates and deletes do not include enough row identity for BigQuery to apply them safely.                                                                       |
| `REPLICA IDENTITY DEFAULT` without a primary key | Not supported    | BigQuery requires a source primary key.                                                                                                                         |

For a general explanation of how replica identity affects update and delete events, see [How does replica identity affect updates and deletes?](/docs/guides/database/replication/external-replication-faq#how-does-replica-identity-affect-updates-and-deletes).

For updates, Postgres does not always send a complete old row through logical replication. It can also mark unchanged toasted values as `unchanged toast` instead of resending the value. BigQuery change data capture (CDC) upserts require a complete new row because omitted columns are not preserved in the destination. The replication pipeline can reconstruct a complete update when the old row image contains the missing value, which is reliable with `REPLICA IDENTITY FULL`.

If a BigQuery pipeline fails with an error about a partial update row, set `REPLICA IDENTITY FULL` on the affected source table and restart the pipeline. Changing replica identity only affects new WAL records, so a retained update that was written before the change may still need to be skipped by recreating the pipeline or re-copying the affected table.

Check a table's current replica identity:

```sql
select
  n.nspname as schema_name,
  c.relname as table_name,
  c.relreplident as replica_identity
from
  pg_class as c
  join pg_namespace as n on n.oid = c.relnamespace
where n.nspname = 'public' and c.relname = 'your_table';
```

The `replica_identity` value is `d` for default, `f` for full, `i` for index, and `n` for nothing.

Set full replica identity when a table has toasted columns and update replication must be reliable:

```sql
alter table public.your_table replica identity full;
```

`REPLICA IDENTITY FULL` increases WAL volume because Postgres logs the full old row for updates and deletes. Use it on tables where update correctness is more important than the extra replication overhead.

## How tables are structured

Due to BigQuery limitations, replicated tables use a versioned structure:

- The table you query is a **view**, for example `users`
- The actual data is stored in versioned tables with a `_version` suffix, for example `users_version`
- When a table is truncated in your database, a new version is created and the view automatically points to the latest version

This structure handles table truncations while maintaining query compatibility.

## Schema change support

Schema change support for BigQuery is currently in beta. Supabase ETL supports a limited set of schema changes for BigQuery while the feature is developed further.

Supported schema changes:

- Adding a nullable column
- Removing a column
- Renaming a column
- Dropping a `NOT NULL` constraint
- Setting or dropping supported column default metadata

Unsupported or limited schema changes:

- Changing a column's data type
- Adding `NOT NULL` with `SET NOT NULL`
- Filling existing rows for `ADD COLUMN ... DEFAULT`
- Unsupported default expressions

BigQuery requires added columns to be nullable. When a replicated `ADD COLUMN` includes a default, external replication (ETL) can apply supported default metadata for future rows, but BigQuery does not backfill existing rows through that DDL. Existing destination rows remain `NULL` unless you run a separate backfill.

Supported defaults are best-effort translations to BigQuery SQL. Unsupported defaults are skipped with a warning instead of failing replication.

## Limitations

- **Row size**: Limited to 10 MB per row due to BigQuery Storage Write API constraints
- **Primary keys**: Source tables must have a primary key, the replicated primary key can contain at most 16 columns, and BigQuery does not enforce key uniqueness
- **Replica identity**: Source tables must use primary-key replica identity or `REPLICA IDENTITY FULL`
- **Table names**: Source table names cannot start or end with `_` when replicating to BigQuery
- **Schema changes**: Limited to the supported schema changes listed above

## Additional resources

- [BigQuery documentation](https://cloud.google.com/bigquery/docs) - Official Google BigQuery documentation
- [BigQuery change data capture](https://cloud.google.com/bigquery/docs/change-data-capture) - BigQuery change data capture (CDC) requirements and limitations