supabase

lywsvip/supabase

Fork 0

mirror of https://github.com/supabase/supabase.git synced 2026-06-14 14:08:31 +08:00

Commit Graph

Author	SHA1	Message	Date
Pamela Chia	47c084e51d	refactor(studio): migrate telemetry to useTrack (#46140 ) ## Summary I migrated every `useSendEventMutation` call site in `apps/studio` to `useTrack`, deleted the legacy hook, and added a lint guardrail so it can't return. `useTrack` is the type-safe replacement: it auto-injects `groups: { project, organization }` from the selected project/org and types `action` + `properties` against `TelemetryEvent`. Existing call sites built groups manually and were not type-checked at the action level. The migration covers 81 files (60 trivial swaps, 9 org-only, 3 pre-auth, 5 bespoke, 4 test mocks). ## Changes - Migrated trivial call sites across `pages/project/[ref]`, `components/interfaces/` (Reports, Storage, Realtime/Inspector, SQLEditor, Functions, EdgeFunctions, Integrations, ProjectAPIDocs, Branching/BranchManagement, TableGridEditor, Connect, Docs, Auth, Support, Home, ProjectHome, App), `components/layouts/`, and `components/ui/`. - Migrated org-only sites (`Organization/Documents/`, `Organization/BillingSettings/Subscription/`, `Organization/SecuritySettings.tsx`, `Account/Preferences/DashboardSettingsToggles.tsx`) by dropping the manual `groups: { organization: ... }` and letting `useTrack` auto-inject. Verified `useSelectedProjectQuery` is disabled on org routes (gates on URL `[ref]`). - Migrated pre-auth sites (`SignInForm.tsx`, `sign-in-mfa.tsx`, `profile.tsx`) where neither project nor org is resolved. - Bespoke handling: - `execute-sql-mutation.ts` and `table-row-create-mutation.ts`: pass `{ project: projectRef }` via `groupOverrides` since the mutation can target a non-selected project ref. - `useStudioCommandMenuTelemetry.ts`: kept a direct `sendTelemetryEvent` call because studio groups must override pre-built event groups (opposite of `useTrack`'s override direction). - `AIAssistantOption.tsx`: passes sentinel-aware `groupOverrides` so `NO_PROJECT_MARKER`/`NO_ORG_MARKER` continue to suppress group emission. - `SidePanelEditor.utils.tsx`: utility functions `createTable` and `updateTable` now take a `track: Track` parameter (threaded from `SidePanelEditor.tsx`); dropped the `organizationSlug` arg since groups are no longer assembled manually. - Branch-event attribution: preserved `parentProjectRef` overrides on `branch_updated`, `branch_merge_completed`, `branch_merge_failed`, `branch_merge_submitted`, `branch_delete_button_clicked`, `branch_review_with_assistant_clicked`, and `branch__merge_request_button_clicked`. Original code grouped these under the parent (production) project, not the branch ref; auto-injection would have shifted them onto the branch. - Switched 4 test mocks from `@/data/telemetry/send-event-mutation` to `@/lib/telemetry/track`. Removed obsolete tests around manual groups and `try/catch` on telemetry rejection. - Deleted `apps/studio/data/telemetry/send-event-mutation.ts`. The deleted module is its own guardrail: any reintroduction of the import fails at TypeScript module resolution before lint runs. ## Testing Tested on preview deploy: - [x] SQL editor `CREATE TABLE` fires `table_created` with method `sql_editor` and `groups.project` set to the mutation's `projectRef`. - [x] Table editor creates a table from the side panel; `table_created` fires from `SidePanelEditor.utils` via threaded `track`. - [x] Help button (`/project/[ref]/...`) fires `help_button_clicked` with auto-injected project + org groups. - [x] Sign-in form fires `sign_in` with empty groups (pre-auth, expected). - [x] Org documents page (`/org/[slug]/documents`) fires `document_view_button_clicked` with org group only, no stale project ref. - [x] Command menu (`Cmd+K`) inside a project still fires `command_menu_opened` with studio's project/org overriding any event-supplied groups. - [x] Support form "Ask the Assistant" without selected org fires `ai_assistant_in_support_form_clicked` with no project/org groups (sentinels suppress). - [x] On a branch, "Update branch" / "Merge branch" / "Close merge request" events fire with `groups.project` set to the parent project ref, not the branch ref. Local checks: - [x] 22/22 tests pass across the 4 updated test files (`SidePanelEditor.utils.createTable`, `EdgeFunctionRenderer`, `LayoutSidebar`, `PlanUpdateSidePanel`). - [x] `rg useSendEventMutation apps/studio` returns 0 hits. ## Linear - fixes GROWTH-860 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Chores * Standardized telemetry across the Studio to a unified tracking system; events now send simplified payloads with less contextual/grouping data. * No user-facing flows changed; UI behavior, permissions, and interactions remain the same. * Tests * Updated telemetry mocks and tests to align with the new tracking approach. <!-- review_stack_entry_start --> [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/supabase/supabase/pull/46140?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-05-27 15:19:54 +08:00
Matt Rossman	d143571586	feat(assistant): trace-level scorers + server-side tool execution with needsApproval (#45654 ) ## Motivation When Assistant runs a potentially destructive tool like `execute_sql`, it stops the LLM request and prompts for client-side approval and execution of the tool. After approval, a second request kicks off under a separate trace. This has made scoring and [Topics](https://www.braintrust.dev/blog/topics) classification challenging, as the generated `output` is split across stateless requests. The [span-level scoring](https://www.braintrust.dev/docs/evaluate/custom-code#score-spans) approach we've used thusfar (after the LLM call, we massage the result into an `output` payload that's stuck onto the root span) has been cumbersome and led to invalid scores / topics where only part of the assistant response is considered. It's also inefficient, as we're duplicating potentially large info (like the `search_docs` output) that already exists within the trace. An alternative to scoring spans is to [score traces](https://www.braintrust.dev/docs/evaluate/custom-code#score-traces). Braintrust [best practices](https://www.braintrust.dev/docs/evaluate/score-online#best-practices) advise: > Use span scope for evaluating individual operations or outputs. Use trace scope for evaluating multi-turn conversations, overall workflow completion, or when your scorer needs access to the full execution context. We've also received [direct guidance](https://supabase.slack.com/archives/C05QYJBLX89/p1777925770927149?thread_ts=1777905716.911979&cid=C05QYJBLX89) from their team to use this approach. ## Changes Migrates eval scorers from custom `AssistantEvalOutput` shape to trace-level scoring via `trace.getThread()` / `trace.getSpans()`, with thread parsing that scores the full latest Assistant turn and passes prior conversation separately where relevant. Moves `execute_sql` and `deploy_edge_function` from client-side execution after approval to AI SDK `needsApproval` + server-side `execute()`. SQL results returned to the model are gated by AI opt-in level, so row data is only included with `schema_and_log_and_data`; otherwise the tool returns the no-data-permissions sentinel. Adds `metadata.isFinalStep` to disambiguate multiple LLM requests within an "assistant" turn due to tool call requests/responses. For online evals, this means we should configure automations to only score traces with `metadata.isFinalStep = true` to ensure we're judging the complete generated response. Other minor kaizen changes: - Renamed `promptProviderOptions` to `systemProviderOptions` to clarify that this is associated with the "system" message and disambiguate from the root `providerOptions` - Adds `evals/trace-utils.ts` to handle Zod validation of the `unknown` span shapes from Braintrust, to more easily access typed inputs/output on tool spans. - Bumps AI SDK floor version `^6.0.116` → `^6.0.174` - Tweaked the "Conciseness" scorer to not unfairly dock points for the new `[called tool_name]` labels in serialized assistant response ## Verification In the studio staging build, I asked Assistant to create a todos table with 3 sample todos. I manually approved the `execute_sql` call and saw Assistant generate text before & after the call. In Braintrust I verified two traces were produced (see [filtered logs](https://www.braintrust.dev/app/supabase.io/p/Assistant/logs?v=Staging&tvt=trace&search={%22filter%22:[{%22text%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22label%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22originType%22:%22btql%22},{%22text%22:%22%2560Chat%2520ID%2560%2520%253D%2520%25221cb2ac45-e5e7-458c-9da4-3bf6863b8842%2522%22,%22label%22:%22Chat%2520ID%2520equals%25201cb2ac45-e5e7-458c-9da4-3bf6863b8842%22,%22originType%22:%22form%22}]})), the first with `metadata.isFinalStep = false` and the second with `metadata.isFinalStep = true`. In the Braintrust staging scorers, I ran the preview Completeness scorer on the second trace and verified it sees the complete Assistant response including markers for tool calls ([link to trace](https://www.braintrust.dev/app/supabase.io/p/Assistant%20(Staging%20Scorers)/trace?object_type=project_logs&object_id=b5214b62-ad1e-4929-9d5b-40b1daebe948&r=0ed0a4f8-8aff-4a34-bb1d-1df1d88a5070&s=ff9015f8-6bf7-4ab3-83a9-ca4e69e27e82)) <img width="1193" height="960" alt="CleanShot 2026-05-07 at 11 27 10@2x" src="https://github.com/user-attachments/assets/509d4858-c3a1-4068-986d-3aa4d5617d1a" /> I also tested the `deploy_edge_function` workflow and verified it still prompts for permission and warns on deployment of existing functions. References - https://www.braintrust.dev/docs/evaluate/custom-code#score-traces - https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#tool-execution-approval Supercedes https://github.com/supabase/supabase/pull/45556 and https://github.com/supabase/supabase/pull/45339 Closes AI-473 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Tool actions (SQL execution, edge-function deploy) now require explicit user Approve/Deny before proceeding. * Improvements * Assistant pauses for approval responses before sending follow-ups, giving clearer control over risky actions. * Deploy/replace flows show confirmation and clearer replace warnings. * Evaluation/scoring updated to use richer trace data for more accurate assistant performance signals. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-05-12 15:24:21 -04:00

Author

SHA1

Message

Date

Pamela Chia

47c084e51d

refactor(studio): migrate telemetry to useTrack (#46140 )

## Summary

I migrated every `useSendEventMutation` call site in `apps/studio` to
`useTrack`, deleted the legacy hook, and added a lint guardrail so it
can't return. `useTrack` is the type-safe replacement: it auto-injects
`groups: { project, organization }` from the selected project/org and
types `action` + `properties` against `TelemetryEvent`. Existing call
sites built groups manually and were not type-checked at the action
level. The migration covers 81 files (60 trivial swaps, 9 org-only, 3
pre-auth, 5 bespoke, 4 test mocks).

## Changes

- Migrated trivial call sites across `pages/project/[ref]`,
`components/interfaces/*` (Reports, Storage, Realtime/Inspector,
SQLEditor, Functions, EdgeFunctions, Integrations, ProjectAPIDocs,
Branching/BranchManagement, TableGridEditor, Connect, Docs, Auth,
Support, Home, ProjectHome, App), `components/layouts/*`, and
`components/ui/*`.
- Migrated org-only sites (`Organization/Documents/*`,
`Organization/BillingSettings/Subscription/*`,
`Organization/SecuritySettings.tsx`,
`Account/Preferences/DashboardSettingsToggles.tsx`) by dropping the
manual `groups: { organization: ... }` and letting `useTrack`
auto-inject. Verified `useSelectedProjectQuery` is disabled on org
routes (gates on URL `[ref]`).
- Migrated pre-auth sites (`SignInForm.tsx`, `sign-in-mfa.tsx`,
`profile.tsx`) where neither project nor org is resolved.
- Bespoke handling:
- `execute-sql-mutation.ts` and `table-row-create-mutation.ts`: pass `{
project: projectRef }` via `groupOverrides` since the mutation can
target a non-selected project ref.
- `useStudioCommandMenuTelemetry.ts`: kept a direct `sendTelemetryEvent`
call because studio groups must override pre-built event groups
(opposite of `useTrack`'s override direction).
- `AIAssistantOption.tsx`: passes sentinel-aware `groupOverrides` so
`NO_PROJECT_MARKER`/`NO_ORG_MARKER` continue to suppress group emission.
- `SidePanelEditor.utils.tsx`: utility functions `createTable` and
`updateTable` now take a `track: Track` parameter (threaded from
`SidePanelEditor.tsx`); dropped the `organizationSlug` arg since groups
are no longer assembled manually.
- Branch-event attribution: preserved `parentProjectRef` overrides on
`branch_updated`, `branch_merge_completed`, `branch_merge_failed`,
`branch_merge_submitted`, `branch_delete_button_clicked`,
`branch_review_with_assistant_clicked`, and
`branch_*_merge_request_button_clicked`. Original code grouped these
under the parent (production) project, not the branch ref;
auto-injection would have shifted them onto the branch.
- Switched 4 test mocks from `@/data/telemetry/send-event-mutation` to
`@/lib/telemetry/track`. Removed obsolete tests around manual groups and
`try/catch` on telemetry rejection.
- Deleted `apps/studio/data/telemetry/send-event-mutation.ts`. The
deleted module is its own guardrail: any reintroduction of the import
fails at TypeScript module resolution before lint runs.

## Testing

Tested on preview deploy:

- [x] SQL editor `CREATE TABLE` fires `table_created` with method
`sql_editor` and `groups.project` set to the mutation's `projectRef`.
- [x] Table editor creates a table from the side panel; `table_created`
fires from `SidePanelEditor.utils` via threaded `track`.
- [x] Help button (`/project/[ref]/...`) fires `help_button_clicked`
with auto-injected project + org groups.
- [x] Sign-in form fires `sign_in` with empty groups (pre-auth,
expected).
- [x] Org documents page (`/org/[slug]/documents`) fires
`document_view_button_clicked` with org group only, no stale project
ref.
- [x] Command menu (`Cmd+K`) inside a project still fires
`command_menu_opened` with studio's project/org overriding any
event-supplied groups.
- [x] Support form "Ask the Assistant" without selected org fires
`ai_assistant_in_support_form_clicked` with no project/org groups
(sentinels suppress).
- [x] On a branch, "Update branch" / "Merge branch" / "Close merge
request" events fire with `groups.project` set to the parent project
ref, not the branch ref.

Local checks:
- [x] 22/22 tests pass across the 4 updated test files
(`SidePanelEditor.utils.createTable`, `EdgeFunctionRenderer`,
`LayoutSidebar`, `PlanUpdateSidePanel`).
- [x] `rg useSendEventMutation apps/studio` returns 0 hits.

## Linear
- fixes GROWTH-860


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Standardized telemetry across the Studio to a unified tracking system;
events now send simplified payloads with less contextual/grouping data.
* No user-facing flows changed; UI behavior, permissions, and
interactions remain the same.
* **Tests**
* Updated telemetry mocks and tests to align with the new tracking
approach.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/supabase/supabase/pull/46140?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

2026-05-27 15:19:54 +08:00

Matt Rossman

d143571586

feat(assistant): trace-level scorers + server-side tool execution with needsApproval (#45654 )

## Motivation

When Assistant runs a potentially destructive tool like `execute_sql`,
it stops the LLM request and prompts for client-side approval and
execution of the tool. After approval, a second request kicks off under
a separate trace. This has made scoring and
[Topics](https://www.braintrust.dev/blog/topics) classification
challenging, as the generated `output` is split across stateless
requests. The [span-level
scoring](https://www.braintrust.dev/docs/evaluate/custom-code#score-spans)
approach we've used thusfar (after the LLM call, we massage the result
into an `output` payload that's stuck onto the root span) has been
cumbersome and led to invalid scores / topics where only part of the
assistant response is considered. It's also inefficient, as we're
duplicating potentially large info (like the `search_docs` output) that
already exists within the trace.

An alternative to scoring spans is to [score
traces](https://www.braintrust.dev/docs/evaluate/custom-code#score-traces).
Braintrust [best
practices](https://www.braintrust.dev/docs/evaluate/score-online#best-practices)
advise:

> Use span scope for evaluating individual operations or outputs. Use
trace scope for evaluating multi-turn conversations, overall workflow
completion, or when your scorer needs access to the full execution
context.

We've also received [direct
guidance](https://supabase.slack.com/archives/C05QYJBLX89/p1777925770927149?thread_ts=1777905716.911979&cid=C05QYJBLX89)
from their team to use this approach.

## Changes

Migrates eval scorers from custom `AssistantEvalOutput` shape to
trace-level scoring via `trace.getThread()` / `trace.getSpans()`, with
thread parsing that scores the full latest Assistant turn and passes
prior conversation separately where relevant.

Moves `execute_sql` and `deploy_edge_function` from client-side
execution after approval to AI SDK `needsApproval` + server-side
`execute()`. SQL results returned to the model are gated by AI opt-in
level, so row data is only included with `schema_and_log_and_data`;
otherwise the tool returns the no-data-permissions sentinel.

Adds `metadata.isFinalStep` to disambiguate multiple LLM requests within
an "assistant" turn due to tool call requests/responses. For online
evals, this means we should configure automations to only score traces
with `metadata.isFinalStep = true` to ensure we're judging the complete
generated response.

Other minor kaizen changes:
- Renamed `promptProviderOptions` to `systemProviderOptions` to clarify
that this is associated with the "system" message and disambiguate from
the root `providerOptions`
- Adds `evals/trace-utils.ts` to handle Zod validation of the `unknown`
span shapes from Braintrust, to more easily access typed inputs/output
on tool spans.
- Bumps AI SDK floor version `^6.0.116` → `^6.0.174`
- Tweaked the "Conciseness" scorer to not unfairly dock points for the
new `[called tool_name]` labels in serialized assistant response

## Verification

In the studio staging build, I asked Assistant to create a todos table
with 3 sample todos. I manually approved the `execute_sql` call and saw
Assistant generate text before & after the call.

In Braintrust I verified two traces were produced (see [filtered
logs](https://www.braintrust.dev/app/supabase.io/p/Assistant/logs?v=Staging&tvt=trace&search={%22filter%22:[{%22text%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22label%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22originType%22:%22btql%22},{%22text%22:%22%2560Chat%2520ID%2560%2520%253D%2520%25221cb2ac45-e5e7-458c-9da4-3bf6863b8842%2522%22,%22label%22:%22Chat%2520ID%2520equals%25201cb2ac45-e5e7-458c-9da4-3bf6863b8842%22,%22originType%22:%22form%22}]})),
the first with `metadata.isFinalStep = false` and the second with
`metadata.isFinalStep = true`.

In the Braintrust staging scorers, I ran the preview Completeness scorer
on the second trace and verified it sees the complete Assistant response
including markers for tool calls ([link to
trace](https://www.braintrust.dev/app/supabase.io/p/Assistant%20(Staging%20Scorers)/trace?object_type=project_logs&object_id=b5214b62-ad1e-4929-9d5b-40b1daebe948&r=0ed0a4f8-8aff-4a34-bb1d-1df1d88a5070&s=ff9015f8-6bf7-4ab3-83a9-ca4e69e27e82))

<img width="1193" height="960" alt="CleanShot 2026-05-07 at 11 27 10@2x"
src="https://github.com/user-attachments/assets/509d4858-c3a1-4068-986d-3aa4d5617d1a"
/>

I also tested the `deploy_edge_function` workflow and verified it still
prompts for permission and warns on deployment of existing functions.

**References**
- https://www.braintrust.dev/docs/evaluate/custom-code#score-traces
-
https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#tool-execution-approval

Supercedes https://github.com/supabase/supabase/pull/45556 and
https://github.com/supabase/supabase/pull/45339

Closes AI-473

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Tool actions (SQL execution, edge-function deploy) now require
explicit user Approve/Deny before proceeding.

* **Improvements**
* Assistant pauses for approval responses before sending follow-ups,
giving clearer control over risky actions.
  * Deploy/replace flows show confirmation and clearer replace warnings.
* Evaluation/scoring updated to use richer trace data for more accurate
assistant performance signals.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

2026-05-12 15:24:21 -04:00

2 Commits