mirror of
https://github.com/supabase/supabase.git
synced 2026-05-13 05:38:21 +08:00
## Motivation When Assistant runs a potentially destructive tool like `execute_sql`, it stops the LLM request and prompts for client-side approval and execution of the tool. After approval, a second request kicks off under a separate trace. This has made scoring and [Topics](https://www.braintrust.dev/blog/topics) classification challenging, as the generated `output` is split across stateless requests. The [span-level scoring](https://www.braintrust.dev/docs/evaluate/custom-code#score-spans) approach we've used thusfar (after the LLM call, we massage the result into an `output` payload that's stuck onto the root span) has been cumbersome and led to invalid scores / topics where only part of the assistant response is considered. It's also inefficient, as we're duplicating potentially large info (like the `search_docs` output) that already exists within the trace. An alternative to scoring spans is to [score traces](https://www.braintrust.dev/docs/evaluate/custom-code#score-traces). Braintrust [best practices](https://www.braintrust.dev/docs/evaluate/score-online#best-practices) advise: > Use span scope for evaluating individual operations or outputs. Use trace scope for evaluating multi-turn conversations, overall workflow completion, or when your scorer needs access to the full execution context. We've also received [direct guidance](https://supabase.slack.com/archives/C05QYJBLX89/p1777925770927149?thread_ts=1777905716.911979&cid=C05QYJBLX89) from their team to use this approach. ## Changes Migrates eval scorers from custom `AssistantEvalOutput` shape to trace-level scoring via `trace.getThread()` / `trace.getSpans()`, with thread parsing that scores the full latest Assistant turn and passes prior conversation separately where relevant. Moves `execute_sql` and `deploy_edge_function` from client-side execution after approval to AI SDK `needsApproval` + server-side `execute()`. SQL results returned to the model are gated by AI opt-in level, so row data is only included with `schema_and_log_and_data`; otherwise the tool returns the no-data-permissions sentinel. Adds `metadata.isFinalStep` to disambiguate multiple LLM requests within an "assistant" turn due to tool call requests/responses. For online evals, this means we should configure automations to only score traces with `metadata.isFinalStep = true` to ensure we're judging the complete generated response. Other minor kaizen changes: - Renamed `promptProviderOptions` to `systemProviderOptions` to clarify that this is associated with the "system" message and disambiguate from the root `providerOptions` - Adds `evals/trace-utils.ts` to handle Zod validation of the `unknown` span shapes from Braintrust, to more easily access typed inputs/output on tool spans. - Bumps AI SDK floor version `^6.0.116` → `^6.0.174` - Tweaked the "Conciseness" scorer to not unfairly dock points for the new `[called tool_name]` labels in serialized assistant response ## Verification In the studio staging build, I asked Assistant to create a todos table with 3 sample todos. I manually approved the `execute_sql` call and saw Assistant generate text before & after the call. In Braintrust I verified two traces were produced (see [filtered logs](https://www.braintrust.dev/app/supabase.io/p/Assistant/logs?v=Staging&tvt=trace&search={%22filter%22:[{%22text%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22label%22:%22metadata.environment%2520%253D%2520%27staging%27%22,%22originType%22:%22btql%22},{%22text%22:%22%2560Chat%2520ID%2560%2520%253D%2520%25221cb2ac45-e5e7-458c-9da4-3bf6863b8842%2522%22,%22label%22:%22Chat%2520ID%2520equals%25201cb2ac45-e5e7-458c-9da4-3bf6863b8842%22,%22originType%22:%22form%22}]})), the first with `metadata.isFinalStep = false` and the second with `metadata.isFinalStep = true`. In the Braintrust staging scorers, I ran the preview Completeness scorer on the second trace and verified it sees the complete Assistant response including markers for tool calls ([link to trace](https://www.braintrust.dev/app/supabase.io/p/Assistant%20(Staging%20Scorers)/trace?object_type=project_logs&object_id=b5214b62-ad1e-4929-9d5b-40b1daebe948&r=0ed0a4f8-8aff-4a34-bb1d-1df1d88a5070&s=ff9015f8-6bf7-4ab3-83a9-ca4e69e27e82)) <img width="1193" height="960" alt="CleanShot 2026-05-07 at 11 27 10@2x" src="https://github.com/user-attachments/assets/509d4858-c3a1-4068-986d-3aa4d5617d1a" /> I also tested the `deploy_edge_function` workflow and verified it still prompts for permission and warns on deployment of existing functions. **References** - https://www.braintrust.dev/docs/evaluate/custom-code#score-traces - https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#tool-execution-approval Supercedes https://github.com/supabase/supabase/pull/45556 and https://github.com/supabase/supabase/pull/45339 Closes AI-473 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Tool actions (SQL execution, edge-function deploy) now require explicit user Approve/Deny before proceeding. * **Improvements** * Assistant pauses for approval responses before sending follow-ups, giving clearer control over risky actions. * Deploy/replace flows show confirmation and clearer replace warnings. * Evaluation/scoring updated to use richer trace data for more accurate assistant performance signals. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
209 lines
7.7 KiB
JSON
209 lines
7.7 KiB
JSON
{
|
|
"name": "studio",
|
|
"version": "0.0.9",
|
|
"private": true,
|
|
"type": "module",
|
|
"scripts": {
|
|
"preinstall": "npx only-allow pnpm",
|
|
"predev": "node scripts/clean-turbopack-cache.mjs",
|
|
"dev": "next dev -p ${STUDIO_PORT:-8082}",
|
|
"build": "next build && if [ \"$SKIP_ASSET_UPLOAD\" != \"1\" ]; then ./../../scripts/upload-static-assets.sh; fi",
|
|
"start": "next start -p 8082",
|
|
"lint": "eslint .",
|
|
"lint:ratchet": "tsx scripts/ratchet-eslint-rules.ts --rule react-hooks/exhaustive-deps --rule import/no-anonymous-default-export --rule @tanstack/query/exhaustive-deps --rule @typescript-eslint/no-explicit-any --rule no-restricted-imports --rule no-restricted-exports --rule react/no-unstable-nested-components",
|
|
"lint:ratchet:type-checks": "tsx scripts/ratchet-eslint-rules.ts --rule studio/require-safe-sql-fragment --eslint-args \"--config eslint.type-checks.config.cjs .\"",
|
|
"clean": "rimraf node_modules tsconfig.tsbuildinfo .next .turbo",
|
|
"test": "vitest --run --coverage",
|
|
"test:watch": "vitest watch",
|
|
"test:ui": "vitest --ui",
|
|
"test:update": "vitest --run --update",
|
|
"test:ci": "vitest --run --coverage",
|
|
"test:report": "open coverage/lcov-report/index.html",
|
|
"deploy:staging": "VERCEL_ORG_ID=team_E6KJ1W561hMTjon1QSwOh0WO VERCEL_PROJECT_ID=QmcmhbiAtCMFTAHCuGgQscNbke4TzgWULECctNcKmxWCoT vercel --prod -A .vercel/staging.json",
|
|
"pretypecheck": "next typegen",
|
|
"typecheck": "tsc --noEmit",
|
|
"build:deno-types": "tsx scripts/deno-types.ts",
|
|
"build:graphql-types": "tsx scripts/download-graphql-schema.mts && pnpm graphql-codegen --config scripts/codegen.ts",
|
|
"build:graphql-types:watch": "pnpm graphql-codegen --config scripts/codegen.ts --watch",
|
|
"evals:setup": "cp node_modules/libpg-query/wasm/libpg-query.wasm evals/libpg-query.wasm",
|
|
"evals:run": "braintrust eval --no-send-logs evals/assistant.eval.ts",
|
|
"evals:upload": "braintrust eval evals/assistant.eval.ts",
|
|
"scorers:deploy": "IS_BRAINTRUST_PUSH=true braintrust push evals/scorer-online.ts"
|
|
},
|
|
"dependencies": {
|
|
"@ai-sdk/amazon-bedrock": "^4.0.81",
|
|
"@ai-sdk/mcp": "^1.0.25",
|
|
"@ai-sdk/openai": "^3.0.41",
|
|
"@ai-sdk/provider": "^3.0.8",
|
|
"@ai-sdk/provider-utils": "^4.0.19",
|
|
"@ai-sdk/react": "^3.0.118",
|
|
"@aws-sdk/credential-providers": "^3.1041.0",
|
|
"@dagrejs/dagre": "^1.0.4",
|
|
"@dnd-kit/core": "^6.1.0",
|
|
"@dnd-kit/modifiers": "^9.0.0",
|
|
"@dnd-kit/sortable": "^8.0.0",
|
|
"@dnd-kit/utilities": "^3.2.2",
|
|
"@graphiql/react": "^0.37.3",
|
|
"@graphiql/toolkit": "^0.11.3",
|
|
"@hcaptcha/react-hcaptcha": "^1.12.0",
|
|
"@heroicons/react": "^2.1.3",
|
|
"@hookform/resolvers": "^3.1.1",
|
|
"@mjackson/multipart-parser": "^0.10.1",
|
|
"@modelcontextprotocol/sdk": "^1.29.0",
|
|
"@monaco-editor/react": "^4.6.0",
|
|
"@next/bundle-analyzer": "16.2.3",
|
|
"@number-flow/react": "^0.3.2",
|
|
"@sentry/nextjs": "catalog:",
|
|
"@std/path": "npm:@jsr/std__path@^1.0.8",
|
|
"@stripe/react-stripe-js": "6.1.0",
|
|
"@stripe/stripe-js": "9.1.0",
|
|
"@supabase/auth-js": "catalog:",
|
|
"@supabase/mcp-server-supabase": "^0.7.0",
|
|
"@supabase/pg-meta": "workspace:*",
|
|
"@supabase/realtime-js": "catalog:",
|
|
"@supabase/shared-types": "0.1.88",
|
|
"@supabase/sql-to-rest": "^0.1.6",
|
|
"@supabase/supabase-js": "catalog:",
|
|
"@tanstack/react-hotkeys": "^0.9.1",
|
|
"@tanstack/react-query": "^5.0.0",
|
|
"@tanstack/react-query-devtools": "^5.0.0",
|
|
"@tanstack/react-table": "^8.21.3",
|
|
"@tanstack/react-virtual": "^3.13.12",
|
|
"@types/d3-geo": "^3.1.0",
|
|
"@uidotdev/usehooks": "^2.4.1",
|
|
"@vercel/functions": "^2.1.0",
|
|
"@xyflow/react": "^12.10.1",
|
|
"@zip.js/zip.js": "^2.7.29",
|
|
"ai": "^6.0.174",
|
|
"ai-commands": "workspace:*",
|
|
"awesome-debounce-promise": "^2.1.0",
|
|
"common": "workspace:*",
|
|
"common-tags": "^1.8.2",
|
|
"config": "workspace:*",
|
|
"cron-parser": "^4.9.0",
|
|
"cronstrue": "^2.50.0",
|
|
"crypto-js": "^4.2.0",
|
|
"d3-geo": "^3.1.1",
|
|
"dayjs": "^1.11.10",
|
|
"dev-tools": "workspace:*",
|
|
"file-saver": "^2.0.5",
|
|
"framer-motion": "^11.18.2",
|
|
"generate-password-browser": "^1.1.0",
|
|
"graphiql": "^5.2.2",
|
|
"html-to-image": "^1.11.13",
|
|
"http-status": "^2.1.0",
|
|
"icons": "workspace:*",
|
|
"idb": "^8.0.2",
|
|
"immutability-helper": "^3.1.1",
|
|
"ip-num": "^1.5.1",
|
|
"json-logic-js": "^2.0.2",
|
|
"lodash": "catalog:",
|
|
"lucide-react": "^0.436.0",
|
|
"markdown-table": "^3.0.3",
|
|
"memoize-one": "^5.0.1",
|
|
"mime-db": "^1.53.0",
|
|
"monaco-editor": "0.52.2",
|
|
"next": "catalog:",
|
|
"next-themes": "catalog:",
|
|
"nuqs": "2.7.1",
|
|
"openai": "^4.104.0",
|
|
"openapi-fetch": "0.12.4",
|
|
"papaparse": "^5.3.1",
|
|
"path-to-regexp": "^8.0.0",
|
|
"pg-minify": "^1.6.3",
|
|
"radix-ui": "catalog:",
|
|
"randombytes": "^2.1.0",
|
|
"react": "catalog:",
|
|
"react-data-grid": "7.0.0-beta.47",
|
|
"react-day-picker": "^9.11.1",
|
|
"react-dom": "catalog:",
|
|
"react-error-boundary": "^4.0.13",
|
|
"react-grid-layout": "^1.4.2",
|
|
"react-hook-form": "^7.71.2",
|
|
"react-inlinesvg": "^4.0.4",
|
|
"react-intersection-observer": "^9.5.3",
|
|
"react-markdown": "^10.1.0",
|
|
"react-resizable": "3.0.5",
|
|
"react-simple-maps": "4.0.0-beta.6",
|
|
"react-use": "^17.5.0",
|
|
"recharts": "catalog:",
|
|
"remark-gfm": "^4.0.0",
|
|
"shared-data": "workspace:*",
|
|
"sonner": "^1.5.0",
|
|
"sql-formatter": "^15.0.0",
|
|
"sqlstring": "^2.3.2",
|
|
"streamdown": "^1.3.0",
|
|
"stripe-experiment-sync": "1.0.31",
|
|
"tus-js-client": "^4.1.0",
|
|
"ui": "workspace:*",
|
|
"ui-patterns": "workspace:*",
|
|
"use-debounce": "^7.0.1",
|
|
"use-stick-to-bottom": "^1.1.1",
|
|
"uuid": "^14.0.0",
|
|
"valtio": "catalog:",
|
|
"zod": "catalog:",
|
|
"zxcvbn": "^4.4.2"
|
|
},
|
|
"devDependencies": {
|
|
"@faker-js/faker": "^9.9.0",
|
|
"@graphql-codegen/cli": "5.0.5",
|
|
"@graphql-typed-document-node/core": "^3.2.0",
|
|
"@radix-ui/react-use-escape-keydown": "^1.1.1",
|
|
"@smithy/property-provider": "^4.0.4",
|
|
"@supabase/postgres-meta": "^0.64.4",
|
|
"@testing-library/dom": "^10.0.0",
|
|
"@testing-library/jest-dom": "^6.6.0",
|
|
"@testing-library/react": "^16.0.0",
|
|
"@testing-library/user-event": "^14.0.0",
|
|
"@types/common-tags": "^1.8.1",
|
|
"@types/crypto-js": "^4.2.2",
|
|
"@types/file-saver": "^2.0.2",
|
|
"@types/json-logic-js": "^1.2.1",
|
|
"@types/lodash": "^4.14.172",
|
|
"@types/markdown-table": "^3.0.0",
|
|
"@types/mime-db": "^1.43.5",
|
|
"@types/node": "catalog:",
|
|
"@types/papaparse": "^5.3.1",
|
|
"@types/randombytes": "^2.0.3",
|
|
"@types/react": "catalog:",
|
|
"@types/react-beautiful-dnd": "^13.1.2",
|
|
"@types/react-datepicker": "^4.3.4",
|
|
"@types/react-dom": "catalog:",
|
|
"@types/react-grid-layout": "^1.3.0",
|
|
"@types/react-simple-maps": "^3.0.1",
|
|
"@types/recharts": "^1.8.23",
|
|
"@types/sqlstring": "^2.3.0",
|
|
"@types/zxcvbn": "^4.4.1",
|
|
"@typescript-eslint/utils": "8.48.0",
|
|
"@vitejs/plugin-react": "catalog:",
|
|
"@vitest/coverage-v8": "catalog:",
|
|
"@vitest/ui": "catalog:",
|
|
"api-types": "workspace:*",
|
|
"autoevals": "^0.0.132",
|
|
"braintrust": "^3.9.0",
|
|
"common": "workspace:*",
|
|
"config": "workspace:*",
|
|
"date-fns": "^2.30.0",
|
|
"eslint-config-supabase": "workspace:*",
|
|
"eslint-plugin-barrel-files": "^2.0.7",
|
|
"eslint-plugin-jsx-a11y": "^6.10.2",
|
|
"graphql-ws": "5.14.1",
|
|
"import-in-the-middle": "^2.0.0",
|
|
"jsdom-testing-mocks": "^1.13.1",
|
|
"libpg-query": "17.6.0",
|
|
"msw": "^2.3.0",
|
|
"next-router-mock": "^0.9.13",
|
|
"node-mocks-http": "^1.17.2",
|
|
"postcss": "catalog:",
|
|
"raw-loader": "^4.0.2",
|
|
"require-in-the-middle": "^8.0.0",
|
|
"tailwindcss": "catalog:",
|
|
"tsconfig": "workspace:*",
|
|
"tsx": "catalog:",
|
|
"typescript": "catalog:",
|
|
"vite": "catalog:",
|
|
"vite-tsconfig-paths": "catalog:",
|
|
"vitest": "catalog:"
|
|
}
|
|
}
|