mirror of
https://github.com/supabase/supabase.git
synced 2026-06-22 03:02:48 +08:00
* chore: bump `supabase` CLI * chore: stricter message types in `generate-v4.ts` * feat: tutorial eval https://www.braintrust.dev/docs/evaluation * feat: project ID for eval * refactor: `generateAssistantResponse` out of `handlePost` * refactor: generateAssistantResponse to lib/ai * feat: factuality eval with assistant response * chore: upgrade braintrust to v1.0.1 * chore: silence tsconfig warning * feat: assertion scorer * fix: aggregate tools across all steps * refactor: strict tool names, remove need for `as const` * refactor: generic tool name type in assertions * feat: transfer mocks from `feature/braintrust` * feat: LLM criteria assertion * feat: braintrust evals workflow * fix: BRAINTRUST_PROJECT_ID * feat: `sql_similar` assertion * fix: `OPENAI_API_KEY` in workflow env * feat: split AssertionScorer into separate scorers * feat: remove tutorial eval * feat: 20 minute CI timeout * feat: category in test case metadata * feat: score with gpt-5 * refactor: dataset to own file, colocate scorers * feat: "gpt-5.2-2025-12-11" for llm as a judge * feat: SQL syntax scorer with `libpg-query` * feat: `evals:setup` and `evals:run` scripts * feat: `evals:setup` in CI * feat: human readable scorer names * chore: rename to "SQL Validity" * feat: add 2 "sql_generation" test cases * feat: update requiredTools in test cases * chore: ignore Cursor MCP config * feat: "Conciseness" score * feat: "Completeness" scorer * fix: generate-v4 test mocks * feat: serialize "steps" for scorer inputs * updated node mem options for typecheck * updated runner * remove ram update as actions handle this * feat: read `BRAINTRUST_PROJECT_ID` from secrets * feat: score helpfulness, remove old scorers * feat: separate `evals:run` and `evals:upload` scripts * feat: passthrough entire classifier result * feat: use live `search_docs` impl, store docs result in metadata * feat: reduce classifier options * feat: filter workflow by `run-evals` PR label or `master` branch * chore: cleanup stubbed mock tools * fix: checkout actual branch with `ref:` * fix: capture search_docs results from all content parts * feat: simplify sql syntax score calculation * feat: use AI SDK's UI message validator * docs: justification for relative `extends` * fix: cleanup leftover validatedMessages * doc: note mock token isn't secret for snyk * fix: mock ui message to pass validation * feat: revert ignoring Cursor MCP config Using `.git/info/exclude` instead until we have an opinion on this * feat: add "tsconfig" as shared-data devDependency, revert relative path in tsconfig * refactor: tool call parsing into function * Update apps/studio/evals/assistant.eval.ts Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * refactor: organize mock schemas and tool factories --------- Co-authored-by: Ali Waseem <waseema393@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>