chore: expand eval coverage for backend SDKs and SvelteKit#78
Merged
Conversation
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
Switch all SKILL.md WebFetch URLs from github.com/blob/ (HTML) to raw.githubusercontent.com (plain text) for cleaner parsing. Add tests/fixtures/README.md documenting fixture state conventions and per-language guidance for upcoming backend SDK eval expansion.
Add partial-install and conflicting-auth fixtures for all 8 backend SDKs (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) and expand SvelteKit from 1 to 5 test states. Each backend SDK now has 4 eval states (up from 2), matching frontend skill coverage. Includes: - 20 new fixture directories with validated, buildable projects - SKILL.md updates with partial install recovery and conflicting auth detection sections for 9 skills - Grader bonus checks for preserved routes and conflicting auth - 24 new eval scenarios registered in runner.ts
Align React, React Router, TanStack Start, and Vanilla JS skills to match the Next.js verification checklist pattern: numbered bash commands with comments, "(ALL MUST PASS)" header, and recovery guidance for critical checks.
v1.x-v2.x require illuminate/support ^5-9 via workos-php, which conflicts with Laravel 11. v5.x requires workos-php ^4.29 with no illuminate/support constraint, resolving the conflict.
Steps 3, 4, and error recovery still referenced the old @workos-inc namespace. The npm package is @workos/authkit-sveltekit.
gin v1.9.1 pulled in golang.org/x/net v0.10.0 which has a High CVE (GHSA-4374-p667-p6c8, HTTP/2 rapid reset). gin v1.10.0 brings in golang.org/x/net v0.25.0 which is patched.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
Backend SDK skills (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) had only 2 eval test states each (
exampleandexample-auth0), while frontend skills had 4-6 states covering partial installs, conflicting middleware, auth migrations, and strict TypeScript. This left edge cases untested for backend integrations — partial installs, conflicting auth systems, and framework-specific gotchas that the evals didn't catch.SvelteKit had just 1 test state (
example), far behind all other frontend frameworks.Additionally, skill content quality was inconsistent: README fetch URLs mixed HTML and plaintext formats, verification checklist structures varied across skills, and backend skills lacked the edge case coverage that makes frontend skills robust.
Changes
Eval Coverage Expansion (24 new scenarios)
Backend SDKs — 16 new fixtures (2 per SDK):
partial-installconflicting-authSvelteKit — 4 new fixtures:
example-auth0— Auth0 SPA auth with hooks and callback routepartial-install— SDK installed but hooks.server.ts exports passthrough, no callbackconflicting-auth— Full Lucia v3 auth with login, logout, dashboard, session cookiestypescript-strict— Strict TS with noImplicitAny, strictNullChecks, exactOptionalPropertyTypesSkill Content Improvements (14 skills updated)
9 backend + SvelteKit skills — added two new sections to each:
5 frontend skills — standardized verification checklist format to match Next.js pattern (numbered bash commands, "(ALL MUST PASS)" header, recovery guidance)
Grader Updates (9 graders)
Added bonus checks (non-blocking) for:
Infrastructure
github.com/blob/(HTML) toraw.githubusercontent.com(plaintext)tests/fixtures/README.mddocumenting fixture state conventions and per-language guidancerunner.tsResults
Full eval suite: 62/62 passing (98.4% first-attempt, 100% with-correction)