Skip to content

chore: expand eval coverage for backend SDKs and SvelteKit#78

Merged
nicknisi merged 8 commits intomainfrom
nicknisi/evals
Mar 5, 2026
Merged

chore: expand eval coverage for backend SDKs and SvelteKit#78
nicknisi merged 8 commits intomainfrom
nicknisi/evals

Conversation

@nicknisi
Copy link
Member

@nicknisi nicknisi commented Mar 4, 2026

Summary

  • Expand backend SDK eval coverage from 2 to 4 states each (8 SDKs)
  • Expand SvelteKit eval coverage from 1 to 5 states
  • Standardize skill content and patterns across all 17 skills

Problem

Backend SDK skills (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) had only 2 eval test states each (example and example-auth0), while frontend skills had 4-6 states covering partial installs, conflicting middleware, auth migrations, and strict TypeScript. This left edge cases untested for backend integrations — partial installs, conflicting auth systems, and framework-specific gotchas that the evals didn't catch.

SvelteKit had just 1 test state (example), far behind all other frontend frameworks.

Additionally, skill content quality was inconsistent: README fetch URLs mixed HTML and plaintext formats, verification checklist structures varied across skills, and backend skills lacked the edge case coverage that makes frontend skills robust.

Changes

Eval Coverage Expansion (24 new scenarios)

Backend SDKs — 16 new fixtures (2 per SDK):

SDK partial-install conflicting-auth
Node Express + WorkOS SDK, incomplete login route Passport.js local auth
Python Flask + WorkOS SDK, commented-out login Flask-Login auth
Ruby Sinatra + WorkOS gem, TODO login route Warden auth
Go Gin + WorkOS SDK, 501 login handler Custom JWT middleware (stdlib)
PHP WorkOS SDK in composer, empty login.php Native PHP session auth
PHP-Laravel SDK + published config, no middleware Laravel Breeze scaffolding
Kotlin SDK in Gradle, imported but no controller Spring Security + SecurityFilterChain
Elixir WorkOS in mix.exs, stub AuthController Ueberauth + ueberauth_identity

SvelteKit — 4 new fixtures:

  • example-auth0 — Auth0 SPA auth with hooks and callback route
  • partial-install — SDK installed but hooks.server.ts exports passthrough, no callback
  • conflicting-auth — Full Lucia v3 auth with login, logout, dashboard, session cookies
  • typescript-strict — Strict TS with noImplicitAny, strictNullChecks, exactOptionalPropertyTypes

Skill Content Improvements (14 skills updated)

9 backend + SvelteKit skills — added two new sections to each:

  • Partial Install Recovery — detect half-completed AuthKit, complete gaps instead of starting over
  • Existing Auth System Detection — SDK-specific patterns (Passport, Flask-Login, Devise, Spring Security, etc.) with instructions to add WorkOS alongside existing auth

5 frontend skills — standardized verification checklist format to match Next.js pattern (numbered bash commands, "(ALL MUST PASS)" header, recovery guidance)

Grader Updates (9 graders)

Added bonus checks (non-blocking) for:

  • Existing routes preserved after integration
  • Conflicting auth config preserved (SDK-specific patterns)
  • SvelteKit: sequence() composition, WORKOS_COOKIE_PASSWORD presence

Infrastructure

  • Standardized all 9 SKILL.md README fetch URLs from github.com/blob/ (HTML) to raw.githubusercontent.com (plaintext)
  • Created tests/fixtures/README.md documenting fixture state conventions and per-language guidance
  • Registered 24 new scenarios in runner.ts

Results

Full eval suite: 62/62 passing (98.4% first-attempt, 100% with-correction)

First-attempt:    80.6% (required: 80%)
With-correction:  98.4% (required: 90%)
With-retry:       98.4% (required: 95%)
Framework Base Auth0 Partial Strict Conflict MW Existing MW Conflict Auth
nextjs + + + + + + -
react + + + + - - +
react-router + + + + + - -
tanstack-start + + + + + - -
vanilla-js + + + - - - +
sveltekit + + + + - - +
node + + + - - - +
python + + + - - - +
ruby + + + - - - +
go + + + - - - +
php + + + - - - +
php-laravel + + + - - - +
kotlin + + + - - - +
elixir + + + - - - +

@socket-security
Copy link

socket-security bot commented Mar 4, 2026

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

View full report

@nicknisi nicknisi changed the title feat: expand eval coverage for backend SDKs and SvelteKit chore: expand eval coverage for backend SDKs and SvelteKit Mar 5, 2026
nicknisi added 7 commits March 5, 2026 15:07
Switch all SKILL.md WebFetch URLs from github.com/blob/ (HTML) to
raw.githubusercontent.com (plain text) for cleaner parsing. Add
tests/fixtures/README.md documenting fixture state conventions and
per-language guidance for upcoming backend SDK eval expansion.
Add partial-install and conflicting-auth fixtures for all 8 backend
SDKs (Node, Python, Ruby, Go, PHP, PHP-Laravel, Kotlin, Elixir) and
expand SvelteKit from 1 to 5 test states. Each backend SDK now has
4 eval states (up from 2), matching frontend skill coverage.

Includes:
- 20 new fixture directories with validated, buildable projects
- SKILL.md updates with partial install recovery and conflicting
  auth detection sections for 9 skills
- Grader bonus checks for preserved routes and conflicting auth
- 24 new eval scenarios registered in runner.ts
Align React, React Router, TanStack Start, and Vanilla JS skills
to match the Next.js verification checklist pattern: numbered bash
commands with comments, "(ALL MUST PASS)" header, and recovery
guidance for critical checks.
v1.x-v2.x require illuminate/support ^5-9 via workos-php, which
conflicts with Laravel 11. v5.x requires workos-php ^4.29 with no
illuminate/support constraint, resolving the conflict.
Steps 3, 4, and error recovery still referenced the old @workos-inc
namespace. The npm package is @workos/authkit-sveltekit.
gin v1.9.1 pulled in golang.org/x/net v0.10.0 which has a High CVE
(GHSA-4374-p667-p6c8, HTTP/2 rapid reset). gin v1.10.0 brings in
golang.org/x/net v0.25.0 which is patched.
@nicknisi nicknisi merged commit a1c1518 into main Mar 5, 2026
5 checks passed
@nicknisi nicknisi deleted the nicknisi/evals branch March 5, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant