checkly · stefanjudis · Jun 14, 2026 · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026
diff --git a/skills/playwright-best-practices-for-agents/SKILL.md b/skills/playwright-best-practices-for-agents/SKILL.md
@@ -0,0 +1,72 @@
+---
+name: playwright-best-practices-for-agents
+description: Agent-first best practices for writing, structuring, debugging, and stabilizing Playwright tests in TypeScript/JavaScript, built around Playwright's agent CLI (`playwright-cli`) and no-GUI agentic debugging flows. Use when authoring or reviewing Playwright tests: choosing locators, writing web-first assertions, fixing flaky tests, handling authentication (SSO/2FA), mocking network/API requests, structuring projects and fixtures, generating test data, building forms and validation, uploading or downloading files, testing iframes, multiple tabs/popups or multi-user flows, mobile and device emulation, mocking time and dates, visual regression and screenshots, tagging and annotating tests, catching console errors, testing error/offline/loading states, configuring global setup, or running Playwright in CI.
+metadata:
+  author: checkly
+---
+
+# Playwright best practices
+
+Condensed, opinionated guidance for writing Playwright tests that are **readable, isolated, and resilient** — built for coding **agents**, around Playwright's **agent CLI** (`playwright-cli`) and its no-GUI debugging flows. Maintained by [Checkly](https://www.checklyhq.com/?utm_source=ai-skill) — the same practices apply whether you run these tests in CI or as production monitors.
+
+Load a reference file from `references/` only when the task needs it (see routing table). Each reference ends with links to the full `/learn` articles for depth.
+
+> **Scope:** all guidance assumes the **`@playwright/test`** test runner with **TypeScript** — its `test`, fixtures, projects, config, and web-first `expect`. Examples are TypeScript (`.spec.ts`); the same APIs work in JavaScript. It does not target the standalone `playwright` automation library (which has no test runner, fixtures, or auto-retrying assertions). Imports are `import { test, expect } from '@playwright/test'`.
+
+> **The agent CLI is what makes this skill shine.** Playwright's **agent CLI** — `playwright-cli`, package `@playwright/cli` — is a separate, token-efficient, **no-GUI** browser you drive command by command to discover locators and step through failing tests. It's distinct from the standard `npx playwright` CLI, and the **Agentic workflow** below leans on it throughout. → [references/debugging.md](references/debugging.md)
+
+## Core rules (always apply)
+
+1. **Locator priority:** prefer user-facing locators — `getByRole` > `getByLabel` / `getByPlaceholder` / `getByText` > `getByTestId` > CSS/XPath. CSS/XPath tie tests to implementation and break easily. → [references/locators.md](references/locators.md)
+2. **Web-first assertions:** use auto-retrying `expect(locator).toBeVisible()` / `toHaveText()` etc. Never assert on a one-shot value you pulled out manually (`innerText()` then `toBe`). → [references/assertions.md](references/assertions.md)
+3. **No hard waits:** never `waitForTimeout()`. Trust auto-waiting actions and web-first assertions; for explicit waits use `waitForURL` / `waitForLoadState` / `waitForResponse`. Avoid `networkidle`. → [references/waiting.md](references/waiting.md)
+4. **Isolated & independent:** each test sets up its own state and can run in any order, in parallel. No test depends on another. Provision state via API in setup, not through the UI. → [references/test-structure.md](references/test-structure.md), [references/flakiness.md](references/flakiness.md)
+5. **One feature per test:** if a test's assertions span more than one feature, split it. Keep tests short and focused.
+6. **Reuse auth, don't re-login:** sign in once, persist `storageState`, reuse it across tests via a setup project. → [references/auth.md](references/auth.md)
+
+## Routing table
+
+| When the task is about… | Read |
+|---|---|
+| Picking selectors, strict mode, `data-testid` | [references/locators.md](references/locators.md) |
+| Assertions, soft assertions, `expect.poll`/`toPass` | [references/assertions.md](references/assertions.md) |
+| Waiting, auto-waiting, timeouts, navigation | [references/waiting.md](references/waiting.md) |
+| Test design, fixtures, Page Object Model, steps | [references/test-structure.md](references/test-structure.md) |
+| `playwright.config.ts`, projects, baseURL, devices, setup dependencies | [references/config.md](references/config.md) |
+| Login, 2FA/TOTP, SSO, sessions, `storageState` | [references/auth.md](references/auth.md) |
+| Mocking, intercepting, `route`, HAR, API testing | [references/network.md](references/network.md) |
+| Debugging failures, `playwright-cli`, `--debug=cli`, traces, common errors | [references/debugging.md](references/debugging.md) |
+| Flaky tests, retries, parallelism, anti-patterns | [references/flakiness.md](references/flakiness.md) |
+| Running in CI, sharding, reporters, GitHub Actions | [references/ci.md](references/ci.md) |
+| Test data, factories, unique data, seeding/cleanup | [references/test-data.md](references/test-data.md) |
+| Forms, inputs, validation, error messages | [references/forms.md](references/forms.md) |
+| File upload & download | [references/files.md](references/files.md) |
+| iframes, frames, `frameLocator` | [references/iframes.md](references/iframes.md) |
+| Multiple tabs, popups, multiple users/contexts | [references/multi-context.md](references/multi-context.md) |
+| Mobile, device emulation, touch, viewport/breakpoints | [references/mobile.md](references/mobile.md) |
+| Time/date, clock mocking, countdowns, timeouts | [references/clock.md](references/clock.md) |
+| Visual regression, screenshots, `toHaveScreenshot`, aria snapshots | [references/visual.md](references/visual.md) |
+| Tags (`@smoke`), `--grep`, `skip`/`fixme`/`slow` annotations | [references/tags-annotations.md](references/tags-annotations.md) |
+| Failing tests on `console`/`pageerror` | [references/console-errors.md](references/console-errors.md) |
+| `globalSetup`/`globalTeardown`, setup projects | [references/global-setup.md](references/global-setup.md) |
+| Error, offline, network-failure, loading states | [references/error-states.md](references/error-states.md) |
+
+## Agentic workflow (no GUI)
+
+The interactive tools — `--ui`, `--debug` (Inspector), `show-trace` — are GUIs you can't drive. Author and debug through the non-interactive signals instead.
+
+> **Having `playwright-cli` available is highly encouraged** — both phases below lean on it. Confirm with `playwright-cli --version` and install it if missing (`npm install -D @playwright/cli`). Everything still works without it, but you lose the inspect/verify loop and fall back to guessing.
+
+**Author — discover, don't guess.** Read locators off the live page rather than from source: `playwright-cli open <url>` → `playwright-cli snapshot` (accessibility tree + element refs) → `playwright-cli generate-locator <ref>` hands back a user-facing locator to paste into the spec. → [references/locators.md](references/locators.md)
+
+**Run & debug:**
+
+1. **Run and read stdout:** `npx playwright test path/to/file.spec.ts`. The reporter prints the failing assertion and the **call log** — which locator/assertion timed out and what Playwright actually saw. Read it; don't guess.
+2. **Read `error-context.md`:** on an `expect` failure Playwright writes an aria-snapshot of the page *at the moment it failed* to the test's `test-results/.../error-context.md`. This is machine-readable page state — open it to see what was actually rendered. *(Playwright ≥ 1.60)*
+3. **Capture artifacts, not GUIs:** add `--trace on --screenshot only-on-failure` to drop `trace.zip` + screenshots into `test-results/` for inspection.
+4. **Step through it live with `playwright-cli`** (no GUI): run `npx playwright test path/to/file.spec.ts --debug=cli` in the background — it pauses and prints a session name. Then `playwright-cli attach <session-name>` and drive it: `playwright-cli snapshot` (page state + element refs), `playwright-cli step-over`, `playwright-cli console error`, `playwright-cli network`, `playwright-cli eval "…"`. Inspect why the locator didn't resolve or what actually rendered, then fix and re-run. *(needs the agent CLI; full detail in [references/debugging.md](references/debugging.md))*
+5. **Fix the root cause** (usually a locator, a missing web-first assertion, or a hard wait), then re-run until green. Don't paper over flakiness with retries — see [references/flakiness.md](references/flakiness.md).
+
+Full agentic-debugging detail (the `playwright-cli` discovery and `--debug=cli` stepping workflow) is in [references/debugging.md](references/debugging.md).
+
+> **Stay current.** These primitives are recent and version-gated — check `npx playwright --version` and `playwright-cli --version`, and update both packages if they're behind. Detail in [references/debugging.md](references/debugging.md).
diff --git a/skills/playwright-best-practices-for-agents/references/assertions.md b/skills/playwright-best-practices-for-agents/references/assertions.md
@@ -0,0 +1,115 @@
+# Assertions
+
+Default to auto-retrying, web-first assertions. They wait for a condition to become true (up to the timeout) instead of checking once, which removes most flakiness.
+
+## Web-first (auto-retrying) — use these
+
+`expect(locator).<matcher>()` polls until it passes or times out:
+
+```ts
+await expect(page.getByRole('alert')).toBeVisible()
+await expect(page.getByTestId('total')).toHaveText('€42.00')
+await expect(page.getByRole('button', { name: 'Pay' })).toBeEnabled()
+```
+
+Common matchers: `toBeVisible`, `toBeHidden`, `toBeAttached`, `toBeEnabled`, `toBeDisabled`, `toBeEditable`, `toBeChecked`, `toBeFocused`, `toBeInViewport`, `toHaveText`, `toContainText`, `toHaveValue`, `toHaveValues`, `toHaveCount`, `toHaveAttribute`, `toHaveClass`, `toHaveURL`, `toHaveTitle`. Accessibility-focused matchers exist too — `toHaveRole`, `toHaveAccessibleName`, `toHaveAccessibleDescription` — and `toBeOK` checks a response. All support `.not`, and negation auto-retries too.
+
+Visual/structure assertions — `toHaveScreenshot` (pixel) and `toMatchAriaSnapshot` (accessibility-tree YAML) — are also auto-retrying.
+
+This is a curated set, not the full list. For every matcher (including `toHaveCSS`, `toHaveJSProperty`, `toContainClass`, `toHaveId`, and more) see the [Playwright assertions reference](https://playwright.dev/docs/test-assertions).
+
+`toHaveText`, `toContainText`, and `toHaveCount` work against a locator that matches **many** elements — assert on the set directly instead of looping:
+
+```ts
+await expect(page.getByRole('listitem')).toHaveCount(3)
+await expect(page.getByRole('listitem')).toContainText(['Coffee', 'Tea', 'Milk'])
+```
+
+**Always `await` a web-first assertion.** It's async; a missing `await` doesn't fail loudly — the check is silently skipped and the test passes for the wrong reason.
+
+## Non-retrying — only for plain values
+
+`expect(value).toBe()/toEqual()/toBeGreaterThan()` evaluate once. Use them for deterministic, already-resolved values (numbers, parsed JSON), not for UI state.
+
+## The #1 mistake: awaiting inside expect
+
+```ts
+// BAD — reads once, no waiting
+expect(await locator.innerText()).toBeTruthy()
+
+// GOOD — web-first, auto-waits
+await expect(locator).not.toBeEmpty()
+```
+
+`await` goes *outside* `expect(locator)`, and the matcher does the waiting. Never pull a value out with `innerText()`/`textContent()` and assert on it when a web-first matcher exists.
+
+## Soft assertions
+
+`expect.soft(...)` records a failure but lets the test continue, then marks it failed at the end. Good for collecting multiple independent checks (form fields, link sweeps) in one run.
+
+```ts
+await expect.soft(page.getByTestId('cookieBanner')).toBeVisible()
+```
+
+To bail out mid-test once some have failed, check `expect(test.info().errors).toHaveLength(0)`.
+
+## Timeouts
+
+Web-first assertions retry against the **expect timeout** (default **5s**) — separate from the test timeout (default **30s**) and any action timeout. If something genuinely takes longer than 5s (a slow report, a long upload), don't add a `waitForTimeout` before it — give that one assertion a longer `timeout` instead:
+
+```ts
+await expect(page.getByText('Report ready')).toBeVisible({ timeout: 30_000 }) // per call
+```
+
+Be deliberate about which knob you turn: a per-assertion `timeout` for one genuinely slow step keeps the rest of the suite fast and signals intent at the call site; raising the project-wide default (`expect: { timeout: 10_000 }` in config) is the honest fix when the whole app is slower (a heavy staging environment), rather than peppering overrides everywhere. For a reusable variant, preconfigure `expect` once and import it:
+
+```ts
+const slowExpect = expect.configure({ timeout: 10_000 })
+const softExpect = expect.configure({ soft: true })
+```
+
+## Custom failure messages
+
+Pass a message as the second arg to `expect` (or `expect.soft`) to make failures self-explanatory in reports and logs:
+
+```ts
+await expect(page, 'dashboard should load after login').toHaveTitle(/Dashboard/)
+```
+
+## Dynamic / flaky conditions
+
+When no web-first matcher fits, retry the *value* or the *block* instead of hard-waiting.
+
+`expect.poll(fn)` re-runs `fn` until the matcher passes or the timeout hits — ideal for polling an API or any non-locator value:
+
+```ts
+await expect
+  .poll(async () => (await request.get('/api/orders/42')).status(), { timeout: 10_000 })
+  .toBe(200)
+```
+
+`expect(async () => { ... }).toPass()` retries a whole block until every assertion inside passes — use it when several conditions must converge together:
+
+```ts
+await expect(async () => {
+  const order = await getOrder(42)
+  expect(order.status).toBe('shipped')
+  expect(order.trackingId).toBeTruthy()
+}).toPass({ timeout: 10_000 })
+```
+
+Note `toPass` defaults to **no timeout** and ignores the global expect timeout — always pass an explicit `timeout` so a never-passing block can't hang the test.
+
+`expect.extend({...})` adds custom matchers for repeated domain checks (e.g. `toBeWithinRange`); merge several matcher modules with `mergeExpects()`.
+
+## Anti-patterns
+
+- `await page.waitForTimeout(3000)` before an assertion — see [waiting.md](./waiting.md).
+- Asserting five features in one test — split it; keep assertions focused.
+- `toBe()` on text where `toContainText()`/`toHaveText()` would auto-wait.
+
+## Deeper in the docs
+
+- [Assertions — types & best practices](https://www.checklyhq.com/learn/playwright/assertions/)
+- [Waits and timeouts](https://www.checklyhq.com/learn/playwright/waits-and-timeouts/)
+- [Playwright assertions reference](https://playwright.dev/docs/test-assertions)
diff --git a/skills/playwright-best-practices-for-agents/references/auth.md b/skills/playwright-best-practices-for-agents/references/auth.md
@@ -0,0 +1,98 @@
+# Authentication
+
+If possible, sign in **once**, persist the session, and reuse it across tests. Logging in through the UI in every test is slow, hammers your auth provider (rate limits, lockouts), and couples unrelated tests to the login flow.
+
+## Reuse auth via a setup project (the default)
+
+Run the login flow in a `setup` project, save the authenticated cookies and local storage to disk with `storageState`, and have every other project depend on it. Dependent tests then start already signed in.
+
+```ts playwright.config.ts
+import { defineConfig, devices } from '@playwright/test'
+
+export default defineConfig({
+  projects: [
+    { name: 'setup', testMatch: /.*\.setup\.ts/ },
+    {
+      name: 'chromium',
+      use: { ...devices['Desktop Chrome'], storageState: 'playwright/.auth/user.json' },
+      dependencies: ['setup'],   // login runs first; this project reuses its state
+    },
+  ],
+})
+```
+
+```ts auth.setup.ts
+import { test as setup, expect } from '@playwright/test'
+
+const authFile = 'playwright/.auth/user.json'
+
+setup('authenticate', async ({ page }) => {
+  await page.goto('/login')
+  await page.getByPlaceholder('Email').fill(process.env.USER_EMAIL!)
+  await page.getByPlaceholder('Password').fill(process.env.USER_PASSWORD!)
+  await page.getByRole('button', { name: 'Sign in' }).click()
+  await expect(page.getByText('Welcome back')).toBeVisible()   // confirm login worked
+  await page.context().storageState({ path: authFile })        // persist the session
+})
+```
+
+Git-ignore the state file — it holds live session cookies: add `playwright/.auth/` to `.gitignore`. See [config.md](./config.md) for the projects/`dependencies` mechanics.
+
+## Credentials and test users
+
+- **Never hardcode credentials**, not even while debugging — read them from env vars (`process.env.USER_PASSWORD`). It's too easy to commit a literal.
+- Use a **dedicated test account**, never a real user's or a customer's — you control its data and avoid bot-detection lockouts.
+
+## Logging in
+
+- **Username/password and SSO/social** (Google, GitHub, Microsoft, Okta, SAML) look the same from the test's side. Third-party providers add redirects across domains; Playwright follows them automatically. Drive the provider's screens with user-facing locators like any other form.
+- **Discovering the steps (as an agent):** drive the page yourself with `playwright-cli` — navigate to the login page, take an accessibility snapshot to read the real `getByRole`/`getByLabel` names, run the login, then transcribe the working steps into `auth.setup.ts`. `npx playwright codegen <your-site>` records the same steps but opens the **interactive Inspector GUI** you can't drive in an agent session, so it's a human-only shortcut. See [debugging.md](./debugging.md) for the `playwright-cli` setup.
+
+## Two-factor auth (TOTP)
+
+You can't read an SMS or push, but **authenticator-app (TOTP) codes are just a secret + the current time** — generate them in-process with [`otpauth`](https://www.npmjs.com/package/otpauth). Store the TOTP secret as an env var.
+
+```ts
+import * as OTPAuth from 'otpauth'
+
+const totp = new OTPAuth.TOTP({ issuer: 'GitHub', digits: 6, period: 30, secret: process.env.TOTP_SECRET! })
+
+await page.getByPlaceholder('XXXXXX').fill(totp.generate())   // current 6-digit code
+```
+
+## API login (skip the UI entirely)
+
+When you only need an authenticated *session* — not coverage of the login screen — log in over HTTP and snapshot the state. It's faster and less flaky than driving the form.
+
+```ts auth.setup.ts
+import { test as setup } from '@playwright/test'
+
+setup('authenticate via API', async ({ request }) => {
+  await request.post('/api/login', { form: { email: process.env.USER_EMAIL!, password: process.env.USER_PASSWORD! } })
+  await request.storageState({ path: 'playwright/.auth/user.json' })   // captures the auth cookies
+})
+```
+
+Test the login *page itself* through the UI; use API login as setup for everything else. See [network.md](./network.md) for the `request` context.
+
+## Multiple roles
+
+Give each role its own setup step and state file, then opt a test into one with `test.use`:
+
+```ts
+setup('auth as admin', async ({ page }) => { /* … */ await page.context().storageState({ path: 'playwright/.auth/admin.json' }) })
+
+test.describe('admin area', () => {
+  test.use({ storageState: 'playwright/.auth/admin.json' })
+  test('sees settings', async ({ page }) => { /* signed in as admin */ })
+})
+```
+
+A persisted `storageState` is the same idea Checkly uses to keep authenticated monitors logged in across scheduled runs, so a session that survives reuse here survives in production monitoring too.
+
+## Deeper in the docs
+
+- [Managing authentication in Playwright](https://www.checklyhq.com/learn/playwright/authentication/)
+- [Bypassing TOTP / 2FA login flows](https://www.checklyhq.com/learn/playwright/bypass-totp/)
+- [Automating Google login](https://www.checklyhq.com/learn/playwright/google-login-automation/)
+- [Playwright: Authentication](https://playwright.dev/docs/auth)