Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
127fb92
docs: add Playwright best-practices skill (initial structure)
stefanjudis Jun 14, 2026
22b9f84
docs: add auth reference to Playwright best-practices skill
stefanjudis Jun 23, 2026
12b8176
docs: add network reference to Playwright best-practices skill
stefanjudis Jun 23, 2026
c19053b
docs: add debugging reference and refocus skill on playwright-cli
stefanjudis Jun 23, 2026
6720486
docs: add flakiness reference to Playwright best-practices skill
stefanjudis Jun 23, 2026
9923394
docs: add CI reference to Playwright best-practices skill
stefanjudis Jun 23, 2026
d9acd03
docs: drop performance.md from Playwright skill scope
stefanjudis Jun 23, 2026
59211cf
docs: drop scenarios.md and run final pass on Playwright skill
stefanjudis Jun 23, 2026
df56155
docs: make Playwright skill agentic-first
stefanjudis Jun 23, 2026
257ad43
docs: tighten over-written prose in Playwright skill
stefanjudis Jun 23, 2026
08604fe
docs: add test-data reference + lock breadth-expansion plan
stefanjudis Jun 23, 2026
05548b6
docs: add tags-annotations reference
stefanjudis Jun 23, 2026
6ccfd96
docs: add console-errors reference
stefanjudis Jun 23, 2026
f492421
docs: add global-setup reference
stefanjudis Jun 23, 2026
c283f52
docs: add visual regression reference
stefanjudis Jun 23, 2026
43da662
docs: add file upload & download reference
stefanjudis Jun 23, 2026
1dd7345
docs: add frames & iframes reference
stefanjudis Jun 23, 2026
2b7aa14
docs: add mobile & device emulation reference
stefanjudis Jun 23, 2026
6aa2ed2
docs: add clock & time reference
stefanjudis Jun 23, 2026
6ec5776
docs: add multiple tabs/popups/users reference; drop drag-drop
stefanjudis Jun 23, 2026
11be4b5
docs: add toMatchAriaSnapshot to visual reference
stefanjudis Jun 23, 2026
a8032e7
docs: add forms & validation reference
stefanjudis Jun 23, 2026
c2ebe70
docs: add error & edge states reference
stefanjudis Jun 23, 2026
ba68cee
docs: wire new references into SKILL.md routing + description
stefanjudis Jun 23, 2026
5bc9b37
docs: rename skill to playwright-best-practices-for-agents
stefanjudis Jun 23, 2026
8377dd7
docs: remove planning artifacts from skill PR
stefanjudis Jun 24, 2026
7ecf609
Merge branch 'main' into docs/playwright-best-practices-skill
stefanjudis Jun 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions skills/playwright-best-practices-for-agents/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
name: playwright-best-practices-for-agents
description: Agent-first best practices for writing, structuring, debugging, and stabilizing Playwright tests in TypeScript/JavaScript, built around Playwright's agent CLI (`playwright-cli`) and no-GUI agentic debugging flows. Use when authoring or reviewing Playwright tests: choosing locators, writing web-first assertions, fixing flaky tests, handling authentication (SSO/2FA), mocking network/API requests, structuring projects and fixtures, generating test data, building forms and validation, uploading or downloading files, testing iframes, multiple tabs/popups or multi-user flows, mobile and device emulation, mocking time and dates, visual regression and screenshots, tagging and annotating tests, catching console errors, testing error/offline/loading states, configuring global setup, or running Playwright in CI.
metadata:
author: checkly
---

# Playwright best practices

Condensed, opinionated guidance for writing Playwright tests that are **readable, isolated, and resilient** — built for coding **agents**, around Playwright's **agent CLI** (`playwright-cli`) and its no-GUI debugging flows. Maintained by [Checkly](https://www.checklyhq.com/?utm_source=ai-skill) — the same practices apply whether you run these tests in CI or as production monitors.

Load a reference file from `references/` only when the task needs it (see routing table). Each reference ends with links to the full `/learn` articles for depth.

> **Scope:** all guidance assumes the **`@playwright/test`** test runner with **TypeScript** — its `test`, fixtures, projects, config, and web-first `expect`. Examples are TypeScript (`.spec.ts`); the same APIs work in JavaScript. It does not target the standalone `playwright` automation library (which has no test runner, fixtures, or auto-retrying assertions). Imports are `import { test, expect } from '@playwright/test'`.

> **The agent CLI is what makes this skill shine.** Playwright's **agent CLI** — `playwright-cli`, package `@playwright/cli` — is a separate, token-efficient, **no-GUI** browser you drive command by command to discover locators and step through failing tests. It's distinct from the standard `npx playwright` CLI, and the **Agentic workflow** below leans on it throughout. → [references/debugging.md](references/debugging.md)

## Core rules (always apply)

1. **Locator priority:** prefer user-facing locators — `getByRole` > `getByLabel` / `getByPlaceholder` / `getByText` > `getByTestId` > CSS/XPath. CSS/XPath tie tests to implementation and break easily. → [references/locators.md](references/locators.md)
2. **Web-first assertions:** use auto-retrying `expect(locator).toBeVisible()` / `toHaveText()` etc. Never assert on a one-shot value you pulled out manually (`innerText()` then `toBe`). → [references/assertions.md](references/assertions.md)
3. **No hard waits:** never `waitForTimeout()`. Trust auto-waiting actions and web-first assertions; for explicit waits use `waitForURL` / `waitForLoadState` / `waitForResponse`. Avoid `networkidle`. → [references/waiting.md](references/waiting.md)
4. **Isolated & independent:** each test sets up its own state and can run in any order, in parallel. No test depends on another. Provision state via API in setup, not through the UI. → [references/test-structure.md](references/test-structure.md), [references/flakiness.md](references/flakiness.md)
5. **One feature per test:** if a test's assertions span more than one feature, split it. Keep tests short and focused.
6. **Reuse auth, don't re-login:** sign in once, persist `storageState`, reuse it across tests via a setup project. → [references/auth.md](references/auth.md)

## Routing table

| When the task is about… | Read |
|---|---|
| Picking selectors, strict mode, `data-testid` | [references/locators.md](references/locators.md) |
| Assertions, soft assertions, `expect.poll`/`toPass` | [references/assertions.md](references/assertions.md) |
| Waiting, auto-waiting, timeouts, navigation | [references/waiting.md](references/waiting.md) |
| Test design, fixtures, Page Object Model, steps | [references/test-structure.md](references/test-structure.md) |
| `playwright.config.ts`, projects, baseURL, devices, setup dependencies | [references/config.md](references/config.md) |
| Login, 2FA/TOTP, SSO, sessions, `storageState` | [references/auth.md](references/auth.md) |
| Mocking, intercepting, `route`, HAR, API testing | [references/network.md](references/network.md) |
| Debugging failures, `playwright-cli`, `--debug=cli`, traces, common errors | [references/debugging.md](references/debugging.md) |
| Flaky tests, retries, parallelism, anti-patterns | [references/flakiness.md](references/flakiness.md) |
| Running in CI, sharding, reporters, GitHub Actions | [references/ci.md](references/ci.md) |
| Test data, factories, unique data, seeding/cleanup | [references/test-data.md](references/test-data.md) |
| Forms, inputs, validation, error messages | [references/forms.md](references/forms.md) |
| File upload & download | [references/files.md](references/files.md) |
| iframes, frames, `frameLocator` | [references/iframes.md](references/iframes.md) |
| Multiple tabs, popups, multiple users/contexts | [references/multi-context.md](references/multi-context.md) |
| Mobile, device emulation, touch, viewport/breakpoints | [references/mobile.md](references/mobile.md) |
| Time/date, clock mocking, countdowns, timeouts | [references/clock.md](references/clock.md) |
| Visual regression, screenshots, `toHaveScreenshot`, aria snapshots | [references/visual.md](references/visual.md) |
| Tags (`@smoke`), `--grep`, `skip`/`fixme`/`slow` annotations | [references/tags-annotations.md](references/tags-annotations.md) |
| Failing tests on `console`/`pageerror` | [references/console-errors.md](references/console-errors.md) |
| `globalSetup`/`globalTeardown`, setup projects | [references/global-setup.md](references/global-setup.md) |
| Error, offline, network-failure, loading states | [references/error-states.md](references/error-states.md) |

## Agentic workflow (no GUI)

The interactive tools — `--ui`, `--debug` (Inspector), `show-trace` — are GUIs you can't drive. Author and debug through the non-interactive signals instead.

> **Having `playwright-cli` available is highly encouraged** — both phases below lean on it. Confirm with `playwright-cli --version` and install it if missing (`npm install -D @playwright/cli`). Everything still works without it, but you lose the inspect/verify loop and fall back to guessing.

**Author — discover, don't guess.** Read locators off the live page rather than from source: `playwright-cli open <url>` → `playwright-cli snapshot` (accessibility tree + element refs) → `playwright-cli generate-locator <ref>` hands back a user-facing locator to paste into the spec. → [references/locators.md](references/locators.md)

**Run & debug:**

1. **Run and read stdout:** `npx playwright test path/to/file.spec.ts`. The reporter prints the failing assertion and the **call log** — which locator/assertion timed out and what Playwright actually saw. Read it; don't guess.
2. **Read `error-context.md`:** on an `expect` failure Playwright writes an aria-snapshot of the page *at the moment it failed* to the test's `test-results/.../error-context.md`. This is machine-readable page state — open it to see what was actually rendered. *(Playwright ≥ 1.60)*
3. **Capture artifacts, not GUIs:** add `--trace on --screenshot only-on-failure` to drop `trace.zip` + screenshots into `test-results/` for inspection.
4. **Step through it live with `playwright-cli`** (no GUI): run `npx playwright test path/to/file.spec.ts --debug=cli` in the background — it pauses and prints a session name. Then `playwright-cli attach <session-name>` and drive it: `playwright-cli snapshot` (page state + element refs), `playwright-cli step-over`, `playwright-cli console error`, `playwright-cli network`, `playwright-cli eval "…"`. Inspect why the locator didn't resolve or what actually rendered, then fix and re-run. *(needs the agent CLI; full detail in [references/debugging.md](references/debugging.md))*
5. **Fix the root cause** (usually a locator, a missing web-first assertion, or a hard wait), then re-run until green. Don't paper over flakiness with retries — see [references/flakiness.md](references/flakiness.md).

Full agentic-debugging detail (the `playwright-cli` discovery and `--debug=cli` stepping workflow) is in [references/debugging.md](references/debugging.md).

> **Stay current.** These primitives are recent and version-gated — check `npx playwright --version` and `playwright-cli --version`, and update both packages if they're behind. Detail in [references/debugging.md](references/debugging.md).
115 changes: 115 additions & 0 deletions skills/playwright-best-practices-for-agents/references/assertions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Assertions

Default to auto-retrying, web-first assertions. They wait for a condition to become true (up to the timeout) instead of checking once, which removes most flakiness.

## Web-first (auto-retrying) — use these

`expect(locator).<matcher>()` polls until it passes or times out:

```ts
await expect(page.getByRole('alert')).toBeVisible()
await expect(page.getByTestId('total')).toHaveText('€42.00')
await expect(page.getByRole('button', { name: 'Pay' })).toBeEnabled()
```

Common matchers: `toBeVisible`, `toBeHidden`, `toBeAttached`, `toBeEnabled`, `toBeDisabled`, `toBeEditable`, `toBeChecked`, `toBeFocused`, `toBeInViewport`, `toHaveText`, `toContainText`, `toHaveValue`, `toHaveValues`, `toHaveCount`, `toHaveAttribute`, `toHaveClass`, `toHaveURL`, `toHaveTitle`. Accessibility-focused matchers exist too — `toHaveRole`, `toHaveAccessibleName`, `toHaveAccessibleDescription` — and `toBeOK` checks a response. All support `.not`, and negation auto-retries too.

Visual/structure assertions — `toHaveScreenshot` (pixel) and `toMatchAriaSnapshot` (accessibility-tree YAML) — are also auto-retrying.

This is a curated set, not the full list. For every matcher (including `toHaveCSS`, `toHaveJSProperty`, `toContainClass`, `toHaveId`, and more) see the [Playwright assertions reference](https://playwright.dev/docs/test-assertions).

`toHaveText`, `toContainText`, and `toHaveCount` work against a locator that matches **many** elements — assert on the set directly instead of looping:

```ts
await expect(page.getByRole('listitem')).toHaveCount(3)
await expect(page.getByRole('listitem')).toContainText(['Coffee', 'Tea', 'Milk'])
```

**Always `await` a web-first assertion.** It's async; a missing `await` doesn't fail loudly — the check is silently skipped and the test passes for the wrong reason.

## Non-retrying — only for plain values

`expect(value).toBe()/toEqual()/toBeGreaterThan()` evaluate once. Use them for deterministic, already-resolved values (numbers, parsed JSON), not for UI state.

## The #1 mistake: awaiting inside expect

```ts
// BAD — reads once, no waiting
expect(await locator.innerText()).toBeTruthy()

// GOOD — web-first, auto-waits
await expect(locator).not.toBeEmpty()
```

`await` goes *outside* `expect(locator)`, and the matcher does the waiting. Never pull a value out with `innerText()`/`textContent()` and assert on it when a web-first matcher exists.

## Soft assertions

`expect.soft(...)` records a failure but lets the test continue, then marks it failed at the end. Good for collecting multiple independent checks (form fields, link sweeps) in one run.

```ts
await expect.soft(page.getByTestId('cookieBanner')).toBeVisible()
```

To bail out mid-test once some have failed, check `expect(test.info().errors).toHaveLength(0)`.

## Timeouts

Web-first assertions retry against the **expect timeout** (default **5s**) — separate from the test timeout (default **30s**) and any action timeout. If something genuinely takes longer than 5s (a slow report, a long upload), don't add a `waitForTimeout` before it — give that one assertion a longer `timeout` instead:

```ts
await expect(page.getByText('Report ready')).toBeVisible({ timeout: 30_000 }) // per call
```

Be deliberate about which knob you turn: a per-assertion `timeout` for one genuinely slow step keeps the rest of the suite fast and signals intent at the call site; raising the project-wide default (`expect: { timeout: 10_000 }` in config) is the honest fix when the whole app is slower (a heavy staging environment), rather than peppering overrides everywhere. For a reusable variant, preconfigure `expect` once and import it:

```ts
const slowExpect = expect.configure({ timeout: 10_000 })
const softExpect = expect.configure({ soft: true })
```

## Custom failure messages

Pass a message as the second arg to `expect` (or `expect.soft`) to make failures self-explanatory in reports and logs:

```ts
await expect(page, 'dashboard should load after login').toHaveTitle(/Dashboard/)
```

## Dynamic / flaky conditions

When no web-first matcher fits, retry the *value* or the *block* instead of hard-waiting.

`expect.poll(fn)` re-runs `fn` until the matcher passes or the timeout hits — ideal for polling an API or any non-locator value:

```ts
await expect
.poll(async () => (await request.get('/api/orders/42')).status(), { timeout: 10_000 })
.toBe(200)
```

`expect(async () => { ... }).toPass()` retries a whole block until every assertion inside passes — use it when several conditions must converge together:

```ts
await expect(async () => {
const order = await getOrder(42)
expect(order.status).toBe('shipped')
expect(order.trackingId).toBeTruthy()
}).toPass({ timeout: 10_000 })
```

Note `toPass` defaults to **no timeout** and ignores the global expect timeout — always pass an explicit `timeout` so a never-passing block can't hang the test.

`expect.extend({...})` adds custom matchers for repeated domain checks (e.g. `toBeWithinRange`); merge several matcher modules with `mergeExpects()`.

## Anti-patterns

- `await page.waitForTimeout(3000)` before an assertion — see [waiting.md](./waiting.md).
- Asserting five features in one test — split it; keep assertions focused.
- `toBe()` on text where `toContainText()`/`toHaveText()` would auto-wait.

## Deeper in the docs

- [Assertions — types & best practices](https://www.checklyhq.com/learn/playwright/assertions/)
- [Waits and timeouts](https://www.checklyhq.com/learn/playwright/waits-and-timeouts/)
- [Playwright assertions reference](https://playwright.dev/docs/test-assertions)
98 changes: 98 additions & 0 deletions skills/playwright-best-practices-for-agents/references/auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Authentication

If possible, sign in **once**, persist the session, and reuse it across tests. Logging in through the UI in every test is slow, hammers your auth provider (rate limits, lockouts), and couples unrelated tests to the login flow.

## Reuse auth via a setup project (the default)

Run the login flow in a `setup` project, save the authenticated cookies and local storage to disk with `storageState`, and have every other project depend on it. Dependent tests then start already signed in.

```ts playwright.config.ts
import { defineConfig, devices } from '@playwright/test'

export default defineConfig({
projects: [
{ name: 'setup', testMatch: /.*\.setup\.ts/ },
{
name: 'chromium',
use: { ...devices['Desktop Chrome'], storageState: 'playwright/.auth/user.json' },
dependencies: ['setup'], // login runs first; this project reuses its state
},
],
})
```

```ts auth.setup.ts
import { test as setup, expect } from '@playwright/test'

const authFile = 'playwright/.auth/user.json'

setup('authenticate', async ({ page }) => {
await page.goto('/login')
await page.getByPlaceholder('Email').fill(process.env.USER_EMAIL!)
await page.getByPlaceholder('Password').fill(process.env.USER_PASSWORD!)
await page.getByRole('button', { name: 'Sign in' }).click()
await expect(page.getByText('Welcome back')).toBeVisible() // confirm login worked
await page.context().storageState({ path: authFile }) // persist the session
})
```

Git-ignore the state file — it holds live session cookies: add `playwright/.auth/` to `.gitignore`. See [config.md](./config.md) for the projects/`dependencies` mechanics.

## Credentials and test users

- **Never hardcode credentials**, not even while debugging — read them from env vars (`process.env.USER_PASSWORD`). It's too easy to commit a literal.
- Use a **dedicated test account**, never a real user's or a customer's — you control its data and avoid bot-detection lockouts.

## Logging in

- **Username/password and SSO/social** (Google, GitHub, Microsoft, Okta, SAML) look the same from the test's side. Third-party providers add redirects across domains; Playwright follows them automatically. Drive the provider's screens with user-facing locators like any other form.
- **Discovering the steps (as an agent):** drive the page yourself with `playwright-cli` — navigate to the login page, take an accessibility snapshot to read the real `getByRole`/`getByLabel` names, run the login, then transcribe the working steps into `auth.setup.ts`. `npx playwright codegen <your-site>` records the same steps but opens the **interactive Inspector GUI** you can't drive in an agent session, so it's a human-only shortcut. See [debugging.md](./debugging.md) for the `playwright-cli` setup.

## Two-factor auth (TOTP)

You can't read an SMS or push, but **authenticator-app (TOTP) codes are just a secret + the current time** — generate them in-process with [`otpauth`](https://www.npmjs.com/package/otpauth). Store the TOTP secret as an env var.

```ts
import * as OTPAuth from 'otpauth'

const totp = new OTPAuth.TOTP({ issuer: 'GitHub', digits: 6, period: 30, secret: process.env.TOTP_SECRET! })

await page.getByPlaceholder('XXXXXX').fill(totp.generate()) // current 6-digit code
```

## API login (skip the UI entirely)

When you only need an authenticated *session* — not coverage of the login screen — log in over HTTP and snapshot the state. It's faster and less flaky than driving the form.

```ts auth.setup.ts
import { test as setup } from '@playwright/test'

setup('authenticate via API', async ({ request }) => {
await request.post('/api/login', { form: { email: process.env.USER_EMAIL!, password: process.env.USER_PASSWORD! } })
await request.storageState({ path: 'playwright/.auth/user.json' }) // captures the auth cookies
})
```

Test the login *page itself* through the UI; use API login as setup for everything else. See [network.md](./network.md) for the `request` context.

## Multiple roles

Give each role its own setup step and state file, then opt a test into one with `test.use`:

```ts
setup('auth as admin', async ({ page }) => { /* … */ await page.context().storageState({ path: 'playwright/.auth/admin.json' }) })

test.describe('admin area', () => {
test.use({ storageState: 'playwright/.auth/admin.json' })
test('sees settings', async ({ page }) => { /* signed in as admin */ })
})
```

A persisted `storageState` is the same idea Checkly uses to keep authenticated monitors logged in across scheduled runs, so a session that survives reuse here survives in production monitoring too.

## Deeper in the docs

- [Managing authentication in Playwright](https://www.checklyhq.com/learn/playwright/authentication/)
- [Bypassing TOTP / 2FA login flows](https://www.checklyhq.com/learn/playwright/bypass-totp/)
- [Automating Google login](https://www.checklyhq.com/learn/playwright/google-login-automation/)
- [Playwright: Authentication](https://playwright.dev/docs/auth)
Loading
Loading