diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4888c74e..20baa7e2 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,6 +6,7 @@ - [Writing Commit Messages](#memo-writing-commit-messages) - [Code Review](#white_check_mark-code-review) - [Coding Style](#nail_care-coding-style) +- [Local Setup](#wrench-local-setup) - [Credits](#pray-credits) ## :repeat: Submitting Pull Requests @@ -134,6 +135,37 @@ We use [error-stack](https://docs.rs/error-stack/latest/error_stack/) for error 3. **Attachments**: Use `.attach_printable("additional info")` to add debugging context without changing the error variant. 4. **Consistency**: Avoid returning bare `TrustedServerError` unless absolutely necessary (e.g. implementing traits). Wrap them in `Report::new()`. +## :wrench: Local Setup + +### Pre-commit URL-host linter (`ts dev lint domains`) + +`ts dev lint domains` checks that source, config, and documentation +files only reference `example.com` (and other RFC 2606 reserved +names), loopback addresses, vetted integration-proxy endpoints, or a +small set of well-known documentation hosts. It is intended to run +as a pre-commit hook so accidental third-party hosts never land in a +commit. + +One-time setup after cloning: + +```bash +cargo install_cli # builds and installs the `ts` binary +ts dev install-hooks # installs the pre-commit hook into .githooks/ +``` + +After that, every `git commit` runs the linter against staged +changes. `ts dev install-hooks` writes `.githooks/pre-commit` and +sets `core.hooksPath`; if you already have a `core.hooksPath` +(husky, lefthook, etc.) it refuses to overwrite it without +`--force`. To bypass the hook for a single commit, use +`git commit --no-verify`. + +To audit the whole repository at once: `ts dev lint domains` (no +arguments). To add a newly-vetted integration proxy to the +allowlist, edit `EXACT_HOSTS` in +`crates/trusted-server-cli/src/dev/lint/domains.rs`. The full design +is in `docs/superpowers/specs/2026-05-18-check-domains-design.md`. + ## :pray: Credits - https://github.com/jessesquires/.github/blob/main/CONTRIBUTING.md diff --git a/Cargo.lock b/Cargo.lock index 07a050d9..d5a73d89 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -119,12 +119,36 @@ version = "1.0.102" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" +[[package]] +name = "arc-swap" +version = "1.9.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6a3a1fd6f75306b68087b831f025c712524bcb19aad54e557b1129cfa0a2b207" +dependencies = [ + "rustversion", +] + [[package]] name = "arraydeque" version = "0.5.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7d902e3d592a523def97af8f317b08ce16b7ab854c1985a0c671e6f15cebc236" +[[package]] +name = "assert_cmd" +version = "2.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2aa3a22042e45de04255c7bf3626e239f450200fd0493c1e382263544b20aea6" +dependencies = [ + "anstyle", + "bstr", + "libc", + "predicates", + "predicates-core", + "predicates-tree", + "wait-timeout", +] + [[package]] name = "async-compression" version = "0.4.42" @@ -280,6 +304,17 @@ dependencies = [ "alloc-stdlib", ] +[[package]] +name = "bstr" +version = "1.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "63044e1ae8e69f3b5a92c736ca6269b8d12fa7efe39bf34ddb06d102cf0e2cab" +dependencies = [ + "memchr", + "regex-automata", + "serde", +] + [[package]] name = "build-print" version = "1.0.1" @@ -515,6 +550,15 @@ version = "1.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9" +[[package]] +name = "clru" +version = "0.6.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "197fd99cb113a8d5d9b6376f3aa817f32c1078f2343b714fff7d2ca44fdf67d5" +dependencies = [ + "hashbrown 0.16.1", +] + [[package]] name = "colorchoice" version = "1.0.5" @@ -703,6 +747,12 @@ dependencies = [ "itertools 0.10.5", ] +[[package]] +name = "crossbeam-utils" +version = "0.8.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28" + [[package]] name = "crunchy" version = "0.2.4" @@ -830,6 +880,20 @@ dependencies = [ "syn 2.0.117", ] +[[package]] +name = "dashmap" +version = "6.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6361d5c062261c78a176addb82d4c821ae42bed6089de0e12603cd25de2059c" +dependencies = [ + "cfg-if", + "crossbeam-utils", + "hashbrown 0.14.5", + "lock_api", + "once_cell", + "parking_lot_core", +] + [[package]] name = "data-encoding" version = "2.11.0" @@ -912,6 +976,12 @@ dependencies = [ "zeroize", ] +[[package]] +name = "difflib" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6184e33543162437515c2e2b48714794e37845ec9851711914eec9d308f6ebe8" + [[package]] name = "digest" version = "0.9.0" @@ -1160,6 +1230,16 @@ dependencies = [ "rustc_version", ] +[[package]] +name = "faster-hex" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7223ae2d2f179b803433d9c830478527e92b8117eab39460edae7f1614d9fb73" +dependencies = [ + "heapless", + "serde", +] + [[package]] name = "fastly" version = "0.11.13" @@ -1252,6 +1332,16 @@ version = "0.2.9" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "28dea519a9695b9977216879a3ebfddf92f1c08c05d984f8996aecd6ecdc811d" +[[package]] +name = "filetime" +version = "0.2.29" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5c287a33c7f0a620c38e641e7f60827713987b3c0f26e8ddc9462cc69cf75759" +dependencies = [ + "cfg-if", + "libc", +] + [[package]] name = "find-msvc-tools" version = "0.1.9" @@ -1268,6 +1358,15 @@ dependencies = [ "miniz_oxide", ] +[[package]] +name = "float-cmp" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b09cf3155332e944990140d967ff5eceb70df778b34f77d8075db46e4704e6d8" +dependencies = [ + "num-traits", +] + [[package]] name = "fnv" version = "1.0.7" @@ -1348,124 +1447,987 @@ dependencies = [ ] [[package]] -name = "futures-io" -version = "0.3.32" +name = "futures-io" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cecba35d7ad927e23624b22ad55235f2239cfa44fd10428eecbeba6d6a717718" + +[[package]] +name = "futures-macro" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.117", +] + +[[package]] +name = "futures-sink" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c39754e157331b013978ec91992bde1ac089843443c49cbc7f46150b0fad0893" + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-timer" +version = "3.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f288b0a4f20f9a56b5d1da57e2227c661b7b16168e2f72365f57b63326e29b24" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-channel", + "futures-core", + "futures-io", + "futures-macro", + "futures-sink", + "futures-task", + "memchr", + "pin-project-lite", + "slab", +] + +[[package]] +name = "fxhash" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c31b6d751ae2c7f11320402d34e41349dd1016f8d5d45e48c4312bc8625af50c" +dependencies = [ + "byteorder", +] + +[[package]] +name = "generic-array" +version = "0.14.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" +dependencies = [ + "typenum", + "version_check", + "zeroize", +] + +[[package]] +name = "getopts" +version = "0.2.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cfe4fbac503b8d1f88e6676011885f34b7174f46e59956bba534ba83abded4df" +dependencies = [ + "unicode-width", +] + +[[package]] +name = "getrandom" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ff2abc00be7fca6ebc474524697ae276ad847ad0a6b3faa4bcb027e9a4614ad0" +dependencies = [ + "cfg-if", + "js-sys", + "libc", + "wasi", + "wasm-bindgen", +] + +[[package]] +name = "getrandom" +version = "0.3.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd" +dependencies = [ + "cfg-if", + "js-sys", + "libc", + "r-efi 5.3.0", + "wasip2", + "wasm-bindgen", +] + +[[package]] +name = "getrandom" +version = "0.4.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0de51e6874e94e7bf76d726fc5d13ba782deca734ff60d5bb2fb2607c7406555" +dependencies = [ + "cfg-if", + "libc", + "r-efi 6.0.0", + "wasip2", + "wasip3", +] + +[[package]] +name = "gix" +version = "0.83.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ce52001b946a6249d5d0d3011df0a042ac3f8a4d013460db6476577b0b9c567" +dependencies = [ + "gix-actor", + "gix-archive", + "gix-attributes", + "gix-blame", + "gix-command", + "gix-commitgraph", + "gix-config", + "gix-date", + "gix-diff", + "gix-dir", + "gix-discover", + "gix-error", + "gix-features", + "gix-filter", + "gix-fs", + "gix-glob", + "gix-hash", + "gix-hashtable", + "gix-ignore", + "gix-index", + "gix-lock", + "gix-merge", + "gix-negotiate", + "gix-object", + "gix-odb", + "gix-pack", + "gix-path", + "gix-pathspec", + "gix-protocol", + "gix-ref", + "gix-refspec", + "gix-revision", + "gix-revwalk", + "gix-sec", + "gix-shallow", + "gix-status", + "gix-submodule", + "gix-tempfile", + "gix-trace", + "gix-traverse", + "gix-url", + "gix-utils", + "gix-validate", + "gix-worktree", + "gix-worktree-state", + "gix-worktree-stream", + "nonempty", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-actor" +version = "0.41.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "272916673b83714734b15d4ef3c8b5f1ccddb15fea8ff548430b97c1ab7b7ed8" +dependencies = [ + "bstr", + "gix-date", + "gix-error", +] + +[[package]] +name = "gix-archive" +version = "0.32.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a20ec244b733338d4cb60e5e05eac700dab7fcc689647b1d1daa9396b119342" +dependencies = [ + "bstr", + "gix-date", + "gix-error", + "gix-object", + "gix-worktree-stream", +] + +[[package]] +name = "gix-attributes" +version = "0.33.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fe17c5a1c0b6f2ef1476aa1d3222ea50cdff67608016613a58bfc3e078046000" +dependencies = [ + "bstr", + "gix-glob", + "gix-path", + "gix-quote", + "gix-trace", + "kstring", + "smallvec", + "thiserror 2.0.18", + "unicode-bom", +] + +[[package]] +name = "gix-bitmap" +version = "0.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ecbfc77ec6852294e341ecc305a490b59f2813e6ca42d79efda5099dcab1894" +dependencies = [ + "gix-error", +] + +[[package]] +name = "gix-blame" +version = "0.13.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "14dab9a942ab54a9661ded7397c3bf927274e7afa94494db0d75cfcbde02ca0a" +dependencies = [ + "gix-commitgraph", + "gix-date", + "gix-diff", + "gix-error", + "gix-hash", + "gix-object", + "gix-revwalk", + "gix-trace", + "gix-traverse", + "gix-worktree", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-chunk" +version = "0.7.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "edf288be9b60fe7231de03771faa292be1493d84786f68727e33ad1f91764320" +dependencies = [ + "gix-error", +] + +[[package]] +name = "gix-command" +version = "0.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "86335306511abe43d75c866d4b1f3d90932fe202edcd43e1314036333e7384d8" +dependencies = [ + "bstr", + "gix-path", + "gix-quote", + "gix-trace", + "shell-words", +] + +[[package]] +name = "gix-commitgraph" +version = "0.37.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fe3b5aa0f24e19028c261d229aeeedafcaaa52ebd71021cc15184620fc9d32eb" +dependencies = [ + "bstr", + "gix-chunk", + "gix-error", + "gix-hash", + "memmap2", + "nonempty", +] + +[[package]] +name = "gix-config" +version = "0.56.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8c01848aebd21c67f6ba41f1de8efd46ae96df21f001954a3c9e1517e514d410" +dependencies = [ + "bstr", + "gix-config-value", + "gix-features", + "gix-glob", + "gix-path", + "gix-ref", + "gix-sec", + "smallvec", + "thiserror 2.0.18", + "unicode-bom", +] + +[[package]] +name = "gix-config-value" +version = "0.18.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "13b39ed39ee4c10a3b157f9fb94bac8098d9f8e56201f0cf7dee6c187416c4b2" +dependencies = [ + "bitflags 2.11.1", + "bstr", + "gix-path", + "libc", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-date" +version = "0.15.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b94cdae4eb4b0f4136e3d9b3aa2d2cd03cfb5bb9b636b31263aea2df86d41543" +dependencies = [ + "bstr", + "gix-error", + "itoa", + "jiff", + "smallvec", +] + +[[package]] +name = "gix-diff" +version = "0.63.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc08e0fa1a91ff5f24affeab052f198056645e1de004910bde7b82b50ea5982a" +dependencies = [ + "bstr", + "gix-command", + "gix-filter", + "gix-fs", + "gix-hash", + "gix-imara-diff", + "gix-object", + "gix-path", + "gix-tempfile", + "gix-trace", + "gix-traverse", + "gix-worktree", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-dir" +version = "0.25.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32a0fc06e9e1e430cbf0a313666976d90f822f461a6525320427aa9b8af5236c" +dependencies = [ + "bstr", + "gix-discover", + "gix-fs", + "gix-ignore", + "gix-index", + "gix-object", + "gix-path", + "gix-pathspec", + "gix-trace", + "gix-utils", + "gix-worktree", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-discover" +version = "0.51.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "17852e6a501e688a1702b24ebe5b3761d4719455bc869fd29f38b0b859bcad34" +dependencies = [ + "bstr", + "dunce", + "gix-fs", + "gix-path", + "gix-ref", + "gix-sec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-error" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e207b971746ab724fccdfced2e4e19e854744611904a0195d3aa8fda8a110613" +dependencies = [ + "bstr", +] + +[[package]] +name = "gix-features" +version = "0.48.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "af375693ad5333d0a2c66b4c5b2cbe9ccc38e34f8e8bf24e4ae42c12307fdc4f" +dependencies = [ + "bytes", + "crc32fast", + "gix-path", + "gix-trace", + "gix-utils", + "libc", + "once_cell", + "prodash", + "thiserror 2.0.18", + "walkdir", + "zlib-rs", +] + +[[package]] +name = "gix-filter" +version = "0.30.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dac917dbe9653c9b615d248db91907a365bd779750c9e1b457a9d9fdeece3a08" +dependencies = [ + "bstr", + "encoding_rs", + "gix-attributes", + "gix-command", + "gix-hash", + "gix-object", + "gix-packetline", + "gix-path", + "gix-quote", + "gix-trace", + "gix-utils", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-fs" +version = "0.21.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e1967daac9848757c47c2aef0c57bcadc1a897347f559778249bf286a536c86" +dependencies = [ + "bstr", + "fastrand", + "gix-features", + "gix-path", + "gix-utils", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-glob" +version = "0.26.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "08bf29249a069bf2507f5964f80997f37b134d320ea348d66527726b9be2c38c" +dependencies = [ + "bitflags 2.11.1", + "bstr", + "gix-features", + "gix-path", +] + +[[package]] +name = "gix-hash" +version = "0.25.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bcf70d1e252337eed16360f8b8ebb71865ece58eab7954b39ce38b420de703d2" +dependencies = [ + "faster-hex", + "gix-features", + "sha1-checked", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-hashtable" +version = "0.15.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d33b455e07b3c16d3b2eeebc7b38d2dafcbf8a653de1138ef55d4c2a1fd0b08b" +dependencies = [ + "gix-hash", + "hashbrown 0.16.1", + "parking_lot", +] + +[[package]] +name = "gix-ignore" +version = "0.21.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6bb13fbbeeafee943e52b61fcc88dfddf6a452fcaf0c4d0cdc8f218fa25bbec5" +dependencies = [ + "bstr", + "gix-glob", + "gix-path", + "gix-trace", + "unicode-bom", +] + +[[package]] +name = "gix-imara-diff" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "39eb0623e15e4cb83c02ce6a959e48fadd1ae3b715b36b5acc01816e01388c82" +dependencies = [ + "bstr", + "hashbrown 0.16.1", +] + +[[package]] +name = "gix-index" +version = "0.51.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "54c3ef97ad08121e4327a6226bd63fed6b9e3c6b976d48bddd4356d9d41191db" +dependencies = [ + "bitflags 2.11.1", + "bstr", + "filetime", + "fnv", + "gix-bitmap", + "gix-features", + "gix-fs", + "gix-hash", + "gix-lock", + "gix-object", + "gix-traverse", + "gix-utils", + "gix-validate", + "hashbrown 0.16.1", + "itoa", + "libc", + "memmap2", + "rustix", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-lock" +version = "23.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09b3bc074e5723027b482dcd9ab99d95804a53742f6de812d0172fbba4a186c1" +dependencies = [ + "gix-tempfile", + "gix-utils", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-merge" +version = "0.16.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "74bbcdcc52b70a32f0a151b024dff9d0fcf56ee48f00d9503e735af9d99ea881" +dependencies = [ + "bstr", + "gix-command", + "gix-diff", + "gix-filter", + "gix-fs", + "gix-hash", + "gix-imara-diff", + "gix-index", + "gix-object", + "gix-path", + "gix-quote", + "gix-revision", + "gix-revwalk", + "gix-tempfile", + "gix-trace", + "gix-worktree", + "nonempty", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-negotiate" +version = "0.31.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "103d42bfade1b8a96ca5005933127bdad461ce588d92422b2c2daa3ff20d780c" +dependencies = [ + "bitflags 2.11.1", + "gix-commitgraph", + "gix-date", + "gix-hash", + "gix-object", + "gix-revwalk", +] + +[[package]] +name = "gix-object" +version = "0.60.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a38075a95d7cc5df8afd38e72c617026c1456952207a4120a7f55a3fbf93b4d7" +dependencies = [ + "bstr", + "gix-actor", + "gix-date", + "gix-features", + "gix-hash", + "gix-hashtable", + "gix-utils", + "gix-validate", + "itoa", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-odb" +version = "0.80.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "aeeda12a9663120418735ecdc1250d06eeab0be75700e47b3402a981331716ba" +dependencies = [ + "arc-swap", + "gix-features", + "gix-fs", + "gix-hash", + "gix-hashtable", + "gix-object", + "gix-pack", + "gix-path", + "gix-quote", + "memmap2", + "parking_lot", + "tempfile", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-pack" +version = "0.70.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "daf02e6f5c8f07a069c9ea5245f40d9b14856ada4086091dc99941b49002b4fa" +dependencies = [ + "clru", + "gix-chunk", + "gix-error", + "gix-features", + "gix-hash", + "gix-hashtable", + "gix-object", + "gix-path", + "memmap2", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-packetline" +version = "0.21.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "362246df440ee691699f0664cbf7006a6ece477db6734222be95e4198e5656e6" +dependencies = [ + "bstr", + "faster-hex", + "gix-trace", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-path" +version = "0.12.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "671a6059e8a4c1b7f406e24716499cefa3926e060876fb1959ef225efeee346e" +dependencies = [ + "bstr", + "gix-trace", + "gix-validate", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-pathspec" +version = "0.18.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2a84a4f083dd70fb49f4377e13afa6d90df2daaa1c705c49d6ff1331fc7e8855" +dependencies = [ + "bitflags 2.11.1", + "bstr", + "gix-attributes", + "gix-config-value", + "gix-glob", + "gix-path", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-protocol" +version = "0.61.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "aa4bee82db63ec635996b96efae71cf467c155fa3f34a556184373224a26c4fd" +dependencies = [ + "bstr", + "gix-date", + "gix-features", + "gix-hash", + "gix-ref", + "gix-shallow", + "gix-transport", + "gix-utils", + "maybe-async", + "nonempty", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-quote" +version = "0.7.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e97b73791a64bc0fa7dd2c5b3e551136115f97750b876ed1c952c7a7dbaf8be" +dependencies = [ + "bstr", + "gix-error", + "gix-utils", +] + +[[package]] +name = "gix-ref" +version = "0.63.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d8ba9cc15f558b274c99349b83130f5ec83459660828fde9718bbbb43a726167" +dependencies = [ + "gix-actor", + "gix-features", + "gix-fs", + "gix-hash", + "gix-lock", + "gix-object", + "gix-path", + "gix-tempfile", + "gix-utils", + "gix-validate", + "memmap2", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-refspec" +version = "0.41.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "61755b27d57edc8940a1b1593c8c61548ca8e4c02da1ed8d5bfeda9eb2a6b761" +dependencies = [ + "bstr", + "gix-error", + "gix-glob", + "gix-hash", + "gix-revision", + "gix-validate", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-revision" +version = "0.45.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fb5288fac706d3ea3e4e2ba9ec38b78743b8c02f422e18cb342299cfd6ab7e8" +dependencies = [ + "bitflags 2.11.1", + "bstr", + "gix-commitgraph", + "gix-date", + "gix-error", + "gix-hash", + "gix-hashtable", + "gix-object", + "gix-revwalk", + "gix-trace", + "nonempty", +] + +[[package]] +name = "gix-revwalk" +version = "0.31.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "313813706b073a12ff7f9b2896bf3e6504cdac7cfbc97b1920114724705069f0" +dependencies = [ + "gix-commitgraph", + "gix-date", + "gix-error", + "gix-hash", + "gix-hashtable", + "gix-object", + "smallvec", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-sec" +version = "0.14.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f5a3a2d3e504a238136751e646a6c028252286a0ea64ea9974bf0498633407c6" +dependencies = [ + "bitflags 2.11.1", + "gix-path", + "libc", + "windows-sys 0.61.2", +] + +[[package]] +name = "gix-shallow" +version = "0.12.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "29187305521bfacf4aefd284ab28dbfa9fb74abd39a5e63dd313b1baa5808c27" +dependencies = [ + "bstr", + "gix-hash", + "gix-lock", + "nonempty", + "thiserror 2.0.18", +] + +[[package]] +name = "gix-status" +version = "0.30.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "cecba35d7ad927e23624b22ad55235f2239cfa44fd10428eecbeba6d6a717718" +checksum = "68c6d2a8c521ffa205fe7e268c82e6d1378ba37cd826ca10ab6129fdc29a4b65" +dependencies = [ + "bstr", + "filetime", + "gix-diff", + "gix-dir", + "gix-features", + "gix-filter", + "gix-fs", + "gix-hash", + "gix-index", + "gix-object", + "gix-path", + "gix-pathspec", + "gix-worktree", + "portable-atomic", + "thiserror 2.0.18", +] [[package]] -name = "futures-macro" -version = "0.3.32" +name = "gix-submodule" +version = "0.30.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b" +checksum = "9fd5fc8692890bd71a596e540fd4c364f8460eaa82c4eaaedebde6e1e3eb4d91" dependencies = [ - "proc-macro2", - "quote", - "syn 2.0.117", + "bstr", + "gix-config", + "gix-path", + "gix-pathspec", + "gix-refspec", + "gix-url", + "thiserror 2.0.18", ] [[package]] -name = "futures-sink" -version = "0.3.32" +name = "gix-tempfile" +version = "23.0.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c39754e157331b013978ec91992bde1ac089843443c49cbc7f46150b0fad0893" +checksum = "691ea1e31435c7e7d4d04705ec9d1c0d9482c46b2acf512bc723939d8f0af7fb" +dependencies = [ + "dashmap", + "gix-fs", + "libc", + "parking_lot", + "tempfile", +] [[package]] -name = "futures-task" -version = "0.3.32" +name = "gix-trace" +version = "0.1.19" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" +checksum = "6f23569e55f2ffaf958617353b9734a7d52a7c19c439eeaa5e3efc217fd2270e" [[package]] -name = "futures-timer" -version = "3.0.3" +name = "gix-transport" +version = "0.57.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f288b0a4f20f9a56b5d1da57e2227c661b7b16168e2f72365f57b63326e29b24" +checksum = "ffd6a5c676b92d4ead5f5a2b2935024415dec69edc997b6090ca9cac010a3018" +dependencies = [ + "bstr", + "gix-command", + "gix-features", + "gix-packetline", + "gix-quote", + "gix-sec", + "gix-url", + "thiserror 2.0.18", +] [[package]] -name = "futures-util" -version = "0.3.32" +name = "gix-traverse" +version = "0.57.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +checksum = "a14b7052c0786676c03e71fcfde7d7f0f8e8316e642b5cec6bb3998719b2ce5c" dependencies = [ - "futures-channel", - "futures-core", - "futures-io", - "futures-macro", - "futures-sink", - "futures-task", - "memchr", - "pin-project-lite", - "slab", + "bitflags 2.11.1", + "gix-commitgraph", + "gix-date", + "gix-hash", + "gix-hashtable", + "gix-object", + "gix-revwalk", + "smallvec", + "thiserror 2.0.18", ] [[package]] -name = "fxhash" -version = "0.2.1" +name = "gix-url" +version = "0.36.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c31b6d751ae2c7f11320402d34e41349dd1016f8d5d45e48c4312bc8625af50c" +checksum = "35842d099e813f6f6bba529e88d4670572149c3df79b7a412952259887721ece" dependencies = [ - "byteorder", + "bstr", + "gix-path", + "percent-encoding", + "thiserror 2.0.18", ] [[package]] -name = "generic-array" -version = "0.14.7" +name = "gix-utils" +version = "0.3.2" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" +checksum = "4e477b4f07a6e8da4ba791c53c858102959703c60d70f199932010d5b94adb2c" dependencies = [ - "typenum", - "version_check", - "zeroize", + "bstr", + "fastrand", + "unicode-normalization", ] [[package]] -name = "getopts" -version = "0.2.24" +name = "gix-validate" +version = "0.11.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "cfe4fbac503b8d1f88e6676011885f34b7174f46e59956bba534ba83abded4df" +checksum = "e26ac2602b43eadfdca0560b81d3341944162a3c9f64ccdeef8fc501ad80dad5" dependencies = [ - "unicode-width", + "bstr", ] [[package]] -name = "getrandom" -version = "0.2.17" +name = "gix-worktree" +version = "0.52.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "ff2abc00be7fca6ebc474524697ae276ad847ad0a6b3faa4bcb027e9a4614ad0" +checksum = "d69955eb5e2910832f88d041964b809eee01dadd579237e0b55efec58fd406fd" dependencies = [ - "cfg-if", - "js-sys", - "libc", - "wasi", - "wasm-bindgen", + "bstr", + "gix-attributes", + "gix-fs", + "gix-glob", + "gix-hash", + "gix-ignore", + "gix-index", + "gix-object", + "gix-path", + "gix-validate", ] [[package]] -name = "getrandom" -version = "0.3.4" +name = "gix-worktree-state" +version = "0.30.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd" +checksum = "8a96dccbcf9e8fe0291c55f06e08da93ebb2e691c1311276f541eefcc6d70800" dependencies = [ - "cfg-if", - "js-sys", - "libc", - "r-efi 5.3.0", - "wasip2", - "wasm-bindgen", + "bstr", + "gix-features", + "gix-filter", + "gix-fs", + "gix-index", + "gix-object", + "gix-path", + "gix-worktree", + "io-close", + "thiserror 2.0.18", ] [[package]] -name = "getrandom" -version = "0.4.2" +name = "gix-worktree-stream" +version = "0.32.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "0de51e6874e94e7bf76d726fc5d13ba782deca734ff60d5bb2fb2607c7406555" +checksum = "9a8444b8ed4662e1a0c97f3eceda29630001a1bbb2632201e50312623e594213" dependencies = [ - "cfg-if", - "libc", - "r-efi 6.0.0", - "wasip2", - "wasip3", + "gix-attributes", + "gix-error", + "gix-features", + "gix-filter", + "gix-fs", + "gix-hash", + "gix-object", + "gix-path", + "gix-traverse", + "parking_lot", ] [[package]] @@ -1490,6 +2452,15 @@ dependencies = [ "zerocopy", ] +[[package]] +name = "hash32" +version = "0.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "47d60b12902ba28e2730cd37e95b8c9223af2808df9e902d4df49588d1470606" +dependencies = [ + "byteorder", +] + [[package]] name = "hashbrown" version = "0.14.5" @@ -1505,6 +2476,17 @@ dependencies = [ "foldhash 0.1.5", ] +[[package]] +name = "hashbrown" +version = "0.16.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100" +dependencies = [ + "allocator-api2", + "equivalent", + "foldhash 0.2.0", +] + [[package]] name = "hashbrown" version = "0.17.1" @@ -1525,6 +2507,16 @@ dependencies = [ "hashbrown 0.15.5", ] +[[package]] +name = "heapless" +version = "0.8.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0bfb9eb618601c89945a70e254898da93b13be0388091d42117462b265bb3fad" +dependencies = [ + "hash32", + "stable_deref_trait", +] + [[package]] name = "heck" version = "0.5.0" @@ -1853,6 +2845,16 @@ dependencies = [ "generic-array", ] +[[package]] +name = "io-close" +version = "0.3.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9cadcf447f06744f8ce713d2d6239bb5bde2c357a452397a9ed90c625da390bc" +dependencies = [ + "libc", + "winapi", +] + [[package]] name = "ipnet" version = "2.12.0" @@ -1900,6 +2902,47 @@ version = "1.0.18" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" +[[package]] +name = "jiff" +version = "0.2.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f00b5dbd620d61dfdcb6007c9c1f6054ebd75319f163d886a9055cec1155073d" +dependencies = [ + "jiff-static", + "jiff-tzdb-platform", + "log", + "portable-atomic", + "portable-atomic-util", + "serde_core", + "windows-sys 0.61.2", +] + +[[package]] +name = "jiff-static" +version = "0.2.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e000de030ff8022ea1da3f466fbb0f3a809f5e51ed31f6dd931c35181ad8e6d7" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.117", +] + +[[package]] +name = "jiff-tzdb" +version = "0.1.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c900ef84826f1338a557697dc8fc601df9ca9af4ac137c7fb61d4c6f2dfd3076" + +[[package]] +name = "jiff-tzdb-platform" +version = "0.1.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "875a5a69ac2bab1a891711cf5eccbec1ce0341ea805560dcd90b7a2e925132e8" +dependencies = [ + "jiff-tzdb", +] + [[package]] name = "jose-b64" version = "0.1.2" @@ -1974,6 +3017,15 @@ dependencies = [ "zeroize", ] +[[package]] +name = "kstring" +version = "2.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "558bf9508a558512042d3095138b1f7b8fe90c5467d94f9f1da28b3731c5dbd1" +dependencies = [ + "static_assertions", +] + [[package]] name = "lazy_static" version = "1.5.0" @@ -2107,12 +3159,32 @@ version = "0.9.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "8863b587001c1b9a8a4e36008cebc6b3612cb1226fe2de94858e06092687b608" +[[package]] +name = "maybe-async" +version = "0.2.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "746873a384ad60adc5db74471dfaba74bd278afbdcfd81db93fafcdfc8b5ca0c" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.117", +] + [[package]] name = "memchr" version = "2.8.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" +[[package]] +name = "memmap2" +version = "0.9.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "714098028fe011992e1c3962653c96b2d578c4b4bce9036e15ff220319b1e0e3" +dependencies = [ + "libc", +] + [[package]] name = "mime" version = "0.3.17" @@ -2155,6 +3227,18 @@ dependencies = [ "memchr", ] +[[package]] +name = "nonempty" +version = "0.12.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9737e026353e5cd0736f98eddae28665118eb6f6600902a7f50db585621fecb6" + +[[package]] +name = "normalize-line-endings" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "61807f77802ff30975e01f4f071c8ba10c022052f98b3294119f3e615d13e5be" + [[package]] name = "num-bigint-dig" version = "0.8.6" @@ -2499,6 +3583,21 @@ dependencies = [ "universal-hash", ] +[[package]] +name = "portable-atomic" +version = "1.13.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c33a9471896f1c69cecef8d20cbe2f7accd12527ce60845ff44c153bb2a21b49" + +[[package]] +name = "portable-atomic-util" +version = "0.2.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c2a106d1259c23fac8e543272398ae0e3c0b8d33c88ed73d0cc71b0f1d902618" +dependencies = [ + "portable-atomic", +] + [[package]] name = "potential_utf" version = "0.1.5" @@ -2529,6 +3628,36 @@ version = "0.1.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "925383efa346730478fb4838dbe9137d2a47675ad789c546d150a6e1dd4ab31c" +[[package]] +name = "predicates" +version = "3.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ada8f2932f28a27ee7b70dd6c1c39ea0675c55a36879ab92f3a715eaa1e63cfe" +dependencies = [ + "anstyle", + "difflib", + "float-cmp", + "normalize-line-endings", + "predicates-core", + "regex", +] + +[[package]] +name = "predicates-core" +version = "1.0.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cad38746f3166b4031b1a0d39ad9f954dd291e7854fcc0eed52ee41a0b50d144" + +[[package]] +name = "predicates-tree" +version = "1.0.13" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0de1b847b39c8131db0467e9df1ff60e6d0562ab8e9a16e568ad0fdb372e2f2" +dependencies = [ + "predicates-core", + "termtree", +] + [[package]] name = "prettyplease" version = "0.2.37" @@ -2579,6 +3708,15 @@ dependencies = [ "unicode-ident", ] +[[package]] +name = "prodash" +version = "31.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "962200e2d7d551451297d9fdce85138374019ada198e30ea9ede38034e27604c" +dependencies = [ + "parking_lot", +] + [[package]] name = "quinn" version = "0.11.9" @@ -3184,6 +4322,16 @@ dependencies = [ "digest 0.10.7", ] +[[package]] +name = "sha1-checked" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "89f599ac0c323ebb1c6082821a54962b839832b03984598375bff3975b804423" +dependencies = [ + "digest 0.10.7", + "sha1", +] + [[package]] name = "sha2" version = "0.9.9" @@ -3296,6 +4444,12 @@ version = "1.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "6ce2be8dc25455e1f91df71bfa12ad37d7af1092ae736f3a6cd0e37bc7810596" +[[package]] +name = "static_assertions" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a2eb9349b6444b326872e140eb1cf5e7c522154d69e7a0ffb0fb81c06b37543f" + [[package]] name = "string_cache" version = "0.8.9" @@ -3420,6 +4574,12 @@ dependencies = [ "utf-8", ] +[[package]] +name = "termtree" +version = "0.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683" + [[package]] name = "thiserror" version = "1.0.69" @@ -3712,6 +4872,7 @@ dependencies = [ name = "trusted-server-cli" version = "0.1.0" dependencies = [ + "assert_cmd", "base64", "chromiumoxide", "clap", @@ -3719,8 +4880,11 @@ dependencies = [ "dialoguer", "error-stack", "futures", + "gix", + "gix-config", "keyring", "log", + "predicates", "regex", "reqwest 0.12.28", "scraper", @@ -3842,12 +5006,27 @@ version = "0.1.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2896d95c02a80c6d6a5d6e953d479f5ddf2dfdb6a244441010e373ac0fb88971" +[[package]] +name = "unicode-bom" +version = "2.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7eec5d1121208364f6793f7d2e222bf75a915c19557537745b195b253dd64217" + [[package]] name = "unicode-ident" version = "1.0.24" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" +[[package]] +name = "unicode-normalization" +version = "0.1.25" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5fd4f6878c9cb28d874b009da9e8d183b5abc80117c40bbd187a1fde336be6e8" +dependencies = [ + "tinyvec", +] + [[package]] name = "unicode-segmentation" version = "1.13.2" @@ -3965,6 +5144,15 @@ version = "0.9.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" +[[package]] +name = "wait-timeout" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09ac3b126d3914f9849036f826e054cbabdc8519970b8998ddaf3b5bd3c65f11" +dependencies = [ + "libc", +] + [[package]] name = "walkdir" version = "2.5.0" @@ -4147,6 +5335,22 @@ dependencies = [ "libc", ] +[[package]] +name = "winapi" +version = "0.3.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419" +dependencies = [ + "winapi-i686-pc-windows-gnu", + "winapi-x86_64-pc-windows-gnu", +] + +[[package]] +name = "winapi-i686-pc-windows-gnu" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6" + [[package]] name = "winapi-util" version = "0.1.11" @@ -4156,6 +5360,12 @@ dependencies = [ "windows-sys 0.61.2", ] +[[package]] +name = "winapi-x86_64-pc-windows-gnu" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f" + [[package]] name = "windows-core" version = "0.62.2" @@ -4632,6 +5842,12 @@ dependencies = [ "syn 2.0.117", ] +[[package]] +name = "zlib-rs" +version = "0.6.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3be3d40e40a133f9c916ee3f9f4fa2d9d63435b5fbe1bfc6d9dae0aa0ada1513" + [[package]] name = "zmij" version = "1.0.21" diff --git a/README.md b/README.md index 125e8a62..cc28fe95 100644 --- a/README.md +++ b/README.md @@ -60,6 +60,8 @@ cargo test_cli `ts audit` is host-only and currently expects a local Chrome/Chromium installation. It checks common PATH names and standard macOS app bundle locations. +`ts dev lint domains` checks that source, config, and docs only reference vetted URL hosts; run `ts dev install-hooks` once after cloning to wire it in as a pre-commit hook. See [CONTRIBUTING.md](CONTRIBUTING.md#wrench-local-setup) for setup. + See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines. ## License diff --git a/crates/trusted-server-cli/Cargo.toml b/crates/trusted-server-cli/Cargo.toml index e36dd0c5..4b3be6b6 100644 --- a/crates/trusted-server-cli/Cargo.toml +++ b/crates/trusted-server-cli/Cargo.toml @@ -32,7 +32,25 @@ trusted-server-core = { workspace = true } url = { workspace = true } keyring = { workspace = true } uuid = { workspace = true } +# `ts dev lint domains` and `ts dev install-hooks` (spec +# docs/superpowers/specs/2026-05-18-check-domains-design.md). +# Versions chosen during the Phase 2 feasibility spike; verified +# via `cargo tree -p gix -p gix-config` that no duplicate +# versions land in the lock file. +gix = { version = "0.83", default-features = false, features = [ + "blob-diff", + "index", + "revision", + "sha1", + # Production runtime does not need tree-editor, but the + # feasibility spike + Phase 4 unit tests construct fixture + # repos via gix-only APIs and need it. Keep enabled. + "tree-editor", +] } +gix-config = "0.56" [dev-dependencies] +assert_cmd = "2" +predicates = "3" temp-env = { workspace = true } tempfile = { workspace = true } diff --git a/crates/trusted-server-cli/src/dev/install_hooks.rs b/crates/trusted-server-cli/src/dev/install_hooks.rs new file mode 100644 index 00000000..8708e9ca --- /dev/null +++ b/crates/trusted-server-cli/src/dev/install_hooks.rs @@ -0,0 +1,517 @@ +//! `ts dev install-hooks` — installs the pre-commit hook that runs +//! `ts dev lint domains --staged`. +//! +//! Design: docs/superpowers/specs/2026-05-18-check-domains-design.md +//! +//! All git operations go through `gix` / `gix-config` — no +//! subprocess. The hook file itself is a tiny shell wrapper (git's +//! hook contract requires an executable artifact); it carries the +//! absolute path of the `ts` binary so it works from GUI git tools +//! that do not inherit the shell `PATH`. + +use core::error::Error; +use std::env; +use std::fs; +use std::path::{Path, PathBuf}; +use std::time::{SystemTime, UNIX_EPOCH}; + +use derive_more::Display; +use error_stack::{Report, ResultExt as _}; +use gix::bstr::BStr; +use gix_config::File as GixConfigFile; + +use crate::dev::InstallHooksArgs; +use crate::error::CliError; +use crate::output::write_stderr_line; +use crate::output::write_stdout_line; + +/// Marker line written into managed hook files. `is_managed` looks +/// for this to decide whether overwriting is safe. +const MANAGED_MARKER: &str = "# ts-install-hooks: managed"; + +/// Errors raised by `ts dev install-hooks`. +#[derive(Debug, Display)] +pub enum InstallHooksError { + /// Opening the git repository failed. + #[display("failed to open git repository")] + OpenRepo, + /// The repository has no working directory (bare repo). + #[display("repository has no working directory")] + NoWorkdir, + /// The path of the running executable could not be determined. + #[display("failed to determine the path of the ts executable")] + CurrentExe, + /// Writing the hook file failed. + #[display("failed to write the pre-commit hook")] + WriteHook, + /// Writing the git config failed. + #[display("failed to write git config")] + ConfigWrite, + /// An existing, unmanaged pre-commit hook would be overwritten. + #[display("refusing to overwrite existing hook at `{}`", path.display())] + WouldClobber { + /// The existing hook file. + path: PathBuf, + }, + /// `core.hooksPath` is already set to a foreign value. + #[display("refusing to override existing core.hooksPath `{current}` (would set `{proposed}`)")] + ForeignHooksPath { + /// The current `core.hooksPath` value. + current: String, + /// The value `install-hooks` would set. + proposed: String, + }, +} + +impl Error for InstallHooksError {} + +/// POSIX single-quote escaping: wrap in `'...'`, and replace every +/// embedded single quote with `'\''` (close, escaped quote, reopen). +fn shell_quote(s: &str) -> String { + let mut out = String::with_capacity(s.len() + 2); + out.push('\''); + for c in s.chars() { + if c == '\'' { + out.push_str(r"'\''"); + } else { + out.push(c); + } + } + out.push('\''); + out +} + +/// Render the pre-commit hook script that runs the linter against +/// staged changes. The `ts` path is shell-quoted and absolute. +fn render_hook(ts_path: &Path) -> String { + format!( + "#!/usr/bin/env bash\n\ + # Installed by `ts dev install-hooks`. DO NOT EDIT.\n\ + {MANAGED_MARKER}\n\ + exec {} dev lint domains --staged\n", + shell_quote(&ts_path.to_string_lossy()), + ) +} + +/// Whether `hook_path` is a hook this tool previously installed — +/// detected by the [`MANAGED_MARKER`] line near the top of the file. +fn is_managed(hook_path: &Path) -> Result> { + let content = match fs::read_to_string(hook_path) { + Ok(c) => c, + Err(e) if e.kind() == std::io::ErrorKind::NotFound => return Ok(false), + Err(e) => { + return Err(Report::new(InstallHooksError::WriteHook).attach(e.to_string())); + } + }; + Ok(content + .lines() + .take(10) + .any(|line| line.trim() == MANAGED_MARKER)) +} + +/// Write `content` to `path` atomically: write a sibling temp file, +/// then rename it over the target (atomic on the same filesystem). +fn write_atomic(path: &Path, content: &[u8]) -> Result<(), Report> { + let dir = path + .parent() + .ok_or_else(|| Report::new(InstallHooksError::WriteHook))?; + let nanos = SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0); + let tmp = dir.join(format!( + ".ts-install-hooks.tmp.{}.{nanos}", + std::process::id() + )); + fs::write(&tmp, content).change_context(InstallHooksError::WriteHook)?; + fs::rename(&tmp, path).change_context(InstallHooksError::WriteHook)?; + Ok(()) +} + +/// Read a single dotted-key value from the local repo config. +/// Returns `Ok(None)` if the config file is missing or the key is +/// unset; propagates any other open/parse error (permission denied, +/// malformed config) as [`InstallHooksError::ConfigWrite`] so the +/// foreign-`core.hooksPath` preflight can't silently misread a +/// real config and overwrite it. +fn read_local_config_value( + repo: &gix::Repository, + dotted_key: &str, +) -> Result, Report> { + let config_path = repo.git_dir().join("config"); + if !config_path.exists() { + return Ok(None); + } + let file = GixConfigFile::from_path_no_includes(config_path, gix_config::Source::Local) + .change_context(InstallHooksError::ConfigWrite)?; + Ok(file + .raw_value(dotted_key) + .ok() + .map(|bytes| String::from_utf8_lossy(&bytes).into_owned())) +} + +/// Set a dotted-key value in the local repo config, writing the file +/// back atomically. +fn set_local_config_value( + repo: &gix::Repository, + dotted_key: &str, + value: &str, +) -> Result<(), Report> { + let config_path = repo.git_dir().join("config"); + let mut file = match GixConfigFile::from_path_no_includes( + config_path.clone(), + gix_config::Source::Local, + ) { + Ok(f) => f, + Err(_) => GixConfigFile::new(gix_config::file::Metadata::from(gix_config::Source::Local)), + }; + let value_bstr: &BStr = value.into(); + file.set_raw_value(dotted_key, value_bstr) + .change_context(InstallHooksError::ConfigWrite)?; + let serialized = file.to_bstring(); + write_atomic(&config_path, serialized.as_slice()).change_context(InstallHooksError::ConfigWrite) +} + +/// Install the pre-commit hook into the repository at `repo_path`. +/// +/// Writes `.githooks/pre-commit` and sets `core.hooksPath` to +/// `.githooks`. Refuses to clobber an unmanaged hook or a foreign +/// `core.hooksPath` unless `force` is set. +/// +/// # Errors +/// +/// Returns [`InstallHooksError`] on any failure; see the variants. +pub fn install_hooks(repo_path: &Path, force: bool) -> Result<(), Report> { + let repo = gix::open(repo_path).change_context(InstallHooksError::OpenRepo)?; + let work_dir = repo + .workdir() + .ok_or_else(|| Report::new(InstallHooksError::NoWorkdir))? + .to_path_buf(); + let ts_path = env::current_exe().change_context(InstallHooksError::CurrentExe)?; + + // Preflight: refuse to override a foreign core.hooksPath. + let existing_hooks_path = read_local_config_value(&repo, "core.hooksPath")?; + let displaced_hooks_path = match existing_hooks_path.as_deref() { + None | Some("") | Some(".githooks") => None, + Some(other) if !force => { + return Err(Report::new(InstallHooksError::ForeignHooksPath { + current: other.to_string(), + proposed: ".githooks".to_string(), + })); + } + Some(other) => Some(other.to_string()), + }; + + let hooks_dir = work_dir.join(".githooks"); + let hook_path = hooks_dir.join("pre-commit"); + fs::create_dir_all(&hooks_dir).change_context(InstallHooksError::WriteHook)?; + + // Refuse to clobber an unmanaged hook. + if hook_path.exists() && !is_managed(&hook_path)? && !force { + return Err(Report::new(InstallHooksError::WouldClobber { + path: hook_path.clone(), + })); + } + // Under --force, back up any existing hook before replacing it. + // Nanosecond precision prevents two forced installs within the + // same second from clobbering each other's backup. + if hook_path.exists() && force { + let nanos = SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_nanos()) + .unwrap_or(0); + let backup = hook_path.with_extension(format!("bak.{nanos}")); + fs::rename(&hook_path, &backup).change_context(InstallHooksError::WriteHook)?; + } + + write_atomic(&hook_path, render_hook(&ts_path).as_bytes())?; + set_executable(&hook_path)?; + set_local_config_value(&repo, "core.hooksPath", ".githooks")?; + + write_stdout_line(format!( + "Installed: pre-commit hook -> {} (runs {})", + hook_path.display(), + ts_path.display(), + )) + .change_context(InstallHooksError::WriteHook)?; + if let Some(prev) = displaced_hooks_path { + write_stderr_line(format!( + "note: previous core.hooksPath was `{prev}`. \ + To restore: git config --local core.hooksPath {prev}" + )) + .change_context(InstallHooksError::WriteHook)?; + } + Ok(()) +} + +/// Set the executable bit on `path` (Unix only; a no-op elsewhere). +#[cfg(unix)] +fn set_executable(path: &Path) -> Result<(), Report> { + use std::os::unix::fs::PermissionsExt as _; + let mut perms = fs::metadata(path) + .change_context(InstallHooksError::WriteHook)? + .permissions(); + perms.set_mode(0o755); + fs::set_permissions(path, perms).change_context(InstallHooksError::WriteHook) +} + +#[cfg(not(unix))] +fn set_executable(_path: &Path) -> Result<(), Report> { + Ok(()) +} + +/// `ts dev install-hooks` entry point. +/// +/// # Errors +/// +/// Returns [`CliError::EnvironmentError`] on any install failure — +/// every install-hooks failure is an environment / configuration +/// issue. +pub fn run(args: &InstallHooksArgs) -> Result<(), Report> { + install_hooks(Path::new("."), args.force).change_context(CliError::EnvironmentError) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn shell_quote_plain_path() { + assert_eq!(shell_quote("/usr/bin/ts"), "'/usr/bin/ts'"); + } + + #[test] + fn shell_quote_path_with_spaces() { + assert_eq!( + shell_quote("/Users/Alice Q/.cargo/bin/ts"), + "'/Users/Alice Q/.cargo/bin/ts'" + ); + } + + #[test] + fn shell_quote_path_with_single_quote() { + // close, escaped quote, reopen + assert_eq!(shell_quote("/path/o'brien/ts"), r"'/path/o'\''brien/ts'"); + } + + #[test] + fn shell_quote_path_with_dollar_backtick_backslash() { + // $, backtick, backslash are all literal inside single quotes. + assert_eq!(shell_quote("/opt/$HOME/ts"), "'/opt/$HOME/ts'"); + assert_eq!(shell_quote("/opt/`x`/ts"), "'/opt/`x`/ts'"); + assert_eq!(shell_quote(r"/opt/a\b/ts"), r"'/opt/a\b/ts'"); + } + + #[test] + fn render_hook_quotes_path_and_carries_marker() { + let hook = render_hook(Path::new("/Users/Alice Q/.cargo/bin/ts")); + assert!( + hook.contains("exec '/Users/Alice Q/.cargo/bin/ts' dev lint domains --staged"), + "hook should exec the quoted ts path: {hook}" + ); + assert!( + hook.lines().any(|l| l == MANAGED_MARKER), + "hook should carry the managed marker: {hook}" + ); + assert!(hook.starts_with("#!/usr/bin/env bash\n")); + } + + #[test] + fn is_managed_detects_marker() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let managed = temp.path().join("managed"); + fs::write(&managed, render_hook(Path::new("/usr/bin/ts"))) + .expect("should write managed hook"); + assert!(is_managed(&managed).expect("should read managed hook")); + + let foreign = temp.path().join("foreign"); + fs::write(&foreign, "#!/bin/sh\necho hi\n").expect("should write foreign hook"); + assert!(!is_managed(&foreign).expect("should read foreign hook")); + + let absent = temp.path().join("absent"); + assert!(!is_managed(&absent).expect("absent hook reads as not managed")); + } + + #[test] + fn write_atomic_writes_and_leaves_no_temp() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let target = temp.path().join("file"); + write_atomic(&target, b"hello").expect("should write atomically"); + assert_eq!( + fs::read(&target).expect("should read written file"), + b"hello" + ); + let leftovers: Vec<_> = fs::read_dir(temp.path()) + .expect("should read tempdir") + .filter_map(Result::ok) + .filter(|e| { + e.file_name() + .to_string_lossy() + .contains(".ts-install-hooks.tmp.") + }) + .collect(); + assert!(leftovers.is_empty(), "no temp file should remain"); + } +} + +#[cfg(test)] +mod config_tests { + use super::*; + use crate::dev::lint::test_support; + + #[test] + fn read_returns_none_when_unset() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + let value = read_local_config_value(&repo, "core.hooksPath").expect("should read config"); + assert!(value.is_none(), "unset key reads as None: {value:?}"); + } + + #[test] + fn write_then_read_round_trips_and_persists() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + + set_local_config_value(&repo, "core.hooksPath", ".githooks").expect("should write config"); + let value = + read_local_config_value(&repo, "core.hooksPath").expect("should read config back"); + assert_eq!(value.as_deref(), Some(".githooks")); + + let on_disk = + fs::read_to_string(repo.git_dir().join("config")).expect("should read .git/config"); + assert!( + on_disk.contains("[core]") && on_disk.contains("hooksPath"), + "on-disk config should carry core/hooksPath: {on_disk}" + ); + } +} + +#[cfg(test)] +mod install_hooks_tests { + use super::*; + use crate::dev::lint::test_support; + + fn hooks_path_value(repo_path: &Path) -> Option { + let repo = gix::open(repo_path).expect("should reopen repo"); + read_local_config_value(&repo, "core.hooksPath").expect("should read hooksPath") + } + + #[test] + fn fresh_repo_installs_hook_and_sets_config() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let _repo = test_support::init_repo(temp.path()); + + install_hooks(temp.path(), false).expect("should install into a fresh repo"); + + let hook = temp.path().join(".githooks/pre-commit"); + assert!(hook.is_file(), "hook file should exist"); + let content = fs::read_to_string(&hook).expect("should read hook"); + assert!( + content.contains(MANAGED_MARKER), + "hook should carry the marker" + ); + assert!( + content.contains("dev lint domains --staged"), + "hook should exec the linter" + ); + assert_eq!(hooks_path_value(temp.path()).as_deref(), Some(".githooks")); + } + + #[test] + fn re_running_is_idempotent() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let _repo = test_support::init_repo(temp.path()); + + install_hooks(temp.path(), false).expect("first install should succeed"); + install_hooks(temp.path(), false).expect("re-install should be idempotent"); + assert_eq!(hooks_path_value(temp.path()).as_deref(), Some(".githooks")); + } + + #[test] + fn overwrites_managed_hook_silently() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let _repo = test_support::init_repo(temp.path()); + + install_hooks(temp.path(), false).expect("first install should succeed"); + // A managed hook is present; a second non-forced install must + // still succeed (silent overwrite). + install_hooks(temp.path(), false).expect("managed hook should be overwritten silently"); + } + + #[test] + fn refuses_to_clobber_unmanaged_hook() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let _repo = test_support::init_repo(temp.path()); + + let hooks_dir = temp.path().join(".githooks"); + fs::create_dir_all(&hooks_dir).expect("should create .githooks"); + fs::write(hooks_dir.join("pre-commit"), "#!/bin/sh\necho custom\n") + .expect("should write unmanaged hook"); + + let err = install_hooks(temp.path(), false) + .expect_err("should refuse to clobber an unmanaged hook"); + assert!( + matches!( + err.current_context(), + InstallHooksError::WouldClobber { .. } + ), + "should be WouldClobber: {err:?}" + ); + } + + #[test] + fn force_backs_up_unmanaged_hook() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let _repo = test_support::init_repo(temp.path()); + + let hooks_dir = temp.path().join(".githooks"); + fs::create_dir_all(&hooks_dir).expect("should create .githooks"); + fs::write(hooks_dir.join("pre-commit"), "#!/bin/sh\necho custom\n") + .expect("should write unmanaged hook"); + + install_hooks(temp.path(), true).expect("force should overwrite"); + + // The new hook is managed; a backup of the old one exists. + let content = + fs::read_to_string(hooks_dir.join("pre-commit")).expect("should read new hook"); + assert!(content.contains(MANAGED_MARKER)); + let has_backup = fs::read_dir(&hooks_dir) + .expect("should read hooks dir") + .filter_map(Result::ok) + .any(|e| { + e.file_name() + .to_string_lossy() + .starts_with("pre-commit.bak.") + }); + assert!(has_backup, "the displaced hook should be backed up"); + } + + #[test] + fn refuses_foreign_hooks_path() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + set_local_config_value(&repo, "core.hooksPath", "hooks") + .expect("should seed foreign hooksPath"); + + let err = + install_hooks(temp.path(), false).expect_err("should refuse a foreign core.hooksPath"); + assert!( + matches!( + err.current_context(), + InstallHooksError::ForeignHooksPath { .. } + ), + "should be ForeignHooksPath: {err:?}" + ); + } + + #[test] + fn force_overrides_foreign_hooks_path() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + set_local_config_value(&repo, "core.hooksPath", "hooks") + .expect("should seed foreign hooksPath"); + + install_hooks(temp.path(), true).expect("force should override foreign hooksPath"); + assert_eq!(hooks_path_value(temp.path()).as_deref(), Some(".githooks")); + } +} diff --git a/crates/trusted-server-cli/src/dev/lint/domains.rs b/crates/trusted-server-cli/src/dev/lint/domains.rs new file mode 100644 index 00000000..e72e2123 --- /dev/null +++ b/crates/trusted-server-cli/src/dev/lint/domains.rs @@ -0,0 +1,2073 @@ +//! `ts dev lint domains` — URL-host linter. +//! +//! Design: docs/superpowers/specs/2026-05-18-check-domains-design.md + +use core::error::Error; +use core::ops::ControlFlow; +use std::cell::RefCell; +use std::collections::{BTreeSet, HashSet}; +use std::env; +use std::fs; +use std::io::{self, ErrorKind}; +use std::path::{Path, PathBuf}; +use std::str::from_utf8; +use std::sync::OnceLock; + +use derive_more::Display; +use error_stack::{Report, ResultExt as _}; +use gix::ObjectId; +use gix::diff::Rewrites; +use gix::diff::blob::{Algorithm, Diff, InternedInput}; +use gix::index::entry::Mode as IndexEntryMode; +use gix::object::tree::EntryKind; +use gix::object::tree::diff::Change; +use regex::Regex; +use serde::Serialize; +use serde_json::json; + +use crate::dev::lint::{DomainsArgs, OutputFormat}; +use crate::error::CliError; +use crate::output::{write_json, write_stderr_line, write_stdout_line}; + +/// Integration proxies and loopback hosts that must match exactly. +/// Subdomains are NOT allowed (e.g., `anything.api.privacy-center.org` +/// is disallowed). See spec §"Exact-match hosts" for the policy. +pub const EXACT_HOSTS: &[&str] = &[ + // Loopback + "127.0.0.1", + "::1", + "localhost", + // didomi + "api.privacy-center.org", + "sdk.privacy-center.org", + // sourcepoint + "cdn.privacy-mgmt.com", + // lockr + "aim.loc.kr", + "identity.loc.kr", + // datadome + "js.datadome.co", + "api-js.datadome.co", + // aps / Amazon + "aax.amazon-adsystem.com", + "aax-events.amazon-adsystem.com", + // permutive + "api.permutive.com", + "secure-signals.permutive.app", + "cdn.permutive.com", + // Google Tag Manager / Analytics + "www.googletagmanager.com", + "www.google-analytics.com", + "analytics.google.com", + // adserver mock + "securepubads.g.doubleclick.net", + "origin-mocktioneer.cdintel.com", + // Prebid CDN + "cdn.prebid.org", + // Fastly platform + "api.fastly.com", +]; + +/// Hosts where exact match AND any subdomain (`*.host`) is allowed. +/// See spec §"Subdomain-permitting hosts" and §"Allowlist +/// Maintenance Policy" for the bar to add an entry here. +pub const SUBDOMAIN_HOSTS: &[&str] = &[ + // IANA RFC 2606 reserved + "example.com", + "example.net", + "example.org", + // Permutive: runtime host is {organization_id}.edge.permutive.app + "edge.permutive.app", +]; + +/// Well-known documentation and specification sources. Exact-match, +/// allowed in every scanned file. See spec §"Reference / doc hosts" +/// for the curated list (seeded from a sampling; expected to grow +/// during Stage 1 doc cleanup). +pub const REFERENCE_HOSTS: &[&str] = &[ + // Git / GitHub + "github.com", + "docs.github.com", + "help.github.com", + "token.actions.githubusercontent.com", + // Git commit conventions + "chris.beams.io", + // Rust + "docs.rs", + "doc.rust-lang.org", + "crates.io", + // Web / W3C standards + "www.w3.org", + "schema.org", + // Versioning / changelogs + "semver.org", + "keepachangelog.com", + // IAB Tech Lab + "iab.com", + "iabtechlab.com", + "iabtechlab.github.io", + "iabeurope.github.io", + // Specs (supply chain) + "in-toto.io", + "rslstandard.org", + // Specs (other) + "webassembly.org", + // Fastly docs + "www.fastly.com", + "developer.fastly.com", + "manage.fastly.com", + // Cloudflare docs + "developers.cloudflare.com", + // Vendor docs + "docs.datadome.co", + "docs.prebid.org", + // Tooling docs + "vitepress.dev", + "playwright.dev", + "testcontainers.com", + "grafana.com", + "docsearch.algolia.com", +]; + +/// IANA RFC 2606 reserved TLDs. Any host ending in one of these is allowed. +pub const RESERVED_TLDS: &[&str] = &[".example", ".test", ".invalid", ".localhost"]; + +/// Errors raised by the domains linter. +#[derive(Debug, Display)] +pub enum DomainsLintError { + /// Opening the git repository failed. + #[display("failed to open git repository")] + OpenRepo, + /// Reading the git index failed. + #[display("failed to read git index")] + Index, + /// Computing a blob or tree diff failed. + #[display("failed to compute diff")] + Diff, + /// A git reference could not be resolved. + #[display("failed to resolve reference `{_0}`")] + Reference(String), + /// No merge-base exists between the base ref and HEAD. + #[display("failed to compute merge-base of `{base}` and HEAD")] + MergeBase { + /// The base reference that was requested. + base: String, + }, + /// A file could not be read. + #[display("failed to read file `{}`", _0.display())] + ReadFile(PathBuf), + /// An explicitly-named path does not exist. + #[display("path not found: `{}`", _0.display())] + PathNotFound(PathBuf), + /// An explicitly-named path could not be read for permission reasons. + #[display("permission denied reading `{}`", _0.display())] + PermissionDenied(PathBuf), + /// Failure writing a warning to stderr (broken pipe, etc.). + /// + /// Used by the in-module [`warn`] helper so collectors can call + /// [`crate::output::write_stderr_line`] and still return + /// `Report` consistently. + #[display("I/O error writing warning to stderr")] + WriteWarning, +} + +impl Error for DomainsLintError {} + +/// In-module warning helper. +/// +/// Wraps the CLI's [`crate::output::write_stderr_line`] (which +/// returns `Report`) so callers inside `domains` can stay +/// on `Report` without inventing custom `?` +/// conversions at every call site. +/// +/// # Errors +/// +/// Returns [`DomainsLintError::WriteWarning`] if writing to stderr +/// fails (e.g., a broken pipe). +fn warn(msg: impl Into) -> Result<(), Report> { + write_stderr_line(msg.into()).change_context(DomainsLintError::WriteWarning) +} + +/// Normalise an extracted URL host: strip bracketed-IPv6 `[ ]`, +/// drop any trailing FQDN dot, and lowercase. Pure function; no I/O. +fn normalise_host(raw: &str) -> String { + let trimmed = raw + .trim_start_matches('[') + .trim_end_matches(']') + .trim_end_matches('.'); + trimmed.to_lowercase() +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn normalise_lowercases() { + assert_eq!(normalise_host("EXAMPLE.COM"), "example.com"); + assert_eq!(normalise_host("Foo.Example.Com"), "foo.example.com"); + } + + #[test] + fn normalise_strips_ipv6_brackets() { + assert_eq!(normalise_host("[::1]"), "::1"); + assert_eq!(normalise_host("[2001:DB8::1]"), "2001:db8::1"); + } + + #[test] + fn normalise_passthrough_for_plain_hosts() { + assert_eq!(normalise_host("test.com"), "test.com"); + assert_eq!(normalise_host("127.0.0.1"), "127.0.0.1"); + } + + #[test] + fn normalise_trims_trailing_fqdn_dot() { + assert_eq!(normalise_host("example.com."), "example.com"); + assert_eq!(normalise_host("Example.Com."), "example.com"); + } +} + +/// Decide whether a normalised host is allowed. +/// +/// Order: per-line suppression set, reserved-TLD suffix, exact match +/// against [`EXACT_HOSTS`] and [`REFERENCE_HOSTS`], then the subdomain +/// rule against [`SUBDOMAIN_HOSTS`]. +fn is_allowed(host: &str, suppressed_on_line: &HashSet) -> bool { + if suppressed_on_line.contains(host) { + return true; + } + if RESERVED_TLDS.iter().any(|t| host.ends_with(t)) { + return true; + } + if EXACT_HOSTS.contains(&host) { + return true; + } + if REFERENCE_HOSTS.contains(&host) { + return true; + } + if SUBDOMAIN_HOSTS + .iter() + .any(|e| host == *e || host.ends_with(&format!(".{e}"))) + { + return true; + } + false +} + +#[cfg(test)] +mod allow_check_tests { + use super::*; + + fn nothing_suppressed() -> HashSet { + HashSet::new() + } + + #[test] + fn exact_match_allows() { + assert!(is_allowed("api.fastly.com", ¬hing_suppressed())); + assert!(is_allowed("127.0.0.1", ¬hing_suppressed())); + } + + #[test] + fn exact_only_rejects_subdomain() { + // EXACT_HOSTS entries are exact-only: a subdomain of an + // exact host is NOT allowed. + assert!(!is_allowed("v2.api.fastly.com", ¬hing_suppressed())); + assert!(!is_allowed( + "anything.api.privacy-center.org", + ¬hing_suppressed() + )); + } + + #[test] + fn subdomain_list_allows_apex_and_subdomains() { + assert!(is_allowed("example.com", ¬hing_suppressed())); + assert!(is_allowed("foo.example.com", ¬hing_suppressed())); + assert!(is_allowed("a.b.example.com", ¬hing_suppressed())); + assert!(is_allowed("example.net", ¬hing_suppressed())); + assert!(is_allowed("assets.example.net", ¬hing_suppressed())); + } + + #[test] + fn lookalike_attack_rejected() { + // example.com.evil.com is not a subdomain of example.com. + assert!(!is_allowed("example.com.evil.com", ¬hing_suppressed())); + assert!(!is_allowed("notexample.com", ¬hing_suppressed())); + } + + #[test] + fn reserved_tld_allows() { + assert!(is_allowed("testlight.example", ¬hing_suppressed())); + assert!(is_allowed("something.test", ¬hing_suppressed())); + assert!(is_allowed("thing.invalid", ¬hing_suppressed())); + assert!(is_allowed("my.localhost", ¬hing_suppressed())); + } + + #[test] + fn reference_hosts_allowed_everywhere() { + assert!(is_allowed("github.com", ¬hing_suppressed())); + assert!(is_allowed("docs.rs", ¬hing_suppressed())); + // But NOT subdomains of REFERENCE_HOSTS (exact-match). + assert!(!is_allowed("other.github.com", ¬hing_suppressed())); + } + + #[test] + fn suppression_set_allows() { + let mut suppressed = HashSet::new(); + suppressed.insert("evil.com".to_string()); + assert!(is_allowed("evil.com", &suppressed)); + } + + #[test] + fn rejects_unrelated_host() { + assert!(!is_allowed("test.com", ¬hing_suppressed())); + assert!(!is_allowed("1.2.3.4", ¬hing_suppressed())); + assert!(!is_allowed("192.168.1.1", ¬hing_suppressed())); + } +} + +/// Regex for absolute `http(s)://` URLs. Case-insensitive; the host +/// must start with an alphanumeric character so placeholders like +/// `https://...` are rejected. +/// +/// `(?:[^/?\s#]+@)?` skips any RFC 3986 `userinfo@` prefix so the +/// captured host is the real authority, not a deceiving `user@` +/// part. Without this, `https://github.com@test.com/path` would +/// extract the allowlisted `github.com` and miss the actual host +/// `test.com`. +fn absolute_url_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new(r"(?i)https?://(?:[^/?\s#]+@)?(\[[0-9a-fA-F:]+\]|[A-Za-z0-9][A-Za-z0-9.\-]*)") + .expect("should compile absolute URL regex") + }) +} + +/// Extract and normalise every host from absolute URLs on `line`. +fn extract_absolute_hosts(line: &str) -> Vec { + absolute_url_regex() + .captures_iter(line) + .filter_map(|c| c.get(1).map(|m| normalise_host(m.as_str()))) + .collect() +} + +#[cfg(test)] +mod absolute_url_tests { + use super::*; + + #[test] + fn extracts_plain() { + assert_eq!( + extract_absolute_hosts("see https://example.com/path here"), + vec!["example.com"] + ); + } + + #[test] + fn extracts_bracketed_ipv6() { + assert_eq!( + extract_absolute_hosts("dial http://[::1]:8080/"), + vec!["::1"] + ); + } + + #[test] + fn extracts_uppercase_normalised() { + assert_eq!( + extract_absolute_hosts("HTTPS://Example.COM/x"), + vec!["example.com"] + ); + } + + #[test] + fn rejects_dots_only_placeholder() { + assert!(extract_absolute_hosts("see https://... for an example").is_empty()); + } + + #[test] + fn handles_punctuation_wrapping() { + for s in [ + "\"https://example.com\",", + "(https://example.com)", + "", + ] { + assert_eq!(extract_absolute_hosts(s), vec!["example.com"], "input: {s}"); + } + } + + #[test] + fn extracts_multiple_per_line() { + assert_eq!( + extract_absolute_hosts("see [a](https://github.com/x) and [b](https://example.com/y)"), + vec!["github.com", "example.com"] + ); + } + + /// Regression for the userinfo-bypass: an allowlisted host placed + /// in the userinfo position must not hide the real authority. + #[test] + fn userinfo_bypass_extracts_real_host() { + assert_eq!( + extract_absolute_hosts("fetch(\"https://github.com@test.com/path\")"), + vec!["test.com"] + ); + // user:password@host form + assert_eq!( + extract_absolute_hosts("fetch(\"https://user:pw@evil.example/path\")"), + vec!["evil.example"] + ); + // Multiple @ in userinfo — last @ is the authority boundary. + assert_eq!( + extract_absolute_hosts("fetch(\"https://a@b@c.evil/path\")"), + vec!["c.evil"] + ); + } + + #[test] + fn no_userinfo_still_works() { + assert_eq!( + extract_absolute_hosts("https://example.com:8080/path"), + vec!["example.com"] + ); + } +} + +/// Regex for protocol-relative `//host/...` URLs. The `//` must be +/// preceded by a boundary character (start-of-line, whitespace, +/// quote, paren, `=`, `<`, `>`, `{`, `,`, `[`, `]`, backtick) — but +/// NOT `:`, which would double-match the `//` in an absolute URL. +/// `(?:[^/?\s#]+@)?` skips any RFC 3986 userinfo so a deceiving +/// `//user@evil.com` pattern reports `evil.com`, not `user`. The +/// host requires a dotted TLD-like suffix to filter out code +/// comment dividers. +fn protocol_relative_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new( + r#"(?i)(?:^|[\s"'(=<>{,\[\]`])//(?:[^/?\s#]+@)?([A-Za-z0-9][A-Za-z0-9.\-]*\.[A-Za-z]{2,})"#, + ) + .expect("should compile protocol-relative URL regex") + }) +} + +/// Extract and normalise every host from protocol-relative URLs. +fn extract_protocol_relative_hosts(line: &str) -> Vec { + protocol_relative_regex() + .captures_iter(line) + .filter_map(|c| c.get(1).map(|m| normalise_host(m.as_str()))) + .collect() +} + +#[cfg(test)] +mod protocol_relative_tests { + use super::*; + + #[test] + fn extracts_after_quote() { + assert_eq!( + extract_protocol_relative_hosts("src=\"//www.googletagmanager.com/gtm.js\""), + vec!["www.googletagmanager.com"] + ); + } + + #[test] + fn extracts_after_start_of_line() { + assert_eq!( + extract_protocol_relative_hosts("//cdn.example.evil/foo"), + vec!["cdn.example.evil"] + ); + } + + #[test] + fn extracts_template_literal_backtick() { + assert_eq!( + extract_protocol_relative_hosts("`//cdn.example.evil/${path}`"), + vec!["cdn.example.evil"] + ); + } + + #[test] + fn extracts_json_object_value() { + assert_eq!( + extract_protocol_relative_hosts("{\"src\": \"//cdn.example.evil/x\"}"), + vec!["cdn.example.evil"] + ); + } + + #[test] + fn does_not_match_colon_prefix() { + // http://foo.com — // is preceded by ':', NOT in the boundary class. + assert!(extract_protocol_relative_hosts("http://foo.com/x").is_empty()); + } + + #[test] + fn does_not_match_code_comment_divider() { + // The trailing TLD-like constraint (.{2,}) filters this out; + // "comment text" has no dotted-suffix. + assert!(extract_protocol_relative_hosts("// comment text").is_empty()); + } + + /// Regression for the userinfo-bypass on protocol-relative URLs. + #[test] + fn userinfo_bypass_extracts_real_host() { + assert_eq!( + extract_protocol_relative_hosts("src=\"//github.com@evil.example/x\""), + vec!["evil.example"] + ); + } + + /// Documents an accepted limitation: a `//email@domain` token in + /// a code comment is indistinguishable from a protocol-relative + /// URL with userinfo, and so is reported. Preserving the + /// userinfo-bypass protection (above) is the higher-priority + /// constraint; users can suppress per-line with + /// `// allow-domain: domain.com` when the email is intentional. + #[test] + fn comment_style_email_is_flagged_by_design() { + assert_eq!( + extract_protocol_relative_hosts("//support@test.com"), + vec!["test.com"] + ); + } +} + +/// Regex for the per-line suppression marker. The comment introducer +/// (`//`, `#`, `|$)", + ) + .expect("should compile suppression marker regex") + }) +} + +/// Result of parsing a line for a suppression marker. +#[derive(Debug, Default, PartialEq, Eq)] +pub struct LineSuppression { + /// Hosts listed in the marker (post-trim, lowercased). + pub suppressed: HashSet, +} + +/// Parse the `allow-domain:` marker on `line`, if present. Splits the +/// captured host list on `,`, trims each entry, lowercases, and +/// drops empties. +fn parse_suppression_marker(line: &str) -> LineSuppression { + let mut out = LineSuppression::default(); + let Some(caps) = suppression_marker_regex().captures(line) else { + return out; + }; + let Some(m) = caps.get(1) else { + return out; + }; + for host in m.as_str().split(',') { + let host = host.trim(); + if !host.is_empty() { + out.suppressed.insert(normalise_host(host)); + } + } + out +} + +#[cfg(test)] +mod suppression_tests { + use super::*; + + fn parse(line: &str) -> HashSet { + parse_suppression_marker(line).suppressed + } + + #[test] + fn single_host_after_slash_comment() { + let got = parse("let x = \"https://evil.com\"; // allow-domain: evil.com"); + let expected: HashSet = ["evil.com".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } + + #[test] + fn html_comment_form_with_trailing_space() { + // Captured group includes trailing space before --> ; trim handles it. + let got = parse(""); + let expected: HashSet = ["test.com".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } + + #[test] + fn hash_comment_form() { + let got = parse("upstream = \"https://evil.com\" # allow-domain: evil.com"); + let expected: HashSet = ["evil.com".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } + + #[test] + fn multi_host_with_whitespace() { + let got = parse("// allow-domain: a.com , b.com , c.com"); + let expected: HashSet = ["a.com", "b.com", "c.com"] + .iter() + .map(ToString::to_string) + .collect(); + assert_eq!(got, expected); + } + + #[test] + fn bypass_attempt_url_path_lookalike_not_suppressed() { + // 'allow-domain' inside a URL path is NOT a comment. + let got = parse("fetch(\"https://evil.com/allow-domain\")"); + assert!( + got.is_empty(), + "URL-path content must not suppress: {got:?}" + ); + } + + #[test] + fn bypass_attempt_pathological_host_named_allow_domain() { + // https://allow-domain:8080/path — the // is preceded by ':', + // not whitespace/SOL, so the marker anchor fails. + let got = parse("let x = \"https://allow-domain:8080/path\";"); + assert!( + got.is_empty(), + "pathological host must not suppress: {got:?}" + ); + } + + #[test] + fn bracketed_ipv6_marker_matches_extracted_host() { + // Extracted IPv6 hosts have their brackets stripped by + // `normalise_host`; the marker must apply the same + // normalisation so the entries match. + let got = parse("fetch(\"https://[2001:db8::1]/x\") // allow-domain: [2001:db8::1]"); + let expected: HashSet = ["2001:db8::1".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } +} + +/// One reported violation on a scanned line. +#[derive(Debug, PartialEq, Eq)] +pub struct LineViolation { + /// The disallowed host. + pub host: String, +} + +/// Result of scanning one source line. +#[derive(Debug, Default, PartialEq, Eq)] +pub struct LineScanOutcome { + /// Disallowed hosts found on the line (after suppression). + pub violations: Vec, + /// Hosts the line's `allow-domain:` marker listed but that would + /// not have been a violation anyway. The caller emits these as a + /// stderr warning. + pub unused_suppressions: Vec, +} + +/// Scan one source line; return violations and any unused +/// suppression-marker entries. +/// +/// Composes [`parse_suppression_marker`], [`extract_absolute_hosts`], +/// [`extract_protocol_relative_hosts`], and [`is_allowed`]. +pub fn scan_line(line: &str) -> LineScanOutcome { + let suppression = parse_suppression_marker(line); + let mut hosts = extract_absolute_hosts(line); + hosts.extend(extract_protocol_relative_hosts(line)); + + // Deduplicate hosts while preserving first-occurrence order. An + // `href` and a visible URL on the same line for the same host + // should not be reported twice. + { + let mut seen: HashSet = HashSet::new(); + hosts.retain(|h| seen.insert(h.clone())); + } + + // Hosts that WOULD be flagged WITHOUT any suppression. A marker + // entry that does not match one of these is "unused" — it + // suppresses nothing and warrants a warning. + let empty_suppression: HashSet = HashSet::new(); + let disallowed_without_suppression: HashSet<&String> = hosts + .iter() + .filter(|h| !is_allowed(h, &empty_suppression)) + .collect(); + + let mut unused: Vec = suppression + .suppressed + .iter() + .filter(|listed| { + !disallowed_without_suppression + .iter() + .any(|h| h.as_str() == listed.as_str()) + }) + .cloned() + .collect(); + unused.sort(); + + let violations = hosts + .into_iter() + .filter(|h| !is_allowed(h, &suppression.suppressed)) + .map(|host| LineViolation { host }) + .collect(); + + LineScanOutcome { + violations, + unused_suppressions: unused, + } +} + +#[cfg(test)] +mod scan_line_tests { + use super::*; + + fn hosts(line: &str) -> Vec { + scan_line(line) + .violations + .into_iter() + .map(|v| v.host) + .collect() + } + + #[test] + fn allowed_passes_clean() { + for line in [ + "see https://example.com", + "see https://foo.example.com", + "see https://api.privacy-center.org", + "dial http://127.0.0.1:8080/", + "see https://github.com/x/y", + "see https://testlight.example", + "//www.googletagmanager.com/gtm.js", + ] { + assert!(hosts(line).is_empty(), "should be clean: {line}"); + } + } + + #[test] + fn disallowed_reports() { + assert_eq!(hosts("see https://test.com"), vec!["test.com"]); + assert_eq!(hosts("see https://partner.com"), vec!["partner.com"]); + } + + #[test] + fn suppression_with_correct_host_passes() { + let out = scan_line("https://evil.com // allow-domain: evil.com"); + assert!(out.violations.is_empty()); + assert!(out.unused_suppressions.is_empty()); + } + + #[test] + fn suppression_with_wrong_host_still_reports_and_warns() { + let out = scan_line("https://evil.com // allow-domain: other.com"); + assert_eq!( + out.violations + .into_iter() + .map(|v| v.host) + .collect::>(), + vec!["evil.com"] + ); + assert_eq!( + out.unused_suppressions, + vec!["other.com"], + "other.com was listed but never appeared on the line" + ); + } + + #[test] + fn multi_host_suppression_applied_to_violations() { + let out = scan_line( + "x = \"https://evil.com\"; y = \"https://bad.org\"; \ + // allow-domain: evil.com, bad.org", + ); + assert!( + out.violations.is_empty(), + "both hosts should be suppressed: {out:?}" + ); + assert!(out.unused_suppressions.is_empty()); + } + + #[test] + fn multi_host_suppression_partial_match_warns_for_unused() { + let out = scan_line("\"https://evil.com\" // allow-domain: evil.com, ghost.com"); + assert!(out.violations.is_empty(), "evil.com should be suppressed"); + assert_eq!(out.unused_suppressions, vec!["ghost.com"]); + } + + #[test] + fn jsdoc_star_suppression_form() { + let out = scan_line(" * fetch(\"https://evil.com\") * allow-domain: evil.com"); + assert!( + out.violations.is_empty(), + "jsdoc-style suppression should apply: {out:?}" + ); + } + + #[test] + fn multiple_disallowed_on_one_line() { + let got = hosts("xy"); + assert_eq!(got, vec!["test.com", "partner.com"]); + } + + #[test] + fn duplicate_host_on_one_line_reported_once() { + // An `href` plus the visible URL on the same line — the host + // appears twice but is one logical violation. + let got = hosts("https://test.com"); + assert_eq!(got, vec!["test.com"]); + } + + #[test] + fn bypass_attempt_reports() { + // fetch("https://evil.com/allow-domain") — substring inside URL, + // not a comment, so suppression does NOT apply. + assert_eq!( + hosts("fetch(\"https://evil.com/allow-domain\")"), + vec!["evil.com"] + ); + } + + #[test] + fn unused_warning_only_when_marker_present() { + let out = scan_line("see https://example.com"); + assert!(out.unused_suppressions.is_empty()); + } + + #[test] + fn unused_warning_fires_for_already_allowed_listed_host() { + // example.com is extracted but already allowed → would never + // have been a violation → the marker entry was unnecessary. + let out = scan_line("see https://example.com // allow-domain: example.com"); + assert!(out.violations.is_empty(), "example.com is already allowed"); + assert_eq!( + out.unused_suppressions, + vec!["example.com"], + "marker listed an already-allowed host; it suppresses nothing" + ); + } +} + +// === Diff and path collectors (Phase 4) === + +/// One added line collected from a diff or file scan. +#[derive(Debug)] +pub(crate) struct DiffLine { + /// Path for display and reporting. Built via + /// `String::from_utf8_lossy` for non-UTF-8 sources. + pub path: PathBuf, + /// 1-based line number within the new-side file. + pub line_no: usize, + /// The line's text content. + pub content: String, +} + +/// File extensions whose contents are scanned. See spec +/// §"File extensions scanned". +const SCANNED_EXTENSIONS: &[&str] = &[ + "rs", "ts", "tsx", "js", "mjs", "cjs", "toml", "yml", "yaml", "json", "md", "css", "html", +]; + +/// Lockfile basenames excluded by exact match. See spec +/// §"Always excluded (paths)". +const EXCLUDED_LOCKFILES: &[&str] = &[ + "Cargo.lock", + "package-lock.json", + "pnpm-lock.yaml", + "pnpm-lock.json", + "yarn.lock", + "npm-shrinkwrap.json", +]; + +/// Path components that exclude any path containing them. +const EXCLUDED_DIR_COMPONENTS: &[&str] = &["node_modules", "target", "dist", ".git", ".worktrees"]; + +/// The linter's own source file — excluded so its allowlist +/// constants and doc comments cannot self-flag. +const SELF_PATH: &str = "crates/trusted-server-cli/src/dev/lint/domains.rs"; + +/// Whether a path should be scanned. Accepts either a repo-relative +/// path (with `/` separators) or an absolute path; the +/// `Path::ends_with` self-exclusion is component-aware so an +/// explicit-mode invocation like `ts dev lint domains +/// /abs/.../crates/trusted-server-cli/src/dev/lint/domains.rs` still +/// skips the linter's own source file. See spec §"File extensions +/// scanned" and §"Always excluded (paths)". +fn path_is_scanned(rel_path: &str) -> bool { + // Self-exclude. `Path::ends_with` matches whole path components, + // so the suffix can be an absolute path or a repo-relative path + // without false positives (e.g., `barcrates/.../domains.rs`). + if Path::new(rel_path).ends_with(SELF_PATH) { + return false; + } + // Excluded directory components (whole-segment match). + let components: Vec<&str> = rel_path.split('/').collect(); + if components + .iter() + .any(|c| EXCLUDED_DIR_COMPONENTS.contains(c)) + { + return false; + } + // `.claude/worktrees/` — two-segment exclusion. + if components.windows(2).any(|w| w == [".claude", "worktrees"]) { + return false; + } + // Publisher-capture HTML fixtures: the narrow + // trusted-server-core/src/integrations/**/fixtures/** path. + if rel_path.contains("crates/trusted-server-core/src/integrations/") + && rel_path.contains("/fixtures/") + { + return false; + } + + let basename = components.last().copied().unwrap_or(""); + // Excluded lockfiles (exact basename). + if EXCLUDED_LOCKFILES.contains(&basename) { + return false; + } + // Dockerfile and Dockerfile.* are scanned (no extension). + if basename == "Dockerfile" || basename.starts_with("Dockerfile.") { + return true; + } + // `.env*` files are scanned. + if basename.starts_with(".env") { + return true; + } + // Otherwise scan by extension. + match basename.rsplit_once('.') { + Some((stem, ext)) if !stem.is_empty() => SCANNED_EXTENSIONS.contains(&ext), + _ => false, + } +} + +/// Read a blob's bytes from the object database. +fn read_blob(repo: &gix::Repository, id: ObjectId) -> Result, Report> { + let obj = repo + .find_object(id) + .change_context(DomainsLintError::Diff)?; + Ok(obj.data.clone()) +} + +/// Compute the new-side added lines between two blob contents. +/// +/// Returns `(1-based line number, content)` for every inserted line. +fn added_lines(old: Option<&[u8]>, new: &[u8]) -> Vec<(usize, String)> { + let old_text = old + .map(|b| String::from_utf8_lossy(b).into_owned()) + .unwrap_or_default(); + let new_text = String::from_utf8_lossy(new).into_owned(); + + let input = InternedInput::new(old_text.as_str(), new_text.as_str()); + let diff = Diff::compute(Algorithm::Myers, &input); + + let new_lines: Vec<&str> = new_text.lines().collect(); + let mut out = Vec::new(); + for hunk in diff.hunks() { + for token_idx in hunk.after.clone() { + let content = new_lines + .get(token_idx as usize) + .copied() + .unwrap_or("") + .to_string(); + out.push((token_idx as usize + 1, content)); + } + } + out +} + +/// Convert a raw byte path to a display `PathBuf`, lossy-decoding +/// non-UTF-8 bytes. Returns `(path, was_lossy)`. +fn bytes_to_pathbuf(raw: &[u8]) -> (PathBuf, bool) { + match from_utf8(raw) { + Ok(s) => (PathBuf::from(s), false), + Err(_) => { + let lossy = String::from_utf8_lossy(raw).into_owned(); + (PathBuf::from(&lossy), true) + } + } +} + +/// Collect added lines staged in the index relative to the HEAD tree. +/// +/// # Errors +/// +/// Returns [`DomainsLintError`] if the repository, its index, or a +/// blob cannot be read. +pub(crate) fn staged_added_lines( + repo_path: &Path, +) -> Result, Report> { + let repo = gix::open(repo_path).change_context(DomainsLintError::OpenRepo)?; + + // HEAD tree — or the empty tree on an unborn HEAD (fresh repo + // with no commits), in which case every staged file is genuinely + // added. + let head_tree = match repo.head_commit() { + Ok(commit) => { + let tree_id = commit + .tree_id() + .change_context(DomainsLintError::OpenRepo)? + .detach(); + repo.find_tree(tree_id) + .change_context(DomainsLintError::OpenRepo)? + } + Err(_) => repo.empty_tree(), + }; + + // Materialise the index as a tree so we can use the same tree-vs- + // tree diff machinery (with rename detection) as changed-vs mode. + let index_tree_id = write_index_to_tree(&repo)?; + let index_tree = repo + .find_tree(index_tree_id) + .change_context(DomainsLintError::Index)?; + + collect_added_from_trees(&repo, &head_tree, &index_tree) +} + +/// Build an in-memory tree object from the current index and write it +/// to the object database. The returned `ObjectId` can be loaded as a +/// `gix::Tree` for tree-vs-tree diffing. +fn write_index_to_tree(repo: &gix::Repository) -> Result> { + let index = repo.index().change_context(DomainsLintError::Index)?; + let empty_tree_id = repo.empty_tree().id; + let mut editor = repo + .edit_tree(empty_tree_id) + .change_context(DomainsLintError::Index)?; + for entry in index.entries() { + if !entry.mode.contains(IndexEntryMode::FILE) { + continue; + } + let path = entry.path(&index); + editor + .upsert(path, EntryKind::Blob, entry.id) + .change_context(DomainsLintError::Index)?; + } + Ok(editor + .write() + .change_context(DomainsLintError::Index)? + .detach()) +} + +/// Diff `old_tree` against `new_tree` with rename tracking and return +/// the added new-side lines for every Addition / Modification / +/// Rename (true renames diff old-blob vs new-blob; pure renames thus +/// add nothing). Copies and Deletions are skipped. +/// +/// Shared by [`staged_added_lines`] (HEAD-tree vs index-tree) and +/// [`changed_vs_added_lines`] (merge-base tree vs HEAD tree). Both +/// modes report non-UTF-8 paths lossily with a stderr warning +/// (full-repo mode skips them — see [`full_repo_lines`]). +fn collect_added_from_trees( + repo: &gix::Repository, + old_tree: &gix::Tree<'_>, + new_tree: &gix::Tree<'_>, +) -> Result, Report> { + // The gix tree-diff callback returns `Result` where + // `E: Into>`. `Report` + // does not satisfy that bound directly, so we capture the first + // failure in a `RefCell` and break out of the traversal. + let out: RefCell> = RefCell::new(Vec::new()); + let deferred: RefCell>> = RefCell::new(None); + + let mut platform = old_tree.changes().change_context(DomainsLintError::Diff)?; + platform.options(|opts| { + opts.track_rewrites(Some(Rewrites { + copies: None, + percentage: Some(0.5), + limit: 1000, + track_empty: false, + })); + }); + + let traverse = platform.for_each_to_obtain_tree::( + new_tree, + |change: Change<'_, '_, '_>| { + let (raw_path, old_id, new_id) = match change { + Change::Addition { location, id, .. } => (location, None, id.detach()), + Change::Modification { + location, + previous_id, + id, + .. + } => (location, Some(previous_id.detach()), id.detach()), + Change::Rewrite { + location, + source_id, + id, + copy: false, + .. + } => (location, Some(source_id.detach()), id.detach()), + Change::Rewrite { copy: true, .. } | Change::Deletion { .. } => { + return Ok(ControlFlow::Continue(())); + } + }; + + let raw_bytes: &[u8] = raw_path.as_ref(); + let (path, was_lossy) = bytes_to_pathbuf(raw_bytes); + let path_str = path.to_string_lossy(); + if !path_is_scanned(&path_str) { + return Ok(ControlFlow::Continue(())); + } + if was_lossy + && let Err(e) = warn(format!( + "warning: path is not valid UTF-8; displaying lossy: {}", + path.display() + )) + { + *deferred.borrow_mut() = Some(e); + return Ok(ControlFlow::Break(())); + } + + let old_bytes = match old_id.map(|id| read_blob(repo, id)).transpose() { + Ok(b) => b, + Err(e) => { + *deferred.borrow_mut() = Some(e); + return Ok(ControlFlow::Break(())); + } + }; + let new_bytes = match read_blob(repo, new_id) { + Ok(b) => b, + Err(e) => { + *deferred.borrow_mut() = Some(e); + return Ok(ControlFlow::Break(())); + } + }; + + let mut out_mut = out.borrow_mut(); + for (line_no, content) in added_lines(old_bytes.as_deref(), &new_bytes) { + out_mut.push(DiffLine { + path: path.clone(), + line_no, + content, + }); + } + Ok(ControlFlow::Continue(())) + }, + ); + traverse.change_context(DomainsLintError::Diff)?; + if let Some(e) = deferred.into_inner() { + return Err(e); + } + Ok(out.into_inner()) +} + +/// Resolve a base reference to an object id, trying four candidate +/// forms in order: the name as given, then `refs/heads/`, +/// `refs/remotes/origin/`, and `refs/tags/`. +/// +/// # Errors +/// +/// Returns [`DomainsLintError::Reference`] if no candidate resolves. +fn resolve_base_ref( + repo: &gix::Repository, + reference: &str, +) -> Result> { + let candidates = [ + reference.to_string(), + format!("refs/heads/{reference}"), + format!("refs/remotes/origin/{reference}"), + format!("refs/tags/{reference}"), + ]; + for candidate in &candidates { + if let Ok(mut r) = repo.find_reference(candidate.as_str()) + && let Ok(id) = r.peel_to_id() + { + return Ok(id.detach()); + } + } + Err(Report::new(DomainsLintError::Reference( + reference.to_string(), + ))) +} + +/// Collect added lines on `HEAD` relative to the merge-base of +/// `reference` and `HEAD` — the CI/PR scan mode. +/// +/// # Errors +/// +/// Returns [`DomainsLintError`] if the repository cannot be opened, +/// the base ref does not resolve, no merge-base exists, or a tree or +/// blob cannot be read. +pub(crate) fn changed_vs_added_lines( + repo_path: &Path, + reference: &str, +) -> Result, Report> { + let repo = gix::open(repo_path).change_context(DomainsLintError::OpenRepo)?; + let head_id = repo + .head_id() + .change_context(DomainsLintError::OpenRepo)? + .detach(); + let base_id = resolve_base_ref(&repo, reference)?; + let merge_base = repo + .merge_base(base_id, head_id) + .change_context_lazy(|| DomainsLintError::MergeBase { + base: reference.to_string(), + })? + .detach(); + + let base_tree = commit_tree(&repo, merge_base)?; + let head_tree = commit_tree(&repo, head_id)?; + collect_added_from_trees(&repo, &base_tree, &head_tree) +} + +/// Resolve a commit id to its tree object. +fn commit_tree( + repo: &gix::Repository, + commit_id: ObjectId, +) -> Result, Report> { + let tree_id = repo + .find_commit(commit_id) + .change_context(DomainsLintError::Diff)? + .tree_id() + .change_context(DomainsLintError::Diff)? + .detach(); + repo.find_tree(tree_id) + .change_context(DomainsLintError::Diff) +} + +#[cfg(test)] +mod staged_added_lines_tests { + use super::*; + use crate::dev::lint::test_support; + + #[test] + fn reports_added_line_with_new_side_line_number() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write(temp.path().join("a.rs"), "alpha\nbeta\ngamma\n") + .expect("should write initial file"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + fs::write(temp.path().join("a.rs"), "alpha\nNEW LINE\nbeta\ngamma\n") + .expect("should write modification"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + let added: Vec<_> = lines + .iter() + .map(|l| { + ( + l.path.to_string_lossy().into_owned(), + l.line_no, + l.content.clone(), + ) + }) + .collect(); + + assert_eq!(added, vec![("a.rs".to_string(), 2, "NEW LINE".to_string())]); + } + + /// Regression for the rename bug: a pure rename (same blob OID, + /// new path) must NOT report every line of the file as added. The + /// previous map-walk implementation hit (None, Some(new)) for the + /// renamed path and diffed against an empty blob. + #[test] + fn pure_rename_yields_no_added_lines() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write( + temp.path().join("old.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write old file"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + fs::remove_file(temp.path().join("old.rs")).expect("should remove old"); + fs::write( + temp.path().join("new.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write new file"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + assert!( + lines.is_empty(), + "pure rename should add no lines, got: {lines:?}" + ); + } + + /// A rename + edit reports ONLY the truly added lines, not every + /// line of the renamed file. Relies on gix similarity-based rename + /// detection pairing `old.rs` ↔ `new.rs`. + #[test] + fn rename_with_edit_reports_only_added_lines() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write( + temp.path().join("old.rs"), + "fn shared() {}\nfn also_shared() {}\nfn third() {}\n", + ) + .expect("should write old file"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + fs::remove_file(temp.path().join("old.rs")).expect("should remove old"); + fs::write( + temp.path().join("new.rs"), + "fn shared() {}\nfn also_shared() {}\nfn third() {}\nlet added = 1;\n", + ) + .expect("should write new file"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!( + texts, + vec!["let added = 1;"], + "rename+edit should report only the new line, got: {lines:?}" + ); + } + + /// A deletion adds no lines. + #[test] + fn deletion_yields_no_added_lines() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write( + temp.path().join("doomed.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write file"); + fs::write(temp.path().join("keep.rs"), "let ok = 1;\n").expect("should write keep"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + fs::remove_file(temp.path().join("doomed.rs")).expect("should remove doomed"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + assert!( + lines.is_empty(), + "deletion should add no lines, got: {lines:?}" + ); + } + + /// Existing committed violations must NOT be reported as staged — + /// only the lines added in this staging round count. + #[test] + fn existing_committed_violation_with_unrelated_change_is_not_reported() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write( + temp.path().join("legacy.rs"), + "let pre_existing = \"https://test.com\";\n", + ) + .expect("should write legacy file"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "commit pre-existing violation"); + + // Stage an unrelated, clean change in a different file. The + // pre-existing violation in legacy.rs must not appear in the + // staged diff. + fs::write(temp.path().join("new.rs"), "let ok = 1;\n").expect("should write new"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!( + texts, + vec!["let ok = 1;"], + "only the newly staged line should appear, got: {lines:?}" + ); + } + + /// Multiple non-contiguous added regions in the same file must all + /// be reported with correct new-side line numbers. + #[test] + fn multi_hunk_same_file_reports_each_added_line() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write( + temp.path().join("a.rs"), + "alpha\nbeta\ngamma\ndelta\nepsilon\n", + ) + .expect("should write initial"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + // Insertion between alpha+beta (line 2) AND between delta+epsilon + // (line 6 after the first insertion). Two non-adjacent hunks. + fs::write( + temp.path().join("a.rs"), + "alpha\nNEW_EARLY\nbeta\ngamma\ndelta\nNEW_LATE\nepsilon\n", + ) + .expect("should write multi-hunk modification"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + let pairs: Vec<_> = lines + .iter() + .map(|l| (l.line_no, l.content.clone())) + .collect(); + assert_eq!( + pairs, + vec![(2, "NEW_EARLY".to_string()), (6, "NEW_LATE".to_string())], + "should report both hunks with correct new-side line numbers, got: {lines:?}" + ); + } + + /// Spec test case 25: staged scan must NOT skip non-UTF-8 paths. + /// + /// Gated to Linux: macOS (APFS/HFS+) rejects non-UTF-8 byte + /// sequences in filenames with `EILSEQ`, so the scenario cannot + /// be constructed there. Linux ext4/CI runners permit it. + #[cfg(target_os = "linux")] + #[test] + fn reports_non_utf8_staged_path_lossy() { + use std::os::unix::ffi::OsStrExt; + + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + + fs::write(temp.path().join("readme.txt"), "hi\n").expect("should write readme"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + let non_utf8_name = + std::ffi::OsStr::from_bytes(&[0x66, 0x6f, 0xff, 0x6f, 0x2e, 0x72, 0x73]); + let bad_file = temp.path().join(non_utf8_name); + fs::write(&bad_file, "let x = \"https://test.com\";\n") + .expect("should write non-utf8-named file"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()) + .expect("should collect staged lines even with non-UTF-8 path"); + assert!( + !lines.is_empty(), + "non-UTF-8 staged paths must be reported, not skipped" + ); + assert!( + lines.iter().any(|l| l.content.contains("https://test.com")), + "must surface the URL for scanning: {lines:?}" + ); + } +} + +#[cfg(test)] +mod changed_vs_tests { + use super::*; + use crate::dev::lint::test_support; + + /// Build a two-branch fixture: `main` with a base commit, then a + /// `feature` branch that adds a line containing a disallowed URL. + /// Returns the tempdir (kept alive by the caller). + fn two_branch_fixture() -> tempfile::TempDir { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + + fs::write(temp.path().join("a.rs"), "let ok = 1;\n").expect("should write base file"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "base"); + + test_support::create_and_checkout_branch(&repo, "feature"); + fs::write( + temp.path().join("a.rs"), + "let ok = 1;\nlet bad = \"https://test.com\";\n", + ) + .expect("should write feature change"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "feature change"); + + temp + } + + #[test] + fn reports_lines_added_by_feature_branch() { + let temp = two_branch_fixture(); + let lines = changed_vs_added_lines(temp.path(), "main") + .expect("should compute changed-vs added lines"); + let added: Vec<_> = lines + .iter() + .map(|l| (l.line_no, l.content.clone())) + .collect(); + assert_eq!( + added, + vec![(2, "let bad = \"https://test.com\";".to_string())], + "should report only the line the feature branch added" + ); + } + + #[test] + fn resolves_via_remote_tracking_ref_fallback() { + let temp = two_branch_fixture(); + let repo = gix::open(temp.path()).expect("should open repo"); + + // Move refs/heads/main → refs/remotes/origin/main so the + // bare name "main" only resolves via the fallback chain. + let main_id = repo + .find_reference("refs/heads/main") + .expect("refs/heads/main should exist") + .peel_to_id() + .expect("should peel main") + .detach(); + repo.reference( + "refs/remotes/origin/main", + main_id, + gix::refs::transaction::PreviousValue::Any, + "seed remote-tracking ref", + ) + .expect("should create remote-tracking ref"); + + use gix::refs::transaction::{Change, RefEdit, RefLog}; + let delete = RefEdit { + change: Change::Delete { + expected: gix::refs::transaction::PreviousValue::Any, + log: RefLog::AndReference, + }, + name: "refs/heads/main".try_into().expect("valid ref name"), + deref: false, + }; + repo.edit_reference(delete) + .expect("should delete refs/heads/main"); + + // resolve_base_ref must now fall through to + // refs/remotes/origin/main. + let lines = changed_vs_added_lines(temp.path(), "main") + .expect("should resolve via remote-tracking fallback"); + assert_eq!( + lines.len(), + 1, + "fallback resolution should still find the feature change" + ); + assert!(lines[0].content.contains("https://test.com")); + } +} + +/// Emit a "skipping" warning for a path that is being excluded from +/// a full-repo scan. +fn warn_skip(path: &Path, reason: &str) -> Result<(), Report> { + warn(format!("note: skipping {}: {reason}", path.display())) +} + +/// Like [`warn_skip`] but for a raw byte path that is not valid UTF-8. +fn warn_skip_bytes(bytes: &[u8], reason: &str) -> Result<(), Report> { + warn(format!( + "note: skipping {}: {reason}", + String::from_utf8_lossy(bytes) + )) +} + +/// Scan every line of every tracked file in the working tree — +/// the full-repo audit mode. +/// +/// Reads working-tree content (not committed blobs), so it reports +/// the current local state including unstaged edits. Tracked files +/// that are missing, symlinks, non-regular, non-UTF-8-named, or +/// binary are skipped with a stderr warning. +/// +/// # Errors +/// +/// Returns [`DomainsLintError`] if the repository or its index +/// cannot be opened, the repository has no work directory, or a +/// scanned file fails to read for a reason other than binary +/// content. +pub(crate) fn full_repo_lines(repo_path: &Path) -> Result, Report> { + let repo = gix::open(repo_path).change_context(DomainsLintError::OpenRepo)?; + let work_dir = repo + .workdir() + .ok_or_else(|| Report::new(DomainsLintError::OpenRepo))? + .to_path_buf(); + let index = repo.index().change_context(DomainsLintError::Index)?; + + let mut out = Vec::new(); + for entry in index.entries() { + let raw = entry.path(&index); + + // Case 4: non-UTF-8 path — skip (full-repo mode does not + // lossy-report; that is staged/changed-vs behavior). + let Ok(rel_str) = from_utf8(raw) else { + warn_skip_bytes(raw, "non-UTF-8 path")?; + continue; + }; + if !path_is_scanned(rel_str) { + continue; + } + + let path = work_dir.join(rel_str); + // Case 1: tracked but missing from the working tree. + let meta = match fs::symlink_metadata(&path) { + Ok(m) => m, + Err(e) if e.kind() == ErrorKind::NotFound => { + warn_skip(&path, "tracked but missing from working tree")?; + continue; + } + Err(e) => { + warn_skip(&path, &format!("metadata error: {e}"))?; + continue; + } + }; + // Case 2: symlink — not followed. + if meta.file_type().is_symlink() { + warn_skip(&path, "symlink not followed")?; + continue; + } + // Case 3: non-regular file (FIFO, socket, device). + if !meta.file_type().is_file() { + warn_skip(&path, "non-regular file")?; + continue; + } + // Case 5: binary content. + let content = match fs::read_to_string(&path) { + Ok(c) => c, + Err(e) if e.kind() == ErrorKind::InvalidData => { + warn_skip(&path, "binary content")?; + continue; + } + Err(e) => { + return Err( + Report::new(DomainsLintError::ReadFile(path.clone())).attach(e.to_string()) + ); + } + }; + + for (i, line) in content.lines().enumerate() { + out.push(DiffLine { + path: PathBuf::from(rel_str), + line_no: i + 1, + content: line.to_string(), + }); + } + } + Ok(out) +} + +#[cfg(test)] +mod full_repo_tests { + use super::*; + use crate::dev::lint::test_support; + + /// A clean tracked file is scanned line-by-line. + #[test] + fn scans_tracked_file_lines() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write(temp.path().join("a.rs"), "one\ntwo\nthree\n").expect("should write file"); + test_support::stage_all(&repo); + + let lines = full_repo_lines(temp.path()).expect("should scan repo"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!(texts, vec!["one", "two", "three"]); + } + + /// Case 1: a tracked file removed from the working tree is + /// skipped, not a hard error. + #[test] + fn skips_tracked_but_missing_file() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write(temp.path().join("a.rs"), "kept\n").expect("should write a"); + fs::write(temp.path().join("gone.rs"), "removed\n").expect("should write gone"); + test_support::stage_all(&repo); + fs::remove_file(temp.path().join("gone.rs")).expect("should remove gone"); + + let lines = full_repo_lines(temp.path()).expect("should scan repo despite missing file"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!( + texts, + vec!["kept"], + "missing file is skipped, kept file scanned" + ); + } + + /// Case 2: a tracked path that became a symlink is skipped. + #[cfg(unix)] + #[test] + fn skips_symlink() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write(temp.path().join("real.rs"), "real\n").expect("should write real"); + fs::write(temp.path().join("link.rs"), "placeholder\n").expect("should write placeholder"); + test_support::stage_all(&repo); + + // Replace link.rs on disk with a symlink; the index entry + // stays a regular file. + fs::remove_file(temp.path().join("link.rs")).expect("should remove placeholder"); + std::os::unix::fs::symlink("real.rs", temp.path().join("link.rs")) + .expect("should create symlink"); + + let lines = full_repo_lines(temp.path()).expect("should scan repo"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!(texts, vec!["real"], "symlink is skipped, real file scanned"); + } + + /// Case 5: a binary file is skipped, not a hard error. + #[test] + fn skips_binary_file() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + fs::write(temp.path().join("text.rs"), "hello\n").expect("should write text"); + // 0xff 0xfe is not a valid UTF-8 sequence — read_to_string + // rejects it with ErrorKind::InvalidData. (A NUL byte would + // NOT work: NUL is valid UTF-8.) + fs::write(temp.path().join("data.json"), b"{\"x\":\xff\xfe}").expect("should write binary"); + test_support::stage_all(&repo); + + let lines = full_repo_lines(temp.path()).expect("should scan repo despite binary file"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!( + texts, + vec!["hello"], + "binary file is skipped, text file scanned" + ); + } +} + +#[cfg(test)] +mod path_is_scanned_tests { + use super::*; + + #[test] + fn scanned_paths() { + for p in [ + "foo.rs", + "foo.html", + "foo.css", + "Dockerfile", + "Dockerfile.prod", + "crates/trusted-server-core/src/html_processor.test.html", + "crates/js/lib/src/core/templates/iframe.html", + ".env.dev", + "crates/integration-tests/fixtures/frameworks/nextjs/app/page.tsx", + "crates/integration-tests/fixtures/frameworks/nextjs/Dockerfile", + "crates/integration-tests/fixtures/frameworks/wordpress/Dockerfile", + "README.md", + "CHANGELOG.md", + "CONTRIBUTING.md", + "docs/guide/onboarding.md", + "docs/superpowers/specs/2026-05-18-check-domains-design.md", + ] { + assert!(path_is_scanned(p), "should be scanned: {p}"); + } + } + + #[test] + fn not_scanned_paths() { + for p in [ + "crates/trusted-server-core/src/integrations/nextjs/fixtures/inlined-data-escaped.html", + "crates/trusted-server-core/src/integrations/google_tag_manager/fixtures/captured.html", + "node_modules/foo.js", + ".worktrees/x/y.rs", + ".claude/worktrees/x/y.rs", + "package-lock.json", + "pnpm-lock.yaml", + "Cargo.lock", + "crates/trusted-server-cli/src/dev/lint/domains.rs", + "foo.markdown", + "foo.MD", + "target/debug/build.rs", + "image.png", + ] { + assert!(!path_is_scanned(p), "should NOT be scanned: {p}"); + } + } + + /// An explicit absolute path pointing at the linter's own source + /// must still self-exclude — the bare-string check in the old + /// implementation only matched the repo-relative spelling. + #[test] + fn self_excludes_via_absolute_path_suffix() { + assert!(!path_is_scanned( + "/Users/anyone/checkout/crates/trusted-server-cli/src/dev/lint/domains.rs" + )); + // False-positive guard: a path that merely contains the + // suffix as a substring (no component boundary) must still + // be scanned. + assert!(path_is_scanned( + "crates/notrusted-server-cli/src/dev/lint/domains.rs" + )); + } +} + +/// Scan explicitly-named paths in full. +/// +/// Policy filters (extension/path exclusion, symlink, non-regular, +/// binary content) warn and skip. Access failures on a user-named +/// path are hard errors: a missing path or a permission failure +/// almost always means a typo or a real environment problem the +/// user should know about. +/// +/// # Errors +/// +/// Returns [`DomainsLintError::PathNotFound`] / +/// [`DomainsLintError::PermissionDenied`] / +/// [`DomainsLintError::ReadFile`] if a named path cannot be accessed. +pub(crate) fn explicit_path_lines( + paths: &[PathBuf], +) -> Result, Report> { + let mut out = Vec::new(); + for path in paths { + let path_str = path.to_string_lossy(); + if !path_is_scanned(&path_str) { + warn(format!( + "note: {} is not in scanned extensions or is excluded; skipping", + path.display() + ))?; + continue; + } + + let meta = match fs::symlink_metadata(path) { + Ok(m) => m, + Err(e) => return Err(io_error_to_report(&e, path)), + }; + if meta.file_type().is_symlink() { + warn_skip(path, "symlink not followed")?; + continue; + } + if !meta.file_type().is_file() { + warn_skip(path, "non-regular file")?; + continue; + } + + let content = match fs::read_to_string(path) { + Ok(c) => c, + Err(e) if e.kind() == ErrorKind::InvalidData => { + warn_skip(path, "binary content")?; + continue; + } + Err(e) => return Err(io_error_to_report(&e, path)), + }; + + for (i, line) in content.lines().enumerate() { + out.push(DiffLine { + path: path.clone(), + line_no: i + 1, + content: line.to_string(), + }); + } + } + Ok(out) +} + +/// Map an [`io::Error`] on a user-named path to the matching +/// [`DomainsLintError`] variant. +fn io_error_to_report(err: &io::Error, path: &Path) -> Report { + match err.kind() { + ErrorKind::NotFound => Report::new(DomainsLintError::PathNotFound(path.to_path_buf())), + ErrorKind::PermissionDenied => { + Report::new(DomainsLintError::PermissionDenied(path.to_path_buf())) + } + _ => Report::new(DomainsLintError::ReadFile(path.to_path_buf())).attach(err.to_string()), + } +} + +#[cfg(test)] +mod explicit_path_tests { + use super::*; + + #[test] + fn scans_a_valid_file() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let file = temp.path().join("a.rs"); + fs::write(&file, "one\ntwo\n").expect("should write file"); + + let lines = explicit_path_lines(&[file]).expect("should scan named file"); + let texts: Vec<_> = lines.iter().map(|l| l.content.clone()).collect(); + assert_eq!(texts, vec!["one", "two"]); + } + + #[test] + fn skips_excluded_extension() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let file = temp.path().join("image.png"); + fs::write(&file, "not really a png").expect("should write file"); + + let lines = explicit_path_lines(&[file]).expect("should skip excluded extension"); + assert!(lines.is_empty(), "excluded extension yields no lines"); + } + + #[test] + fn skips_excluded_path() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let dir = temp.path().join("node_modules"); + fs::create_dir(&dir).expect("should create node_modules"); + let file = dir.join("pkg.js"); + fs::write(&file, "let x = 1;\n").expect("should write file"); + + let lines = explicit_path_lines(&[file]).expect("should skip node_modules path"); + assert!(lines.is_empty(), "node_modules path yields no lines"); + } + + #[cfg(unix)] + #[test] + fn skips_symlink() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let real = temp.path().join("real.rs"); + fs::write(&real, "real\n").expect("should write real"); + let link = temp.path().join("link.rs"); + std::os::unix::fs::symlink(&real, &link).expect("should create symlink"); + + let lines = explicit_path_lines(&[link]).expect("should skip symlink"); + assert!(lines.is_empty(), "symlink yields no lines"); + } + + #[test] + fn missing_path_is_hard_error() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let missing = temp.path().join("nope.rs"); + + let err = explicit_path_lines(&[missing]).expect_err("missing path should error"); + assert!( + matches!(err.current_context(), DomainsLintError::PathNotFound(_)), + "should be PathNotFound: {err:?}" + ); + } + + #[cfg(unix)] + #[test] + fn permission_denied_is_hard_error() { + use std::os::unix::fs::PermissionsExt; + + let temp = tempfile::tempdir().expect("should create tempdir"); + let file = temp.path().join("secret.rs"); + fs::write(&file, "secret\n").expect("should write file"); + fs::set_permissions(&file, fs::Permissions::from_mode(0o000)).expect("should chmod 000"); + + let result = explicit_path_lines(std::slice::from_ref(&file)); + // Restore perms so the tempdir can be cleaned up. + let _ = fs::set_permissions(&file, fs::Permissions::from_mode(0o644)); + + let err = result.expect_err("permission-denied path should error"); + assert!( + matches!(err.current_context(), DomainsLintError::PermissionDenied(_)), + "should be PermissionDenied: {err:?}" + ); + } +} + +// === CLI entry point (Phase 5) === + +/// One reported violation, with full file context for the report. +#[derive(Debug, Serialize)] +pub struct FileViolation { + /// Repo-relative path of the file. + pub path: PathBuf, + /// 1-based line number. + #[serde(rename = "line_no")] + pub line: usize, + /// The disallowed host. + pub host: String, + /// The full text of the line the host appeared on (not just the + /// URL — there may be surrounding code or punctuation). + #[serde(rename = "line")] + pub line_excerpt: String, +} + +/// Run `ts dev lint domains`. +/// +/// Dispatches on the scan mode (`--staged`, `--changed-vs`, explicit +/// paths, or full-repo), scans each collected line, and emits a +/// human or JSON report. +/// +/// # Errors +/// +/// Returns [`CliError::EnvironmentError`] if a collector fails (e.g., +/// the repository cannot be opened), or [`CliError::ViolationsFound`] +/// if any disallowed host is found. +pub fn run(args: &DomainsArgs) -> Result<(), Report> { + let cwd = env::current_dir().change_context(CliError::EnvironmentError)?; + let lines: Vec = if args.staged { + staged_added_lines(&cwd).change_context(CliError::EnvironmentError)? + } else if let Some(reference) = &args.changed_vs { + changed_vs_added_lines(&cwd, reference).change_context(CliError::EnvironmentError)? + } else if args.paths.is_empty() { + full_repo_lines(&cwd).change_context(CliError::EnvironmentError)? + } else { + explicit_path_lines(&args.paths).change_context(CliError::EnvironmentError)? + }; + + let mut violations: Vec = Vec::new(); + let mut verbose_path: Option = None; + let mut verbose_count: usize = 0; + for line in lines { + if args.verbose { + match &verbose_path { + Some(prev) if prev == &line.path => verbose_count += 1, + _ => { + if let Some(prev) = verbose_path.take() { + write_stderr_line(format!( + "scanned {verbose_count} lines in {}", + prev.display() + ))?; + } + verbose_path = Some(line.path.clone()); + verbose_count = 1; + } + } + } + + let outcome = scan_line(&line.content); + for unused in outcome.unused_suppressions { + write_stderr_line(format!( + "warning: {}:{}: allow-domain marker listed `{unused}` but it does not appear on the line", + line.path.display(), + line.line_no, + ))?; + } + for v in outcome.violations { + violations.push(FileViolation { + path: line.path.clone(), + line: line.line_no, + host: v.host, + line_excerpt: line.content.clone(), + }); + } + } + if let Some(prev) = verbose_path { + write_stderr_line(format!( + "scanned {verbose_count} lines in {}", + prev.display() + ))?; + } + + match args.format { + OutputFormat::Human => emit_human(&violations)?, + OutputFormat::Json => emit_json(&violations)?, + } + + if violations.is_empty() { + Ok(()) + } else { + Err(Report::new(CliError::ViolationsFound { + count: violations.len(), + })) + } +} + +/// Emit the human-readable violation report on stdout. +fn emit_human(violations: &[FileViolation]) -> Result<(), Report> { + for v in violations { + write_stdout_line(format!( + "{}:{}: disallowed host {}", + v.path.display(), + v.line, + v.host + ))?; + } + if !violations.is_empty() { + let files: BTreeSet<&PathBuf> = violations.iter().map(|v| &v.path).collect(); + write_stdout_line("")?; + write_stdout_line(format!( + "{} disallowed host(s) found in {} file(s).", + violations.len(), + files.len() + ))?; + write_stdout_line( + "To allow a new integration proxy, add it to EXACT_HOSTS in \ + crates/trusted-server-cli/src/dev/lint/domains.rs.", + )?; + write_stdout_line( + "To suppress one line (e.g., security tests), append \ + `// allow-domain: ` in a comment.", + )?; + write_stdout_line("Run `ts dev lint domains` (no args) for a full-repo audit.")?; + } + Ok(()) +} + +/// Emit the JSON violation report on stdout. +fn emit_json(violations: &[FileViolation]) -> Result<(), Report> { + let files_affected: BTreeSet<&PathBuf> = violations.iter().map(|v| &v.path).collect(); + let report = json!({ + "violations": violations, + "count": violations.len(), + "files_affected": files_affected.len(), + }); + write_json(&report) +} diff --git a/crates/trusted-server-cli/src/dev/lint/mod.rs b/crates/trusted-server-cli/src/dev/lint/mod.rs new file mode 100644 index 00000000..7da70a59 --- /dev/null +++ b/crates/trusted-server-cli/src/dev/lint/mod.rs @@ -0,0 +1,75 @@ +//! `ts dev lint` subcommand group: linters for source/config/docs. +//! +//! Subcommands: +//! - `domains`: URL-host linter (this design). + +use std::path::PathBuf; + +use clap::{Args, Subcommand, ValueEnum}; +use error_stack::Report; + +use crate::error::CliError; + +pub mod domains; + +#[cfg(test)] +pub(crate) mod test_support; + +/// Subcommands under `ts dev lint`. +#[derive(Debug, Subcommand)] +pub enum LintCommand { + /// Lint URL hosts in source/config/docs. + /// + /// With no flags or paths, scans every tracked file's *working-tree* + /// content (includes unstaged edits, so a local audit may diverge + /// from CI on the same commit). Use `--changed-vs ` for the + /// committed-state PR mode. + Domains(DomainsArgs), +} + +/// Arguments for `ts dev lint domains`. +#[derive(Debug, Args)] +pub struct DomainsArgs { + /// Pre-commit mode: scan only staged-added lines. + #[arg(long, conflicts_with_all = ["changed_vs", "paths"])] + pub staged: bool, + + /// CI/PR mode: scan only lines added relative to merge-base(, HEAD). + #[arg(long, value_name = "REF", conflicts_with_all = ["staged", "paths"])] + pub changed_vs: Option, + + /// Explicit paths to scan in full. Mutually exclusive with + /// `--staged` / `--changed-vs`. Unstaged content is read directly + /// from each named file. + #[arg(value_name = "PATH", conflicts_with_all = ["staged", "changed_vs"])] + pub paths: Vec, + + /// Output format. + #[arg(long, value_enum, default_value = "human")] + pub format: OutputFormat, + + /// Print per-file scan progress on stderr. Has no effect on the + /// exit code or violation count. + #[arg(long)] + pub verbose: bool, +} + +/// Output format for `ts dev lint domains`. +#[derive(Debug, Clone, Copy, ValueEnum)] +pub enum OutputFormat { + /// Human-readable `path:line: disallowed host ` lines. + Human, + /// Structured JSON report. + Json, +} + +/// Dispatch a `ts dev lint` subcommand. +/// +/// # Errors +/// +/// Propagates the error from the chosen linter. +pub fn run(command: LintCommand) -> Result<(), Report> { + match command { + LintCommand::Domains(args) => domains::run(&args), + } +} diff --git a/crates/trusted-server-cli/src/dev/lint/test_support.rs b/crates/trusted-server-cli/src/dev/lint/test_support.rs new file mode 100644 index 00000000..d45ebc0b --- /dev/null +++ b/crates/trusted-server-cli/src/dev/lint/test_support.rs @@ -0,0 +1,205 @@ +//! Shared git-repo fixture helpers for the `dev/lint` inline tests. +//! +//! All operations go through `gix` — no subprocess, no `git` binary. +//! Commits use a fixed signature so they do not depend on ambient +//! `user.name` / `user.email` config and are deterministic across +//! runs (clean CI machines included). +//! +//! NOTE: integration tests under `tests/` cannot reach `pub(crate)` +//! items here, so `tests/common/mod.rs` carries the same helpers. +//! Keep the two files in sync when editing either. + +// Fixture helpers — not every inline test module uses every helper. +// (The module is already `#[cfg(test)]`-gated at its declaration in +// `mod.rs`, so no inner `#![cfg(test)]` is needed here.) +#![allow(dead_code)] + +use std::fs; +use std::path::Path; + +use gix::ObjectId; +use gix::bstr::BString; + +/// Fixed signature for all fixture commits. +fn test_signature() -> gix::actor::Signature { + gix::actor::Signature { + name: BString::from("ts dev lint tests"), + email: BString::from("tests@example.com"), + time: gix::date::Time::new(1_700_000_000, 0), + } +} + +/// Initialise a fresh repository at `path`. +pub(crate) fn init_repo(path: &Path) -> gix::Repository { + gix::init(path).expect("should init gix repo") +} + +/// Stage every file currently in the working tree: write a blob per +/// file and rebuild the index from scratch. The `.git` directory is +/// skipped. Paths are stored with `/` separators relative to the +/// work directory. +pub(crate) fn stage_all(repo: &gix::Repository) { + let work_dir = repo + .workdir() + .expect("fixture repo should have a work directory") + .to_path_buf(); + + let mut files: Vec<(BString, ObjectId)> = Vec::new(); + collect_files(repo, &work_dir, &work_dir, &mut files); + files.sort_by(|a, b| a.0.cmp(&b.0)); + + let mut state = gix::index::State::new(repo.object_hash()); + for (path, oid) in files { + state.dangerously_push_entry( + gix::index::entry::Stat::default(), + oid, + gix::index::entry::Flags::empty(), + gix::index::entry::Mode::FILE, + path.as_ref(), + ); + } + state.sort_entries(); + + let mut file = gix::index::File::from_state(state, repo.index_path()); + file.write(gix::index::write::Options::default()) + .expect("should write index file"); +} + +/// Recursively collect `(relative_path, blob_id)` for every file +/// under `dir`, skipping the `.git` directory. +fn collect_files( + repo: &gix::Repository, + work_dir: &Path, + dir: &Path, + out: &mut Vec<(BString, ObjectId)>, +) { + for entry in fs::read_dir(dir).expect("should read fixture directory") { + let entry = entry.expect("should read directory entry"); + let path = entry.path(); + let file_type = entry.file_type().expect("should read file type"); + if file_type.is_dir() { + if path.file_name().is_some_and(|n| n == ".git") { + continue; + } + collect_files(repo, work_dir, &path, out); + } else if file_type.is_file() { + let content = fs::read(&path).expect("should read fixture file"); + let oid = repo + .write_blob(&content) + .expect("should write blob") + .detach(); + let rel = path + .strip_prefix(work_dir) + .expect("file should be under work dir"); + out.push((rel_path_to_bstring(rel), oid)); + } + } +} + +/// Convert a working-tree-relative `Path` to a `BString` for an index +/// entry. On Unix, preserves raw bytes verbatim so non-UTF-8 filenames +/// reach gix unchanged (spec case 25). On Windows, falls back to a +/// lossy UTF-8 conversion with backslash-to-slash normalisation. +#[cfg(unix)] +fn rel_path_to_bstring(rel: &Path) -> BString { + use std::os::unix::ffi::OsStrExt; + BString::from(rel.as_os_str().as_bytes()) +} + +#[cfg(not(unix))] +fn rel_path_to_bstring(rel: &Path) -> BString { + let s = rel.to_string_lossy().replace('\\', "/"); + BString::from(s.as_bytes()) +} + +/// Build a tree from the current index and commit it to `HEAD`, +/// parented on the current `HEAD` commit (if any). +pub(crate) fn commit_all(repo: &gix::Repository, message: &str) -> ObjectId { + commit_index_to_ref(repo, "HEAD", message) +} + +/// Like [`commit_all`] but commits to an explicit branch ref +/// (e.g. `refs/heads/feature`). +pub(crate) fn commit_all_as_branch( + repo: &gix::Repository, + branch_ref: &str, + message: &str, +) -> ObjectId { + commit_index_to_ref(repo, branch_ref, message) +} + +fn commit_index_to_ref(repo: &gix::Repository, target_ref: &str, message: &str) -> ObjectId { + // Build a tree from the index entries via the tree editor. + let index = repo.index().expect("should read index"); + let empty_tree_id = repo.empty_tree().id; + let mut editor = repo + .edit_tree(empty_tree_id) + .expect("should create tree editor"); + for entry in index.entries() { + let path = entry.path(&index); + editor + .upsert( + path.to_string(), + gix::object::tree::EntryKind::Blob, + entry.id, + ) + .expect("should upsert index entry into tree"); + } + let tree_id = editor.write().expect("should write tree").detach(); + + let parents: Vec = repo + .head_id() + .ok() + .map(|id| vec![id.detach()]) + .unwrap_or_default(); + + let sig = test_signature(); + let mut author_time_buf = gix::date::parse::TimeBuf::default(); + let mut committer_time_buf = gix::date::parse::TimeBuf::default(); + repo.commit_as( + sig.to_ref(&mut committer_time_buf), + sig.to_ref(&mut author_time_buf), + target_ref, + message, + tree_id, + parents, + ) + .expect("should write commit") + .detach() +} + +/// Create `refs/heads/` pointing at the current `HEAD` +/// commit and move `HEAD` to it (symbolic). +pub(crate) fn create_and_checkout_branch(repo: &gix::Repository, branch: &str) { + let head = repo.head_id().expect("HEAD should exist").detach(); + let full_ref = format!("refs/heads/{branch}"); + repo.reference( + full_ref.as_str(), + head, + gix::refs::transaction::PreviousValue::Any, + format!("create branch {branch}"), + ) + .expect("should create branch ref"); + + use gix::refs::transaction::{Change, LogChange, PreviousValue, RefEdit, RefLog}; + use gix::refs::{FullName, Target}; + let full: FullName = full_ref + .as_str() + .try_into() + .expect("should parse branch FullName"); + let edit = RefEdit { + change: Change::Update { + log: LogChange { + mode: RefLog::AndReference, + force_create_reflog: false, + message: BString::from(format!("checkout {branch}")), + }, + expected: PreviousValue::Any, + new: Target::Symbolic(full), + }, + name: "HEAD".try_into().expect("HEAD is a valid ref name"), + deref: false, + }; + repo.edit_reference(edit) + .expect("should move HEAD to the new branch"); +} diff --git a/crates/trusted-server-cli/src/dev/mod.rs b/crates/trusted-server-cli/src/dev/mod.rs new file mode 100644 index 00000000..83bfeb18 --- /dev/null +++ b/crates/trusted-server-cli/src/dev/mod.rs @@ -0,0 +1,59 @@ +//! `ts dev` subcommand group: developer-workflow commands. +//! +//! Subcommands: +//! - `serve`: launches the local dev server (formerly `ts dev`). +//! - `lint domains`: URL-host linter (Phase 2+). +//! - `install-hooks`: pre-commit hook installer (Phase 6). + +use std::path::PathBuf; + +use clap::{Args, Subcommand}; + +pub mod install_hooks; +pub mod lint; +pub mod serve; + +// Re-export what `lib.rs` consumes via `crate::dev::*`. Other public +// items in `serve` (FASTLY_LOCAL_MANIFEST, render_local_fastly_manifest, +// write_local_fastly_manifest, run_fastly_dev) remain accessible via +// `crate::dev::serve::*` for tests and any future internal consumers. +pub use serve::{Adapter, run_dev_command}; + +/// Subcommands under `ts dev`. +#[derive(Debug, Subcommand)] +pub enum DevCommand { + /// Launch the local dev server (formerly `ts dev`). + Serve(ServeArgs), + /// Linters for source, config, and documentation. + Lint { + /// The lint to run. + #[command(subcommand)] + command: lint::LintCommand, + }, + /// Install the pre-commit hook into this repo (one-time setup). + InstallHooks(InstallHooksArgs), +} + +/// Arguments for `ts dev install-hooks`. +#[derive(Debug, Args)] +pub struct InstallHooksArgs { + /// Overwrite an existing unmanaged hook or a non-default + /// `core.hooksPath` (the displaced value is backed up / printed). + #[arg(long)] + pub force: bool, +} + +/// Arguments for `ts dev serve`. Preserves byte-for-byte the flags +/// of today's `ts dev` leaf — see spec §"This PR must make the +/// CLI-surface change". +#[derive(Debug, Args)] +pub struct ServeArgs { + #[arg(long, short = 'a', default_value = "fastly")] + pub adapter: Adapter, + #[arg(long)] + pub config: Option, + #[arg(long, default_value = "local")] + pub env: String, + #[arg(trailing_var_arg = true, allow_hyphen_values = true)] + pub passthrough: Vec, +} diff --git a/crates/trusted-server-cli/src/dev.rs b/crates/trusted-server-cli/src/dev/serve.rs similarity index 98% rename from crates/trusted-server-cli/src/dev.rs rename to crates/trusted-server-cli/src/dev/serve.rs index 79ef9af1..38db8d01 100644 --- a/crates/trusted-server-cli/src/dev.rs +++ b/crates/trusted-server-cli/src/dev/serve.rs @@ -8,7 +8,7 @@ use crate::config::ValidatedConfig; use crate::error::CliError; pub const FASTLY_LOCAL_MANIFEST: &str = "fastly.local.toml"; -const EMBEDDED_FASTLY_TEMPLATE: &str = include_str!("../../../fastly.toml"); +const EMBEDDED_FASTLY_TEMPLATE: &str = include_str!("../../../../fastly.toml"); #[derive(Debug, Clone, Copy, Default, PartialEq, Eq, clap::ValueEnum)] pub enum Adapter { diff --git a/crates/trusted-server-cli/src/error.rs b/crates/trusted-server-cli/src/error.rs index 3168b9dc..d7e2919b 100644 --- a/crates/trusted-server-cli/src/error.rs +++ b/crates/trusted-server-cli/src/error.rs @@ -22,6 +22,13 @@ pub enum CliError { Json, #[display("operation cancelled")] Cancelled, + #[display("environment error")] + EnvironmentError, + #[display("found {count} disallowed host(s)")] + ViolationsFound { + /// Number of disallowed hosts found across all scanned files. + count: usize, + }, } impl Error for CliError {} diff --git a/crates/trusted-server-cli/src/lib.rs b/crates/trusted-server-cli/src/lib.rs index ae4411d1..d962c4a7 100644 --- a/crates/trusted-server-cli/src/lib.rs +++ b/crates/trusted-server-cli/src/lib.rs @@ -37,7 +37,10 @@ enum Command { command: ConfigCommand, }, Audit(AuditArgs), - Dev(DevArgs), + Dev { + #[command(subcommand)] + command: dev::DevCommand, + }, Auth { #[command(subcommand)] command: AuthCommand, @@ -85,18 +88,6 @@ struct AuditArgs { force: bool, } -#[derive(Debug, Args)] -struct DevArgs { - #[arg(long, short = 'a', default_value = "fastly")] - adapter: dev::Adapter, - #[arg(long)] - config: Option, - #[arg(long, default_value = "local")] - env: String, - #[arg(trailing_var_arg = true, allow_hyphen_values = true)] - passthrough: Vec, -} - #[derive(Debug, Subcommand)] enum AuthCommand { Fastly { @@ -165,14 +156,22 @@ struct FastlyProvisionApplyArgs { pub fn run() -> ExitCode { match execute() { Ok(()) => ExitCode::SUCCESS, - Err(error) => { - let _ = write_stderr_line(format_report(&error)); - if matches!(error.current_context(), CliError::Cancelled) { - ExitCode::from(130) - } else { + // `ViolationsFound` and `Cancelled` exit without an + // error-stack dump: the violation report is already on + // stdout, and cancellation is a benign user signal. Real + // failures still print `format_report`. + Err(error) => match error.current_context() { + CliError::Cancelled => ExitCode::from(130), + CliError::ViolationsFound { .. } => ExitCode::from(1), + CliError::EnvironmentError => { + let _ = write_stderr_line(format_report(&error)); + ExitCode::from(2) + } + _ => { + let _ = write_stderr_line(format_report(&error)); ExitCode::from(1) } - } + }, } } @@ -181,7 +180,7 @@ fn execute() -> Result<(), Report> { match cli.command { Command::Config { command } => run_config(command), Command::Audit(args) => run_audit(&args), - Command::Dev(args) => run_dev(&args), + Command::Dev { command } => run_dev(command), Command::Auth { command } => run_auth(command), Command::Provision { command } => run_provision(command), } @@ -278,7 +277,15 @@ fn run_audit(args: &AuditArgs) -> Result<(), Report> { )) } -fn run_dev(args: &DevArgs) -> Result<(), Report> { +fn run_dev(command: dev::DevCommand) -> Result<(), Report> { + match command { + dev::DevCommand::Serve(args) => run_dev_serve(&args), + dev::DevCommand::Lint { command } => dev::lint::run(command), + dev::DevCommand::InstallHooks(args) => dev::install_hooks::run(&args), + } +} + +fn run_dev_serve(args: &dev::ServeArgs) -> Result<(), Report> { let validated = config::load_validated_config(args.config.as_deref())?; let status = dev::run_dev_command(args.adapter, &validated, &args.env, &args.passthrough)?; if status.success() { diff --git a/crates/trusted-server-cli/tests/common/mod.rs b/crates/trusted-server-cli/tests/common/mod.rs new file mode 100644 index 00000000..d97549ea --- /dev/null +++ b/crates/trusted-server-cli/tests/common/mod.rs @@ -0,0 +1,204 @@ +//! Shared git-repo fixture helpers for the integration tests. +//! +//! All operations go through `gix` — no subprocess, no `git` binary. +//! Commits use a fixed signature so they do not depend on ambient +//! `user.name` / `user.email` config and are deterministic across +//! runs (clean CI machines included). +//! +//! Keep in sync with `src/dev/lint/test_support.rs`. The split exists +//! because integration tests under `tests/` cannot reach `pub(crate)` +//! items in the crate. + +// Each integration-test file `mod common;`s this and uses a subset +// of the helpers. +#![allow(dead_code)] + +use std::fs; +use std::path::Path; + +use gix::ObjectId; +use gix::bstr::BString; + +/// Fixed signature for all fixture commits. +fn test_signature() -> gix::actor::Signature { + gix::actor::Signature { + name: BString::from("ts dev lint tests"), + email: BString::from("tests@example.com"), + time: gix::date::Time::new(1_700_000_000, 0), + } +} + +/// Initialise a fresh repository at `path`. +pub(crate) fn init_repo(path: &Path) -> gix::Repository { + gix::init(path).expect("should init gix repo") +} + +/// Stage every file currently in the working tree: write a blob per +/// file and rebuild the index from scratch. The `.git` directory is +/// skipped. Paths are stored with `/` separators relative to the +/// work directory. +pub(crate) fn stage_all(repo: &gix::Repository) { + let work_dir = repo + .workdir() + .expect("fixture repo should have a work directory") + .to_path_buf(); + + let mut files: Vec<(BString, ObjectId)> = Vec::new(); + collect_files(repo, &work_dir, &work_dir, &mut files); + files.sort_by(|a, b| a.0.cmp(&b.0)); + + let mut state = gix::index::State::new(repo.object_hash()); + for (path, oid) in files { + state.dangerously_push_entry( + gix::index::entry::Stat::default(), + oid, + gix::index::entry::Flags::empty(), + gix::index::entry::Mode::FILE, + path.as_ref(), + ); + } + state.sort_entries(); + + let mut file = gix::index::File::from_state(state, repo.index_path()); + file.write(gix::index::write::Options::default()) + .expect("should write index file"); +} + +/// Recursively collect `(relative_path, blob_id)` for every file +/// under `dir`, skipping the `.git` directory. +fn collect_files( + repo: &gix::Repository, + work_dir: &Path, + dir: &Path, + out: &mut Vec<(BString, ObjectId)>, +) { + for entry in fs::read_dir(dir).expect("should read fixture directory") { + let entry = entry.expect("should read directory entry"); + let path = entry.path(); + let file_type = entry.file_type().expect("should read file type"); + if file_type.is_dir() { + if path.file_name().is_some_and(|n| n == ".git") { + continue; + } + collect_files(repo, work_dir, &path, out); + } else if file_type.is_file() { + let content = fs::read(&path).expect("should read fixture file"); + let oid = repo + .write_blob(&content) + .expect("should write blob") + .detach(); + let rel = path + .strip_prefix(work_dir) + .expect("file should be under work dir"); + out.push((rel_path_to_bstring(rel), oid)); + } + } +} + +/// Convert a working-tree-relative `Path` to a `BString` for an index +/// entry. On Unix, preserves raw bytes verbatim so non-UTF-8 filenames +/// reach gix unchanged (spec case 25). On Windows, falls back to a +/// lossy UTF-8 conversion with backslash-to-slash normalisation. +#[cfg(unix)] +fn rel_path_to_bstring(rel: &Path) -> BString { + use std::os::unix::ffi::OsStrExt; + BString::from(rel.as_os_str().as_bytes()) +} + +#[cfg(not(unix))] +fn rel_path_to_bstring(rel: &Path) -> BString { + let s = rel.to_string_lossy().replace('\\', "/"); + BString::from(s.as_bytes()) +} + +/// Build a tree from the current index and commit it to `HEAD`, +/// parented on the current `HEAD` commit (if any). +pub(crate) fn commit_all(repo: &gix::Repository, message: &str) -> ObjectId { + commit_index_to_ref(repo, "HEAD", message) +} + +/// Like [`commit_all`] but commits to an explicit branch ref +/// (e.g. `refs/heads/feature`). +pub(crate) fn commit_all_as_branch( + repo: &gix::Repository, + branch_ref: &str, + message: &str, +) -> ObjectId { + commit_index_to_ref(repo, branch_ref, message) +} + +fn commit_index_to_ref(repo: &gix::Repository, target_ref: &str, message: &str) -> ObjectId { + // Build a tree from the index entries via the tree editor. + let index = repo.index().expect("should read index"); + let empty_tree_id = repo.empty_tree().id; + let mut editor = repo + .edit_tree(empty_tree_id) + .expect("should create tree editor"); + for entry in index.entries() { + let path = entry.path(&index); + editor + .upsert( + path.to_string(), + gix::object::tree::EntryKind::Blob, + entry.id, + ) + .expect("should upsert index entry into tree"); + } + let tree_id = editor.write().expect("should write tree").detach(); + + let parents: Vec = repo + .head_id() + .ok() + .map(|id| vec![id.detach()]) + .unwrap_or_default(); + + let sig = test_signature(); + let mut author_time_buf = gix::date::parse::TimeBuf::default(); + let mut committer_time_buf = gix::date::parse::TimeBuf::default(); + repo.commit_as( + sig.to_ref(&mut committer_time_buf), + sig.to_ref(&mut author_time_buf), + target_ref, + message, + tree_id, + parents, + ) + .expect("should write commit") + .detach() +} + +/// Create `refs/heads/` pointing at the current `HEAD` +/// commit and move `HEAD` to it (symbolic). +pub(crate) fn create_and_checkout_branch(repo: &gix::Repository, branch: &str) { + let head = repo.head_id().expect("HEAD should exist").detach(); + let full_ref = format!("refs/heads/{branch}"); + repo.reference( + full_ref.as_str(), + head, + gix::refs::transaction::PreviousValue::Any, + format!("create branch {branch}"), + ) + .expect("should create branch ref"); + + use gix::refs::transaction::{Change, LogChange, PreviousValue, RefEdit, RefLog}; + use gix::refs::{FullName, Target}; + let full: FullName = full_ref + .as_str() + .try_into() + .expect("should parse branch FullName"); + let edit = RefEdit { + change: Change::Update { + log: LogChange { + mode: RefLog::AndReference, + force_create_reflog: false, + message: BString::from(format!("checkout {branch}")), + }, + expected: PreviousValue::Any, + new: Target::Symbolic(full), + }, + name: "HEAD".try_into().expect("HEAD is a valid ref name"), + deref: false, + }; + repo.edit_reference(edit) + .expect("should move HEAD to the new branch"); +} diff --git a/crates/trusted-server-cli/tests/lint_domains_cli.rs b/crates/trusted-server-cli/tests/lint_domains_cli.rs new file mode 100644 index 00000000..69bc3362 --- /dev/null +++ b/crates/trusted-server-cli/tests/lint_domains_cli.rs @@ -0,0 +1,765 @@ +//! End-to-end tests for `ts dev lint domains`, exercising the `ts` +//! binary as a whole: exit codes, stdout, and stderr. +//! +//! The pure-function and collector logic is covered by inline unit +//! tests in `src/dev/lint/domains.rs`; this file locks the +//! binary-observable contract (exit 0 / 1 / 2, report shape). + +mod common; + +use assert_cmd::Command; +use predicates::prelude::*; +use tempfile::TempDir; + +/// Build the `ts` command rooted at `dir`. +fn ts_in(dir: &TempDir) -> Command { + let mut cmd = Command::cargo_bin("ts").expect("should locate the ts binary"); + cmd.current_dir(dir.path()); + cmd +} + +/// A repo with one committed clean file and HEAD established. +fn repo_with_initial_commit() -> TempDir { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write(temp.path().join("ok.rs"), "fn ok() {}\n").expect("should write ok.rs"); + common::stage_all(&repo); + common::commit_all(&repo, "initial"); + temp +} + +// === --staged mode === + +#[test] +fn staged_clean_exits_zero() { + let temp = repo_with_initial_commit(); + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +#[test] +fn staged_violation_exits_one_human() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("bad.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write bad.rs"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1) + .stdout(predicate::str::contains( + "bad.rs:1: disallowed host test.com", + )) + .stdout(predicate::str::contains("1 disallowed host(s) found")); +} + +#[test] +fn staged_violation_json_format() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("bad.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write bad.rs"); + common::stage_all(&repo); + + let assert = ts_in(&temp) + .args(["dev", "lint", "domains", "--staged", "--format", "json"]) + .assert() + .code(1); + let stdout = + String::from_utf8(assert.get_output().stdout.clone()).expect("stdout should be UTF-8"); + let parsed: serde_json::Value = + serde_json::from_str(&stdout).expect("stdout should be valid JSON"); + assert_eq!(parsed["count"], 1); + assert_eq!(parsed["violations"][0]["host"], "test.com"); +} + +#[test] +fn staged_suppression_marker_passes() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("sec.rs"), + "let attacker = \"https://evil.com\"; // allow-domain: evil.com\n", + ) + .expect("should write sec.rs"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +/// Spec test case 25: non-UTF-8 staged paths are reported (not +/// skipped) with a lossy-path stderr warning. Linux-only — macOS +/// rejects non-UTF-8 filenames with `EILSEQ`. +#[cfg(target_os = "linux")] +#[test] +fn staged_non_utf8_path_warns_and_reports() { + use std::os::unix::ffi::OsStrExt; + + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + let name = std::ffi::OsStr::from_bytes(&[0x66, 0x6f, 0xff, 0x6f, 0x2e, 0x72, 0x73]); + std::fs::write(temp.path().join(name), "let bad = \"https://test.com\";\n") + .expect("should write non-utf8-named file"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host test.com")) + .stderr(predicate::str::contains("not valid UTF-8")); +} + +/// Regression for the rename bug: a pure rename of a file containing +/// a disallowed URL must exit clean. The previous implementation +/// reported every line of the renamed file as added. +#[test] +fn staged_pure_rename_exits_zero() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write( + temp.path().join("old.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write old"); + common::stage_all(&repo); + common::commit_all(&repo, "initial"); + + std::fs::remove_file(temp.path().join("old.rs")).expect("should remove old"); + std::fs::write( + temp.path().join("new.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write new"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +#[test] +fn staged_deletion_exits_zero() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write( + temp.path().join("doomed.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write doomed"); + common::stage_all(&repo); + common::commit_all(&repo, "initial"); + + std::fs::remove_file(temp.path().join("doomed.rs")).expect("should remove doomed"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +/// Existing committed violations must not be re-reported when an +/// unrelated, clean change is staged. +#[test] +fn staged_existing_violation_with_unrelated_change_exits_zero() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write( + temp.path().join("legacy.rs"), + "let pre_existing = \"https://test.com\";\n", + ) + .expect("should write legacy"); + common::stage_all(&repo); + common::commit_all(&repo, "commit pre-existing violation"); + + std::fs::write(temp.path().join("clean.rs"), "let ok = 1;\n").expect("should write clean"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +/// Multi-hunk same-file edit: both added regions are scanned and +/// both violations reported with their correct new-side line numbers. +#[test] +fn staged_multi_hunk_reports_both_added_violations() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write( + temp.path().join("a.rs"), + "alpha\nbeta\ngamma\ndelta\nepsilon\n", + ) + .expect("should write initial"); + common::stage_all(&repo); + common::commit_all(&repo, "initial"); + + std::fs::write( + temp.path().join("a.rs"), + "alpha\nlet bad1 = \"https://test.com\";\nbeta\ngamma\ndelta\nlet bad2 = \"https://partner.com\";\nepsilon\n", + ) + .expect("should write multi-hunk"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1) + .stdout(predicate::str::contains("a.rs:2: disallowed host test.com")) + .stdout(predicate::str::contains( + "a.rs:6: disallowed host partner.com", + )); +} + +/// JSON output shape: `count`, `files_affected`, and each +/// violation's `path`, `line_no`, `host`, `line` fields. +#[test] +fn staged_violation_json_full_shape() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("bad.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write bad.rs"); + common::stage_all(&repo); + + let assert = ts_in(&temp) + .args(["dev", "lint", "domains", "--staged", "--format", "json"]) + .assert() + .code(1); + let stdout = + String::from_utf8(assert.get_output().stdout.clone()).expect("stdout should be UTF-8"); + let parsed: serde_json::Value = + serde_json::from_str(&stdout).expect("stdout should be valid JSON"); + + assert_eq!(parsed["count"], 1); + assert_eq!(parsed["files_affected"], 1); + let v = &parsed["violations"][0]; + assert_eq!(v["path"], "bad.rs"); + assert_eq!(v["line_no"], 1); + assert_eq!(v["host"], "test.com"); + assert_eq!(v["line"], "let bad = \"https://test.com\";"); +} + +/// `--verbose` writes a per-file scan-progress note to stderr; exit +/// code and violation count are unchanged. +#[test] +fn staged_verbose_writes_per_file_progress_to_stderr() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("bad.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write bad.rs"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged", "--verbose"]) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host test.com")) + .stderr(predicate::str::contains("scanned")) + .stderr(predicate::str::contains("bad.rs")); +} + +// === --changed-vs mode === + +#[test] +fn changed_vs_reports_feature_branch_lines() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write(temp.path().join("a.rs"), "let ok = 1;\n").expect("should write base"); + common::stage_all(&repo); + common::commit_all(&repo, "base"); + + common::create_and_checkout_branch(&repo, "feature"); + std::fs::write( + temp.path().join("a.rs"), + "let ok = 1;\nlet bad = \"https://test.com\";\n", + ) + .expect("should write feature change"); + common::stage_all(&repo); + common::commit_all(&repo, "feature change"); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--changed-vs", "main"]) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host test.com")); +} + +/// Spec case 28: when HEAD is behind the base ref, the merge-base +/// is HEAD itself and the diff is empty — so no violations are +/// reported even if the base ref has introduced one. This exercises +/// the merge-base path with an "anti-symmetric" topology. +#[test] +fn changed_vs_branch_behind_base_reports_nothing() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + + // Base: a single clean commit on `main`. + std::fs::write(temp.path().join("a.rs"), "let ok = 1;\n").expect("should write base"); + common::stage_all(&repo); + common::commit_all(&repo, "base"); + + // Branch from `main` at the base commit (no further commits on + // the feature branch — HEAD is at the merge-base). + common::create_and_checkout_branch(&repo, "feature"); + + // Advance `main` past the merge-base with a commit that, if + // wrongly attributed to the feature branch, would be a + // violation. Then move HEAD back to `feature`. + use gix::refs::transaction::{Change, LogChange, PreviousValue, RefEdit, RefLog}; + use gix::refs::{FullName, Target}; + let main_ref: FullName = "refs/heads/main".try_into().expect("valid ref name"); + let head_edit = RefEdit { + change: Change::Update { + log: LogChange { + mode: RefLog::AndReference, + force_create_reflog: false, + message: gix::bstr::BString::from("switch to main"), + }, + expected: PreviousValue::Any, + new: Target::Symbolic(main_ref), + }, + name: "HEAD".try_into().expect("HEAD"), + deref: false, + }; + repo.edit_reference(head_edit) + .expect("should switch HEAD to main"); + std::fs::write( + temp.path().join("a.rs"), + "let ok = 1;\nlet ahead = \"https://test.com\";\n", + ) + .expect("should write main-ahead change"); + common::stage_all(&repo); + common::commit_all(&repo, "main: ahead of feature"); + + // Move HEAD back to feature. + let feature_ref: FullName = "refs/heads/feature".try_into().expect("valid ref name"); + let head_edit = RefEdit { + change: Change::Update { + log: LogChange { + mode: RefLog::AndReference, + force_create_reflog: false, + message: gix::bstr::BString::from("switch to feature"), + }, + expected: PreviousValue::Any, + new: Target::Symbolic(feature_ref), + }, + name: "HEAD".try_into().expect("HEAD"), + deref: false, + }; + repo.edit_reference(head_edit) + .expect("should switch HEAD back to feature"); + + // `--changed-vs main`: merge-base(main, feature) == feature, so + // diff is empty. The `main`-introduced violation must NOT appear. + ts_in(&temp) + .args(["dev", "lint", "domains", "--changed-vs", "main"]) + .assert() + .code(0); +} + +/// A `--changed-vs` ref that doesn't resolve in any of the four +/// fallback locations is an environment error (exit 2), not a +/// violation (exit 1). +#[test] +fn changed_vs_unknown_ref_exits_two() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write(temp.path().join("a.rs"), "let ok = 1;\n").expect("should write base"); + common::stage_all(&repo); + common::commit_all(&repo, "base"); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--changed-vs", "no-such-ref"]) + .assert() + .code(2); +} + +// === full-repo mode === + +#[test] +fn full_repo_reports_committed_violation() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + std::fs::write( + temp.path().join("bad.rs"), + "let bad = \"https://partner.com\";\n", + ) + .expect("should write bad.rs"); + common::stage_all(&repo); + common::commit_all(&repo, "commit with a violation"); + + ts_in(&temp) + .args(["dev", "lint", "domains"]) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host partner.com")); +} + +/// Binary-level coverage for spec cases 30, 31, 32, 34: paths under +/// `node_modules/`, `.worktrees/`, integrations fixtures, and known +/// lockfiles must be skipped even when they contain a disallowed +/// URL; one violation in a non-excluded file is still reported. +#[test] +fn full_repo_path_exclusions_are_skipped() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + + let bad = "let bad = \"https://test.com\";\n"; + + // Excluded. + let nm = temp.path().join("node_modules"); + std::fs::create_dir_all(&nm).expect("node_modules"); + std::fs::write(nm.join("pkg.js"), bad).expect("write node_modules pkg.js"); + + let wt = temp.path().join(".worktrees/branch"); + std::fs::create_dir_all(&wt).expect(".worktrees/branch"); + std::fs::write(wt.join("a.rs"), bad).expect("write .worktrees a.rs"); + + let fixtures = temp + .path() + .join("crates/trusted-server-core/src/integrations/x/fixtures"); + std::fs::create_dir_all(&fixtures).expect("fixtures dir"); + std::fs::write(fixtures.join("captured.html"), bad).expect("write fixtures captured.html"); + + std::fs::write(temp.path().join("package-lock.json"), bad).expect("write lockfile"); + + // Reported (sole non-excluded file). + std::fs::write(temp.path().join("ok.rs"), bad).expect("write ok.rs"); + + common::stage_all(&repo); + common::commit_all(&repo, "seed mixed paths"); + + let assert = ts_in(&temp) + .args(["dev", "lint", "domains"]) + .assert() + .code(1); + let stdout = String::from_utf8(assert.get_output().stdout.clone()).expect("utf8 stdout"); + assert!( + stdout.contains("ok.rs:1: disallowed host test.com"), + "ok.rs should be reported: {stdout}" + ); + assert!( + !stdout.contains("pkg.js") + && !stdout.contains(".worktrees") + && !stdout.contains("fixtures") + && !stdout.contains("package-lock.json"), + "excluded paths must not appear in the report: {stdout}" + ); + assert!( + stdout.contains("1 disallowed host(s) found"), + "summary should reflect exactly one violation: {stdout}" + ); +} + +/// Explicit absolute path pointing at the linter's own source file +/// must still self-exclude — regression for the absolute-path +/// bypass of `SELF_PATH`. +#[test] +fn explicit_absolute_path_to_self_skips() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let nested = temp.path().join("crates/trusted-server-cli/src/dev/lint"); + std::fs::create_dir_all(&nested).expect("nested dir"); + let self_clone = nested.join("domains.rs"); + std::fs::write(&self_clone, "let bad = \"https://test.com\";\n") + .expect("write fake linter source"); + + let abs = self_clone + .canonicalize() + .expect("should canonicalize self-clone"); + ts_in(&temp) + .args(["dev", "lint", "domains", abs.to_str().expect("utf-8 path")]) + .assert() + .code(0); +} + +// === Markdown coverage (spec cases 36, 37, 39, 40, 42, 43) === + +/// Spec case 37 (autolink), 42 (reference-link target), 43 (image +/// link), 39 (multiple links on one line), 40 (fenced code block). +/// One Markdown file exercises all five forms in one binary +/// invocation. +#[test] +fn markdown_link_variants_all_reported() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("reopen repo"); + let body = "\ +# Doc + +Autolink: +Inline: [bad](https://partner.com) +Image: ![alt](https://test.com/img.png) +Multi: see [a](https://github.com/x) and [b](https://test.com) + +``` +curl https://test.com/foo +``` + +[1]: https://test.com +"; + std::fs::write(temp.path().join("doc.md"), body).expect("write doc.md"); + common::stage_all(&repo); + + let assert = ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1); + let stdout = String::from_utf8(assert.get_output().stdout.clone()).expect("utf8 stdout"); + + // Every line that carries a disallowed host is reported. We + // assert the *line numbers* match the file body exactly. + for needle in [ + "doc.md:3: disallowed host test.com", // autolink + "doc.md:4: disallowed host partner.com", // inline link + "doc.md:5: disallowed host test.com", // image + "doc.md:6: disallowed host test.com", // multi (github.com allowed, test.com flagged) + "doc.md:9: disallowed host test.com", // fenced code block + "doc.md:12: disallowed host test.com", // reference list + ] { + assert!( + stdout.contains(needle), + "expected line `{needle}` in:\n{stdout}" + ); + } +} + +/// Spec case 32 (positive half): an `.html` file outside the +/// integrations-fixtures exclusion is scanned normally. +#[test] +fn html_file_outside_fixtures_is_scanned() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + + let nested = temp.path().join("crates/trusted-server-core/src"); + std::fs::create_dir_all(&nested).expect("nested dir"); + std::fs::write( + nested.join("html_processor.test.html"), + "x\n", + ) + .expect("write html"); + + common::stage_all(&repo); + common::commit_all(&repo, "seed html"); + + ts_in(&temp) + .args(["dev", "lint", "domains"]) + .assert() + .code(1) + .stdout(predicate::str::contains( + "html_processor.test.html:1: disallowed host test.com", + )); +} + +/// Spec case 33: proves the `**/fixtures/**` blanket exclusion was +/// removed. A `.tsx` file under +/// `crates/integration-tests/fixtures/frameworks/nextjs/app/` IS +/// scanned, even though it lives under a `fixtures` directory. +#[test] +fn integration_test_fixture_tsx_is_scanned() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = common::init_repo(temp.path()); + + let nested = temp + .path() + .join("crates/integration-tests/fixtures/frameworks/nextjs/app"); + std::fs::create_dir_all(&nested).expect("nested dir"); + std::fs::write(nested.join("page.tsx"), "fetch(\"https://test.com\");\n") + .expect("write page.tsx"); + + common::stage_all(&repo); + common::commit_all(&repo, "seed nextjs fixture"); + + ts_in(&temp) + .args(["dev", "lint", "domains"]) + .assert() + .code(1) + .stdout(predicate::str::contains( + "page.tsx:1: disallowed host test.com", + )); +} + +/// Spec case 41: a fenced code block in Markdown that references an +/// allowlisted `REFERENCE_HOSTS` URL (`https://docs.rs/clap`) is +/// not flagged. +#[test] +fn markdown_fenced_block_with_allowed_reference_passes() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("reopen repo"); + let body = "\ +# Doc + +``` +cargo add clap # see https://docs.rs/clap +``` +"; + std::fs::write(temp.path().join("doc.md"), body).expect("write doc.md"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +/// Spec case 45: a bare repo (no working tree) yields exit 2 in +/// full-repo mode — there is no working tree to scan. +#[test] +fn full_repo_in_bare_repo_exits_two() { + let temp = tempfile::tempdir().expect("should create tempdir"); + gix::init_bare(temp.path()).expect("should init bare repo"); + + ts_in(&temp) + .args(["dev", "lint", "domains"]) + .assert() + .code(2); +} + +/// Spec case 38: an HTML-comment suppression marker on a Markdown +/// line suppresses the violation; a wrong-host marker still flags +/// the real host and emits a stderr "unused marker" warning. +#[test] +fn markdown_html_comment_suppression() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("reopen repo"); + let body = "\ +ok: see [docs](https://test.com) +bad: see [docs](https://test.com) +"; + std::fs::write(temp.path().join("doc.md"), body).expect("write doc.md"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1) + .stdout(predicate::str::contains("doc.md:1: disallowed host test.com").not()) + .stdout(predicate::str::contains( + "doc.md:2: disallowed host test.com", + )) + .stderr(predicate::str::contains( + "marker listed `other.com` but it does not appear", + )); +} + +// === explicit-path mode === + +#[test] +fn explicit_path_scans_named_file() { + let temp = tempfile::tempdir().expect("should create tempdir"); + std::fs::write( + temp.path().join("named.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write named.rs"); + + ts_in(&temp) + .args(["dev", "lint", "domains", "named.rs"]) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host test.com")); +} + +#[test] +fn explicit_missing_path_exits_two() { + let temp = tempfile::tempdir().expect("should create tempdir"); + ts_in(&temp) + .args(["dev", "lint", "domains", "does-not-exist.rs"]) + .assert() + .code(2); +} + +// === Markdown === + +#[test] +fn markdown_disallowed_link_reported() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("doc.md"), + "See [the tracker](https://test.com) for details.\n", + ) + .expect("should write doc.md"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1) + .stdout(predicate::str::contains( + "doc.md:1: disallowed host test.com", + )); +} + +#[test] +fn markdown_allowed_inline_link_passes() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("doc.md"), + "See [the Fastly docs](https://developer.fastly.com/learning).\n", + ) + .expect("should write doc.md"); + common::stage_all(&repo); + + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(0); +} + +// === Environment cases === + +#[test] +fn outside_git_repo_exits_two() { + let temp = tempfile::tempdir().expect("should create tempdir"); + // No repo initialised — gix::open fails → EnvironmentError → exit 2. + ts_in(&temp) + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(2); +} + +/// The linter must not require a `git` binary on `PATH` — all git +/// work goes through gitoxide. Run with an emptied `PATH` and confirm +/// it still functions. Unix-only (Windows PATH semantics differ). +#[cfg(unix)] +#[test] +fn works_without_git_on_path() { + let temp = repo_with_initial_commit(); + let repo = gix::open(temp.path()).expect("should reopen repo"); + std::fs::write( + temp.path().join("bad.rs"), + "let bad = \"https://test.com\";\n", + ) + .expect("should write bad.rs"); + common::stage_all(&repo); + + ts_in(&temp) + .env_clear() + .env("PATH", "") + .args(["dev", "lint", "domains", "--staged"]) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host test.com")); +} diff --git a/docs/superpowers/plans/2026-05-18-ts-dev-lint-domains.md b/docs/superpowers/plans/2026-05-18-ts-dev-lint-domains.md new file mode 100644 index 00000000..83587b83 --- /dev/null +++ b/docs/superpowers/plans/2026-05-18-ts-dev-lint-domains.md @@ -0,0 +1,2883 @@ +# `ts dev lint domains` Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Ship `ts dev lint domains` and `ts dev install-hooks` as new subcommands of the Trusted Server CLI, with a pre-commit hook integration that prevents commits from introducing non-allowlisted URL hosts in source, config, and documentation files. + +**Architecture:** Add a `dev/` module directory to `trusted-server-cli` that hosts: (a) the existing dev-server behavior, renamed to `ts dev serve`; (b) `ts dev install-hooks` for the one-time hook installer; (c) `ts dev lint domains` for the actual linter. All git operations use the `gix` / `gix-config` crates — no subprocess. URL extraction uses the standard `regex` crate (no lookahead) with three allowlists (`EXACT_HOSTS`, `SUBDOMAIN_HOSTS`, `REFERENCE_HOSTS`). Pre-commit-only enforcement in v1; CI gate is a documented Stage 2 follow-up. + +**Tech Stack:** Rust 2024 edition, `clap` (existing), `regex` (existing), `gix` + `gix-config` (new — versions pinned during the Phase 2 spike), `tempfile` + `assert_cmd` for tests. `error-stack` for error plumbing, `derive_more::Display` per project convention. + +**Spec:** `docs/superpowers/specs/2026-05-18-check-domains-design.md` — every implementation decision below is grounded in a numbered section there. When a task says "per spec §X" it means "open the spec and read section X before implementing this step." + +**Branch base:** `feature/check-domains-spec` (stacked on `origin/feature/ts-cli` / PR #669). All commits land on this branch. + +--- + +## Pre-flight (Phase 0) + +### Task 0.1: Verify prerequisite state + +- [ ] **Step 1: Confirm the branch base** + +Run: `git rev-list --count HEAD ^origin/feature/ts-cli` +Expected: a small positive integer (the existing spec commits on this branch). If `git` complains the ref is unknown, run `git fetch origin feature/ts-cli` first. + +- [ ] **Step 2: Confirm the CLI surface is present** + +Run: `ls crates/trusted-server-cli/src/` +Expected output includes: `audit.rs audit config.rs dev.rs error.rs fastly lib.rs main.rs output.rs`. If `dev.rs` is missing, the rebase onto `feature/ts-cli` did not land — stop and re-establish the branch base. + +- [ ] **Step 3: Confirm the workspace builds clean before any edits** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS with no errors. + +If this fails, the issue is upstream (PR #669 conflict or the workspace is broken); do not start the refactor on a broken base. + +### Task 0.2: Capture the `ts dev` baseline before refactoring + +- [ ] **Step 1: Capture `ts dev --help` output** + +Run: `cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev --help 2>&1 | tee /tmp/ts-dev-help-before.txt` +Expected: clap help text listing `--adapter`, `--config`, `--env`, and a trailing-args mention. The file is the byte-for-byte baseline for the Phase 1 verification. + +- [ ] **Step 2: Capture today's `dev.rs` public API surface** + +Run: `grep -n '^pub ' crates/trusted-server-cli/src/dev.rs > /tmp/ts-dev-pub-api-before.txt && cat /tmp/ts-dev-pub-api-before.txt` +Expected output: + +``` +14:pub enum Adapter { +19:pub fn render_local_fastly_manifest(template: &str, canonical_toml: &str) -> String { +30:pub fn write_local_fastly_manifest( +46:pub fn run_fastly_dev( +102:pub fn run_dev_command( +``` + +These five public items must remain importable from `crate::dev::*` after the refactor (`pub use` re-exports if needed). + +--- + +## Phase 1: Refactor `ts dev` → `ts dev serve` + +Spec §"Why `ts dev` as the parent?" and §"Crate Layout" — `ts dev serve` must preserve every flag and behavior of today's `ts dev` leaf. + +### Task 1.1: Create `dev/` module skeleton, move `dev.rs` body to `dev/serve.rs` + +**Files:** + +- Create: `crates/trusted-server-cli/src/dev/mod.rs` +- Create: `crates/trusted-server-cli/src/dev/serve.rs` +- Delete: `crates/trusted-server-cli/src/dev.rs` + +- [ ] **Step 1: Create `dev/serve.rs` with the existing `dev.rs` body** + +Move the contents of `crates/trusted-server-cli/src/dev.rs` verbatim into `crates/trusted-server-cli/src/dev/serve.rs`. The five `pub` items (`Adapter`, `render_local_fastly_manifest`, `write_local_fastly_manifest`, `run_fastly_dev`, `run_dev_command`) stay public. + +- [ ] **Step 2: Create `dev/mod.rs` as the subcommand-group dispatcher** + +Write: + +```rust +//! `ts dev` subcommand group: developer-workflow commands. +//! +//! Subcommands: +//! - `serve`: launches the local dev server (formerly `ts dev`). +//! - `lint domains`: URL-host linter (Phase 2+). +//! - `install-hooks`: pre-commit hook installer (Phase 6). + +pub mod serve; + +// Re-export the public surface so existing imports +// `crate::dev::{Adapter, run_dev_command, ...}` continue to work. +pub use serve::{ + Adapter, render_local_fastly_manifest, run_dev_command, run_fastly_dev, + write_local_fastly_manifest, FASTLY_LOCAL_MANIFEST, +}; +``` + +- [ ] **Step 3: Delete the old `dev.rs` file** + +Run: `git rm crates/trusted-server-cli/src/dev.rs` + +- [ ] **Step 4: Verify the workspace still builds** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS. If the build fails, an import in `lib.rs` or elsewhere needs adjusting; do not proceed until clean. + +- [ ] **Step 5: Run the existing `dev` tests** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::` +Expected: the three tests in `dev/serve.rs` (`rendered_manifest_embeds_runtime_config_store`, `cargo_target_dir_defaults_to_project_target`, `cargo_target_dir_honors_environment_override`) all PASS. + +- [ ] **Step 6: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/ crates/trusted-server-cli/src/dev.rs +git commit -m "Refactor ts dev into dev/ module with serve.rs + +Move the existing dev-server function body verbatim into dev/serve.rs; +add dev/mod.rs that re-exports the public surface so existing +crate::dev::{...} imports keep working. This is the first half of +splitting ts dev from a leaf command into a subcommand group; the +clap-side change lands in the next commit." +``` + +### Task 1.2: Introduce `DevCommand` enum with `Serve` variant; rewire `lib.rs` + +**Files:** + +- Modify: `crates/trusted-server-cli/src/lib.rs` (lines around 40, 89, 184, 281) +- Modify: `crates/trusted-server-cli/src/dev/mod.rs` + +- [ ] **Step 1: Add the `DevCommand` enum in `dev/mod.rs`** + +Append to `crates/trusted-server-cli/src/dev/mod.rs`: + +```rust +use std::path::PathBuf; + +use clap::{Args, Subcommand}; + +/// Subcommands under `ts dev`. +#[derive(Debug, Subcommand)] +pub enum DevCommand { + /// Launch the local dev server (formerly `ts dev`). + Serve(ServeArgs), +} + +/// Arguments for `ts dev serve`. **Must preserve byte-for-byte the +/// flags of today's `ts dev` leaf** — see spec §"This PR must make +/// the CLI-surface change". +#[derive(Debug, Args)] +pub struct ServeArgs { + #[arg(long, short = 'a', default_value = "fastly")] + pub adapter: Adapter, + #[arg(long)] + pub config: Option, + #[arg(long, default_value = "local")] + pub env: String, + #[arg(trailing_var_arg = true, allow_hyphen_values = true)] + pub passthrough: Vec, +} +``` + +- [ ] **Step 2: Update `lib.rs` to use `DevCommand`** + +In `crates/trusted-server-cli/src/lib.rs`: + +Find: + +```rust + Dev(DevArgs), +``` + +Change to: + +```rust + Dev { + #[command(subcommand)] + command: dev::DevCommand, + }, +``` + +Find and delete the entire `struct DevArgs { ... }` block (lines ~89-99). + +Find: + +```rust + Command::Dev(args) => run_dev(&args), +``` + +Change to: + +```rust + Command::Dev { command } => run_dev(command), +``` + +Find: + +```rust +fn run_dev(args: &DevArgs) -> Result<(), Report> { +``` + +Change the entire function body to: + +```rust +fn run_dev(command: dev::DevCommand) -> Result<(), Report> { + match command { + dev::DevCommand::Serve(args) => run_dev_serve(&args), + } +} + +fn run_dev_serve(args: &dev::ServeArgs) -> Result<(), Report> { + let validated = config::load_validated_config(args.config.as_deref())?; + let status = dev::run_dev_command(args.adapter, &validated, &args.env, &args.passthrough)?; + if status.success() { + Ok(()) + } else { + Err(Report::new(CliError::Development).attach(format!( + "`fastly compute serve` exited with status {status}" + ))) + } +} +``` + +(The body of `run_dev_serve` is literally the body of the old `run_dev` with `args.*` references unchanged. Verify by diffing against the old `run_dev` block.) + +- [ ] **Step 3: Verify the workspace builds** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS. + +- [ ] **Step 4: Verify the `dev serve --help` output preserves the flag contract** + +A byte-for-byte diff against the captured baseline is too brittle — +clap may legitimately reformat headings or the `Usage:` line when +the command moves from a leaf to a child of a subcommand group. +The contract we care about is **flag preservation**, not +help-text identity. Capture the new help text and assert on each +required surface: + +```sh +cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" \ + -- dev serve --help > /tmp/ts-dev-serve-help-after.txt 2>&1 + +# Each flag from the baseline must still be advertised, with the +# same default value where applicable. +grep -q -- '--adapter' /tmp/ts-dev-serve-help-after.txt +grep -q -- '-a' /tmp/ts-dev-serve-help-after.txt +grep -q -E 'default[^]]*fastly' /tmp/ts-dev-serve-help-after.txt +grep -q -- '--config' /tmp/ts-dev-serve-help-after.txt +grep -q -- '--env' /tmp/ts-dev-serve-help-after.txt +grep -q -E 'default[^]]*local' /tmp/ts-dev-serve-help-after.txt +# Trailing passthrough is usually rendered as '[PASSTHROUGH]...' or +# similar; the presence of an ellipsis after the positional name is +# the contract: +grep -q -E '\[.*\]\.\.\.' /tmp/ts-dev-serve-help-after.txt +``` + +All seven greps must exit 0. If any fail, the refactor lost a flag +— fix `ServeArgs` before continuing. Keep the captured baseline +(`/tmp/ts-dev-help-before.txt`) around so you can eyeball-diff if a +grep fails. + +Functional verification (more important than help-text shape): + +```sh +# Trailing args still reach the runner. Use --skip-build so the +# runner doesn't actually try to launch fastly; the failure mode +# should be the documented "no Wasm binary" message, not a +# clap-parse error. +cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" \ + -- dev serve --adapter=fastly --env=local -- --skip-build 2>&1 \ + | grep -q -- '--skip-build was passed' +``` + +Expected: the grep finds the runner's diagnostic, proving the +passthrough arg reached `run_fastly_dev`. If clap rejects the args +or the passthrough is lost, the refactor is broken. + +- [ ] **Step 5: Verify `ts dev --help` now shows a subcommand list** + +Run: `cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev --help` +Expected: clap help text listing `serve` as a subcommand (other subcommands `lint`, `install-hooks` arrive in later phases). No flags listed at the `ts dev` level itself. + +- [ ] **Step 6: Run existing tests** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: all existing tests PASS (no behavior change yet, only structural rename). + +- [ ] **Step 7: Commit** + +```bash +git add crates/trusted-server-cli/src/lib.rs crates/trusted-server-cli/src/dev/mod.rs +git commit -m "Promote ts dev to subcommand group with serve as the first child + +ts dev is no longer a leaf; today's behavior is now ts dev serve, +preserving --adapter, --config, --env, and the trailing passthrough +args byte-for-byte. Verified via diff of --help output against the +captured baseline. Required by spec §'This PR must make the +CLI-surface change' so that ts dev lint domains and ts dev +install-hooks can be added in subsequent commits." +``` + +--- + +## Phase 2: gix feasibility spike + +Spec §"Implementation Readiness" step 1 and §"Cargo dependencies". The spike's deliverables are: (a) pinned matched `gix` + `gix-config` versions; (b) three working integration tests proving the conceptual operations; (c) updates to the spec replacing the `` placeholders. + +### Task 2.1: Add the gix dependencies with provisional versions + +**Files:** + +- Modify: `crates/trusted-server-cli/Cargo.toml` + +- [ ] **Step 1: Find a matched release-family pair** + +Run: `cargo search gix --limit 5` and `cargo search gix-config --limit 5` +Note the latest `gix` version (e.g., `0.66.x`) and look at its release notes (on crates.io / docs.rs) for the corresponding `gix-config` version. **They must come from the same release family** — see spec note "the `gix 0.66` release line shipped with `gix-config 0.39.x`, not `0.40`". Write the chosen pair to `/tmp/gix-pins.txt` in the form `gix=0.x.y\ngix-config=0.a.b`. + +- [ ] **Step 2: Add to `Cargo.toml`** + +In `crates/trusted-server-cli/Cargo.toml` under `[dependencies]`, add: + +```toml +gix = { version = "", default-features = false, features = [ + "blob-diff", + "index", + "revision", +] } +gix-config = "" +``` + +Replace `` and `` with the values from step 1. + +- [ ] **Step 3: Resolve and verify no duplicate versions** + +Run: `cargo update --package gix --package gix-config && cargo tree -p gix -p gix-config 2>&1 | head -40` + +Expected: each crate appears exactly once at the top level. No `(*)` markers indicating duplicate-version entries elsewhere in the tree. If duplicates appear, adjust the version pins until they don't. + +- [ ] **Step 4: Build to confirm the deps compile in this workspace** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/Cargo.toml Cargo.lock +git commit -m "Add gix + gix-config deps for ts dev lint domains spike + +Pinned to a matched release-family pair (verified with +cargo tree -p gix -p gix-config that no duplicate versions land +in the lock). Features limited to blob-diff, index, revision per +spec §'Cargo dependencies'. Feasibility spike tests follow." +``` + +### Task 2.2: Spike test 1 — staged blob diff with new-side line numbers + +**All spike-test commit helpers must use a fixed author/committer +signature**, not rely on the host's `user.name` / `user.email` git +config. A clean CI runner or fresh dev machine without global git +identity would otherwise fail the spike with "please tell me who +you are." The Phase 4 `test_support` module (Task 4.0) documents +the same requirement and pins a `test_signature()` helper; the +spike helpers in Tasks 2.2 / 2.3 should pin an equivalent fixed +signature locally. When the spike succeeds, the same constant can +be reused from `test_support` once that module exists in Phase 4. + +**Files:** + +- Create: `crates/trusted-server-cli/tests/spike_gix_staged_diff.rs` + +- [ ] **Step 1: Write the failing test** + +Create the file with: + +```rust +//! Spike: prove that gix can give us per-blob hunk information for +//! files staged in the index against the HEAD tree, with new-side +//! line numbers. Once this test passes the chosen entry points are +//! pinned for the staged_added_lines() implementation in Phase 4. + +use std::fs; + +use gix::ObjectId; +use tempfile::tempdir; + +#[test] +fn staged_blob_diff_yields_new_side_line_numbers() { + let temp = tempdir().expect("should create tempdir"); + let repo_path = temp.path(); + let repo = gix::init(repo_path).expect("should init gix repo"); + + // Commit 1: a file with three lines. + let file = repo_path.join("a.txt"); + fs::write(&file, "alpha\nbeta\ngamma\n").expect("should write initial file"); + let commit1 = gix_test_util::commit_all(&repo, "initial"); + + // Stage a modification adding a new line at position 2. + fs::write(&file, "alpha\nNEW LINE\nbeta\ngamma\n").expect("should write modification"); + gix_test_util::stage_all(&repo); + + // Call the conceptual operation: enumerate index-vs-HEAD changes, + // and for each modified blob produce hunks with new-side line numbers. + let hunks = gix_test_util::staged_blob_hunks(&repo).expect("should collect staged hunks"); + + // We expect exactly one added line at new-side line 2 with content "NEW LINE". + let added: Vec<(String, usize, String)> = hunks + .into_iter() + .flat_map(|(path, hunks)| { + hunks.into_iter().flat_map(move |h| { + h.added_lines + .into_iter() + .map(|(ln, c)| (path.clone(), ln, c)) + .collect::>() + }) + }) + .collect(); + + assert_eq!(added.len(), 1, "should have one added line: {added:?}"); + assert_eq!(added[0].0, "a.txt", "path"); + assert_eq!(added[0].1, 2, "new-side line number"); + assert_eq!(added[0].2, "NEW LINE", "content"); + + let _ = commit1; // keep variable name visible in failure context +} + +mod gix_test_util { + //! Helpers that pin the specific gix entry points used by the + //! production code in Phase 4. The signatures here are stable; + //! the bodies use whatever gix APIs work in the pinned version. + + use super::*; + + pub fn commit_all(_repo: &gix::Repository, _msg: &str) -> ObjectId { + unimplemented!("call into gix to stage everything and commit; \ + return the new commit id") + } + + pub fn stage_all(_repo: &gix::Repository) { + unimplemented!("call into gix to update the index from working tree") + } + + pub struct Hunk { + pub added_lines: Vec<(usize, String)>, + } + + pub fn staged_blob_hunks( + _repo: &gix::Repository, + ) -> Result)>, Box> { + unimplemented!("compare HEAD tree vs index; for each modified entry, \ + load old + new blobs and run a line diff; return hunks") + } +} +``` + +- [ ] **Step 2: Run the test to verify it fails** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --test spike_gix_staged_diff` +Expected: FAIL with `unimplemented!()` panic. + +- [ ] **Step 3: Implement the three `gix_test_util` helpers using the pinned gix version** + +Replace the `unimplemented!()` bodies with real calls. Start with `commit_all` (gix exposes commit-creation via `repo.commit("HEAD", msg, tree, parents)` or equivalent in the pinned version). Then `stage_all` (write the working tree to the index). Finally `staged_blob_hunks` — the most involved: + +1. Open the HEAD tree via `repo.head_commit()?.tree()?`. +2. Read the index via `repo.index()?`. +3. Walk index-vs-tree changes. In the pinned gix version, this lives under one of: `gix::diff::tree_with_rewrites`, `gix::object::tree::diff::Platform`, or `gix::index::diff_against_tree` — pick the one that exists and produces `(path, old_blob_id, new_blob_id)` triples for modified/added entries. +4. For each changed entry, load the old blob (or empty for additions) and the new blob. +5. Run a blob line diff. In gix this is `gix_diff::blob::diff` driven by `imara_diff`. Collect `(post_image_line_no, content)` for each insertion. + +When the test passes, **document the exact entry-point names you used** in `/tmp/gix-api-pins.txt` — these get copy-pasted into the spec in Task 2.5. + +- [ ] **Step 4: Run the test to verify it passes** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --test spike_gix_staged_diff` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/tests/spike_gix_staged_diff.rs +git commit -m "Spike: staged-diff gix entry points pinned + +Proves we can enumerate index-vs-HEAD changes, load the old and new +blobs per changed entry, and produce blob-diff hunks with new-side +line numbers and content — the contract Phase 4's +staged_added_lines() relies on. The exact gix entry points used will +be reflected in the spec's prototype-required callout once the spike +batch is complete." +``` + +### Task 2.3: Spike test 2 — merge-base + tree-vs-tree blob diff + +**Files:** + +- Create: `crates/trusted-server-cli/tests/spike_gix_changed_vs.rs` + +- [ ] **Step 1: Write the failing test** + +```rust +//! Spike: prove that gix can compute a merge-base between two refs +//! and then run a tree-vs-tree diff with the same blob-diff hunks +//! used by the staged path. Locks in the API for +//! changed_vs_added_lines() in Phase 4. + +use std::fs; + +use tempfile::tempdir; + +#[test] +fn merge_base_then_tree_diff_yields_added_lines() { + let temp = tempdir().expect("should create tempdir"); + let repo_path = temp.path(); + let repo = gix::init(repo_path).expect("should init gix repo"); + + // main: commit a single line on a branch named "main". + let file = repo_path.join("a.txt"); + fs::write(&file, "one\n").expect("should write base file"); + let _base = spike_helpers::commit_all_as_branch(&repo, "main", "first"); + + // feature: branch off main, add another line. + spike_helpers::create_and_checkout_branch(&repo, "feature"); + fs::write(&file, "one\ntwo\n").expect("should write feature-branch change"); + let _head = spike_helpers::commit_all(&repo, "second"); + + // Conceptual operation: merge-base("main", HEAD) then diff the + // merge-base tree against HEAD tree. + let added = spike_helpers::changed_vs_ref(&repo, "main") + .expect("should compute changed-vs added lines"); + + assert_eq!( + added, + vec![("a.txt".to_string(), 2usize, "two".to_string())], + "should report only the line added by the feature branch" + ); +} + +mod spike_helpers { + use super::*; + use gix::ObjectId; + + pub fn commit_all_as_branch(_r: &gix::Repository, _b: &str, _m: &str) -> ObjectId { + unimplemented!("stage + commit on the given branch ref") + } + + pub fn create_and_checkout_branch(_r: &gix::Repository, _b: &str) { + unimplemented!("create branch ref pointing at HEAD; move HEAD to it") + } + + pub fn commit_all(_r: &gix::Repository, _m: &str) -> ObjectId { + unimplemented!("stage + commit on current ref") + } + + pub fn changed_vs_ref( + _r: &gix::Repository, + _ref_name: &str, + ) -> Result, Box> { + unimplemented!( + "resolve ref via the four-fallback order (see spec \ + §'Base-ref resolution order'), compute merge-base with \ + HEAD, diff base-tree vs HEAD-tree, return (path, \ + new-side line, content) for each added line" + ) + } +} +``` + +- [ ] **Step 2: Run to verify it fails** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --test spike_gix_changed_vs` +Expected: FAIL with `unimplemented!()`. + +- [ ] **Step 3: Implement the helpers** + +`changed_vs_ref` is the load-bearing one: + +1. Resolve `_ref_name` per the spec's four-fallback order: ``, `refs/heads/`, `refs/remotes/origin/`, `refs/tags/`. Return the first that resolves to an object id. +2. Compute merge-base via `repo.merge_base(base_id, head_id)`. +3. Get the trees: `repo.find_commit(merge_base)?.tree()?` and `repo.find_commit(head_id)?.tree()?`. +4. Run tree-vs-tree diff via the same primitives used in Task 2.2. +5. For each changed blob, run the blob diff and collect `(path, new_line_no, content)` for insertions. + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --test spike_gix_changed_vs` +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/tests/spike_gix_changed_vs.rs +git commit -m "Spike: merge-base and tree-vs-tree gix entry points pinned + +Drives the conceptual operation for --changed-vs mode: resolve +the ref via the spec's four-fallback order, compute merge-base with +HEAD, diff the merge-base tree against HEAD tree, and yield added-line +hunks with new-side line numbers. Same blob-diff primitive as the +staged spike." +``` + +### Task 2.4: Spike test 3 — durable `core.hooksPath` write via `gix-config::File` + +**Files:** + +- Create: `crates/trusted-server-cli/tests/spike_gix_config_write.rs` + +- [ ] **Step 1: Write the failing test** + +```rust +//! Spike: prove that gix-config::File can read and write +//! /.git/config so that ts dev install-hooks can persist +//! core.hooksPath without subprocess. Locks the read/write APIs +//! for Phase 6. + +use std::fs; +use tempfile::tempdir; + +#[test] +fn write_core_hooks_path_via_gix_config_persists_to_disk() { + let temp = tempdir().expect("should create tempdir"); + let repo_path = temp.path(); + let _repo = gix::init(repo_path).expect("should init gix repo"); + + spike_helpers::set_local_config_value( + repo_path, + "core", + None, + "hooksPath", + ".githooks", + ) + .expect("should write core.hooksPath via gix-config"); + + // Read via gix-config and confirm. + let value = spike_helpers::read_local_config_value( + repo_path, + "core", + None, + "hooksPath", + ) + .expect("should read core.hooksPath back"); + assert_eq!(value.as_deref(), Some(".githooks")); + + // Sanity: reading directly off disk should show the section + // and key in canonical format. + let on_disk = fs::read_to_string(repo_path.join(".git/config")) + .expect("should read .git/config from disk"); + assert!( + on_disk.contains("[core]") && on_disk.contains("hooksPath"), + "should contain core/hooksPath: {on_disk:?}" + ); +} + +#[test] +fn read_local_config_value_returns_none_when_unset() { + let temp = tempdir().expect("should create tempdir"); + let repo_path = temp.path(); + let _repo = gix::init(repo_path).expect("should init gix repo"); + + let value = spike_helpers::read_local_config_value( + repo_path, + "core", + None, + "hooksPath", + ) + .expect("should read core.hooksPath (returning None)"); + assert!(value.is_none(), "unset value reads as None: {value:?}"); +} + +mod spike_helpers { + use std::path::Path; + + pub fn set_local_config_value( + _repo_path: &Path, + _section: &str, + _subsection: Option<&str>, + _key: &str, + _value: &str, + ) -> Result<(), Box> { + unimplemented!( + "use gix_config::File::from_path_no_includes on \ + /.git/config (or default()), set_raw_value_by, \ + serialize, write atomically (temp + rename)" + ) + } + + pub fn read_local_config_value( + _repo_path: &Path, + _section: &str, + _subsection: Option<&str>, + _key: &str, + ) -> Result, Box> { + unimplemented!( + "gix_config::File::from_path_no_includes; raw_value_by; \ + return None if file or key absent" + ) + } +} +``` + +- [ ] **Step 2: Run to verify it fails** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --test spike_gix_config_write` +Expected: both tests FAIL with `unimplemented!()`. + +- [ ] **Step 3: Implement the two helpers** + +The set helper: read existing `.git/config` via `gix_config::File::from_path_no_includes(path, gix_config::Source::Local)`, fall back to `File::default()` if missing; call `set_raw_value_by(section, subsection, key, value.as_bytes())`; serialize via `to_bstring()`; write atomically (write to `config.tmp.`, then `rename` to `config`). + +The read helper: same `from_path_no_includes`, then `raw_value_by(section, subsection, key)`. Return `Ok(None)` if the file is absent or the key is missing. + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --test spike_gix_config_write` +Expected: both tests PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/tests/spike_gix_config_write.rs +git commit -m "Spike: gix-config File read/write entry points pinned + +Drives the conceptual operations for ts dev install-hooks: +set_local_config_value (atomic write to /.git/config via +gix_config::File) and read_local_config_value (returns None for +unset, used by the core.hooksPath preflight). Atomic write uses +temp file + rename so a partial write never lands." +``` + +### Task 2.5: Update the spec with the pinned versions and entry points + +**Files:** + +- Modify: `docs/superpowers/specs/2026-05-18-check-domains-design.md` + +- [ ] **Step 1: Replace the version placeholders** + +In the Cargo dependencies block, change `` and `` to the concrete versions from `/tmp/gix-pins.txt`. Add a trailing comment noting the release family (e.g., `# gix 0.66 release family`). + +- [ ] **Step 2: Update Open Question 5 with the chosen gix API entry points** + +In the Open Questions section, change Q5 from "prototype-required" to a RESOLVED list naming the concrete functions you used in the three spike tests (e.g., `gix::index::Platform::diff_against_tree`, `gix_diff::blob::diff` — whatever you actually used). + +- [ ] **Step 3: Update Open Question 6 with the pinned versions** + +Resolve Q6 with the chosen pair and a one-line note about why this pair. + +- [ ] **Step 4: Update the prototype-required callout in the staged-mode section** + +In the "Line collection: --staged mode (gitoxide)" section, change the "prototype-required" callout to a resolved one naming the entry points and pointing at `tests/spike_gix_staged_diff.rs` as the reference implementation. + +- [ ] **Step 5: Commit** + +```bash +git add docs/superpowers/specs/2026-05-18-check-domains-design.md +git commit -m "Reflect gix feasibility spike outcomes in the spec + +Replace / +placeholders with the matched pair pinned in the spike commits. +Resolve Open Questions 5 and 6 with the concrete API entry points +used by tests/spike_gix_*.rs. Update the prototype-required +callout in the staged-mode section to name those entry points." +``` + +--- + +## Phase 3: URL extraction + allowlist + suppression (pure functions) + +Spec §"Allowlist (Rust constants)", §"URL extraction (without lookahead)", §"Suppression marker regex", §"Allow check". This phase produces no CLI surface — only pure functions exercised by unit tests. + +### Task 3.1: Create `dev/lint/` module skeleton + constants + +**Files:** + +- Create: `crates/trusted-server-cli/src/dev/lint/mod.rs` +- Create: `crates/trusted-server-cli/src/dev/lint/domains.rs` +- Modify: `crates/trusted-server-cli/src/dev/mod.rs` + +- [ ] **Step 1: Create `dev/lint/mod.rs`** + +```rust +//! `ts dev lint` subcommand group: linters for source/config/docs. +//! +//! Subcommands: +//! - `domains`: URL-host linter (this design). + +pub mod domains; +``` + +- [ ] **Step 2: Create `dev/lint/domains.rs` with the three allowlist arrays and reserved TLDs** + +Copy the verbatim lists from the spec (§"Exact-match hosts", §"Subdomain-permitting hosts", §"Reference / doc hosts"). Each entry gets a trailing `//`-comment naming the integration / category per the spec's maintenance policy. + +Skeleton: + +```rust +//! `ts dev lint domains` — URL-host linter. +//! +//! Design: docs/superpowers/specs/2026-05-18-check-domains-design.md + +use core::error::Error; + +use derive_more::Display; + +/// Integration proxies and loopback hosts that must match exactly. +/// Subdomains are NOT allowed (e.g., `anything.api.privacy-center.org` +/// is disallowed). See spec §"Exact-match hosts" for the policy. +pub const EXACT_HOSTS: &[&str] = &[ + // Loopback + "127.0.0.1", + "::1", + "localhost", + // didomi + "api.privacy-center.org", + "sdk.privacy-center.org", + // sourcepoint + "cdn.privacy-mgmt.com", + // lockr + "aim.loc.kr", + "identity.loc.kr", + // datadome + "js.datadome.co", + "api-js.datadome.co", + // aps / Amazon + "aax.amazon-adsystem.com", + "aax-events.amazon-adsystem.com", + // permutive + "api.permutive.com", + "secure-signals.permutive.app", + "cdn.permutive.com", + // Google Tag Manager / Analytics + "www.googletagmanager.com", + "www.google-analytics.com", + "analytics.google.com", + // adserver mock + "securepubads.g.doubleclick.net", + "origin-mocktioneer.cdintel.com", + // Prebid CDN + "cdn.prebid.org", + // Fastly platform + "api.fastly.com", +]; + +/// Hosts where exact match AND any subdomain (`*.host`) is allowed. +/// See spec §"Subdomain-permitting hosts" and §"Allowlist +/// Maintenance Policy" for the bar to add an entry here. +pub const SUBDOMAIN_HOSTS: &[&str] = &[ + // IANA RFC 2606 reserved + "example.com", + "example.net", + "example.org", + // Permutive: runtime host is {organization_id}.edge.permutive.app + "edge.permutive.app", +]; + +/// Well-known documentation and specification sources. Exact-match, +/// allowed in every scanned file. See spec §"Reference / doc hosts" +/// for the curated list (seeded from a sampling; expected to grow +/// during Stage 1 doc cleanup). +pub const REFERENCE_HOSTS: &[&str] = &[ + // Git / GitHub + "github.com", + "docs.github.com", + "help.github.com", + "token.actions.githubusercontent.com", + // Git commit conventions + "chris.beams.io", + // Rust + "docs.rs", + "doc.rust-lang.org", + "crates.io", + // Web / W3C standards + "www.w3.org", + "schema.org", + // Versioning / changelogs + "semver.org", + "keepachangelog.com", + // IAB Tech Lab + "iab.com", + "iabtechlab.com", + "iabtechlab.github.io", + "iabeurope.github.io", + // Specs (supply chain) + "in-toto.io", + "rslstandard.org", + // Specs (other) + "webassembly.org", + // Fastly docs + "www.fastly.com", + "developer.fastly.com", + "manage.fastly.com", + // Cloudflare docs + "developers.cloudflare.com", + // Vendor docs + "docs.datadome.co", + "docs.prebid.org", + // Tooling docs + "vitepress.dev", + "playwright.dev", + "testcontainers.com", + "grafana.com", + "docsearch.algolia.com", +]; + +/// IANA RFC 2606 reserved TLDs. Any host ending in one of these is allowed. +pub const RESERVED_TLDS: &[&str] = &[".example", ".test", ".invalid", ".localhost"]; + +#[derive(Debug, Display)] +pub enum DomainsLintError { + #[display("failed to open git repository")] + OpenRepo, + #[display("failed to read git index")] + Index, + #[display("failed to compute diff")] + Diff, + #[display("failed to resolve reference `{_0}`")] + Reference(String), + #[display("failed to compute merge-base of `{base}` and HEAD")] + MergeBase { base: String }, + #[display("failed to read file `{_0}`")] + ReadFile(std::path::PathBuf), + #[display("path not found: `{_0}`")] + PathNotFound(std::path::PathBuf), + #[display("permission denied reading `{_0}`")] + PermissionDenied(std::path::PathBuf), + #[display("invalid mode combination")] + InvalidMode, + /// Failure writing a warning to stderr (broken pipe, etc.). + /// Used by the in-module `warn` helper so collectors can call + /// `crate::output::write_stderr_line` and still return + /// `Report` consistently. + #[display("I/O error writing warning to stderr")] + WriteWarning, +} +impl Error for DomainsLintError {} + +/// In-module warning helper. Wraps the CLI's `write_stderr_line` +/// (which returns `Report`) so that callers inside +/// `domains` can stay on `Report` without +/// inventing custom `?` conversions at every call site. +fn warn(msg: impl Into) + -> Result<(), error_stack::Report> +{ + use error_stack::ResultExt; + crate::output::write_stderr_line(msg.into()) + .change_context(DomainsLintError::WriteWarning) +} +``` + +- [ ] **Step 3: Add `lint` to `dev/mod.rs`** + +In `crates/trusted-server-cli/src/dev/mod.rs`, append: + +```rust +pub mod lint; +``` + +- [ ] **Step 4: Verify the workspace builds** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS (with a couple of "unused" warnings for the new constants — fine, they're consumed in subsequent tasks). + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/ crates/trusted-server-cli/src/dev/mod.rs +git commit -m "Scaffold dev/lint/domains.rs with allowlist constants + +EXACT_HOSTS, SUBDOMAIN_HOSTS, REFERENCE_HOSTS, RESERVED_TLDS, and +the DomainsLintError enum per spec §'Allowlist' sections. Pure +constants only; the allow check, URL extraction, and suppression +parsing arrive in subsequent commits." +``` + +### Task 3.2: Implement `normalise_host` (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing tests** + +Append to `domains.rs`: + +```rust +fn normalise_host(raw: &str) -> String { + todo!("strip surrounding [ ] for bracketed IPv6; lowercase") +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn normalise_lowercases() { + assert_eq!(normalise_host("EXAMPLE.COM"), "example.com"); + assert_eq!(normalise_host("Foo.Example.Com"), "foo.example.com"); + } + + #[test] + fn normalise_strips_ipv6_brackets() { + assert_eq!(normalise_host("[::1]"), "::1"); + assert_eq!(normalise_host("[2001:DB8::1]"), "2001:db8::1"); + } + + #[test] + fn normalise_passthrough_for_plain_hosts() { + assert_eq!(normalise_host("test.com"), "test.com"); + assert_eq!(normalise_host("127.0.0.1"), "127.0.0.1"); + } +} +``` + +- [ ] **Step 2: Run to verify tests fail** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::tests::normalise` +Expected: 3 FAIL with `not yet implemented`. + +- [ ] **Step 3: Implement** + +Replace the `todo!()` body with: + +```rust +fn normalise_host(raw: &str) -> String { + let trimmed = raw.trim_start_matches('[').trim_end_matches(']'); + trimmed.to_lowercase() +} +``` + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::tests::normalise` +Expected: 3 PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Add normalise_host: bracket-strip + lowercase + +Tested against IPv6 bracket forms (case-insensitive), regular +lowercase, and pass-through cases. Pure function; no I/O." +``` + +### Task 3.3: Implement `is_allowed` (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing tests** + +Append: + +```rust +use std::collections::HashSet; + +fn is_allowed(host: &str, suppressed_on_line: &HashSet) -> bool { + todo!("see spec §'Allow check'") +} + +#[cfg(test)] +mod allow_check_tests { + use super::*; + + fn nothing_suppressed() -> HashSet { HashSet::new() } + + #[test] + fn exact_match_allows() { + assert!(is_allowed("api.fastly.com", ¬hing_suppressed())); + assert!(is_allowed("127.0.0.1", ¬hing_suppressed())); + } + + #[test] + fn exact_only_rejects_subdomain() { + // api.fastly.com is exact-only; v2.api.fastly.com is allowed + // by the subdomain rule on api.fastly.com (any subdomain of + // an EXACT host is NOT allowed) — wait, re-read spec. + // Per spec §"Worked examples": api.fastly.com EXACT-list + // allows v2.api.fastly.com (subdomain rule applies to BOTH + // arrays). Re-confirm before changing. + // Actually the spec says SUBDOMAIN_HOSTS adds the + // subdomain rule; EXACT_HOSTS is exact-only. + // So: api.fastly.com exact, v2.api.fastly.com NOT allowed. + assert!(!is_allowed("v2.api.fastly.com", ¬hing_suppressed())); + assert!(!is_allowed("anything.api.privacy-center.org", ¬hing_suppressed())); + } + + #[test] + fn subdomain_list_allows_apex_and_subdomains() { + assert!(is_allowed("example.com", ¬hing_suppressed())); + assert!(is_allowed("foo.example.com", ¬hing_suppressed())); + assert!(is_allowed("a.b.example.com", ¬hing_suppressed())); + assert!(is_allowed("example.net", ¬hing_suppressed())); + assert!(is_allowed("assets.example.net", ¬hing_suppressed())); + } + + #[test] + fn lookalike_attack_rejected() { + // example.com.evil.com is not a subdomain of example.com. + assert!(!is_allowed("example.com.evil.com", ¬hing_suppressed())); + assert!(!is_allowed("notexample.com", ¬hing_suppressed())); + } + + #[test] + fn reserved_tld_allows() { + assert!(is_allowed("testlight.example", ¬hing_suppressed())); + assert!(is_allowed("something.test", ¬hing_suppressed())); + assert!(is_allowed("thing.invalid", ¬hing_suppressed())); + assert!(is_allowed("my.localhost", ¬hing_suppressed())); + } + + #[test] + fn reference_hosts_allowed_everywhere() { + assert!(is_allowed("github.com", ¬hing_suppressed())); + assert!(is_allowed("docs.rs", ¬hing_suppressed())); + // But NOT subdomains of REFERENCE_HOSTS (exact-match). + assert!(!is_allowed("other.github.com", ¬hing_suppressed())); + } + + #[test] + fn suppression_set_allows() { + let mut suppressed = HashSet::new(); + suppressed.insert("evil.com".to_string()); + assert!(is_allowed("evil.com", &suppressed)); + } + + #[test] + fn rejects_unrelated_host() { + assert!(!is_allowed("test.com", ¬hing_suppressed())); + assert!(!is_allowed("1.2.3.4", ¬hing_suppressed())); + assert!(!is_allowed("192.168.1.1", ¬hing_suppressed())); + } +} +``` + +- [ ] **Step 2: Run to verify tests fail** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::allow_check_tests` +Expected: 8 FAIL with `not yet implemented`. + +- [ ] **Step 3: Implement** + +Replace the `todo!()` body with: + +```rust +fn is_allowed(host: &str, suppressed_on_line: &HashSet) -> bool { + if suppressed_on_line.contains(host) { return true; } + if RESERVED_TLDS.iter().any(|t| host.ends_with(t)) { return true; } + if EXACT_HOSTS.iter().any(|e| host == *e) { return true; } + if REFERENCE_HOSTS.iter().any(|e| host == *e) { return true; } + if SUBDOMAIN_HOSTS.iter().any(|e| { + host == *e || host.ends_with(&format!(".{}", e)) + }) { return true; } + false +} +``` + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::allow_check_tests` +Expected: 8 PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Add is_allowed implementing the three-array check + +Pure function: suppressed-set short-circuit, reserved-TLD suffix, +exact-match against EXACT_HOSTS and REFERENCE_HOSTS, subdomain +rule against SUBDOMAIN_HOSTS. Eight tests cover the worked +examples from spec §'Matching summary'." +``` + +### Task 3.4: Implement absolute-URL extraction (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing tests** + +Append: + +```rust +use regex::Regex; +use std::sync::OnceLock; + +fn absolute_url_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + // (?i) case-insensitive; host must start with alphanumeric to + // reject placeholders like https://... + // (?:[^/?\s#]+@)? skips RFC 3986 userinfo so a deceiving + // https://github.com@test.com/path reports test.com. + Regex::new( + r"(?i)https?://(?:[^/?\s#]+@)?(\[[0-9a-fA-F:]+\]|[A-Za-z0-9][A-Za-z0-9.\-]*)", + ) + .expect("should compile absolute URL regex") + }) +} + +fn extract_absolute_hosts(line: &str) -> Vec { + todo!("apply absolute_url_regex, capture group 1, normalise each match") +} + +#[cfg(test)] +mod absolute_url_tests { + use super::*; + + #[test] + fn extracts_plain() { + assert_eq!( + extract_absolute_hosts("see https://example.com/path here"), + vec!["example.com"] + ); + } + + #[test] + fn extracts_bracketed_ipv6() { + assert_eq!( + extract_absolute_hosts("dial http://[::1]:8080/"), + vec!["::1"] + ); + } + + #[test] + fn extracts_uppercase_normalised() { + assert_eq!( + extract_absolute_hosts("HTTPS://Example.COM/x"), + vec!["example.com"] + ); + } + + #[test] + fn rejects_dots_only_placeholder() { + assert!(extract_absolute_hosts("see https://... for an example").is_empty()); + } + + #[test] + fn handles_punctuation_wrapping() { + // The regex stops at any character not in [A-Za-z0-9.-]; + // wrapping punctuation falls outside the capture. + for s in [ + "\"https://example.com\",", + "(https://example.com)", + "", + ] { + assert_eq!(extract_absolute_hosts(s), vec!["example.com"], "input: {s}"); + } + } + + #[test] + fn extracts_multiple_per_line() { + assert_eq!( + extract_absolute_hosts( + "see [a](https://github.com/x) and [b](https://example.com/y)" + ), + vec!["github.com", "example.com"] + ); + } +} +``` + +- [ ] **Step 2: Run to verify tests fail** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::absolute_url_tests` +Expected: 6 FAIL. + +- [ ] **Step 3: Implement** + +Replace the `todo!()` body with: + +```rust +fn extract_absolute_hosts(line: &str) -> Vec { + absolute_url_regex() + .captures_iter(line) + .filter_map(|c| c.get(1).map(|m| normalise_host(m.as_str()))) + .collect() +} +``` + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::absolute_url_tests` +Expected: 6 PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Add extract_absolute_hosts using the no-lookahead regex + +Standard regex crate; host must start with an alphanumeric to reject +https://... placeholder noise. Six tests cover plain, bracketed +IPv6, case-insensitive, punctuation wrapping, multi-per-line, and +the malformed-host rejection from spec test 20a." +``` + +### Task 3.5: Implement protocol-relative URL extraction (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing tests** + +Append: + +```rust +fn protocol_relative_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + // Boundary class: start-of-line, whitespace, quotes, paren, + // =, <, >, {, [, ], comma, backtick. NOT colon (would + // double-match absolute URLs). + // (?:[^/?\s#]+@)? skips userinfo for bypass prevention, + // same reason as the absolute URL regex. + Regex::new( + r"(?i)(?:^|[\s\"'(=<>{,\[\]`])//(?:[^/?\s#]+@)?([A-Za-z0-9][A-Za-z0-9.\-]*\.[A-Za-z]{2,})", + ) + .expect("should compile protocol-relative URL regex") + }) +} + +fn extract_protocol_relative_hosts(line: &str) -> Vec { + todo!("apply protocol_relative_regex, capture group 1, normalise") +} + +#[cfg(test)] +mod protocol_relative_tests { + use super::*; + + #[test] + fn extracts_after_quote() { + assert_eq!( + extract_protocol_relative_hosts("src=\"//www.googletagmanager.com/gtm.js\""), + vec!["www.googletagmanager.com"] + ); + } + + #[test] + fn extracts_after_start_of_line() { + assert_eq!( + extract_protocol_relative_hosts("//cdn.example.evil/foo"), + vec!["cdn.example.evil"] + ); + } + + #[test] + fn extracts_template_literal_backtick() { + assert_eq!( + extract_protocol_relative_hosts("`//cdn.example.evil/${path}`"), + vec!["cdn.example.evil"] + ); + } + + #[test] + fn extracts_json_object_value() { + assert_eq!( + extract_protocol_relative_hosts("{\"src\": \"//cdn.example.evil/x\"}"), + vec!["cdn.example.evil"] + ); + } + + #[test] + fn does_not_match_colon_prefix() { + // http://foo.com — // is preceded by ':', NOT in the boundary class. + assert!(extract_protocol_relative_hosts("http://foo.com/x").is_empty()); + } + + #[test] + fn does_not_match_code_comment_divider() { + // The trailing TLD-like constraint (.{2,}) filters this out; + // "comment text" has no dotted-suffix. + assert!(extract_protocol_relative_hosts("// comment text").is_empty()); + } +} +``` + +- [ ] **Step 2: Run to verify failure** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::protocol_relative_tests` +Expected: 6 FAIL. + +- [ ] **Step 3: Implement** + +```rust +fn extract_protocol_relative_hosts(line: &str) -> Vec { + protocol_relative_regex() + .captures_iter(line) + .filter_map(|c| c.get(1).map(|m| normalise_host(m.as_str()))) + .collect() +} +``` + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::protocol_relative_tests` +Expected: 6 PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Add extract_protocol_relative_hosts with boundary class + +Boundary class includes start-of-line, whitespace, quotes, paren, +=, <, >, {, [, ], comma, backtick — covers HTML attribute values, +JS template literals, JSON object values. Deliberately excludes +':' to avoid double-matching absolute URLs (where '//' is preceded +by the scheme separator). Six tests cover the cases from spec +§'Protocol-relative URL regex'." +``` + +### Task 3.6: Implement suppression-marker parsing (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing tests** + +Append: + +```rust +fn suppression_marker_regex() -> &'static Regex { + static R: OnceLock = OnceLock::new(); + R.get_or_init(|| { + Regex::new( + r"(?im)(?:^|\s)(?://|\#||$)", + ) + .expect("should compile suppression marker regex") + }) +} + +/// Result of parsing a line for a suppression marker. +#[derive(Debug, Default, PartialEq, Eq)] +pub struct LineSuppression { + /// Hosts listed in the marker (post-trim, lowercased). + pub suppressed: HashSet, + /// Hosts listed but found nowhere on this line; emitted as a + /// stderr warning later. + pub _unused: Vec, +} + +fn parse_suppression_marker(line: &str) -> LineSuppression { + todo!("apply regex, capture group 1, split on ',', trim, lowercase, drop empties") +} + +#[cfg(test)] +mod suppression_tests { + use super::*; + + fn parse(line: &str) -> HashSet { + parse_suppression_marker(line).suppressed + } + + #[test] + fn single_host_after_slash_comment() { + let got = parse("let x = \"https://evil.com\"; // allow-domain: evil.com"); + let expected: HashSet = ["evil.com".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } + + #[test] + fn html_comment_form_with_trailing_space() { + // Captured group includes trailing space before --> ; trim handles it. + let got = parse(""); + let expected: HashSet = ["test.com".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } + + #[test] + fn hash_comment_form() { + let got = parse("upstream = \"https://evil.com\" # allow-domain: evil.com"); + let expected: HashSet = ["evil.com".to_string()].into_iter().collect(); + assert_eq!(got, expected); + } + + #[test] + fn multi_host_with_whitespace() { + let got = parse("// allow-domain: a.com , b.com , c.com"); + let expected: HashSet = ["a.com", "b.com", "c.com"] + .iter().map(|s| s.to_string()).collect(); + assert_eq!(got, expected); + } + + #[test] + fn bypass_attempt_url_path_lookalike_not_suppressed() { + // 'allow-domain' inside a URL path is NOT a comment. + let got = parse("fetch(\"https://evil.com/allow-domain\")"); + assert!(got.is_empty(), "URL-path content must not suppress: {got:?}"); + } + + #[test] + fn bypass_attempt_pathological_host_named_allow_domain() { + // https://allow-domain:8080/path — the // is preceded by ':', + // not whitespace/SOL, so the marker anchor fails. + let got = parse("let x = \"https://allow-domain:8080/path\";"); + assert!(got.is_empty(), "pathological host must not suppress: {got:?}"); + } +} +``` + +- [ ] **Step 2: Run to verify failure** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::suppression_tests` +Expected: 6 FAIL. + +- [ ] **Step 3: Implement** + +```rust +fn parse_suppression_marker(line: &str) -> LineSuppression { + let mut out = LineSuppression::default(); + let Some(caps) = suppression_marker_regex().captures(line) else { return out }; + let Some(m) = caps.get(1) else { return out }; + for host in m.as_str().split(',') { + let host = host.trim(); + if !host.is_empty() { + out.suppressed.insert(host.to_lowercase()); + } + } + out +} +``` + +(`_unused` is populated later by `scan_line` once it knows which hosts actually appeared.) + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::suppression_tests` +Expected: 6 PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Add parse_suppression_marker with bypass-resistant anchor + +Marker regex requires start-of-line or whitespace before the comment +introducer (//, #, in HTML form). Six tests +include the two documented bypass attempts (URL-path 'allow-domain' +substring; pathological host literally named 'allow-domain')." +``` + +### Task 3.7: Implement `scan_line` (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +`scan_line` returns **two** things: the violations and an +"unused suppression" report. Per spec §"Per-Line Suppression": +"Each host listed must actually match a violation on that line; if a +listed host does not appear among the line's violations, a warning +is emitted (stderr) but the suppression for matched hosts still +applies." The unused list is what the caller emits as the stderr +warning. + +- [ ] **Step 1: Write failing tests** + +Append: + +```rust +/// One reported violation on a scanned line. +#[derive(Debug, PartialEq, Eq)] +pub struct LineViolation { + pub host: String, +} + +/// Result of scanning one source line. +#[derive(Debug, Default, PartialEq, Eq)] +pub struct LineScanOutcome { + pub violations: Vec, + /// Hosts that the line's `allow-domain:` marker listed but that + /// did not appear among the extracted hosts. Caller emits these + /// as a stderr warning ("listed in allow-domain marker but no + /// matching host on the line"). + pub unused_suppressions: Vec, +} + +/// Scan one source line; return violations and any unused +/// suppression-marker entries. +pub fn scan_line(line: &str) -> LineScanOutcome { + todo!("collect absolute + protocol-relative hosts, apply suppression, \ + filter via is_allowed, compute unused = listed - extracted") +} + +#[cfg(test)] +mod scan_line_tests { + use super::*; + + fn hosts(line: &str) -> Vec { + scan_line(line).violations.into_iter().map(|v| v.host).collect() + } + + fn unused(line: &str) -> Vec { + let mut u = scan_line(line).unused_suppressions; + u.sort(); + u + } + + #[test] + fn allowed_passes_clean() { + for line in [ + "see https://example.com", + "see https://foo.example.com", + "see https://api.privacy-center.org", + "dial http://127.0.0.1:8080/", + "see https://github.com/x/y", + "see https://testlight.example", + "//www.googletagmanager.com/gtm.js", + ] { + assert!(hosts(line).is_empty(), "should be clean: {line}"); + } + } + + #[test] + fn disallowed_reports() { + assert_eq!(hosts("see https://test.com"), vec!["test.com"]); + assert_eq!(hosts("see https://partner.com"), vec!["partner.com"]); + } + + #[test] + fn suppression_with_correct_host_passes() { + let out = scan_line("https://evil.com // allow-domain: evil.com"); + assert!(out.violations.is_empty()); + assert!(out.unused_suppressions.is_empty()); + } + + #[test] + fn suppression_with_wrong_host_still_reports_and_warns() { + let out = scan_line("https://evil.com // allow-domain: other.com"); + assert_eq!( + out.violations.into_iter().map(|v| v.host).collect::>(), + vec!["evil.com"] + ); + assert_eq!( + out.unused_suppressions, vec!["other.com"], + "other.com was listed but never appeared on the line" + ); + } + + #[test] + fn multi_host_suppression_applied_to_violations() { + // Spec §"Per-line suppression" — multiple comma-separated + // hosts; all are suppressed when they match extracted hosts. + let out = scan_line( + "x = \"https://evil.com\"; y = \"https://bad.org\"; \ + // allow-domain: evil.com, bad.org" + ); + assert!(out.violations.is_empty(), "both hosts should be suppressed: {out:?}"); + assert!(out.unused_suppressions.is_empty()); + } + + #[test] + fn multi_host_suppression_partial_match_warns_for_unused() { + // evil.com matches; ghost.com does not appear on the line. + let out = scan_line("\"https://evil.com\" // allow-domain: evil.com, ghost.com"); + assert!(out.violations.is_empty(), "evil.com should be suppressed"); + assert_eq!(out.unused_suppressions, vec!["ghost.com"]); + } + + #[test] + fn jsdoc_star_suppression_form() { + // Spec §"Marker grammar" — '*' followed by whitespace is one + // of the four supported comment-introducer branches. + // Format: a jsdoc/block-comment continuation line where the + // marker is adjacent to '* '. + let out = scan_line( + " * fetch(\"https://evil.com\") * allow-domain: evil.com" + ); + assert!(out.violations.is_empty(), "jsdoc-style suppression should apply: {out:?}"); + } + + #[test] + fn multiple_disallowed_on_one_line() { + let got = hosts( + "xy", + ); + assert_eq!(got, vec!["test.com", "partner.com"]); + } + + #[test] + fn bypass_attempt_reports() { + // fetch("https://evil.com/allow-domain") — substring inside URL, + // not a comment, so suppression does NOT apply. + assert_eq!( + hosts("fetch(\"https://evil.com/allow-domain\")"), + vec!["evil.com"] + ); + } + + #[test] + fn unused_warning_only_when_marker_present() { + // No marker → no unused warning, even though "other.com" does + // not appear in any line we scanned. + let out = scan_line("see https://example.com"); + assert!(out.unused_suppressions.is_empty()); + } + + #[test] + fn unused_warning_fires_for_already_allowed_listed_host() { + // Spec §"Per-Line Suppression": listed host must match a + // VIOLATION, not just an extracted host. example.com is + // extracted but is already allowed → would never have been + // a violation → the marker entry was unnecessary → warn. + let out = scan_line("see https://example.com // allow-domain: example.com"); + assert!(out.violations.is_empty(), "example.com is already allowed"); + assert_eq!( + out.unused_suppressions, vec!["example.com"], + "marker listed an already-allowed host; it suppresses nothing" + ); + } +} +``` + +- [ ] **Step 2: Run to verify failure** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::scan_line_tests` +Expected: 11 FAIL (one per `#[test]`). + +- [ ] **Step 3: Implement** + +```rust +pub fn scan_line(line: &str) -> LineScanOutcome { + let suppression = parse_suppression_marker(line); + let mut hosts = extract_absolute_hosts(line); + hosts.extend(extract_protocol_relative_hosts(line)); + + // Compute the set of hosts that WOULD be flagged WITHOUT any + // suppression — i.e., extracted hosts that fail the allowlist + // check when the suppression set is empty. Per spec + // §"Per-Line Suppression": the allow-domain marker's job is to + // suppress violations. A listed host that wasn't going to be a + // violation anyway (already allowed, or not extracted at all) + // is "unused" and warrants the stderr warning. + let empty_suppression: std::collections::HashSet = + std::collections::HashSet::new(); + let disallowed_without_suppression: std::collections::HashSet<&String> = hosts + .iter() + .filter(|h| !is_allowed(h, &empty_suppression)) + .collect(); + + let mut unused: Vec = suppression + .suppressed + .iter() + .filter(|listed| { + !disallowed_without_suppression + .iter() + .any(|h| h.as_str() == listed.as_str()) + }) + .cloned() + .collect(); + unused.sort(); + + let violations = hosts + .into_iter() + .filter(|h| !is_allowed(h, &suppression.suppressed)) + .map(|host| LineViolation { host }) + .collect(); + + LineScanOutcome { + violations, + unused_suppressions: unused, + } +} +``` + +- [ ] **Step 4: Run to verify pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev::lint::domains::scan_line_tests` +Expected: 11 PASS. + +- [ ] **Step 5: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Add scan_line returning violations + unused-suppression report + +Composes parse_suppression_marker + extract_absolute_hosts + +extract_protocol_relative_hosts + is_allowed. The LineScanOutcome +struct carries both the violation list AND the 'unused suppression' +list per spec §'Per-Line Suppression' — listed hosts that would +not have been a violation in the first place (already allowed, or +not extracted at all) are surfaced for the caller to emit as +stderr warnings. Eleven tests cover: allowed-pass, +disallowed-report, single-host suppression match, wrong-host +warning, multi-host full-match, multi-host partial-match warning, +jsdoc/* form, multi-violation-per-line, URL-content bypass attempt, +no-marker-no-warning, and the already-allowed-host-listed case." +``` + +--- + +## Phase 4: Diff and path collectors + +Spec §"Line collection: --staged mode", §"Line collection: --changed-vs", §"Line collection: full-repo", §"Line collection: explicit paths". + +Each task in this phase pulls the gix entry points from the Phase 2 spike tests and wraps them in production helpers under `dev/lint/domains.rs`. Re-read the spike test bodies before implementing. + +**Tests live as inline `#[cfg(test)] mod tests` blocks inside `dev/lint/domains.rs`, NOT as files under `crates/trusted-server-cli/tests/`.** Reason: `lib.rs` declares `mod dev;` (private), so integration tests under `tests/` cannot reach `trusted_server_cli::dev::lint::domains::staged_added_lines` or any other path inside the crate. Inline tests get full access to the private/`pub(crate)` items. End-to-end binary-level tests (Phase 7) belong in `tests/` because they call `Command::cargo_bin("ts")`. + +A shared helper module for git-repo fixtures lives at `dev/lint/test_support.rs` and is gated `#[cfg(test)]`. Copy the `commit_all` / `stage_all` / branch helpers proven in the Phase 2 spike tests into it (the spike tests stay where they are; this file is the production-quality version of those helpers). + +### Task 4.0: Extract git-fixture helpers into a shared `test_support` module + +**Files:** + +- Create: `crates/trusted-server-cli/src/dev/lint/test_support.rs` +- Modify: `crates/trusted-server-cli/src/dev/lint/mod.rs` + +**Critical: helper commits MUST set explicit author/committer +signatures, not rely on ambient git config.** A clean test +environment (CI runner, container, fresh machine without +`user.name` / `user.email` set globally) will fail with "please tell +me who you are" or produce nondeterministic timestamps. Pin a fixed +signature in the helpers so tests are deterministic and don't depend +on the host's git config. + +- [ ] **Step 1: Create `dev/lint/test_support.rs`** + +Lift the helper functions from `tests/spike_gix_staged_diff.rs` and `tests/spike_gix_changed_vs.rs` (the production-quality versions, not the `unimplemented!()` shells). Signatures: + +```rust +#![cfg(test)] + +use std::path::Path; + +use gix::ObjectId; + +/// Fixed test signature used for all helper commits — avoids +/// dependence on ambient `user.name` / `user.email` config and +/// keeps commit hashes stable across runs. +pub(crate) fn test_signature() -> gix::actor::SignatureRef<'static> { + gix::actor::SignatureRef { + name: "ts dev lint tests".into(), + email: "tests@example.com".into(), + time: gix::date::Time::new(1_700_000_000, 0).into(), + } +} + +pub(crate) fn init_repo(path: &Path) -> gix::Repository { /* ... */ } +pub(crate) fn commit_all(repo: &gix::Repository, msg: &str) -> ObjectId { /* ... */ } +pub(crate) fn stage_all(repo: &gix::Repository) { /* ... */ } +pub(crate) fn create_and_checkout_branch(repo: &gix::Repository, branch: &str) { /* ... */ } +pub(crate) fn commit_all_as_branch(repo: &gix::Repository, branch: &str, msg: &str) -> ObjectId { /* ... */ } +``` + +`commit_all` and `commit_all_as_branch` MUST pass `test_signature()` +(or equivalent) as both author and committer when calling gix's +commit-creation API — do not let gix fall back to environment / +git-config lookups. If the pinned gix version's exact SignatureRef +shape differs from the sketch above, adjust the helper to whatever +the pinned API requires, but the fixed-signature principle is +non-negotiable. + +- [ ] **Step 2: Wire the module** + +In `dev/lint/mod.rs`, add: + +```rust +#[cfg(test)] +pub(crate) mod test_support; +``` + +- [ ] **Step 3: Verify it compiles** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --tests` +Expected: PASS. + +- [ ] **Step 4: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/test_support.rs crates/trusted-server-cli/src/dev/lint/mod.rs +git commit -m "Add dev/lint/test_support: shared git fixtures for module tests + +Lifts the working gix helper bodies from tests/spike_gix_*.rs into +a #[cfg(test)] pub(crate) module that the inline #[cfg(test)] mod +tests blocks in domains.rs (Phase 4) can use. The spike tests +themselves stay in tests/ and continue to drive their unimplemented +stubs through the pinned implementations." +``` + +### Task 4.1: `staged_added_lines` (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +**Path representation for staged diffs.** `gix` returns diff entry +paths as `BString` (byte strings). `DiffLine::path` is a `PathBuf`, +which on Unix is an `OsString` byte container — so byte sequences +that are not valid UTF-8 are still valid paths there. The +implementation must: + +- For valid UTF-8 paths: convert directly via `std::str::from_utf8` + → `PathBuf`. Normal path. +- For non-UTF-8 paths in `--staged` mode (per spec test 25 and + spec §"Note on non-UTF-8 paths"): **report normally with a stderr + warning that the path is being displayed lossy-UTF-8.** This + intentionally differs from full-repo mode (case 4 in spec + §"Handling tracked-but-missing files and symlinks"), which + skips non-UTF-8 entries. Construct the `PathBuf` via + `String::from_utf8_lossy` (replacement chars in the display name + are acceptable — host extraction runs against blob content, not + the path) and emit a stderr warning via + `crate::output::write_stderr_line` naming the lossy path. + +This applies to `--changed-vs` mode as well (same blob-content +scanning model). Full-repo mode is the only place we skip — see +Task 4.3. + +- [ ] **Step 1: Write a failing inline test inside `dev/lint/domains.rs`** + +In the existing `#[cfg(test)] mod tests` block (the same one with the URL extraction and scan_line tests), append: + +```rust +mod staged_added_lines_tests { + use super::*; + use crate::dev::lint::test_support; + + #[test] + fn reports_added_line_with_new_side_line_number() { + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + std::fs::write(temp.path().join("a.txt"), "alpha\nbeta\ngamma\n") + .expect("should write initial file"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + std::fs::write(temp.path().join("a.txt"), "alpha\nNEW LINE\nbeta\ngamma\n") + .expect("should write modification"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()).expect("should collect staged lines"); + let added: Vec<_> = lines + .iter() + .map(|l| (l.path.to_string_lossy().into_owned(), l.line_no, l.content.clone())) + .collect(); + + assert_eq!(added, vec![("a.txt".to_string(), 2, "NEW LINE".to_string())]); + } + + /// Spec test case 25: staged scan must NOT skip non-UTF-8 paths + /// (full-repo mode skips them; staged reports lossy + warning). + #[cfg(unix)] + #[test] + fn reports_non_utf8_staged_path_lossy() { + use std::os::unix::ffi::OsStrExt; + + let temp = tempfile::tempdir().expect("should create tempdir"); + let repo = test_support::init_repo(temp.path()); + + // Initial commit so HEAD exists. + std::fs::write(temp.path().join("readme.txt"), "hi\n") + .expect("should write readme"); + test_support::stage_all(&repo); + test_support::commit_all(&repo, "initial"); + + // Add a file with a non-UTF-8 component, containing a + // disallowed URL. + let non_utf8_name = std::ffi::OsStr::from_bytes(&[0x66, 0x6f, 0xff, 0x6f, 0x2e, 0x72, 0x73]); // f, o, 0xff, o, ., r, s + let bad_file = temp.path().join(non_utf8_name); + std::fs::write(&bad_file, "let x = \"https://test.com\";\n") + .expect("should write non-utf8-named file"); + test_support::stage_all(&repo); + + let lines = staged_added_lines(temp.path()) + .expect("should collect staged lines even with non-UTF-8 path"); + // Expect exactly one DiffLine for the bad file's added line. + // The path displays with a replacement char, but the line is + // reported (NOT skipped). + let added_lines: Vec<_> = lines.iter().collect(); + assert!( + !added_lines.is_empty(), + "non-UTF-8 staged paths must be reported, not skipped" + ); + // The content must be the original added line, byte-faithful. + assert!( + added_lines.iter().any(|l| l.content.contains("https://test.com")), + "must surface the URL for scanning: {added_lines:?}" + ); + } +} +``` + +- [ ] **Step 2: Run to verify failure** (function doesn't exist yet) + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- staged_added_lines_tests` +Expected: FAIL with `cannot find function staged_added_lines in this scope`. + +- [ ] **Step 3: Implement `staged_added_lines` in `dev/lint/domains.rs`** + +Function signature: + +```rust +#[derive(Debug)] +pub(crate) struct DiffLine { + /// Path for display and reporting. Built via `String::from_utf8_lossy` + /// for non-UTF-8 sources (see Task 4.1 notes on path representation). + pub path: std::path::PathBuf, + pub line_no: usize, + pub content: String, +} + +pub(crate) fn staged_added_lines( + repo_path: &std::path::Path, +) -> Result, error_stack::Report> +``` + +Body: open repo, get HEAD tree, get index, run index-vs-tree diff using the entry points pinned in Phase 2 step 2.3, filter changed paths through `path_is_scanned()` (Task 4.5 dependency — define a stub returning `true` for now and refine later), run blob diff per changed entry, collect added-line hunks. + +Path conversion: for each gix `BString` entry path, + +```rust +let (path, was_lossy) = match std::str::from_utf8(raw_bytes) { + Ok(s) => (std::path::PathBuf::from(s), false), + Err(_) => { + let lossy = String::from_utf8_lossy(raw_bytes).into_owned(); + (std::path::PathBuf::from(&lossy), true) + } +}; +if was_lossy { + // `warn` is the in-module helper defined alongside + // DomainsLintError; it returns Report so the + // `?` here flows correctly out of staged_added_lines. + warn(format!( + "warning: staged path is not valid UTF-8; displaying lossy: {}", + path.display() + ))?; +} +``` + +`pub(crate)` (not `pub`) is appropriate — the function is exercised through inline tests and the in-crate `domains::run` caller; no external API surface. + +- [ ] **Step 4: Run to verify pass.** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- staged_added_lines_tests` +Expected: PASS (both the normal case and the non-UTF-8 case). + +- [ ] **Step 5: Commit.** + +### Task 4.2: `changed_vs_added_lines` with base-ref resolution (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing inline tests** + +In the same module-level test mod, append a new `mod changed_vs_tests { ... }` with two cases: + +1. Two-branch fixture (`main` with base commit, `feature` with an additional commit adding `https://test.com` to a file). Assert `changed_vs_added_lines(repo_path, "main")` returns exactly one `DiffLine` with the new content. +2. Ref-resolution fallback: rename the local `main` ref to `refs/remotes/origin/main` (use gix to manipulate refs in the fixture) and assert `changed_vs_added_lines(repo_path, "main")` still resolves and returns the same result via the fallback chain. + +Use `tempfile::tempdir().expect("should create tempdir")` and the `test_support` helpers; every `expect()` message follows the `should ...` convention. + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement `changed_vs_added_lines`** in `dev/lint/domains.rs`. Pull merge-base + tree-vs-tree from Phase 2 step 2.3. Include the `resolve_base_ref` helper that tries the four candidates from the spec (``, `refs/heads/`, `refs/remotes/origin/`, `refs/tags/`) in order and returns the first match. + +Signature: `pub(crate) fn changed_vs_added_lines(repo_path: &Path, reference: &str) -> Result, Report>` + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 4.3: `full_repo_lines` with edge-case handling (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing inline tests** (`mod full_repo_tests`) for each of the five edge cases in spec §"Handling tracked-but-missing files and symlinks": + 1. Tracked-but-missing file → warns and skips. + 2. Symlink → warns and skips ("symlink not followed"). + 3. Non-regular file (`#[cfg(unix)]` — mkfifo via `nix` or shell-equivalent; if too painful, gate this case behind `#[cfg(feature = "fifo-test")]` and skip in CI). + 4. Non-UTF-8 path component (Unix-only — create via `std::os::unix::ffi::OsStrExt::from_bytes(&[0xff, 0xfe])`). + 5. Binary file (`.json` with embedded NUL — write `b"{\"x\": \0null}"`). + +Each test asserts the audit proceeds to the next entry; the function returns `Ok(Vec)` with no entries for the skipped file. (Test the stderr warning indirectly by ensuring no violation is reported for the problematic path; full stderr-capture tests happen in Phase 7 via `assert_cmd`.) + +Use `expect("should ...")` throughout. + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement `full_repo_lines`** per the spec pseudocode. The `warn_skip(path, reason)` / `warn_skip_bytes(bytes, reason)` helpers wrap the in-module `warn` helper (defined alongside `DomainsLintError`), which itself wraps `crate::output::write_stderr_line` with `change_context(DomainsLintError::WriteWarning)`. Do NOT call `write_stderr_line` directly — the type would not unify with `Report` and the `?` operator would fail to compile. + +Signature: `pub(crate) fn full_repo_lines(repo_path: &Path) -> Result, Report>` + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 4.4: `explicit_path_lines` with the soft/hard split (TDD) + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Write failing inline tests** (`mod explicit_path_tests`): + 1. Existing valid file → reports violations from it normally. + 2. Path with an excluded extension (`.png` — outside the [scanned-extensions list](2026-05-18-check-domains-design.md#file-extensions-scanned); `.html`/`.css` ARE scanned) → warns and skips, returns empty `Vec`. + 3. Path under `node_modules/` → warns and skips. + 4. Symlink → warns and skips. + 5. Missing path (typo) → returns `Err(...)` whose `current_context()` is `DomainsLintError::PathNotFound`. + 6. Permission-denied path (`#[cfg(unix)]` only — use `chmod 000` on a tempfile) → returns `Err(DomainsLintError::PermissionDenied)`. + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement `explicit_path_lines`** per the spec pseudocode. Policy filters use `warn_skip`; access failures return `Err`. Map `io::ErrorKind::NotFound` → `DomainsLintError::PathNotFound`, `io::ErrorKind::PermissionDenied` → `DomainsLintError::PermissionDenied`. + +Signature: `pub(crate) fn explicit_path_lines(paths: &[PathBuf]) -> Result, Report>` + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 4.5: `path_is_scanned` policy helper (TDD) + +- [ ] **Step 1: Write failing tests** for the extension and path-exclusion filter: + - `foo.rs` → scanned. + - `foo.html` → **scanned** (extension list now includes `.html`). + - `foo.css` → **scanned** (extension list now includes `.css`). + - `Dockerfile` → **scanned** (matched by exact basename). + - `Dockerfile.prod` → **scanned** (matched by `Dockerfile.*` pattern). + - `crates/trusted-server-core/src/integrations/nextjs/fixtures/inlined-data-escaped.html` → **NOT scanned** (publisher-fixture path exclusion — spec §"Always excluded (paths)"). + - `crates/trusted-server-core/src/integrations/google_tag_manager/fixtures/captured.html` → **NOT scanned** (same publisher-fixture rule, different integration). + - `crates/trusted-server-core/src/html_processor.test.html` → **scanned** (NOT under a `/fixtures/` directory; this is our own test fixture, not a publisher capture). + - `crates/js/lib/src/core/templates/iframe.html` → **scanned** (our own template). + - `node_modules/foo.js` → not scanned (path exclusion). + - `.worktrees/x/y.rs` → not scanned. + - `package-lock.json` → not scanned. + - `pnpm-lock.yaml` → not scanned (exact basename match). + - `Cargo.lock` → not scanned. + - `.env.dev` → scanned (matches `.env*`). + - `crates/integration-tests/fixtures/frameworks/nextjs/app/page.tsx` → scanned (proves the **/fixtures/** blanket exclusion was removed; only the narrow `crates/trusted-server-core/src/integrations/**/fixtures/**` path is excluded). + - `crates/integration-tests/fixtures/frameworks/nextjs/Dockerfile` → **scanned** (Dockerfile matched by basename; this fixture path is NOT the excluded publisher-capture path). + - `crates/integration-tests/fixtures/frameworks/wordpress/Dockerfile` → **scanned** (same reasoning). + - `crates/trusted-server-cli/src/dev/lint/domains.rs` → NOT scanned (self-exclude). + - **Markdown coverage (spec §"File extensions scanned" mandates `.md` is in scope):** + - `README.md` → scanned. + - `CHANGELOG.md` → scanned. + - `CONTRIBUTING.md` → scanned. + - `docs/guide/onboarding.md` → scanned. + - `docs/superpowers/specs/2026-05-18-check-domains-design.md` → scanned (spec itself is in scope). + - `foo.markdown` → NOT scanned (only `.md` is in the extension list, not `.markdown`). + - `foo.MD` → NOT scanned (case-sensitive extension match per Rust conventions; if a contributor uses uppercase, they get a warning at scan time, not a silent skip — document this as a known limitation if `.MD` files appear in real PRs). + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement `path_is_scanned(rel_path: &[u8]) -> bool`** with the constants from spec §"File extensions scanned" and §"Always excluded (paths)". + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +--- + +## Phase 5: CLI exit-code wiring + `dev lint domains` subcommand + +Spec §"CLI Surface" and §"Required change to existing CLI exit-code mapping". + +### Task 5.1: Extend `CliError` with `EnvironmentError` and `ViolationsFound` + +**Files:** + +- Modify: `crates/trusted-server-cli/src/error.rs` + +- [ ] **Step 1: Add the two variants** + +Add to the enum in `error.rs`: + +```rust + #[display("environment error")] + EnvironmentError, + #[display("found {count} disallowed host(s)")] + ViolationsFound { count: usize }, +``` + +- [ ] **Step 2: Update `lib.rs::run()` to map them** + +The existing implementation prints `format_report(&error)` for +EVERY error, then maps the exit code. That model collapses two +different user experiences: a real failure (`EnvironmentError`, +`Configuration`, etc.) deserves the error-stack dump, but +`ViolationsFound` and `Cancelled` should not — the violation +report itself is already on stdout (or JSON), and Cancelled is a +benign user signal. Printing `format_report` for `ViolationsFound` +would write the linter's normal output AND an error-stack message +on stderr, doubling the noise. + +Replace the existing `match` body in `run()` with: + +```rust +#[must_use] +pub fn run() -> ExitCode { + match execute() { + Ok(()) => ExitCode::SUCCESS, + Err(error) => match error.current_context() { + CliError::Cancelled => ExitCode::from(130), + CliError::ViolationsFound { .. } => ExitCode::from(1), + CliError::EnvironmentError => { + let _ = write_stderr_line(format_report(&error)); + ExitCode::from(2) + } + _ => { + let _ = write_stderr_line(format_report(&error)); + ExitCode::from(1) + } + } + } +} +``` + +Only the "real failure" branches print the error-stack report; +`ViolationsFound` and `Cancelled` exit silently (the violation +list and the cancellation are conveyed elsewhere). Matches the +spec's Output Format section, which shows the violation report +itself as the user-visible output. + +- [ ] **Step 3: Build and verify existing tests still pass** + +Run: `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS. + +- [ ] **Step 4: Commit** + +```bash +git add crates/trusted-server-cli/src/error.rs crates/trusted-server-cli/src/lib.rs +git commit -m "Add CliError::EnvironmentError and ViolationsFound; map exit codes + +Required by spec §'Required change to existing CLI exit-code mapping'. +run() now maps Cancelled -> 130, ViolationsFound -> 1, EnvironmentError +-> 2, everything else -> 1 (unchanged). Distinguishes 'found a real +violation' from 'could not even run the scan' in CI logs." +``` + +### Task 5.2: Add `DevCommand::Lint` and `LintCommand::Domains` clap surface + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/mod.rs` +- Modify: `crates/trusted-server-cli/src/dev/lint/mod.rs` + +- [ ] **Step 1: Add the nested clap types** + +In `dev/lint/mod.rs`: + +```rust +use std::path::PathBuf; + +use clap::{Args, Subcommand}; + +#[derive(Debug, Subcommand)] +pub enum LintCommand { + /// Lint URL hosts in source/config/docs. + Domains(DomainsArgs), +} + +#[derive(Debug, Args)] +pub struct DomainsArgs { + /// Pre-commit mode: scan only staged-added lines. + #[arg(long, conflicts_with_all = ["changed_vs", "paths"])] + pub staged: bool, + + /// CI/PR mode: scan only lines added relative to merge-base(, HEAD). + #[arg(long, value_name = "REF", conflicts_with_all = ["staged", "paths"])] + pub changed_vs: Option, + + /// Explicit paths to scan (full file). Mutually exclusive with --staged / --changed-vs. + #[arg(value_name = "PATH", conflicts_with_all = ["staged", "changed_vs"])] + pub paths: Vec, + + /// Output format. Default: human. + #[arg(long, value_enum, default_value = "human")] + pub format: OutputFormat, + + /// Verbose: print per-file scan progress on stderr (number of + /// lines scanned per file). Off by default; useful for + /// debugging "is this file being scanned at all". Has no + /// effect on exit code or violation count. + #[arg(long)] + pub verbose: bool, +} + +#[derive(Debug, Clone, Copy, clap::ValueEnum)] +pub enum OutputFormat { + Human, + Json, +} +``` + +In `dev/mod.rs`, extend `DevCommand`: + +```rust +pub enum DevCommand { + Serve(ServeArgs), + /// Linters for source/config/docs. + Lint { + #[command(subcommand)] + command: lint::LintCommand, + }, +} +``` + +- [ ] **Step 2: Wire dispatch in `lib.rs`** + +Update `run_dev`: + +```rust +fn run_dev(command: dev::DevCommand) -> Result<(), Report> { + match command { + dev::DevCommand::Serve(args) => run_dev_serve(&args), + dev::DevCommand::Lint { command } => dev::lint::run(command), + } +} +``` + +In `dev/lint/mod.rs`, add: + +```rust +pub fn run(command: LintCommand) -> Result<(), error_stack::Report> { + match command { + LintCommand::Domains(args) => domains::run(args), + } +} +``` + +In `dev/lint/domains.rs`, add the entry-point function: + +```rust +pub fn run(args: crate::dev::lint::DomainsArgs) + -> Result<(), error_stack::Report> +{ + todo!("dispatch on mode (staged | changed_vs | paths | full-repo); \ + call the appropriate collector; scan each line; emit report; \ + return Err(ViolationsFound) on violations, Err(EnvironmentError) on env errors") +} +``` + +- [ ] **Step 3: Verify build and `--help` surfaces are correct** + +Run: `cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev lint --help` +Expected: lists `domains` as a subcommand. + +Run: `cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev lint domains --help` +Expected: lists `--staged`, `--changed-vs`, `--format`, `--verbose`, plus the trailing `[PATH]...` arg. + +- [ ] **Step 4: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/ crates/trusted-server-cli/src/lib.rs +git commit -m "Wire ts dev lint domains clap surface and dispatch + +Adds DevCommand::Lint, LintCommand::Domains, DomainsArgs (with the +four mutually-exclusive mode flags). Body of domains::run is a +todo! to be replaced in the next commit; this commit just lands +the CLI scaffolding so --help works end-to-end." +``` + +### Task 5.3: Implement `domains::run` mode dispatch + reporting + +**Files:** + +- Modify: `crates/trusted-server-cli/src/dev/lint/domains.rs` + +- [ ] **Step 1: Implement `domains::run`** + +Replace the `todo!()` body with: + +```rust +pub fn run(args: crate::dev::lint::DomainsArgs) + -> Result<(), error_stack::Report> +{ + use error_stack::ResultExt; + use crate::error::CliError; + + let cwd = std::env::current_dir().change_context(CliError::EnvironmentError)?; + let lines: Vec = if args.staged { + staged_added_lines(&cwd).change_context(CliError::EnvironmentError)? + } else if let Some(ref reference) = args.changed_vs { + changed_vs_added_lines(&cwd, reference).change_context(CliError::EnvironmentError)? + } else if !args.paths.is_empty() { + explicit_path_lines(&args.paths).change_context(CliError::EnvironmentError)? + } else { + full_repo_lines(&cwd).change_context(CliError::EnvironmentError)? + }; + + let mut violations: Vec = Vec::new(); + let mut last_verbose_path: Option = None; + let mut verbose_line_count: usize = 0; + for line in lines { + if args.verbose { + // Tally per-file line counts for the end-of-file summary. + match &last_verbose_path { + Some(prev) if prev == &line.path => verbose_line_count += 1, + _ => { + if let Some(prev) = last_verbose_path.take() { + crate::output::write_stderr_line(format!( + "scanned {} lines in {}", + verbose_line_count, prev.display() + ))?; + } + last_verbose_path = Some(line.path.clone()); + verbose_line_count = 1; + } + } + } + let outcome = scan_line(&line.content); + for unused in outcome.unused_suppressions { + crate::output::write_stderr_line(format!( + "warning: {}:{}: allow-domain marker listed `{}` but it does not appear on the line", + line.path.display(), line.line_no, unused + ))?; + } + for v in outcome.violations { + violations.push(FileViolation { + path: line.path.clone(), + line: line.line_no, + host: v.host, + line_excerpt: line.content.clone(), + }); + } + } + if let Some(prev) = last_verbose_path { + // Flush the last file's tally. + crate::output::write_stderr_line(format!( + "scanned {} lines in {}", + verbose_line_count, prev.display() + ))?; + } + + match args.format { + crate::dev::lint::OutputFormat::Human => emit_human(&violations)?, + crate::dev::lint::OutputFormat::Json => emit_json(&violations)?, + } + + if violations.is_empty() { + Ok(()) + } else { + Err(error_stack::Report::new(CliError::ViolationsFound { + count: violations.len(), + })) + } +} + +#[derive(Debug, serde::Serialize)] +pub struct FileViolation { + pub path: std::path::PathBuf, + #[serde(rename = "line_no")] + pub line: usize, + pub host: String, + #[serde(rename = "line")] + pub line_excerpt: String, +} + +fn emit_human(violations: &[FileViolation]) + -> Result<(), error_stack::Report> +{ + use crate::output::write_stdout_line; + + for v in violations { + write_stdout_line(format!( + "{}:{}: disallowed host {}", + v.path.display(), v.line, v.host + ))?; + } + if !violations.is_empty() { + let files: std::collections::BTreeSet<_> = violations.iter().map(|v| &v.path).collect(); + write_stdout_line("")?; + write_stdout_line(format!( + "{} disallowed host(s) found in {} file(s).", + violations.len(), + files.len() + ))?; + write_stdout_line( + "To allow a new integration proxy, add it to EXACT_HOSTS in \ + crates/trusted-server-cli/src/dev/lint/domains.rs." + )?; + write_stdout_line( + "To suppress one line (e.g., security tests), append \ + `// allow-domain: ` in a comment." + )?; + write_stdout_line("Run `ts dev lint domains` (no args) for a full-repo audit.")?; + } + Ok(()) +} + +fn emit_json(violations: &[FileViolation]) + -> Result<(), error_stack::Report> +{ + use crate::output::write_json; + + let files_affected: std::collections::BTreeSet<_> = + violations.iter().map(|v| &v.path).collect(); + let report = serde_json::json!({ + "violations": violations, + "count": violations.len(), + "files_affected": files_affected.len(), + }); + write_json(&report) +} +``` + +**No raw `println!` / `eprintln!` in production code.** The workspace +lints under `-D warnings` may not flag `println!` directly, but the +CLI's convention (see `crates/trusted-server-cli/src/config.rs`) is +to route all stdout through `crate::output::write_stdout_line` / +`write_json` and stderr through `write_stderr_line`. In +`domains::run` the return type is `Report` so +`write_stderr_line(...)?` works directly. In the Phase 4 +collectors (which return `Report`), use the +in-module `warn(msg)` helper instead — it wraps +`write_stderr_line` with `change_context(DomainsLintError::WriteWarning)` +so the `?` operator type-checks. + +- [ ] **Step 2: Verify the workspace builds** + +Run: `cargo check --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` +Expected: PASS. + +- [ ] **Step 3: Smoke-test in a throwaway tempdir, NOT the working repo** + +Building and running `ts dev lint domains --staged` directly in the +working checkout would (a) require staging a `https://test.com` +fixture file in this repo — easy to forget to revert — and (b) +report on the existing Stage 1 doc violations, drowning the +smoke-test output in noise. Use a throwaway tempdir instead: + +```sh +TMPREPO="$(mktemp -d)" +( cd "$TMPREPO" && git init -q && \ + git config user.name 'smoke' && git config user.email 'smoke@example.com' && \ + echo 'fn ok() {}' > ok.rs && git add ok.rs && git commit -q -m initial && \ + echo 'let bad = "https://test.com";' > bad.rs && git add bad.rs ) +TS_BIN="$(cargo build --quiet --package trusted-server-cli \ + --target "$(rustc -vV | sed -n 's/^host: //p')" \ + --message-format=json 2>/dev/null \ + | jq -r 'select(.executable != null and (.target.name == "ts")) | .executable' | tail -1)" +( cd "$TMPREPO" && "$TS_BIN" dev lint domains --staged ) ; rc=$? +echo "exit: $rc" +rm -rf "$TMPREPO" +``` + +Expected: prints `bad.rs:1: disallowed host test.com` (and the +summary lines) to stdout, then `exit: 1`. Clean exit code, no +artifacts left in the working repo. + +If `jq` is unavailable, run `ts dev lint domains --staged` from the +already-installed `ts` binary (post `cargo install_cli`) instead of +extracting the path from `cargo build --message-format=json`. + +- [ ] **Step 4: Commit** + +```bash +git add crates/trusted-server-cli/src/dev/lint/domains.rs +git commit -m "Implement domains::run mode dispatch + human/JSON reporting + +Routes --staged, --changed-vs, explicit paths, and full-repo to the +matching collector; scans each returned line via scan_line; emits a +human or JSON report; returns Err(ViolationsFound { count }) on +violations, Err(EnvironmentError) on collector failures. Exit codes +flow through the run() match arm added in the previous CliError +extension." +``` + +--- + +## Phase 6: `ts dev install-hooks` + +Spec §"Pre-commit hook", §"Hook installer (Rust subcommand)", and §"Persisting `core.hooksPath`". + +### Task 6.1: `shell_quote` helper (TDD) + +- [ ] **Step 1: Write failing tests** for: simple path, path with spaces, path with a single quote, path with `$`, path with backticks, path with backslashes. Each test asserts the output is wrappable by `bash -c ""` without misbehaving (verify via a temp bash invocation). + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement** per the spec snippet (POSIX single-quote escaping). + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 6.2: `render_hook` + `is_managed` (TDD) + +- [ ] **Step 1: Write failing tests:** + - `render_hook(Path::new("/Users/Alice Q/.cargo/bin/ts"))` produces a string containing `exec '/Users/Alice Q/.cargo/bin/ts' dev lint domains --staged` and the `# ts-install-hooks: managed` marker line. + - `is_managed` returns `true` on a file containing the marker line in its first 10 lines, `false` otherwise. + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement** both functions per spec. + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 6.3: `write_atomic` helper (TDD) + +- [ ] **Step 1: Write failing test:** in a tempdir, call `write_atomic(path, b"hello")`; assert `fs::read(path).expect("should read written file") == b"hello"`; assert no `path.tmp.*` file remains in the directory. **Do not use `.unwrap()`** — workspace clippy denies `unwrap_used`. + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement:** write to `path.with_extension("tmp.{rand}")`, then `rename` to `path`. Use a small random suffix from `std::time::SystemTime` or `process::id()` to avoid collision on parallel installs. + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 6.4: `set_local_config_value` + `read_local_config_value` (production versions) + +- [ ] **Step 1: Lift the spike helpers from `tests/spike_gix_config_write.rs`** into `crates/trusted-server-cli/src/dev/install_hooks.rs` (new file). Adjust signatures to take `&gix::Repository` and return `error_stack::Report` per the spec sketch. + +- [ ] **Step 2: Define the `InstallHooksError` enum** with variants `OpenRepo`, `NoWorkdir`, `CurrentExe`, `WriteHook`, `ConfigWrite`, `WouldClobber { path }`, `ForeignHooksPath { current, proposed }`. + +- [ ] **Step 3: Write unit tests** for both helpers using a tempdir repo. Assert read returns `None` when unset, returns `Some(value)` after a write, and the on-disk `.git/config` contains a `[core]` section with `hooksPath` after the write. + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 6.5: `install_hooks` main function with preflight + clobber detection (TDD) + +- [ ] **Step 1: Write failing end-to-end tests:** + - Fresh repo, no `.githooks/`, no `core.hooksPath`: `install_hooks(force=false)` writes the hook, sets `core.hooksPath = .githooks`, succeeds. + - Re-run on the same repo: idempotent, succeeds. + - Pre-existing `.githooks/pre-commit` with the managed marker: silently overwritten, succeeds. + - Pre-existing `.githooks/pre-commit` WITHOUT the marker: `install_hooks(force=false)` returns `Err(WouldClobber)`. + - Same as above with `force=true`: backs up to `.githooks/pre-commit.bak.`, succeeds. + - Pre-existing `core.hooksPath = hooks` (foreign): `install_hooks(force=false)` returns `Err(ForeignHooksPath)`. + - Same as above with `force=true`: succeeds, prints the displaced value with the restore command. + +- [ ] **Step 2: Verify failure.** + +- [ ] **Step 3: Implement `install_hooks`** per the spec pseudocode. + +- [ ] **Step 4: Verify pass.** + +- [ ] **Step 5: Commit.** + +### Task 6.6: Wire `dev install-hooks` into the CLI + +- [ ] **Step 1: Add the clap variant** + +In `dev/mod.rs`: + +```rust +pub enum DevCommand { + Serve(ServeArgs), + Lint { #[command(subcommand)] command: lint::LintCommand }, + /// Install the pre-commit hook into this repo (one-time setup). + InstallHooks(InstallHooksArgs), +} + +#[derive(Debug, Args)] +pub struct InstallHooksArgs { + /// Overwrite an existing unmanaged hook or non-default core.hooksPath. + #[arg(long)] + pub force: bool, +} +``` + +- [ ] **Step 2: Wire dispatch in `lib.rs`** + +Add to `run_dev`: + +```rust +dev::DevCommand::InstallHooks(args) => dev::install_hooks::run(&args), +``` + +- [ ] **Step 3: Add `install_hooks::run` wrapper** that maps `InstallHooksError` → `CliError` (`ForeignHooksPath` and `WouldClobber` map to `CliError::EnvironmentError`; other variants map to `CliError::EnvironmentError` too — every install-hooks failure is by definition an env-config issue). + +- [ ] **Step 4: Verify build and `--help`** + +Run: `cargo run --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" -- dev install-hooks --help` +Expected: shows `--force`. + +- [ ] **Step 5: Smoke-test in a tempdir repo end-to-end** + +Run: + +```sh +mkdir -p /tmp/ts-install-hooks-smoke && cd /tmp/ts-install-hooks-smoke +git init +ts dev install-hooks +test -x .githooks/pre-commit && grep -q 'ts-install-hooks: managed' .githooks/pre-commit +grep -A1 'hooksPath' .git/config +``` + +Expected: hook file exists, is executable, contains the +`# ts-install-hooks: managed` marker; `.git/config` shows +`hooksPath = .githooks` under `[core]`. (`git init` is intentional — +`gix` is a Rust crate dependency, not a shell command the +contributor can rely on having installed.) + +- [ ] **Step 6: Commit.** + +--- + +## Phase 7: End-to-end CLI tests via `assert_cmd` + +Spec §"Testing Strategy" enumerates 47 cases. Phases 3, 4, and 6 covered the unit-level cases. This phase covers the remaining `assert_cmd` end-to-end cases — those that exercise the binary as a whole. + +### Task 7.1: Add `assert_cmd` and `predicates` dev-dependencies + +- [ ] **Step 1: Add to `[dev-dependencies]` in `crates/trusted-server-cli/Cargo.toml`:** + +```toml +assert_cmd = "2" +predicates = "3" +``` + +- [ ] **Step 2: Commit.** + +### Task 7.2: End-to-end tests for `--staged` mode (spec cases 21–26) + +- [ ] Implement each case as a `#[test]` in `crates/trusted-server-cli/tests/lint_domains_cli.rs`. Each test builds a tempdir repo, invokes `Command::cargo_bin("ts").args(["dev", "lint", "domains", "--staged"]).current_dir(&tempdir)`, asserts on exit code + stdout + stderr. + +- [ ] Each case gets its own task step: write failing test → verify failure → confirm production code already passes it → commit. + +- [ ] **Spec case 25 (non-UTF-8 staged path) requires an explicit stderr assertion** in addition to the exit-code and stdout checks. The inline Task 4.1 test proves the path is not skipped; the Phase 7 E2E test must additionally assert that stderr contains the lossy-path warning string (`"staged path is not valid UTF-8; displaying lossy:"` or whatever exact phrasing Task 4.1's implementation lands on). Example assertion using `predicates`: + + ```rust + use predicates::prelude::*; + // ... build a tempdir repo, stage a file with a 0xff byte in the + // name containing https://test.com ... + Command::cargo_bin("ts") + .expect("should find ts binary") + .args(["dev", "lint", "domains", "--staged"]) + .current_dir(&tempdir) + .assert() + .code(1) + .stdout(predicate::str::contains("disallowed host test.com")) + .stderr(predicate::str::contains("not valid UTF-8")); + ``` + + This locks the staged non-UTF-8 reporting contract at the E2E layer so a future refactor cannot silently start skipping these paths. + +### Task 7.3: End-to-end tests for `--changed-vs` mode (spec cases 27–29) + +- [ ] Same pattern as 7.2, with two-commit branch fixtures. + +### Task 7.4: End-to-end tests for path-exclusion (spec cases 30–34) and markdown (35–43) + +- [ ] Same pattern. Markdown cases use `.md` fixtures with the various forms (allowed/disallowed link, autolink, HTML comment suppression, fenced block, reference list, image link). + +### Task 7.5: End-to-end environment cases (spec 44–47) + +- [ ] Test 44: run outside a git repo → exit 2 with `EnvironmentError`. +- [ ] Test 45: bare repo → exit 2. +- [ ] Test 46: run under `env -i PATH=""` → still works (proves no `git` binary needed). On non-Unix CI lanes this test is `#[cfg(unix)]`. +- [ ] Test 47: run the full test suite via `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` — already covered by the host-target CI lane introduced in PR #669. + +- [ ] Final commit for Phase 7. + +--- + +## Phase 8: Documentation + +### Task 8.1: Update `CONTRIBUTING.md` with the install steps + +- [ ] **Step 1: Add a "Local setup" subsection** documenting: + +````markdown +### Pre-commit URL-host linter (`ts dev lint domains`) + +One-time setup after cloning: + +```bash +cargo install_cli # builds and installs the `ts` binary +ts dev install-hooks # installs the pre-commit hook into .githooks/ +``` +```` + +After that, every `git commit` runs the linter against staged +changes. If you have an existing `core.hooksPath` (husky, +lefthook, etc.), `ts dev install-hooks` refuses to overwrite it +without `--force`. See `docs/superpowers/specs/2026-05-18-check-domains-design.md` +for the full design. + +To bypass the hook for a single commit: `git commit --no-verify`. + +```` + +- [ ] **Step 2: Commit.** + +### Task 8.2: Update `README.md` with a brief mention + +- [ ] **Step 1: Under any "Development" section in the project README**, add a one-line mention pointing at `CONTRIBUTING.md` for the linter setup. + +- [ ] **Step 2: Commit.** + +--- + +## Phase 9: Final verification + +### Task 9.1: Run all CI gates locally + +CLAUDE.md splits clippy and test into separate wasm-runtime and +host-target CLI lanes (per PR #669's CI changes). Use the split +commands; **do NOT use the older single `cargo clippy --workspace` +form** — it doesn't match what CI runs and will give a misleading +green when the host-target CLI has warnings. + +- [ ] `cargo fmt --all -- --check` → PASS +- [ ] `cargo clippy --workspace --exclude trusted-server-cli --all-targets --all-features -- -D warnings` → PASS (wasm-runtime lane) +- [ ] `cargo clippy --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')" --all-targets -- -D warnings` → PASS (host-target CLI lane) +- [ ] `cargo test --workspace --exclude trusted-server-cli` → PASS (wasm-runtime lane) +- [ ] `cargo test --package trusted-server-cli --target "$(rustc -vV | sed -n 's/^host: //p')"` → PASS (host-target lane, including the new lint module + spike + end-to-end tests) +- [ ] `cd crates/js/lib && npx vitest run` → PASS (unchanged) +- [ ] `cd crates/js/lib && npm run format` → PASS (unchanged) +- [ ] `cd docs && npm run format` → PASS (no doc changes that would fail formatting) + +### Task 9.2: Self-dogfood the linter + +**Exit-code expectations.** The linter is designed to find existing +violations in this repo (the Stage 1 cleanup target). Both commands +below are **expected to exit `1`** — this is not a failure of the +linter, it is the linter doing its job. Do not abort the +verification step on a non-zero exit here. The commands below are +written defensively for `set -e` / `pipefail` shells. + +- [ ] **Step 1: Run `ts dev lint domains` against this very branch** + +Run: + +```sh +ts dev lint domains || rc=$? +echo "exit code: ${rc:-0}" +```` + +Expected: a list of existing violations on stdout, and `exit code: 1` printed at the end. **`exit 1` is the success condition for this step.** The output should look reasonable (well-formed `path:line:` lines). The violations themselves go into the Stage 1 Doc Cleanup Plan, not into this PR. + +- [ ] **Step 2: Run the frequency report from the spec** + +The JSON pipeline below uses `|| true` on the linter so the pipe +doesn't abort under `set -e` / `pipefail` when the linter exits 1 +(by design — see Step 1). + +```sh +(ts dev lint domains --format json || true) \ + | jq -r '.violations[].host' \ + | sort | uniq -c | sort -rn | head -30 +``` + +Expected: a host-frequency table, top entries first. File the top entries into the Stage 1 Doc Cleanup Plan as a follow-up issue. + +If `jq` is not installed, use the python3 alternative from spec §"Stage 1 Doc Cleanup Plan" — same `(... || true) | …` wrapping applies. + +### Task 9.3: Push and open the PR + +- [ ] **Step 1: Push the branch** + +```bash +git push -u origin feature/check-domains-spec +``` + +- [ ] **Step 2: Open the PR** with a title like "Add `ts dev lint domains` and `ts dev install-hooks`" and a body summarizing: + - What it does (one paragraph) + - Link to the design doc + - Test plan checklist (the items from Task 9.1 + a manual `ts dev install-hooks` smoke test in a tempdir) + - Note that the Stage 1 doc cleanup is a separate follow-up workstream + +--- + +## Notes for the implementer + +- Each phase's spec references are intentional — open the spec for the relevant section before writing code. The spec contains _why_ in places where the plan only has _what_. +- The Phase 2 spike is the riskiest part. If it fails — e.g., the chosen `gix` version doesn't expose a stable tree-vs-tree diff entry point — stop and re-pin against a different release before proceeding. The downstream phases all depend on those API choices. +- `error-stack` usage follows the existing crate convention: `Report` at the boundary, `change_context()` to map module-level errors. See PR #669's `config.rs` / `audit.rs` for examples. +- Commit early and often. Each task step that says "commit" is a real commit; don't batch. +- If a step's "expected" output doesn't match what you see, STOP. Don't ratchet through the failure — investigate and either fix the implementation or update the plan with a note about what the spec/spike missed. diff --git a/docs/superpowers/specs/2026-05-18-check-domains-design.md b/docs/superpowers/specs/2026-05-18-check-domains-design.md new file mode 100644 index 00000000..f72b74f0 --- /dev/null +++ b/docs/superpowers/specs/2026-05-18-check-domains-design.md @@ -0,0 +1,1934 @@ +# `ts dev lint domains` — Design + +**Date:** 2026-05-18 +**Status:** Draft (revised after third review — pivoted to Rust / `ts` CLI) + +## Goal + +Fail commits that introduce new **URL hosts** (extracted from `http(s)://` +and protocol-relative `//host/` URLs) that are not on an explicit +allowlist, across source, config, and documentation files. Catches +accidental test-pollution domains (e.g., `test.com`, `partner.com`, +`new.com`) and hardcoded third-party endpoints that have not been vetted +as integration proxies. + +Enforces the rule: **production code, tests, and config may only reference +`example.com` (and its subdomains), loopback addresses, an explicit list of +integration-proxy endpoints, or a small set of reference/doc-link hosts.** + +The term **URL host** (not "domain") is used throughout because the linter +only inspects the host portion of an extracted URL. Bare hostnames written +as plain strings (e.g., `cookie_domain = "test-publisher.com"`, +`exclude_domains = ["foo.com"]`) are **not** detected. + +## Prerequisite + +This design **depends on PR #669** (`Add the Trusted Server CLI`, branch +`feature/ts-cli`). PR #669 introduces the `crates/trusted-server-cli` +crate, the `ts` binary, the `cargo install_cli` alias, the host-target +CI lane, and the clap command-surface conventions this design extends. + +**Required base for any implementation work:** a branch whose ancestry +contains PR #669. Two acceptable bases: + +- `main`, after #669 has merged, **or** +- `origin/feature/ts-cli` directly (stacked on PR #669's branch), with + a rebase onto `main` once #669 merges. + +A plain `main` checkout that _predates_ #669's merge cannot host this +implementation — the CLI surface this design extends does not exist +there. See [Implementation Readiness](#implementation-readiness) for +the full start-condition checklist. + +## Implementation Readiness + +**Status today: ready to start _only on a branch stacked on PR #669_.** +A plain `main` checkout has no `crates/trusted-server-cli`, no `ts` +binary, no `cargo install_cli` alias, and no host-target CI lane — +starting there would force the implementer to reinvent or duplicate +PR #669's surface. Implementation must happen on a branch whose base +includes #669. + +**Two acceptable execution paths:** + +1. **Wait for #669 to merge to `main`.** Then start implementation on + a branch off `main`. Simplest history; lowest coordination cost. +2. **Stack on `origin/feature/ts-cli` (PR #669's branch) now.** + Create the implementation branch off `feature/ts-cli`. The branch + carries PR #669's commits as ancestors; once #669 merges, rebase + onto `main` (the rebase is a no-op for the ancestors). Faster to + start; requires re-syncing if #669 force-pushes. + +**Start conditions** (all must be true on whichever base is chosen): + +1. `crates/trusted-server-cli` exists at the branch base — verify + with `ls crates/trusted-server-cli/src/`. +2. This PR owns the `ts dev` subcommand-group refactor: today's + `ts dev` leaf becomes `ts dev serve`, and the same PR adds + `ts dev lint domains` and `ts dev install-hooks`. Do not defer + this refactor to a later cleanup PR — without it, the command + surface described here does not exist. +3. The chosen `gix` + `gix-config` version pair resolves against the + workspace's transitive dep graph without forcing duplicates + (verify with `cargo tree -p gix -p gix-config`). + +**Suggested first-implementation order** (front-loads the riskiest +API assumptions, matches reviewer guidance): + +1. **Spike — gix feasibility — DONE.** Completed in Phase 2. + `gix = 0.83` + `gix-config = 0.56` pinned; three integration + tests (`crates/trusted-server-cli/tests/spike_gix_*.rs`) prove + staged blob diff, merge-base + tree diff, and durable + `core.hooksPath` write — all gix-only, no subprocess. The + resolved entry points are recorded in + [Resolved by the Phase 2 spike](#resolved-by-the-phase-2-spike). +2. **URL extraction + allowlist + suppression.** Pure-function + layer, fully unit-testable without `gix`. Implement against the + regex / allowlist / marker grammar in this spec; cover every + test case enumerated in [Testing Strategy](#testing-strategy) + that does not require git. +3. **CLI wiring.** Add the `Commands::Dev` subcommand-group + skeleton (preserving the existing `serve` subcommand wholesale), + then add `dev lint domains` dispatching to the function from + step 2 plus the diff collectors from step 1. +4. **`dev install-hooks`.** Wires steps 1 and 2 together for the + config write + hook file write + shell-escape path. +5. **End-to-end `assert_cmd` tests** matching `Testing Strategy`. +6. **Stage 1 doc cleanup** (separate PR series — see + [Stage 1 Doc Cleanup Plan](#stage-1-doc-cleanup-plan)). + +If start conditions aren't satisfied when this design is up for +implementation, the answer is "wait for #669," not "build a parallel +CLI surface." + +## Non-Goals + +- No CI gate in v1. The pre-commit hook is the only enforcement mechanism. + See [Migration to CI](#migration-to-ci). +- No baseline file. Existing violations are tolerated; the linter is scoped + to new lines. +- No autofix. +- No detection of bare hostnames without an `http(s)://` or `//` prefix. +- **Publisher-capture HTML fixtures are excluded by path** — + specifically the `crates/trusted-server-core/src/integrations/**/fixtures/**` + tree, which contains real-world captured publisher pages used as + test fixtures for the HTML processor. Those files have hundreds + of legitimate third-party URLs (Facebook, typekit, ad networks) + that cannot reasonably be allowlisted; trying would either + drown the linter in noise or force a giant allowlist that + defeats its review purpose. **Other HTML, CSS, and Dockerfile + files are scanned** (see [File extensions scanned](#file-extensions-scanned)). + +## CLI Surface + +A new top-level subcommand on the `ts` CLI: + +``` +ts dev lint domains [--staged | --changed-vs | ...] + [--format human|json] [--verbose] +``` + +Modes (mutually exclusive): + +| Invocation | Behavior | +| ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `ts dev lint domains` | Full-repo audit. Walks tracked files matching the extension filter and scans every line. **Diagnostic only in Stage 1.** | +| `ts dev lint domains --staged` | Pre-commit mode. Scans only added lines in `git diff --cached`. Existing violations not reported. | +| `ts dev lint domains --changed-vs ` | CI/PR mode (Stage 2). Scans only added lines in the diff **equivalent to** `git diff $(git merge-base HEAD)..HEAD` — computed via gitoxide, not by shelling out. | +| `ts dev lint domains path/...` | Scans the listed files in full. | + +Output format defaults to `human`. `--format json` emits a structured +report (see [Output Format](#output-format)). + +Exit codes: `0` no violations; `1` violations found; `2` usage or +environment error. + +**Required change to existing CLI exit-code mapping.** PR #669's +`crates/trusted-server-cli/src/lib.rs::run()` currently maps every +non-`CliError::Cancelled` error to `ExitCode::from(1)`. That collapses +the violation-vs-environment-error distinction this contract requires +— in CI, a failed git open and a real violation would be +indistinguishable. + +**This PR therefore must extend the existing `CliError` and `run()`:** + +1. Add a `CliError::EnvironmentError` variant (name TBD; could be + `EnvIo` or similar to match the crate's existing naming) that + carries the underlying `Report` as context. +2. The lint module wraps env-class errors (gix open fails, no git + repo, missing base ref, no working tree, gix-config write fails, + filesystem permission errors at install-hooks time) as + `CliError::EnvironmentError`. +3. When the scan finds violations, the lint module **returns + `Err(CliError::ViolationsFound { count })`**. This is a + semantically-meaningful "error" — it carries the violation count + for the message and surfaces through the same `run()` dispatch + that maps `CliError::Cancelled` to exit 130. Pick one model: in + this spec, violations propagate as `Err`, not `Ok(())`. The + match arm in step 4 is what distinguishes a "violations found" + exit from an environment-error exit. +4. `lib.rs::run()` pattern-matches: + + ```rust + match execute() { + Ok(()) => ExitCode::SUCCESS, + Err(error) => match error.current_context() { + CliError::Cancelled => ExitCode::from(130), + CliError::ViolationsFound { .. } => ExitCode::from(1), + CliError::EnvironmentError => ExitCode::from(2), + // … all other existing variants map to 1 unchanged + _ => ExitCode::from(1), + }, + } + ``` + +The two new variants and the dispatch arm are part of this PR's +scope, not a follow-up. The sketch function signature shown later in +this spec — `fn run(...) -> Result>` +— is illustrative; the production shape returns +`Result<(), Report>` matching the existing convention, +with the exit code emerging from the `current_context()` match +above. + +### Why `ts dev` as the parent? + +`lint domains` and `install-hooks` are developer-workflow commands — +they only matter when working on the codebase, not when operating a +deployed Trusted Server. Grouping them under `dev` keeps the +top-level `ts` surface focused on operator concerns (`config`, +`auth`, `audit`, `provision`) and gives developer tooling a natural +home for future additions (`ts dev lint deps`, `ts dev format`, +`ts dev check`, etc.). + +Within `dev`, `lint` is itself a subcommand group (so future lints +slot in as `ts dev lint `). + +## Crate Layout + +PR #669 ships `ts dev` as a single-file leaf command +(`crates/trusted-server-cli/src/dev.rs`, ~161 lines) that starts the +local Fastly dev server. To host nested subcommands, that file is +converted into a module directory: + +``` +crates/trusted-server-cli/src/ + lib.rs # add Commands::Dev(DevArgs) variant + # if not already present; dispatch + # to dev::run + dev/ + mod.rs # Dev subcommand enum + dispatch. + # Includes the existing dev-server + # behavior as `ts dev serve` so + # the PR #669 functionality is + # preserved under the new group. + serve.rs # the existing dev.rs body moved + # under `ts dev serve` + install_hooks.rs # `ts dev install-hooks` + lint/ + mod.rs # Lint subsubcommand enum + dispatch + domains.rs # this design's implementation +``` + +Existing code touched: + +- `crates/trusted-server-cli/src/lib.rs` — extend the existing + `Commands::Dev` variant so it owns a nested `DevCommand` enum + (subcommands: `Serve`, `Lint(LintCommand)`, `InstallHooks(...)`). +- `crates/trusted-server-cli/src/dev.rs` → split into the directory + above. The existing dev-server function moves into `dev/serve.rs` + with its public API unchanged. **This PR must make the CLI-surface + change**: today's `ts dev` becomes `ts dev serve`. This is not a + follow-up task; `ts dev lint domains` and `ts dev install-hooks` + cannot be added cleanly while `ts dev` remains a leaf command. + + **`ts dev serve` must preserve every flag and behavior of today's + `ts dev` leaf**, byte-for-byte from a user's perspective: + + | Existing `ts dev` flag | `ts dev serve` requirement | + | ------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | + | `--adapter / -a` (default `fastly`) | Same default, same enum | + | `--config` (`Option`) | Preserved unchanged | + | `--env` (default `local`) | Preserved unchanged | + | Trailing `passthrough` args (`trailing_var_arg = true`, `allow_hyphen_values = true`) | Preserved unchanged — the `serve` subcommand still forwards everything after the recognized flags to the underlying runner | + + In other words: any shell invocation that works today as + `ts dev --adapter=fastly --config=... --env=local -- --extra ...` + must work tomorrow as `ts dev serve --adapter=fastly +--config=... --env=local -- --extra ...` with identical effect. + The refactor is a structural rename, not a behavior change. + Verification: an end-to-end test asserts that + `ts dev serve --help` lists the same flags as today's + `ts dev --help`, and that trailing-arg passthrough still reaches + the runner. + +- `crates/trusted-server-cli/src/error.rs` — add `LintError` and + `InstallHooksError` variants if needed for typed propagation, + otherwise reuse the crate's existing `Report` plumbing. + +No changes to `trusted-server-core` or `trusted-server-adapter-fastly`. + +## Allowlist (Rust constants) + +Three arrays as `const &[&str]` at module top of `dev/lint/domains.rs`: +`EXACT_HOSTS` (integration proxies + loopback), `SUBDOMAIN_HOSTS` +(allow `*.host`), and `REFERENCE_HOSTS` (well-known doc/spec +sources, exact-match, allowed everywhere). The split keeps the +security review for each group focused: integration-proxy additions +need vendor justification; reference-host additions just need "is this +a legitimate documentation source we link to repeatedly?" + +### Exact-match hosts (`EXACT_HOSTS`) + +Integration proxies and loopback. Subdomains are **not** allowed +(e.g., `anything.api.privacy-center.org` is disallowed). + +| Category | Hosts | +| ---------------------------------------------------- | ------------------------------------------------------------------------------ | +| Loopback | `127.0.0.1`, `::1`, `localhost` | +| Integration proxies (didomi) | `api.privacy-center.org`, `sdk.privacy-center.org` | +| Integration proxies (sourcepoint) | `cdn.privacy-mgmt.com` | +| Integration proxies (lockr) | `aim.loc.kr`, `identity.loc.kr` | +| Integration proxies (datadome) | `js.datadome.co`, `api-js.datadome.co` | +| Integration proxies (aps / Amazon) | `aax.amazon-adsystem.com`, `aax-events.amazon-adsystem.com` | +| Integration proxies (permutive) | `api.permutive.com`, `secure-signals.permutive.app`, `cdn.permutive.com` | +| Integration proxies (Google Tag Manager / Analytics) | `www.googletagmanager.com`, `www.google-analytics.com`, `analytics.google.com` | +| Integration proxies (adserver mock) | `securepubads.g.doubleclick.net`, `origin-mocktioneer.cdintel.com` | +| Integration proxies (Prebid CDN) | `cdn.prebid.org` | +| Integration proxies (Fastly platform) | `api.fastly.com` | + +### Subdomain-permitting hosts (`SUBDOMAIN_HOSTS`) + +The host equals one of these **or** ends with `.` + one of these. + +| Host | Allows | Why subdomain matching | +| -------------------- | --------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `example.com` | `example.com`, `foo.example.com`, `a.b.example.com` | IANA RFC 2606 reserved; arbitrary subdomains expected in test fixtures and docs | +| `example.net` | `example.net`, `assets.example.net`, etc. | IANA RFC 2606 reserved; appears in real docs (`https://assets.example.net`) | +| `example.org` | `example.org`, `*.example.org` | IANA RFC 2606 reserved | +| `edge.permutive.app` | `edge.permutive.app`, `.edge.permutive.app` | Permutive constructs the host as `{organization_id}.edge.permutive.app` at runtime (see `crates/trusted-server-core/src/integrations/permutive.rs:93`); subdomains are vendor-controlled per customer | + +### Reference / doc hosts (`REFERENCE_HOSTS`) + +Exact-match. Allowed in every scanned file (no docs-vs-code split). +These are well-known documentation and spec sources that appear as +markdown link targets, `///` doc-comment URLs, `#` config comments, +etc. + +**The table below is the seed list curated from a sampling of current +`.md` files. It is expected to be incomplete on first pass.** The +Stage 1 cleanup workstream (see +[Stage 1 Doc Cleanup Plan](#stage-1-doc-cleanup-plan)) drives the +actual final list by running the full-repo audit, sorting hosts by +frequency, and triaging each into one of: add to `REFERENCE_HOSTS`, +add to integration `EXACT_HOSTS`, rewrite to a reserved host, or +suppress per-line. + +| Category | Hosts | +| ----------------------- | ----------------------------------------------------------------------------------------------- | +| Git / GitHub | `github.com`, `docs.github.com`, `help.github.com`, `token.actions.githubusercontent.com` | +| Git commit conventions | `chris.beams.io` | +| Rust | `docs.rs`, `doc.rust-lang.org`, `crates.io` | +| Web / W3C standards | `www.w3.org`, `schema.org` | +| Versioning / changelogs | `semver.org`, `keepachangelog.com` | +| IAB Tech Lab | `iab.com`, `iabtechlab.com`, `iabtechlab.github.io`, `iabeurope.github.io` | +| Specs (supply chain) | `in-toto.io`, `rslstandard.org` | +| Specs (other) | `webassembly.org` | +| Fastly docs | `www.fastly.com`, `developer.fastly.com`, `manage.fastly.com` | +| Cloudflare docs | `developers.cloudflare.com` | +| Vendor docs | `docs.datadome.co`, `docs.prebid.org` | +| Tooling docs | `vitepress.dev`, `playwright.dev`, `testcontainers.com`, `grafana.com`, `docsearch.algolia.com` | + +One-off references not on this list (e.g., a single arxiv.org link in +a security spec) should use the per-line suppression marker — +inflating `REFERENCE_HOSTS` with single-use entries defeats its review +purpose. + +### IANA-reserved TLD rule + +Any host ending in `.example`, `.test`, `.invalid`, or `.localhost` +is allowed (IANA RFC 2606 reserves these TLDs for documentation, +testing, and special use). Hard-coded suffix check, not list entries. + +### Matching summary + +| Host | Allowed? | +| ----------------------------------- | ------------------------------------------ | +| `example.com` | yes (subdomain-list) | +| `foo.example.com` | yes (subdomain-list) | +| `assets.example.net` | yes (subdomain-list) | +| `example.com.evil.com` | **no** (not a subdomain of `example.com`) | +| `api.fastly.com` | yes (exact) | +| `v2.api.fastly.com` | **no** (exact-only) | +| `developer.fastly.com` | yes (reference) | +| `testlight.example` | yes (reserved TLD rule) | +| `something.test` | yes (reserved TLD rule) | +| `127.0.0.1` | yes (exact) | +| `192.168.1.1` | **no** (RFC 1918 private IP, not loopback) | +| `1.2.3.4` | no | +| `[::1]` → `::1` after bracket strip | yes (exact) | + +Matching is case-insensitive on the host after lowercasing. + +### Allowlist Maintenance Policy + +All three arrays are security-relevant artifacts. Different bars +apply: + +**`EXACT_HOSTS` (integration proxies + loopback):** + +1. **Vendor + integration**: must correspond to a named integration + in the registry. No personal preferences, no test domains, no + speculative entries. +2. **Justification in a `//`-comment** above the entry, naming the + integration and role (e.g., `// didomi: config endpoint`). +3. **Narrowest workable host**: prefer the subdomain + (`api.privacy-center.org`) over the apex (`privacy-center.org`). +4. **Exact by default**: only move to `SUBDOMAIN_HOSTS` when the + vendor uses multiple subdomains in real traffic and we accept + trusting all of them. + +**`SUBDOMAIN_HOSTS`:** + +1. Same vendor-justification bar as `EXACT_HOSTS`. +2. **Plus** an explicit comment naming _why_ subdomain matching is + needed (runtime host construction, vendor-controlled subdomain + sharding, etc.). + +**`REFERENCE_HOSTS`:** + +1. Host must be a **legitimate documentation or specification source** + that we link to in multiple places. One-off references use + per-line suppression instead — inflating `REFERENCE_HOSTS` with + single-use entries defeats its review purpose. +2. **Justification in a `//`-comment** naming the category + (e.g., `// IAB Tech Lab spec source`). + +Changes to any array must be reviewed as part of the PR. + +### Per-Line Suppression + +Some legitimate uses are not part of any integration — most notably +security tests using attacker-controlled placeholders. Real example: +`crates/trusted-server-core/src/integrations/google_tag_manager.rs:838` +contains `"https://evil.com/?redirect=https://www.google-analytics.com/collect"`. + +The linter recognizes a **comment-anchored, host-named** marker: + +```rust +let attacker = "https://evil.com/path"; // allow-domain: evil.com +``` + +```toml +upstream = "https://evil.com" # allow-domain: evil.com +``` + +```html + +``` + +**Marker grammar (Rust regex):** + +``` +(?im)(?:^|\s)(?://|\#||$) +``` + +- The comment introducer (`//`, `#`, `` inside a ` ```bash ` fence is +displayed to readers as a literal HTML comment in their shell +example — confusing and misleading. The linter's marker regex +accepts several comment introducers; pick the one that matches the +fenced block's language: + +| Fence language | Use this marker form | +| -------------------- | ------------------------------- | +| `bash`, `sh`, `toml` | `# allow-domain: ` | +| `rust`, `ts`, `js` | `// allow-domain: ` | +| HTML (or no fence) | `` | + +**Strongly prefer rewriting the example to a reserved host instead +of suppressing** — see [Stage 1 Doc Cleanup +Plan](#stage-1-doc-cleanup-plan). Per-line suppression is for true +one-offs (security write-ups citing a real CVE host, etc.). HTML +comments are reserved for **prose** Markdown contexts outside +fenced code blocks. + +### Always excluded (paths) + +- `Cargo.lock` +- Lockfiles by **exact basename** (not glob): `package-lock.json`, + `pnpm-lock.yaml`, `pnpm-lock.json`, `yarn.lock`, + `npm-shrinkwrap.json`. Listing each by name avoids the bug where + a `*-lock.json` glob would miss `pnpm-lock.yaml` while `.yaml` is + in the scanned extensions. **This is a supply-chain trade-off, + not just dependency noise.** The current `package-lock.json` + files contain `registry.npmjs.org`, `funding`/`sponsor` URLs, and + many transitive package-repository URLs. Excluding lockfiles + means a malicious or unreviewed registry URL added to a lockfile + would not be flagged. Mitigated by the fact that lockfile changes + are themselves a high-signal review surface (PR reviewers should + already inspect lockfile diffs). Revisit if a real incident + occurs. +- `node_modules/` (any depth) +- `target/` +- `dist/` +- `.git/` +- `.worktrees/`, `.claude/worktrees/` +- `crates/trusted-server-cli/src/dev/lint/domains.rs` itself (so the + module's own allowlist constants and doc comments cannot self-flag) +- **`crates/trusted-server-core/src/integrations/**/fixtures/**` — + publisher-capture HTML/JS fixtures.** Real-world snapshots used as + test inputs for the HTML processor; they contain hundreds of + legitimate third-party URLs that cannot reasonably be + allowlisted. This is a narrow path exclusion, NOT the older + too-broad `**/fixtures/**` rule (that earlier draft would have + hidden the integration-test app source under + `crates/integration-tests/fixtures/frameworks/nextjs/app/*.tsx`, + which we deliberately scan). + +**Source files under `crates/integration-tests/fixtures/frameworks/*` — +including `.tsx`, `.ts`, `.json`, `next.config.mjs`, `Dockerfile` — +ARE scanned.** Only the publisher-capture path above is excluded. + +## Implementation + +### Module structure + +```rust +// crates/trusted-server-cli/src/dev/lint/domains.rs + +use core::error::Error; +use std::path::PathBuf; + +use derive_more::Display; +use error_stack::{Report, ResultExt}; +use regex::Regex; + +// gix = "gitoxide": pure-Rust git implementation. No external git binary +// required; no subprocess; typed diff/merge-base/index APIs. +use gix; + +/// Hosts that must match exactly. Subdomains are NOT allowed. +const EXACT_HOSTS: &[&str] = &[ + // Loopback + "127.0.0.1", + "::1", + "localhost", + // didomi + "api.privacy-center.org", + "sdk.privacy-center.org", + // ... etc. +]; + +/// Hosts that match exactly OR via subdomain (`*.host`). +const SUBDOMAIN_HOSTS: &[&str] = &[ + "example.com", +]; + +#[derive(Debug, Display)] +pub enum DomainsLintError { + #[display("failed to open git repository")] + OpenRepo, + #[display("failed to read git index")] + Index, + #[display("failed to compute diff")] + Diff, + #[display("failed to resolve reference `{_0}`")] + Reference(String), + #[display("failed to compute merge-base of `{base}` and HEAD")] + MergeBase { base: String }, + #[display("failed to read file `{_0}`")] + ReadFile(PathBuf), + #[display("invalid mode combination")] + InvalidMode, +} +impl Error for DomainsLintError {} + +pub struct DomainsLintArgs { + pub mode: LintMode, + pub format: OutputFormat, + pub verbose: bool, +} + +pub enum LintMode { + Staged, + ChangedVs(String), + Paths(Vec), + FullRepo, +} + +pub fn run(args: DomainsLintArgs) -> Result> { + let lines = collect_lines(&args.mode)?; + let violations = scan_lines(&lines); + emit_report(&violations, args.format); + Ok(if violations.is_empty() { 0 } else { 1 }) +} +``` + +### Cargo dependencies + +Add to `crates/trusted-server-cli/Cargo.toml`: + +```toml +[dependencies] +gix = { version = "0.83", default-features = false, features = [ + "blob-diff", # blob-level line diffs (gix-diff / imara-diff) + "index", # read the git index for staged-vs-HEAD diffs + "revision", # merge-base computation (gix-revision) + "sha1", # SHA backend — gix-hash refuses to compile without it + "tree-editor", # Repository::edit_tree, used by test fixtures +] } +gix-config = "0.56" # direct File-level read/write of /.git/config + # for ts dev install-hooks +regex = "1" +``` + +Notes: + +- **Versions pinned by the Phase 2 feasibility spike: `gix = 0.83`, + `gix-config = 0.56`** (the same gitoxide release family — `gix +0.83` depends on `gix-config 0.56`). Verified with + `cargo tree -p gix -p gix-config --duplicates`: only an unrelated + `hashbrown` appears twice; `gix` and `gix-config` each resolve to + a single version. +- **`sha1` feature is required.** With `default-features = false`, + `gix-hash` will not compile without a SHA backend and emits + `Please set either the sha1 or the sha256 feature flag`. +- **`tree-editor` feature is required for test fixtures.** The + production runtime does not call `Repository::edit_tree`, but the + Phase 2 spike and Phase 4 unit tests build fixture repos entirely + through gix (write_blob + edit_tree + commit_as), and `edit_tree` + is gated behind `tree-editor`. +- `gix-config` is pulled in **explicitly** for the durable + `/.git/config` write performed by `ts dev install-hooks`. + `gix::Repository::config_snapshot_mut()` only modifies an + in-memory snapshot and is not the persistence path; the hook + installer therefore uses `gix-config::File` directly. Do not + rely on `config_snapshot/_mut` for persistence. +- No networking, credential helpers, or worktree mutation features + are enabled — the linter only reads from the local repo and does + one targeted config write in `ts dev install-hooks`. +- The exact feature names match the `gix` crate's documented features + (`blob-diff`, `index`, `revision` — see docs.rs/gix). If a feature + has been renamed or split in the version the spike selects, the + closest documented equivalent is used and the change is flagged + in the implementation PR. + +### URL extraction (without lookahead) + +Rust's standard `regex` crate does not support lookahead. The patterns +are designed to work without it — host character classes naturally bound +the match. + +**Absolute URL regex:** + +``` +(?i)https?://(?:[^/?\s#]+@)?(\[[0-9a-fA-F:]+\]|[A-Za-z0-9][A-Za-z0-9.\-]*) +``` + +- `(?:[^/?\s#]+@)?` is a non-capturing optional group that consumes + any RFC 3986 `userinfo@` prefix so the captured host is the real + authority. Without it, `https://github.com@test.com/path` would + extract the allowlisted `github.com` and miss the actual host + `test.com` — a real bypass for a security-relevant linter. + Multi-`@` userinfo is handled by regex backtracking: the engine + consumes as much as possible while still finding an `@` followed + by a valid host token. +- The non-IPv6 host branch `[A-Za-z0-9][A-Za-z0-9.\-]*` requires the + host to **start with an alphanumeric** character. This rejects + placeholder noise like `https://...` (which the earlier + `[A-Za-z0-9.\-]+` would have matched, producing the bogus host + `...`). A leading `-` or `.` is rejected by the same rule; that's + fine, both are invalid per RFC 1035 anyway. +- Greedy match stops at the first character outside the class + (e.g., `/`, `:`, `?`, `"`, `>`). +- Bracketed IPv6 is captured as `[…]`; surrounding brackets stripped + in normalisation. + +**Protocol-relative URL regex:** + +``` +(?i)(?:^|[\s"'(=<>{,\[\]`])//(?:[^/?\s#]+@)?([A-Za-z0-9][A-Za-z0-9.\-]*\.[A-Za-z]{2,}) +``` + +- The non-capturing group `(?:^|[\s"'(=<>{,\[\]` + backtick + `])` + requires a boundary character before the `//`: start-of-line, + whitespace, quote (`"` or `'`), paren `(`, `=`, `<`, `>`, `{`, + `,`, `[`, `]`, or backtick (template literal). Backtick covers + JavaScript/TypeScript template literals + (`` `//cdn.example.com/${path}` ``); `{`, `[`, `,` cover + JSON / TS object literals where a URL string follows a key. +- `(?:[^/?\s#]+@)?` skips userinfo for the same bypass-prevention + reason as the absolute URL regex — `//github.com@evil.example/x` + reports `evil.example`, not `github.com`. +- **Why not `:`?** `:` deliberately excluded — `http://foo.com` has + `//` preceded by `:` (the URL scheme separator). Adding `:` to the + boundary class would cause the protocol-relative regex to also + match the host portion of every absolute URL, double-flagging. +- Prevents matching `// comment text` (the `//` is at column 0 or + preceded by code, but the trailing TLD constraint also filters + out comment dividers like `// foo bar`). +- The host capture `[A-Za-z0-9][A-Za-z0-9.\-]*\.[A-Za-z]{2,}` + requires at least one dot followed by a TLD-like suffix and a + leading alphanumeric character. +- **Known limitation**: back-to-back protocol-relative URLs without a + separator (`//foo.com//bar.com`) miss the second one because the + engine continues from `/bar.com` with no boundary char. Accepted + for v1; no real-world occurrence. +- **Known limitation**: an email-shaped token in a `//` comment + (e.g., `//support@test.com`) is reported as a protocol-relative + URL with userinfo `support@` and host `test.com`. The userinfo + skip cannot syntactically distinguish "URL with userinfo" from + "email in code comment" — and preserving the bypass protection + (so `//github.com@evil.example/x` reports `evil.example`, not the + allowlisted `github.com`) takes priority. Per-line suppression + (`// allow-domain: test.com`) covers the rare intentional case. + +### Suppression marker regex + +The canonical regex (single source of truth — matches the form +documented in [Per-Line Suppression](#per-line-suppression)): + +``` +(?im)(?:^|\s)(?://|\#||$) +``` + +The `(?:^|\s)` anchor is what closes the URL-content bypass (see +[Bypass-resistance](#per-line-suppression)). Any implementation must +use this exact regex; do not introduce a second variant elsewhere. + +**Captured-group handling.** The host capture +`([A-Za-z0-9.\-:\[\],\s]+?)` includes `\s` (whitespace) because hosts +may be comma-separated with surrounding spaces, and an HTML-comment +marker like `` has a space before +`-->` that the lazy quantifier will pull into the capture. The +implementation **must**: + +1. Take the captured string. +2. Split on `,`. +3. Trim each resulting segment of leading/trailing whitespace + (including any spaces the lazy quantifier picked up before + `-->`). +4. Drop empty segments. +5. Lowercase each remaining host for comparison. + +Tests exercise both `` (with the +trailing space before `-->`) and +`// allow-domain: test.com, other.com` (multi-host with spaces). + +### Host normalisation + +```rust +fn normalise_host(raw: &str) -> String { + let trimmed = raw.trim_start_matches('[').trim_end_matches(']'); + trimmed.to_lowercase() +} +``` + +### Allow check + +```rust +const RESERVED_TLDS: &[&str] = &[".example", ".test", ".invalid", ".localhost"]; + +fn is_allowed(host: &str, suppressed_on_line: &HashSet) -> bool { + if suppressed_on_line.contains(host) { return true; } + if RESERVED_TLDS.iter().any(|t| host.ends_with(t)) { return true; } + if EXACT_HOSTS.iter().any(|e| host == *e) { return true; } + if REFERENCE_HOSTS.iter().any(|e| host == *e) { return true; } + if SUBDOMAIN_HOSTS.iter().any(|e| { + host == *e || host.ends_with(&format!(".{}", e)) + }) { return true; } + false +} +``` + +### Line collection: `--staged` mode (gitoxide) + +**No subprocess. No `git` binary on PATH required.** All git operations +go through `gix` APIs. + +The flow: + +1. Open the repo: `gix::open(".")`. +2. Resolve the HEAD tree. +3. Resolve the index (the staging area). +4. Compute the tree-vs-index changes — this is the set of files with + staged modifications, additions, renames, or deletions. +5. For each `Modified` / `Added` / `Renamed` change: + - Load the **old blob** from the HEAD tree (empty for additions). + - Load the **new blob** from the index. + - Run a **blob diff** using `gix-diff::blob` (which wraps + `imara-diff`, the Myers diff implementation `gix` uses + internally). + - Walk the resulting hunks; for each hunk's **post-image (new) line + range**, emit `DiffLine { path, line_no, content }` for each added + line. +6. Skip `Deleted` changes (deletions cannot introduce a violation). +7. Apply the extension/path filter to the _post-image path_ before + loading blobs (cheap filter, avoids unnecessary diffing). + +Sketch (prototype-shaped — concrete `gix` API surface is identified +during implementation; helper names below are placeholders): + +```rust +fn staged_added_lines(repo_path: &Path) + -> Result, Report> +{ + let repo = gix::open(repo_path).change_context(DomainsLintError::OpenRepo)?; + let head_tree = match repo.head_commit() { + Ok(c) => repo.find_tree(c.tree_id()?.detach())?, + Err(_) => repo.empty_tree(), // unborn HEAD + }; + // Materialise the index as a tree so we can diff trees uniformly + // with rename detection enabled. + let index_tree_id = write_index_to_tree(&repo)?; + let index_tree = repo.find_tree(index_tree_id)?; + collect_added_from_trees(&repo, &head_tree, &index_tree) +} +``` + +**The `gix` API surface is RESOLVED by the implementation** — see +`crates/trusted-server-cli/src/dev/lint/domains.rs` for +`collect_added_from_trees` and `write_index_to_tree`. The conceptual +operations: + +1. Open the repository — `gix::open(path)`. +2. Resolve the HEAD commit's tree — `repo.head_commit()?.tree_id()?` + then `repo.find_tree(id)?`. On an unborn HEAD use + `repo.empty_tree()`. +3. Materialise the index as a tree — iterate `index.entries()` + filtered to `Mode::FILE`, `editor.upsert(entry.path(&index), +EntryKind::Blob, entry.id)` on `repo.edit_tree(empty)`, then + `editor.write()`. This lets the same tree-vs-tree machinery + serve both staged and `--changed-vs` modes. +4. Run a tree-vs-tree diff with rename detection — + `old_tree.changes()?` → + `Platform::for_each_to_obtain_tree(&new_tree, callback)` with + `track_rewrites(Some(Rewrites { copies: None, percentage: +Some(0.5), limit: 1000, track_empty: false }))`. The callback + matches `Change::{Addition, Modification, Rewrite, Deletion}` — + pure renames (same blob, new path) yield no added lines, and + rename + edit diffs the matched old blob vs the new blob. +5. Read each blob's content — `repo.find_object(id)?.data`. +6. Run a line-level diff — `gix::diff::blob::Diff::compute( +Algorithm::Myers, &InternedInput::new(old, new))`, then walk + each `hunk.after` range for new-side line numbers and content. + +**Why this is better than shelling out:** + +- No `git` binary on PATH required. +- No diff-text parsing — line numbers and content come from typed + hunk structs. +- No locale / quote-path / `b/` prefix / `/dev/null` edge cases. +- Renamed files are handled by `gix`'s built-in rename detection + (`track_rewrites` on the tree-diff `Platform`) — pure renames + introduce no added lines; rename + edit reports only the truly new + lines. +- Filenames with spaces or non-UTF8 characters: `gix` paths are + `BString` (byte strings). The script lossy-converts to UTF-8 for + output and emits a stderr warning for non-UTF-8 paths. + +### Line collection: `--changed-vs ` mode (gitoxide) + +Same blob-diff machinery, but the two trees are HEAD's tree and the +merge-base tree: + +```rust +fn changed_vs_added_lines(repo_path: &Path, reference: &str) + -> Result, Report> +{ + let repo = gix::open(repo_path).change_context(DomainsLintError::OpenRepo)?; + let head_id = repo.head_id().change_context(DomainsLintError::OpenRepo)?.detach(); + let base_id = resolve_base_ref(&repo, reference)?; + let merge_base = repo + .merge_base(base_id, head_id) + .change_context_lazy(|| DomainsLintError::MergeBase { base: reference.into() })? + .detach(); + let base_tree = commit_tree(&repo, merge_base)?; + let head_tree = commit_tree(&repo, head_id)?; + + // Same tree-vs-tree diff with rename tracking as staged mode; + // the index is just swapped for the merge-base tree. + collect_added_from_trees(&repo, &base_tree, &head_tree) +} +``` + +#### Base-ref resolution order + +In CI, `$GITHUB_BASE_REF` is typically a bare branch name like +`main`. On a freshly-cloned PR working tree, `main` often **does +not exist as a local ref** — only `origin/main` (a remote-tracking +ref) does. A naive `repo.find_reference("main")` would fail. + +`resolve_base_ref(repo, reference)` tries the following candidates +in order and returns the first one that resolves to an object id: + +1. `` exactly (works when the caller passes e.g. + `refs/remotes/origin/main` directly). +2. `refs/heads/` (local branch). +3. `refs/remotes/origin/` (remote-tracking branch — the + common CI case where ` == "main"`). +4. `refs/tags/` (tag — covers release-gate use). + +If none resolve, the linter exits **2** with a message naming all +four candidates that were tried, so the CI failure mode is +diagnosable from log output alone. + +**CI requirements (documented when Stage 2 lands):** + +- `actions/checkout@v4` with `fetch-depth: 0` so the base ref and + the full PR-branch history are reachable. Without it, `gix` + cannot compute a merge-base on a shallow clone and the linter + exits 2. +- Pass the base ref as a bare branch name (`main`) — the + resolution order above handles the `origin/` lookup. Callers + may also pass `origin/main` or `refs/remotes/origin/main` + directly if they prefer to be explicit. +- For fork PRs, the base ref must still be present in the local + clone. `actions/checkout@v4 fetch-depth: 0` covers this. +- **No `git` binary required on the runner.** `gix` reads the + on-disk repo directly. + +### Line collection: full-repo (gitoxide) + +Full-repo audit enumerates tracked files via the index +(`gix::index::State::entries()`), then **reads working-tree content +from disk** (not the index/HEAD blob). + +**Working-tree semantics — explicit decision.** A full-repo audit +therefore reports hosts that appear in the _current local edits_, +including unstaged and uncommitted changes. This is the right +behavior for an interactive developer audit ("what's currently in my +files?") and matches what someone running the linter as a +diagnostic-mode sanity check would expect. It is **not** a stable +"what is committed in this repo" audit. + +If a stable, commit-state audit is needed later (e.g., for a release +gate that reports the state at a tagged commit), a separate mode like +`--at ` would scan blob content from that revision's tree +instead. Out of scope for v1; deferred to follow-up if real demand +appears. + +Untracked files are intentionally skipped — they cannot land in a +commit, and scanning them would falsely flag scratch/tmp files. + +#### Handling tracked-but-missing files and symlinks + +Because we enumerate the **index** and then read the **working +tree**, the two can disagree. Cases the implementation must handle +explicitly: + +1. **Tracked but absent from the working tree** (`rm file` without + `git rm`, or a partial checkout): `symlink_metadata` returns + `NotFound`. Skip with a stderr warning naming the path. Do not + fail — the user may be mid-task. +2. **Symlink** (`symlink_metadata().file_type().is_symlink()`): + skip with a stderr warning ("symlink not followed"). Rationale: + following symlinks would (a) potentially escape the repo + (`/etc/passwd`), (b) double-scan if the target is also tracked, + and (c) is rarely what a linter wants. If a real use case + appears, add `--follow-symlinks` later. **Broken symlinks fall + into this case** — `symlink_metadata` returns information about + the link itself, not the (missing) target, so `is_symlink()` is + `true` and the entry is skipped here. (If we used + `std::fs::metadata` instead, a broken symlink would yield + `NotFound`; we deliberately use `symlink_metadata` to keep + symlink detection independent of target reachability.) +3. **Non-regular file** (FIFO, socket, device): skip with a stderr + warning. Almost never in a real repo, but defensive. +4. **Non-UTF-8 path component**: `gix` returns path entries as + `BString` (byte strings). On Unix, a byte sequence that is not + valid UTF-8 is still a valid path; on Windows, paths must be + convertible to UTF-16 and arbitrary bytes are not accepted. + For consistency and simplicity, the linter **skips non-UTF-8 + entries with a stderr warning** on all platforms in v1. The + working-tree-content read is therefore safe to perform on a + `PathBuf` built from validated UTF-8 only. (A future v2 could + add Unix-only lossless handling via + `std::os::unix::ffi::OsStringExt::from_vec` if real repos hit + this; not expected for trusted-server.) +5. **Binary file** (`std::fs::read_to_string` returns + `InvalidData`): skip with a stderr warning. The extension + filter already excludes most binaries, but a `.json` file with + embedded NULs (rare) would hit this. + +All five cases are warnings, not errors — the audit continues to +the next entry. Exit code reflects only the violation count. + +```rust +fn full_repo_lines() -> Result, Report> { + let repo = gix::open(".").change_context(DomainsLintError::OpenRepo)?; + let index = repo.index().change_context(DomainsLintError::Index)?; + let work_dir = repo.work_dir().ok_or_else(|| Report::new(DomainsLintError::OpenRepo))?; + + let mut out = Vec::new(); + for entry in index.entries() { + let rel_path = entry.path(&index); // BString + // Skip non-UTF-8 paths with a warning (see case 4 above). + let rel_str = match std::str::from_utf8(rel_path.as_ref()) { + Ok(s) => s, + Err(_) => { + warn_skip_bytes(rel_path.as_ref(), "non-UTF-8 path"); + continue; + } + }; + let path = work_dir.join(rel_str); + if !path_is_scanned(&rel_path) { continue; } + // See "Handling tracked-but-missing files and symlinks" above. + let meta = match std::fs::symlink_metadata(&path) { + Ok(m) => m, + Err(e) if e.kind() == std::io::ErrorKind::NotFound => { + warn_skip(&path, "tracked but missing from working tree"); + continue; + } + Err(e) => { + warn_skip(&path, &format!("metadata error: {e}")); + continue; + } + }; + if meta.file_type().is_symlink() { + warn_skip(&path, "symlink not followed"); + continue; + } + if !meta.file_type().is_file() { + warn_skip(&path, "non-regular file"); + continue; + } + let content = match std::fs::read_to_string(&path) { + Ok(c) => c, + Err(e) if e.kind() == std::io::ErrorKind::InvalidData => { + warn_skip(&path, "binary content"); + continue; + } + Err(e) => return Err(Report::new(DomainsLintError::ReadFile(path.clone())) + .attach_printable(e.to_string())), + }; + for (i, line) in content.lines().enumerate() { + out.push(DiffLine { + path: rel_path.into(), + line_no: i + 1, + content: line.into(), + }); + } + } + Ok(out) +} +``` + +### Line collection: explicit paths + +Each path the user named is processed individually. Two layered +behaviors that differ from full-repo mode: + +**Policy filters (extension, path-exclusion, symlink, non-regular, +binary) behave the same as full-repo: warn and skip.** The reason +is consistency — a file that would not be scanned in the full-repo +audit must not be scanned when named explicitly either. Specifically: + +- Path matches an always-excluded location (`node_modules/`, + `.worktrees/`, lockfile basename, etc.): warn and skip. +- Extension not in the scanned set (`.png`, `.markdown`, `.sql`, + etc. — anything outside the + [list above](#file-extensions-scanned)): + warn and skip with `note: is not in scanned extensions; +skipping`. The deferred `--force-scan path/...` escape hatch + remains an Open Question. +- Symlink, non-regular file, binary content (`InvalidData`): + warn and skip per the + [full-repo handling table](#handling-tracked-but-missing-files-and-symlinks). + +**Note on non-UTF-8 paths.** The non-UTF-8 handling described in +the full-repo section applies to **git/index-derived `BString` +paths** (full-repo, `--staged`, `--changed-vs` modes), where the +linter has to convert bytes back into an OS path. Explicit-path +mode receives an OS-supplied `PathBuf` from clap (which on Unix is +an `OsString` byte sequence that may not be UTF-8 but is already a +valid OS path) and passes it directly to the filesystem APIs — no +conversion step, no detection step. If the user explicitly named a +path that the OS accepts, the linter reads it; the non-UTF-8 +classification is best-effort only and primarily applies to paths +the linter discovered via git. + +**Access failures on a user-named path are hard errors, not +warnings.** Differing from full-repo here is intentional: if the +user typed `ts dev lint domains some/file.rs` and `some/file.rs` +does not exist or cannot be read for permissions reasons, that is +almost certainly a typo or a real environment problem the user +should know about — not the "tracked-but-missing during a sweep" +case full-repo handles silently. Treatment: + +- `NotFound`: exit `2` with `CliError::EnvironmentError`, message + `path not found: `. No partial-success — if any explicit + path fails to open, no violations are reported. +- `PermissionDenied` or other `io::Error`: same, with the + underlying error in the message. + +```rust +fn explicit_path_lines(paths: &[PathBuf]) -> Result, Report> { + let mut out = Vec::new(); + for path in paths { + // Policy filters first (warn-and-skip). + if !path_is_scanned_named(path) { continue; } + let meta = std::fs::symlink_metadata(path) + .change_context_lazy(|| DomainsLintError::ReadFile(path.clone()))?; + if meta.file_type().is_symlink() { warn_skip(path, "symlink not followed"); continue; } + if !meta.file_type().is_file() { warn_skip(path, "non-regular file"); continue; } + let content = match std::fs::read_to_string(path) { + Ok(c) => c, + Err(e) if e.kind() == std::io::ErrorKind::InvalidData => { + warn_skip(path, "binary content"); continue; + } + Err(e) => return Err(Report::new(DomainsLintError::ReadFile(path.clone())) + .attach_printable(e.to_string())), + }; + for (i, line) in content.lines().enumerate() { + out.push(DiffLine { path: path.clone(), line_no: i + 1, content: line.into() }); + } + } + Ok(out) +} +``` + +The hard-vs-soft split is documented as the user contract: +explicit paths are "I told you to look at this file"; full-repo is +"sweep over everything the index claims exists." Different intent, +different error behavior. + +### Output Format (`human`) + +``` +crates/trusted-server-core/src/foo.rs:42: disallowed host test.com +trusted-server.toml:15: disallowed host 68.183.113.79 + +2 disallowed hosts found in 2 files. +To allow a new integration proxy, add it to EXACT_HOSTS in +crates/trusted-server-cli/src/dev/lint/domains.rs and document the +integration in a comment. +To suppress one line (e.g., security-test attacker hosts), append +`// allow-domain: ` in a comment. +Run `ts dev lint domains` (no args) for a full-repo audit. +``` + +### Output Format (`json`) + +```json +{ + "violations": [ + { + "path": "crates/trusted-server-core/src/foo.rs", + "line_no": 42, + "host": "test.com", + "line": "let x = \"https://test.com/path\";" + } + ], + "count": 1, + "files_affected": 1 +} +``` + +### Pre-commit hook + +Git invokes the hook as an executable file; the hook itself is +necessarily an OS-executable artifact (this is git's hook contract, +not "shelling out from Rust"). The hook is a minimal one-liner that +runs the `ts` binary. + +**PATH fragility — addressed by embedding the absolute path at install +time.** GUI git tools (Sourcetree, GitHub Desktop, VS Code's git +integration) often do not inherit the shell's PATH, so a hook that +just calls `ts` may fail to find the binary even when +`cargo install_cli` has placed it in `~/.cargo/bin`. To avoid this: + +`ts dev install-hooks` captures the absolute path of the currently-running +`ts` binary (via `std::env::current_exe()`) and writes that absolute +path into the hook: + +```sh +#!/usr/bin/env bash +# .githooks/pre-commit — installed by `ts dev install-hooks`. DO NOT EDIT. +# Generated from . +exec "/Users/example/.cargo/bin/ts" dev lint domains --staged +``` + +If the user later rebuilds or moves the binary, re-running +`ts dev install-hooks` regenerates the hook with the new absolute path. +Without this, the fallback path `exec ts dev lint domains --staged` +relying on PATH is brittle in GUI contexts. + +### Hook installer (Rust subcommand) + +To keep the workflow Rust-only — no shell scripts in `scripts/`, +no `git config` invocation from a script — install via a `ts` +subcommand: + +``` +ts dev install-hooks +``` + +This is a small Rust subcommand on the `ts` CLI that: + +1. Opens the repo via `gix::open(".")`. +2. Resolves the absolute path of the current `ts` executable via + `std::env::current_exe()`. +3. **Preflight: read the existing local `core.hooksPath`** (via + `gix-config::File`): + - **Unset, empty, or already `.githooks`:** proceed. Idempotent + re-run on an existing installation is a no-op for this check. + - **Set to a different path** (`hooks`, `.husky`, `.cargo-husky`, + anything else): **refuse unless `--force`**. The user likely + has another hook chain (husky, cargo-husky, lefthook, a + hand-rolled `hooks/` directory). Silently rewriting their + `core.hooksPath` would disable that chain. Message: + ``` + ts dev install-hooks: refusing to override existing core.hooksPath + current: hooks + would set: .githooks + This would disable your existing hook chain. Choose one of: + 1. Re-run with --force (your existing core.hooksPath value is + printed above; you can restore it later with + `git config --local core.hooksPath hooks`). + 2. Manually add `exec dev lint domains --staged` + to your existing pre-commit hook chain. The absolute path + for this binary is: + ``` + Exit code: 2 (environment error per the exit-code contract — + this is a configuration conflict, not a violation). +4. **Checks for an existing `.githooks/pre-commit`:** + - **Absent:** writes the file fresh. + - **Present, and contains the `# ts-install-hooks: managed` + marker on a known line:** overwrites silently. This is the + managed-file case. + - **Present, but content does not match the managed marker:** + refuses to overwrite. Prints the path of the existing hook, + suggests `--force` to overwrite or merging the contents + manually. Exits non-zero. Rationale: the user may have + hand-edited a custom hook (lint chain, secret scan, etc.); we + never silently clobber. +5. With `--force`, the existing hook (if any) is renamed to + `.githooks/pre-commit.bak.` before writing fresh, and + the existing `core.hooksPath` value (if it pointed elsewhere) is + printed in the success message so the user can restore it later. +6. Sets the executable bit via `std::fs::Permissions` / + `set_permissions` (Unix `0o755`). +7. Sets `core.hooksPath = .githooks` in the local repo config via + the `gix-config::File` write path described under "Persisting + `core.hooksPath`" below (no subprocess). +8. Prints a confirmation message including the embedded binary path + and (under `--force`) any displaced previous `core.hooksPath`. + +Pseudocode (managed-file overwrite policy elided for brevity; see +above): + +```rust +pub fn install_hooks(force: bool) -> Result<(), Report> { + let repo = gix::open(".") + .change_context(InstallHooksError::OpenRepo)?; + let work_dir = repo.work_dir() + .ok_or_else(|| Report::new(InstallHooksError::NoWorkdir))?; + let ts_path = std::env::current_exe() + .change_context(InstallHooksError::CurrentExe)?; + + // Preflight: refuse to clobber a foreign core.hooksPath. + let existing_hooks_path = read_local_config_value( + &repo, "core", None, "hooksPath", + )?; + let displaced_hooks_path = match existing_hooks_path.as_deref() { + None | Some("") | Some(".githooks") => None, // safe to proceed + Some(other) if !force => { + return Err(Report::new(InstallHooksError::ForeignHooksPath { + current: other.to_string(), + proposed: ".githooks".to_string(), + }) + .attach_printable("re-run with --force to override; existing value will be printed for manual restoration")); + } + Some(other) => Some(other.to_string()), // --force; remember to surface + }; + + let hooks_dir = work_dir.join(".githooks"); + let hook_path = hooks_dir.join("pre-commit"); + std::fs::create_dir_all(&hooks_dir) + .change_context(InstallHooksError::WriteHook)?; + + if hook_path.exists() && !is_managed(&hook_path)? && !force { + return Err(Report::new(InstallHooksError::WouldClobber { + path: hook_path, + }) + .attach_printable("re-run with --force to overwrite (existing hook is backed up)")); + } + if hook_path.exists() && force { + // Backup timestamp via std::time, no chrono dependency needed. + let ts_secs = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.as_secs()) + .unwrap_or(0); + let backup = hook_path.with_extension(format!("bak.{ts_secs}")); + std::fs::rename(&hook_path, &backup) + .change_context(InstallHooksError::WriteHook)?; + } + + let content = render_hook(&ts_path); + std::fs::write(&hook_path, content) + .change_context(InstallHooksError::WriteHook)?; + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + let mut perms = std::fs::metadata(&hook_path)?.permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&hook_path, perms)?; + } + + // Persistent local-repo config write: set core.hooksPath = .githooks + // in /.git/config. See "Persisting core.hooksPath" below for + // the concrete file-level write plan via the gix-config crate. + set_local_config_value(&repo, "core", None, "hooksPath", ".githooks")?; + + println!( + "Installed: pre-commit hook → {} (calls {})", + hook_path.display(), + ts_path.display(), + ); + if let Some(prev) = displaced_hooks_path { + eprintln!( + "note: previous core.hooksPath was '{prev}'. \ + To restore: git config --local core.hooksPath {prev}" + ); + } + Ok(()) +} + +fn render_hook(ts_path: &Path) -> String { + format!( + "#!/usr/bin/env bash\n\ + # Installed by `ts dev install-hooks`. DO NOT EDIT.\n\ + # ts-install-hooks: managed\n\ + exec {} dev lint domains --staged\n", + shell_quote(&ts_path.to_string_lossy()), + ) +} + +fn is_managed(hook_path: &Path) -> Result> { + // Returns true if the file contains the marker line + // `# ts-install-hooks: managed` in its first ~10 lines. +} +``` + +The `# ts-install-hooks: managed` marker on a known line is the +signal `is_managed` uses to detect prior-installed hooks. Hand-written +hooks won't have this marker, so they're treated as user content and +preserved unless `--force` is passed. + +#### Shell-safe path quoting in the hook + +`render_hook` writes the hook script's `exec` line. `Path::display()` +and `format!("{:?}", path)` are **not** shell-safe — paths containing +spaces, `$`, backticks, single quotes, or backslashes would break +the hook or silently misbehave (and on some systems open a command +injection through the install-time path). + +The implementation uses POSIX-shell single-quote escaping, which is +trivial and bulletproof — single quotes inside the wrapper are +escaped as `'\''`: + +```rust +fn shell_quote(s: &str) -> String { + // POSIX single-quote escaping: wrap in '...', and any embedded + // single quote becomes '\'' (close, escaped-quote, reopen). + let mut out = String::with_capacity(s.len() + 2); + out.push('\''); + for c in s.chars() { + if c == '\'' { + out.push_str(r"'\''"); + } else { + out.push(c); + } + } + out.push('\''); + out +} +``` + +Tests cover paths containing: spaces (`/Users/Alice Q/.cargo/bin/ts`), +single quotes (`/path/with'quote/ts`), `$` (`/opt/$HOME/ts`), +backticks, backslashes (on Windows-style installer outputs). + +#### Persisting `core.hooksPath` + +`gix::Repository::config_snapshot_mut()` modifies an in-memory +snapshot; persisting back to `/.git/config` is not a single +stable call in current `gix`. The plan is to write the file directly +using the `gix-config` crate's file-level API: + +```rust +fn set_local_config_value( + repo: &gix::Repository, + section: &str, + subsection: Option<&str>, + key: &str, + value: &str, +) -> Result<(), Report> { + use gix_config::File; + let config_path = repo.path().join("config"); // /.git/config + + // Read existing file. If missing, start with an empty File. + let mut file = match File::from_path_no_includes( + config_path.clone(), + gix_config::Source::Local, + ) { + Ok(f) => f, + Err(_) => File::default(), + }; + + // Set the value in the requested section/subsection/key. + file.set_raw_value_by(section, subsection, key, value.as_bytes()) + .change_context(InstallHooksError::ConfigWrite)?; + + // Serialize and write back atomically (write to a temp file in + // the same directory, then rename). + let serialized = file.to_bstring(); + write_atomic(&config_path, serialized.as_slice()) + .change_context(InstallHooksError::ConfigWrite)?; + Ok(()) +} + +/// Read a single value from the local repo config. Returns Ok(None) +/// if the file or key is absent (i.e., never set). Used by the +/// install-hooks preflight to detect a foreign `core.hooksPath`. +fn read_local_config_value( + repo: &gix::Repository, + section: &str, + subsection: Option<&str>, + key: &str, +) -> Result, Report> { + use gix_config::File; + let config_path = repo.path().join("config"); + let file = match File::from_path_no_includes( + config_path, + gix_config::Source::Local, + ) { + Ok(f) => f, + Err(_) => return Ok(None), + }; + Ok(file + .raw_value_by(section, subsection, key) + .ok() + .map(|bytes| String::from_utf8_lossy(&bytes).into_owned())) +} +``` + +`write_atomic` is a small helper that writes to `config.tmp.` +then `rename`s to `config` (atomic on the same filesystem). This +matches git's own behavior of never leaving a partially-written +`.git/config`. + +This replaces the earlier sketch using `config_snapshot_mut` / +`commit()` which is in-memory only. The `gix-config` file-write +path is the documented stable way to durably modify a local repo's +git config without subprocess. + +`ts dev install-hooks` is a one-time setup contributors run after cloning, +alongside `cargo install_cli`. Documented in CONTRIBUTING.md. + +## Testing Strategy + +Following the conventions established in PR #669: unit tests live under +`#[cfg(test)] mod tests` in each module; end-to-end CLI tests use +`assert_cmd` and `tempfile`. + +### Unit tests (in `dev/lint/domains.rs`) + +Pure functions tested directly: `normalise_host`, `is_allowed`, +`extract_hosts_from_line`, `parse_suppression_marker`. + +Diff-collection functions (`staged_added_lines`, +`changed_vs_added_lines`, `full_repo_lines`) are exercised via +end-to-end tests that build a real temp git repo with `gix` and assert +on the collected `DiffLine` values. + +### Allowed-host cases + +1. Plain allowed hosts — `https://example.com`, `https://foo.example.com`, + `https://api.privacy-center.org`, `http://127.0.0.1:8080`, + `https://github.com/x/y`. +2. Subdomain-list rule — `https://foo.example.com` allowed. +3. **Reserved TLDs** — `https://testlight.example`, + `https://something.test`, `https://thing.invalid`, + `https://my.localhost` all allowed. +4. Bracketed IPv6 loopback — `http://[::1]:8080` allowed. +5. Uppercase host — `HTTPS://Example.COM/path` allowed. +6. Quoted / trailing punctuation — `"https://example.com",`, + `(https://example.com)`, `` parse cleanly. +7. Multiple URLs on one line, all allowed — no violations. +8. Protocol-relative allowed — `//www.googletagmanager.com/gtm.js`. +9. Legitimate suppression — `// allow-domain: evil.com` passes when host + matches. +10. Multi-host suppression — `// allow-domain: evil.com, bad.org`. +11. Block-comment / jsdoc suppression — line beginning with ` *` and + immediately followed by the marker, e.g., + ` * allow-domain: evil.com` paired with a URL on the same line: + `let bad = "https://evil.com"; * allow-domain: evil.com` + (constructed; in practice the marker would more often be a `//` + trailing comment on the same line as the URL). The point of the + test is to confirm the `\*\s` branch of the regex fires when the + marker is adjacent to the comment introducer. + +### Disallowed-host cases + +12. Plain disallowed hosts — `https://test.com`, `https://partner.com`, + `https://1.2.3.4` → 3 violations. +13. Subdomain-attack lookalike — `https://example.com.evil.com` flagged. +14. Exact-only subdomain attempt — `https://anything.api.privacy-center.org` + flagged. +15. Non-loopback IPv6 — `http://[2001:db8::1]/` flagged as `2001:db8::1`. +16. Protocol-relative disallowed — `//cdn.example.evil/foo` flagged. +17. Multiple disallowed on one line — both reported. +18. **Bypass attempt via URL content** — + `fetch("https://evil.com/allow-domain")` → flagged. +19. **Bypass attempt via URL-path comment-lookalike** — + `fetch("https://evil.com/x//allow-domain: evil.com")` → flagged. +20. **Wrong host in marker** — + `https://evil.com // allow-domain: other.com` → `evil.com` flagged; + stderr warning notes `other.com` was listed but did not match. + 20a. **Placeholder URL with malformed host** — + `https://...` in a Markdown placeholder must NOT extract host + `...` (the regex requires an alphanumeric first character). + Asserts the URL is silently skipped (it is not a real URL). + 20b. **Template-literal protocol-relative URL** — + `` `//cdn.example.evil/${path}` `` (JS/TS template literal) + flagged as `cdn.example.evil`. Asserts backtick boundary works. + 20c. **JSON object value with protocol-relative URL** — + `{"src": "//cdn.example.evil/x"}` flagged. Asserts `{` and `,` + boundary characters work for JSON contexts. + 20d. **Suppression marker with trailing whitespace before `-->`** — + `` correctly trims the host + (captured group ends with spaces, but split+trim yields + `["test.com"]`). + 20e. **Suppression marker with multi-host whitespace** — + `// allow-domain: a.com , b.com , c.com` correctly yields + `["a.com", "b.com", "c.com"]`. + +### `--staged` mode cases (`assert_cmd` end-to-end) + +Each test sets up a temp git repo using `gix::init`, populates blobs +and the index with `gix` APIs (no shell), runs the binary with +`assert_cmd`, asserts exit code and stdout/stderr. + +21. New violation in staged change → exits 1 with correct `path:line`. +22. Existing violation, unrelated staged change → exits 0. +23. Renamed file with added violation → reported at new path. +24. File deletion of a file containing disallowed URL → exits 0. +25. Filename with spaces or non-ASCII characters — handled correctly + by `gix` (no quoting layer to fight with); reported normally. + **Non-UTF-8 path component in a staged diff: reported normally + with a stderr warning that the path is being displayed + lossy-UTF-8.** This intentionally differs from + [full-repo mode](#handling-tracked-but-missing-files-and-symlinks) + (case 4), which skips non-UTF-8 entries. The reason: a staged + diff is built from blob ids and tree entries, so the host + extraction happens against blob content regardless of how the + path renders for display. Skipping a staged change just because + its path bytes are not valid UTF-8 would silently hide a + violation the user is actively trying to commit — exactly the + opposite of what `--staged` mode exists for. Full-repo mode, + by contrast, has no commit-intent signal and the working-tree + `read_to_string` path is simpler to keep consistent by + skipping. + + Implementers: do not generalize the full-repo non-UTF-8 skip + rule to `--staged` / `--changed-vs` modes. + +26. Multiple hunks in one file — all added lines reported correctly. + +### `--changed-vs` mode cases + +27. Two commits on a branch, second adds a violation → reported. +28. Merge-base correctly computed when branch is behind base. +29. Missing remote ref → exits 2 with clear message. + +### Path-exclusion and inclusion cases + +30. `node_modules/foo.js` with `https://test.com` → ignored. +31. `.worktrees/x/y.rs` → ignored. +32. `*.html` extension → scanned. Files under + `crates/trusted-server-core/src/integrations/**/fixtures/**` are + skipped by path; other `.html` files (e.g., + `crates/trusted-server-core/src/html_processor.test.html`) are + scanned normally. +33. **Proves the `**/fixtures/**` blanket exclusion was removed**: + `crates/integration-tests/fixtures/frameworks/nextjs/app/page.tsx` + fixture with `https://test.com` → reported. +34. `package-lock.json` → ignored. + +### Markdown-specific cases + +35. **Allowed reference link in normal Markdown** — + `[the Fastly docs](https://developer.fastly.com/learning)` in a + `.md` file → no violation (covered by `REFERENCE_HOSTS`). +36. **Disallowed Markdown link target** — + `[bad](https://test.com)` → flagged as `test.com` at the + correct line. +37. **Autolink form** — `` flagged; the angle + brackets are wrapping, not part of the URL. +38. **HTML comment suppression in Markdown** — + a line containing `https://test.com` followed by + `` → suppressed; same line with a + wrong-host marker `` → flagged + with the stderr warning. +39. **Multiple links on one line** — + `see [a](https://github.com/x) and [b](https://test.com)` → + one violation reported (`test.com`). +40. **Fenced code block — disallowed** — + a triple-backtick block containing + `curl https://test.com/foo` is scanned and reported. Documents + that fenced blocks are NOT skipped; per-line suppression + (`` outside the fence on the + same logical line is impractical) requires either an inline HTML + comment in the code-block language's comment syntax (e.g., + `# allow-domain: test.com` for shell) or rewriting the example + to use `.example`. +41. **Fenced code block — allowed reference** — + triple-backtick block referencing `https://docs.rs/clap` → no + violation. +42. **Reference list at end of Markdown** — link-reference syntax + `[1]: https://test.com` is scanned (the URL is still extracted + by the absolute-URL regex regardless of Markdown semantics). +43. **Image link** — + `![alt](https://test.com/img.png)` flagged. + +### Environment cases + +44. **Not inside a git repo** — `gix::open` fails → + exits 2 with `DomainsLintError::OpenRepo` and a clear message. +45. **Bare repo / no working tree** — `gix::open` succeeds but + `repo.work_dir()` is `None` (only relevant for the full-repo + mode that reads working-tree files) → exits 2 with a clear + message. +46. **No git binary on PATH at all** — the linter still works + end-to-end (verified by running the binary under `env -i PATH=""`, + confirming `gix` is self-contained). +47. Run unit tests under `cargo test --package trusted-server-cli` + on the host target (matches PR #669's split CI lanes). + +## Trade-offs + +- **Pre-commit-only enforcement is bypassable.** `git commit --no-verify` + skips the hook. Closed by the migration plan. +- **`--staged` mode misses violations introduced via rebase/merge** that + don't go through `git commit`. CI follow-up catches them. +- **Inline allowlist requires editing the Rust source.** Each new + integration proxy requires a code change + review. Acceptable given + expected low churn. +- **Existing violations are not addressed.** They remain until those + files are touched. The full-repo audit (`ts dev lint domains` no args) is + **diagnostic-only** in Stage 1 — it will report many existing + violations; that is expected, not a failure. +- **Bare-string hostnames are not detected.** Config values like + `cookie_domain = "test-publisher.com"` are out of scope. +- **`REFERENCE_HOSTS` are allowed in every scanned file, including + production source.** This is intentional. A production `.rs` + change that introduces `let x = "https://github.com/...";` will + pass the linter. The alternative — restricting reference hosts + to comment-only contexts (`///` in `.rs`, `#` in `.toml`, + `` in `.md`) — would require a comment-aware tokenizer + per language and was rejected as over-engineering for a small + risk surface. Code review catches stray reference URLs that + matter; the linter's purpose is preventing test-pollution and + unvetted _integration_ endpoints, not policing every documentation + link. If a real incident shows production code routinely embedding + reference URLs as runtime values, revisit with a per-context + policy. +- **Non-UTF-8 filenames** are skipped in full-repo / explicit-path + working-tree reads with a stderr warning. `gix` preserves diff paths + as `BString` internally, but v1 intentionally avoids platform-specific + lossless path reconstruction from arbitrary bytes. +- **Back-to-back protocol-relative URLs without a separator** + (`//a.com//b.com`) miss the second host. No real-world occurrence in + this repo. +- **PR #669 hard prerequisite.** This work requires a base that already + contains #669's CLI crate and host-target CI lane. The implementation + may either wait for #669 to merge to `main` or stack on PR #669's + branch; if #669 stalls without a stackable branch, this design needs + revisiting (alternative: ship as a standalone `trusted-server-lint` + crate). +- **New top-level dependency: `gix`.** Pulls in ~15 sub-crates + (gix-diff, gix-revision, gix-index, gix-config, etc.). Adds + meaningful compile time to the host-target CLI build. Mitigation: + use `default-features = false` and enable only the needed features + (`blob-diff`, `revision`, `index`, `config`). Acceptable because the + alternative (shelling to `git`) was rejected as a hard requirement. + +## Stage 1 Doc Cleanup Plan + +Bringing `.md` into scope means the current docs have many +non-allowlisted hosts that need triage. The full-repo audit +(`ts dev lint domains` with no args) is diagnostic-only in Stage 1 +precisely so this cleanup can happen incrementally — but it is a +**committed workstream**, not "incidental noise we'll get to." + +### Verified disallowed hosts in current `.md` files + +A grep against the current `docs/` and root-level Markdown surfaces +these example categories (representative, not exhaustive — the +implementation runs the full audit and produces the complete list): + +| Host | Category | Resolution | +| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| `aps.amazon.com` | Real Amazon doc/product page | Add to `REFERENCE_HOSTS` if linked repeatedly, otherwise suppress per-line | +| `api.lockr.io` | Legitimate lockr integration endpoint | Add to integration `EXACT_HOSTS` (lockr) — verify it is actually proxied | +| `krk.kargo.com` | Kargo bidder host | Verify if proxied; add to integration list OR rewrite illustrative usage to `.example` | +| `sync.ssp.com`, `ec.publisher.com`, `tracker.com`, `advertiser.com`, `cdn.com`, `short.link`, `redirect1.com`, `redirect2.com`, `final.com`, `new-server.com`, `publisher.com`, `partner.com`, `web.prebidwrapper.com`, `prebid-server.com`, `your-server.com` | Illustrative placeholders in `docs/guide/creative-processing.md`, `docs/guide/first-party-proxy.md`, etc. | **Rewrite to RFC 2606 reserved hosts** (`tracker.example.com`, `advertiser.example.com`, `cdn.example.com`, `short.example`, etc.) | +| `formally-vital-lion.edgecompute.app` | One-off Fastly Compute test URL | Suppress per-line where it appears | +| `getpurpose.ai` | Test site in PR #669 reviewer instructions | Rewrite to `example.com` or suppress | +| `192.168.1.1` | RFC 1918 private IP example | Rewrite to a reserved host or `127.0.0.1` | + +### Cleanup policy + +1. **Strongly prefer rewriting illustrative example hosts to RFC 2606 + reserved names** (`*.example.com`, `*.example.net`, `*.example.org`, + `*.example`, `*.test`, `*.invalid`, `*.localhost`). This is the + default for placeholder URLs in tutorials, prose, and code + snippets. It is also the answer to the + "multi-line fenced-code-block suppression" pain point — the linter + has no block-level suppression mechanism (intentional: keeps the + tool simple), so multi-line examples that would otherwise need + one marker per line should be rewritten to reserved hosts instead. +2. **Add legitimate integration / vendor hosts to the appropriate + allowlist** when they appear in multiple files and have a real + integration backing them (e.g., `api.lockr.io`). +3. **Suppress per-line only for true one-offs** — security write-ups + referencing a CVE-relevant domain, attacker placeholders in + security tests (`evil.com`), single citations of an external + resource. Suppressing 20 illustrative occurrences of a placeholder + is a smell — rewrite to reserved instead. + +### Cleanup execution + +The cleanup PR(s) land **after** the linter ships (Stage 1) but +**before** Stage 2 (CI gate on changed lines), so contributors get +the protection of the local hook immediately while the doc cleanup +happens in parallel without blocking the main release. + +Suggested execution order: + +1. Land the linter and pre-commit hook (this design). +2. Produce a frequency-ordered host report. The human output + includes file paths and summary lines, so naive `sort | uniq -c` + over the human format counts _lines_, not hosts. Use the JSON + output and a small parser: + + ```sh + ts dev lint domains --format json \ + | jq -r '.violations[].host' \ + | sort | uniq -c | sort -rn + ``` + + This gives ` ` lines sorted by frequency, which + feeds the triage in step 3. + + **Requires `jq`** (Homebrew: `brew install jq`; most CI runners + already have it). If `jq` is not available locally, a + no-extra-tool alternative until a built-in `--summary hosts` + mode is added (deferred Open Question): + + ```sh + ts dev lint domains --format json \ + | python3 -c 'import json,sys,collections; d=json.load(sys.stdin); \ + c=collections.Counter(v["host"] for v in d["violations"]); \ + [print(f"{n:6d} {h}") for h,n in c.most_common()]' + ``` + +3. Triage the top ~80% of violations into the three categories above. +4. Submit cleanup PRs grouped by file (so each PR is reviewable): + `docs/guide/creative-processing.md`, + `docs/guide/first-party-proxy.md`, + `docs/guide/api-reference.md`, + etc. +5. Each cleanup PR runs the linter's `--changed-vs main` mode as a + self-check. +6. Once the audit is clean (or down to a small known list), enable + Stage 2 CI. + +## Migration to CI + +**Stage 1 (this design + the cleanup workstream above):** Pre-commit +hook calling `ts dev lint domains --staged`. Prevents _new_ +violations. Full-repo audit available but diagnostic-only; the doc +cleanup runs in parallel. + +**Stage 2:** GitHub Actions workflow runs +`ts dev lint domains --changed-vs $GITHUB_BASE_REF` on every PR. Same +delta-only enforcement, unbypassable. Requirements: + +- `actions/checkout@v4` with `fetch-depth: 0` (or explicit fetch of + `$GITHUB_BASE_REF`). +- Reuse the host-target CI lane introduced by PR #669 (since `ts` + binary is host-target only). + +**Stage 3 (optional, deferred):** Either (a) clean existing violations +and add full-repo audit as a CI gate, or (b) snapshot a baseline file +and run full-repo audit with baseline subtraction. Choice deferred +until Stages 1 and 2 are stable. + +## Resolved Decisions + +Settled choices that the implementer should not re-litigate. Kept +here as historical context with the rationale, so future readers can +see _why_ each decision went the way it did rather than re-opening +the question. + +1. **Subcommand naming and ownership.** `ts dev lint domains` and + `ts dev install-hooks`. Both `lint` and `install-hooks` are + developer-workflow commands and belong under `dev`, not on the + operator-facing top level (`config`, `auth`, `audit`, + `provision`). This PR owns the required refactor of the existing + PR #669 `ts dev` leaf into a subcommand group, with `ts dev serve` + for the existing behavior. The earlier review's suggestion to keep + `ts lint domains` top-level was explicitly rejected by the spec + owner — `dev` parent is the chosen shape. +2. **`cdn.prebid.org` on the integration allowlist** (rather than + rewriting the `prebid.rs` test code to `.example`). The tests + verify rewriting of real-world Prebid CDN URLs; converting them + to reserved hosts would weaken the test's intent. +3. **Stage 1 ships without a full cleanup of existing violations.** + Existing violations are cleaned incrementally as files are + touched, with the dedicated workstream tracked in + [Stage 1 Doc Cleanup Plan](#stage-1-doc-cleanup-plan). The + linter ships now; the doc audit happens in parallel. +4. **Suppression marker syntax: `allow-domain: `, + comment-anchored, host-validated.** Alternatives considered: + bare `allow-domain` without a host (rejected — bypassable via + URL paths), `allowed-domain:` (rejected — verbose without + benefit), block-level suppression markers (rejected — adds + state tracking and complexity; rewriting to reserved hosts + covers the multi-line case). +5. **`ts dev install-hooks` clobber-detection signature.** The + `# ts-install-hooks: managed` marker on a known line is the + detection heuristic. Unmanaged hooks are refused without + `--force`. A `--append-to-existing` mode is left for later if + demand surfaces. +6. **`--force-scan` escape hatch for explicit paths is NOT in + v1.** Explicit paths honour the extension filter (skipped with + stderr warning). Adding `--force-scan` is deferred until a real + workflow needs it. + +## Resolved by the Phase 2 spike + +1. **`gix` API entry points — RESOLVED.** The Phase 2 feasibility + spike (`crates/trusted-server-cli/tests/spike_gix_*.rs`) pinned + the following gix 0.83 entry points: - **Repo / objects:** `gix::open`, `gix::init`, + `Repository::write_blob`, `Repository::find_object` (→ + `Object { data: Vec, .. }`), `Repository::find_tree`, + `Repository::find_commit`, `Repository::head_commit`, + `Repository::head_id`. - **Tree construction (test fixtures):** `Repository::empty_tree`, + `Repository::edit_tree` + `Editor::upsert` + `Editor::write`, + `Repository::commit_as` (with `Signature::to_ref` + + `gix::date::parse::TimeBuf`). - **Tree traversal:** `tree.traverse().breadthfirst.files()` → + `Vec`. + Filter to blobs with `EntryMode::is_blob()`. - **Index:** `Repository::index()` → entries via + `state.entries()`, path via `entry.path(&state)`, blob id via + `entry.id`, file filter via + `entry.mode.contains(gix::index::entry::Mode::FILE)`. Building + a fixture index: `gix::index::State::new` + + `dangerously_push_entry` + `sort_entries` + + `gix::index::File::from_state` + `File::write`. - **merge-base / refs:** `Repository::merge_base(base, head)`, + `Repository::find_reference` + `Reference::peel_to_id` + (`peel_to_id_in_place` is deprecated), `Repository::reference` + for branch creation, `Repository::edit_reference` with a + `Target::Symbolic` `RefEdit` for moving HEAD. - **Blob line diff:** `gix::diff::blob::{Algorithm, Diff, +InternedInput}` — `Diff::compute(Algorithm::Myers, &input)`, + then `diff.hunks()`; each `Hunk.after` is the new-side token + (line) range. - **Tree-vs-tree diff with rename detection.** Both the staged + and `--changed-vs` collectors call `old_tree.changes()` → + `Platform::for_each_to_obtain_tree(&new_tree, ...)` with + `track_rewrites(Some(Rewrites { copies: None, percentage: +Some(0.5), limit: 1000, track_empty: false }))`. The callback + iterates `Change::{Addition, Modification, Rewrite, +Deletion}` — pure renames (same blob, new path) yield no + added lines; rename + edit diffs the matched old blob vs the + new blob. + + **Resolution note:** an earlier revision of this spec rejected + `Platform`/`for_each_to_obtain_tree` and prescribed a manual + map-walk. That approach silently broke renames: a renamed file + hit `(None, Some(new_id))` and was diffed against an empty + blob, reporting every line of the renamed file as added + (including pre-existing violations the author never touched). + The current spec uses the `Platform` API so rename detection + is correct by construction. + + For staged mode, the index is first materialised as a tree + via `Repository::edit_tree` → per-entry `Editor::upsert(path, + + EntryKind::Blob, entry.id)`→`Editor::write()`, then the + same tree-vs-tree path serves both modes. + + - **gix-config:** `File::from_path_no_includes(path, + + Source::Local)`, `File::set_raw_value`(dotted`AsKey`form — + avoids the`File<'event>`invariance that bites + `set_raw_value_by`), `File::raw_value`, `File::to_bstring`. + +2. **`gix` / `gix-config` version pins — RESOLVED.** `gix = 0.83`, + `gix-config = 0.56`, same gitoxide release family. See + [Cargo dependencies](#cargo-dependencies) for the full feature + set and rationale. + +## Open Questions + +Genuine unresolved items, deferred beyond v1. + +1. **Stable-commit audit mode (`--at `).** Full-repo audit + currently reads working-tree content (current local edits + included). If a release-gate use case appears that needs an + "at a tagged commit" view, add an `--at ` mode that scans + blob content from that revision's tree. Deferred until real + demand surfaces; not part of v1.