From 9b7531b9086731400a5b14d5a4c1ab594d1fddda Mon Sep 17 00:00:00 2001 From: ukimsanov Date: Sun, 14 Jun 2026 00:39:59 -0700 Subject: [PATCH 1/2] fix(capture/lint/producer): pipeline robustness fixes from real-AI-test runs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Surgical bugfixes accumulated across a series of real-AI-test runs (heygen.com, huly.io, heygen-showcase). Each fix targets a specific observed defect; happy paths are untouched. packages/cli/src/capture/ • assetCataloger.ts: surface three structural logo signals on every cataloged asset (inBanner / inHomeLink / matchesTitleBrand). The prior class-substring-only isLogo detector caught 0/32 SVGs on heygen.com and 0/19 on huly.io — modern React/Tailwind builds don't put "logo" or "brand" in any className. The new signals catch the universal "site header logo" pattern. Boolean merge semantics: any positive sample wins through context-merge + srcset dedup. • tokenExtractor.ts: broaden inline-SVG isLogo via the same three structural signals (header/nav/role=banner ancestor, root-href anchor parent, document.title brand-segment match in aria-label). No change to the existing class-substring detector — runs first, new heuristics only fire when it misses. • assetDownloader.ts: content-hash SVG slugs. SVG filenames are now `svg-<8char-sha1>.svg` (or `logo-.svg` when isLogo flags fire), replacing the previous label-derived slugging that mis-attributed brand carousels. Verified by rasterizing real captured SVGs: heygen-logo.svg actually contained the Google wordmark, hubspot-logo.svg contained Trivago, huly-logo.svg contained "Kube", heygen-logo.svg → "oogo". Catalog → URL label inference (aria-label / nearest-heading / sectionClasses) is too drift-prone across partner-logo carousels; content-hash names are invariant by construction. • contentExtractor.ts: SVG→PNG rasterization via sharp before sending to Gemini Vision. Previous path sent raw SVG markup as text and hit pure-hallucination output on wordmarks (VIVIENNE for HubSpot, "wrestling" for Workday). Vision models can read PNG pixels reliably; they cannot mental-render path commands. Adds polarity detection (white-glyph vs dark-glyph) so an SVG that flattens to a blank PNG against the wrong background gets inverted automatically before captioning. • contentExtractor.ts: LOGO tag in asset-descriptions.md lines when the structural signals fire (independent of Gemini). The no-Gemini-key fallback still emits an ⚠ banner + the LOGO-tagged lines so agents can grep for logos via filename pattern even without Vision. • index.ts: asset-descriptions.md header branches on Gemini-key presence with an explicit "Vision was OFF, descriptions are catalog-derived" warning + a fallback recipe ("open LOGO-tagged SVGs in a previewer before referencing"). Progress message also reports catalog-fallback mode. • capture/assetCataloger.ts + capture/tokenExtractor.ts regex escape: `/^https?:\\/\\/[^/]+\\/?$/` inside the page.evaluate template literal. The original `/^https?:\/\/[^/]+\/?$/` was collapsing `\/` to `/` inside the template (because backslash before a non-escape char is consumed), producing a parse error on every capture. Capture against heygen.com and huly.io both 100% blocked on this until the escape was fixed. packages/core/src/lint/utils.ts • findRootTag masks , , and ranges before tag extraction. A literal