Skip to content

Add cold-outbound skill — SDR lead gen with Browserbase search and deep research#68

Open
jay-sahnan wants to merge 1 commit intomainfrom
cold-outbound
Open

Add cold-outbound skill — SDR lead gen with Browserbase search and deep research#68
jay-sahnan wants to merge 1 commit intomainfrom
cold-outbound

Conversation

@jay-sahnan
Copy link
Copy Markdown

@jay-sahnan jay-sahnan commented Apr 3, 2026

Summary

  • New skill for automated outbound lead generation: discovers target companies via Browserbase Search API, deeply researches each using a Plan→Research→Synthesize pattern, scores ICP
    fit, finds contacts, and generates personalized cold emails compiled into a scored CSV
  • Includes 5 bundled scripts (search, smart fetch, CSV compilation, URL dedup, batch writer) and 3 reference docs (research patterns, workflow templates, email templates)
  • Supports three depth modes (quick/deep/deeper) for balancing scale vs intelligence

Note

Medium Risk
Adds a brand-new outbound automation skill plus helper scripts that fetch external web content and write/clean up files under /tmp, so failures could affect lead output quality and local file hygiene but the changes are isolated to a new skills/cold-outbound directory.

Overview
Introduces a new cold-outbound skill that defines an 8-step pipeline for outbound lead generation: build/confirm a persistent sender company profile, discover targets via Browserbase Search, enrich and ICP-score companies using a Plan→Research→Synthesize workflow, optionally find contacts, then generate personalized emails and compile a final scored CSV.

Adds bundled automation scripts to support the pipeline: bb_search.ts (Browserbase Search with rate-limit retry), bb_smart_fetch.ts (Fetch API fast-path with Playwright/Browserbase session fallback and optional raw-mode for sitemap.xml/llms.txt), list_discovery_urls.py (dedupe discovered URLs by domain), write_batch.py (stdin→JSON batch writer), and compile_csv.py (merge/dedupe enrichment batches, produce CSV, emit summary, and optionally clean up batch files). Includes reference docs and example profiles to standardize subagent prompts, schemas, and email templates.

Reviewed by Cursor Bugbot for commit 985032a. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 985032a. Configure here.

def load_batches(tmp_dir: str) -> list[dict]:
"""Load all enrichment batch files and extract company records."""
pattern = os.path.join(tmp_dir, "cold_enrichment_batch_*.json")
files = sorted(glob.glob(pattern))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final CSV misses email data from Step 8

High Severity

load_batches only globs for cold_enrichment_batch_*.json, but Step 8 email-generation subagents write their output (with emails and contact columns) to cold_final_batch_*.json. When compile_csv.py is re-run in Step 8 to produce the final CSV, it never reads the final batch files — so the compiled CSV silently omits all personalized emails and contact info. Notably the cleanup code on line 163 does reference cold_final_batch_*.json for deletion, confirming the file pattern exists but is never loaded.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 985032a. Configure here.

}

// Structured mode: parse HTML into company data
console.error(`[smart_fetch] Fetch API succeeded, parsing HTML...`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log message hardcodes wrong fetch method source

Low Severity

The log message on line 221 hardcodes "Fetch API succeeded" but this code path also executes when content came from the browser fallback. The fetchMethod variable already tracks the actual source. This produces misleading debug output when diagnosing fetch issues.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 985032a. Configure here.

console.error("[smart_fetch] Falling back to browser...");
content = await fetchWithBrowser(url);
fetchMethod = "browser";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw mode bypassed by HTML content quality heuristics

Medium Severity

tryFetchApi applies HTML-focused quality heuristics (needsBrowserFallback) unconditionally, but the raw flag isn't checked until after the fallback decision in main(). For --raw fetches of sitemap.xml or llms.txt, short content (<500 chars) or low text density in XML triggers a browser fallback. The browser then returns rendered HTML via page.content() instead of raw XML/text, corrupting the output that downstream sitemap parsing depends on.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 985032a. Configure here.

Copy link
Copy Markdown
Contributor

@shrey150 shrey150 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally need to fix:

  • Remove Playwright dep & use Stagehand directly
  • Use bb instead of custom API wrappers
  • Confirm that you're following best skill standards
  • Use .mjs instead of .py files for custom scripts (since bb is Node-based)

@@ -0,0 +1,96 @@
// Browserbase Search API wrapper for cold outbound lead discovery.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use bb search instead of needing this script?

@@ -0,0 +1,42 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, let's not mix and match Python scripts and TS/JS. I'd prefer everything is in TypeScript via .mjs files. see /cookie-sync for an example on how to do this

"private": true,
"dependencies": {
"@browserbasehq/sdk": "^2.8.0",
"playwright": "^1.52.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid Playwright unless needed? Please use Stagehand instead. Notice you need a double dep with Browserbase + Playwright instead of just Stagehand bc of this

@@ -0,0 +1,169 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of compiling a CSV from disparate JSON files, it might be better to provide example individual company research in mini Markdown files, then instructing the LLM to compile all Markdown files into a final CSV by generating code on the fly, instead of prescribing a solution via Python this way.

TLDR skills should remain readable (i.e. prefer Markdown as intermediate data format) and not overly prescriptive (i.e. provide a set script to run). Open to opinion on the 2nd based on your benchmarking as to what works best. My guess is the model is smart enough to gen its own approach given a few pointers. Perhaps try auto-researching to eval what works best?

@@ -0,0 +1,236 @@
// Smart fetch with Browserbase Fetch API fast-path and Stagehand browser fallback.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, let's use bb fetch here?

@@ -0,0 +1,287 @@
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify in PR description that you've done the following:

  1. Followed the Agent Skills standard as best as possible (https://agentskills.io/llms-full.txt)
  2. Compared against popular skills in https://github.com/anthropics/skills
  3. Observed 5-10 runs and noticed for any steps for which the agent thrashes, distilled learnings from that into the skill, and ran again to verify it doesn't happen anymore
  4. Checked that the skill performs well on both Claude Code & Codex, ideally also benchmark on OpenCode as well.

^ may be best to automate this in CI as mentioned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants