Create a course information scraper

## 🧠 Context

`src/data/courses.json` is currently hand-crafted with only 8 courses. This ticket builds a script that scrapes the Carleton undergraduate calendar to produce a full course list in the same format.

The scraper's job is data extraction only — it does not parse prerequisites into the AST. Each course it outputs should have `prereq: null` and `prereqRaw` set to the raw prerequisite text copied verbatim from the calendar. The prereq parser (separate ticket) handles converting those strings into structured AST nodes later.

The output schema is the `Course` type defined in `src/types/course.ts`. Every field must be populated:
- `code` — e.g. `"COMP 1405"`
- `title` — full course title
- `credits` — as a number (Carleton uses 0.5 for one-term courses)
- `description` — course description text
- `prereq: null` — always null, filled in by the parser later
- `prereqRaw` — raw prerequisite string from the calendar, or `null` if no prerequisite is listed
- `precludes` — array of course codes listed as preclusions, or `[]`

Start with COMP courses. SYSC, MATH, and STAT are stretch goals and can be covered later since CS students commonly take courses from those departments and they appear in some COMP prerequisite trees.

---

## 🛠️ Implementation Plan

1. Create a `scripts/` folder at the project root. The scraper lives at `scripts/scrape-courses.ts` and is run with:
   ```sh
   pnpm run scrape:courses
   ```
   Add this script to `package.json`.

2. Install `cheerio` for HTML parsing. Use Node's built-in `fetch` (available in Node 22) for HTTP requests — do not add axios or node-fetch. This project has security settings in `pnpm-workspace.yaml` — if the cheerio install is blocked by a policy error, flag it to Jacc rather than working around it.

3. Before writing any code, inspect the Carleton undergraduate calendar course listing pages in your browser. Understand the HTML structure — how courses, titles, credits, descriptions, prerequisites, and preclusions are marked up. Save a real HTML page from the calendar as a fixture file under `scripts/fixtures/` to use in tests.

4. Write the scraper to fetch the course listing page(s), parse the HTML with Cheerio, and extract the fields above for each course entry.

5. Write tests in `scripts/scrape-courses.test.ts` that run against the saved HTML fixture — not the live network. Verify that a known course (e.g. COMP 3004) is extracted correctly with the right code, title, credits, and prereqRaw string.

6. The scraper should write its output to `scripts/output/courses-scraped.json`, not directly to `src/data/courses.json`. The merge with the existing hand-crafted entries needs a human review step.

---

## ✅ Acceptance Criteria

- [x] `pnpm run scrape:courses` runs without errors and writes `scripts/output/courses-scraped.json`
- [x] Output is a valid JSON array where every entry conforms to the `Course` type
- [x] `prereq` is `null` on every entry
- [x] `prereqRaw` contains the raw prerequisite text from the calendar, or `null` if none
- [x] `precludes` is populated where the calendar lists preclusions, or `[]`
- [x] Tests run against a saved HTML fixture (no live network calls in tests)
- [x] COMP courses are fully covered
- [x] `pnpm typecheck` passes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a course information scraper #8

🧠 Context

🛠️ Implementation Plan

✅ Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Create a course information scraper #8

Description

🧠 Context

🛠️ Implementation Plan

✅ Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions