🧠 Context
src/data/courses.json is currently hand-crafted with only 8 courses. This ticket builds a script that scrapes the Carleton undergraduate calendar to produce a full course list in the same format.
The scraper's job is data extraction only — it does not parse prerequisites into the AST. Each course it outputs should have prereq: null and prereqRaw set to the raw prerequisite text copied verbatim from the calendar. The prereq parser (separate ticket) handles converting those strings into structured AST nodes later.
The output schema is the Course type defined in src/types/course.ts. Every field must be populated:
code — e.g. "COMP 1405"
title — full course title
credits — as a number (Carleton uses 0.5 for one-term courses)
description — course description text
prereq: null — always null, filled in by the parser later
prereqRaw — raw prerequisite string from the calendar, or null if no prerequisite is listed
precludes — array of course codes listed as preclusions, or []
Start with COMP courses. SYSC, MATH, and STAT are stretch goals and can be covered later since CS students commonly take courses from those departments and they appear in some COMP prerequisite trees.
🛠️ Implementation Plan
-
Create a scripts/ folder at the project root. The scraper lives at scripts/scrape-courses.ts and is run with:
Add this script to package.json.
-
Install cheerio for HTML parsing. Use Node's built-in fetch (available in Node 22) for HTTP requests — do not add axios or node-fetch. This project has security settings in pnpm-workspace.yaml — if the cheerio install is blocked by a policy error, flag it to Jacc rather than working around it.
-
Before writing any code, inspect the Carleton undergraduate calendar course listing pages in your browser. Understand the HTML structure — how courses, titles, credits, descriptions, prerequisites, and preclusions are marked up. Save a real HTML page from the calendar as a fixture file under scripts/fixtures/ to use in tests.
-
Write the scraper to fetch the course listing page(s), parse the HTML with Cheerio, and extract the fields above for each course entry.
-
Write tests in scripts/scrape-courses.test.ts that run against the saved HTML fixture — not the live network. Verify that a known course (e.g. COMP 3004) is extracted correctly with the right code, title, credits, and prereqRaw string.
-
The scraper should write its output to scripts/output/courses-scraped.json, not directly to src/data/courses.json. The merge with the existing hand-crafted entries needs a human review step.
✅ Acceptance Criteria
🧠 Context
src/data/courses.jsonis currently hand-crafted with only 8 courses. This ticket builds a script that scrapes the Carleton undergraduate calendar to produce a full course list in the same format.The scraper's job is data extraction only — it does not parse prerequisites into the AST. Each course it outputs should have
prereq: nullandprereqRawset to the raw prerequisite text copied verbatim from the calendar. The prereq parser (separate ticket) handles converting those strings into structured AST nodes later.The output schema is the
Coursetype defined insrc/types/course.ts. Every field must be populated:code— e.g."COMP 1405"title— full course titlecredits— as a number (Carleton uses 0.5 for one-term courses)description— course description textprereq: null— always null, filled in by the parser laterprereqRaw— raw prerequisite string from the calendar, ornullif no prerequisite is listedprecludes— array of course codes listed as preclusions, or[]Start with COMP courses. SYSC, MATH, and STAT are stretch goals and can be covered later since CS students commonly take courses from those departments and they appear in some COMP prerequisite trees.
🛠️ Implementation Plan
Create a
scripts/folder at the project root. The scraper lives atscripts/scrape-courses.tsand is run with:Add this script to
package.json.Install
cheeriofor HTML parsing. Use Node's built-infetch(available in Node 22) for HTTP requests — do not add axios or node-fetch. This project has security settings inpnpm-workspace.yaml— if the cheerio install is blocked by a policy error, flag it to Jacc rather than working around it.Before writing any code, inspect the Carleton undergraduate calendar course listing pages in your browser. Understand the HTML structure — how courses, titles, credits, descriptions, prerequisites, and preclusions are marked up. Save a real HTML page from the calendar as a fixture file under
scripts/fixtures/to use in tests.Write the scraper to fetch the course listing page(s), parse the HTML with Cheerio, and extract the fields above for each course entry.
Write tests in
scripts/scrape-courses.test.tsthat run against the saved HTML fixture — not the live network. Verify that a known course (e.g. COMP 3004) is extracted correctly with the right code, title, credits, and prereqRaw string.The scraper should write its output to
scripts/output/courses-scraped.json, not directly tosrc/data/courses.json. The merge with the existing hand-crafted entries needs a human review step.✅ Acceptance Criteria
pnpm run scrape:coursesruns without errors and writesscripts/output/courses-scraped.jsonCoursetypeprereqisnullon every entryprereqRawcontains the raw prerequisite text from the calendar, ornullif noneprecludesis populated where the calendar lists preclusions, or[]pnpm typecheckpasses