Skip to content

feat: Riksdag calendar API fallback (primary MCP → riksdagen.se web scrape)#2041

Merged
pethers merged 12 commits intomainfrom
copilot/implement-riksdag-calendar-api-fallback
Apr 27, 2026
Merged

feat: Riksdag calendar API fallback (primary MCP → riksdagen.se web scrape)#2041
pethers merged 12 commits intomainfrom
copilot/implement-riksdag-calendar-api-fallback

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 27, 2026

Adds a resilient calendar-fetching utility (scripts/fetch-calendar.ts) to keep week-ahead/month-ahead workflows supplied with Riksdag agenda/vote scheduling data when the primary MCP tool intermittently returns HTML instead of JSON.

Changes Made

Core Implementation

  • MCP JSON-RPC primary fetch with retry logic and immediate HTML-error short-circuit to a riksdagen.se HTML-scrape fallback
  • Normalizes output to a consistent CalendarEvent[] shape with optional persistence to data/calendar/{from}_{dateTo}.json (range filename — collapses to data/calendar/{from}.json for single-day ranges) and a provenance manifest snippet formatter
  • Case-insensitive, broadened HTML-error detection (HTML_PREFIX_RE) to handle uppercase <HTML>, <HEAD>, and other leading tags
  • Article parser restricted to <article class="calendar-item"> elements only to prevent false positives from non-calendar articles
  • normalizeOrgCode() preserves canonical committee casing (e.g. FiU, JuU, FöU) via trim-only normalization
  • CLI entry guard uses path.resolve(fileURLToPath(import.meta.url)) pattern consistent with the rest of the repo

HTML Attribute Robustness

  • All HTML extraction helpers (liRe, extractDatetime, extractDataAttr, extractSpanText, extractHeadingAndLinks) accept both " and ' attribute quoting styles
  • decodeHtmlEntities() (from scripts/html-utils.ts) applied to summary, org, and akt in both parseCalendarArticle and parseCalendarListItem

Safety & Testability

  • escapeRegex() helper prevents metacharacter injection in extractDataAttr and extractSpanText
  • persistCalendarJson() accepts an optional injectable outputDir parameter (defaults to data/calendar/) to avoid writing into the repo during tests

CLI Hygiene

  • All progress/status logs go to stderr (console.error / console.warn); stdout is reserved for the JSON payload via process.stdout.write, keeping the CLI output machine-readable for piping
  • parseCalendarArgs accepts --tom as a Swedish alias for --to (matches existing repo docs in analysis/methodologies/ai-driven-analysis-guide.md); --to wins when both are provided
  • Markdown manifest from formatManifestMarkdown is now written to stderr in main, never stdout
  • New typed CliArgsError class — invalid CLI arguments exit with code 2 (per module header & repo convention in scripts/statskontoret-fetch.ts); other fatal errors exit with code 1

Testing

Comprehensive Vitest suite (tests/fetch-calendar.test.ts) with 65 tests covering:

  • MCP primary path: JSON-RPC parsing, HTML-error detection, retry behavior
  • Web fallback: HTML scraping patterns (Pattern A <article>, Pattern B <li>), entity decoding, single/double quote attribute handling
  • persistCalendarJson: directory creation, JSON schema/shape validation, return value, range-based {from}_{dateTo}.json filename for multi-day ranges
  • CLI helpers: parseCalendarArgs (including --tom alias and typed CliArgsError), formatManifestMarkdown
  • Zero live network calls

All tests pass, TypeScript compiles cleanly, CodeQL reports zero alerts.

@github-actions github-actions Bot added the size-xs Extra small change (< 10 lines) label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: size-xs

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…verage

Implements scripts/fetch-calendar.ts with:
- Primary path: get_calendar_events via riksdag-regering MCP (JSON-RPC 2.0)
- HTML-error detection: isHtmlErrorResponse() identifies non-JSON MCP responses
- Web fallback: scrapes riksdagen.se/sv/kalendarium/ with regex-based HTML parser
- Two HTML patterns: article-per-event and list-item-per-event
- Exponential back-off retry; HTML errors trigger immediate fallback (no retry)
- CalendarFetchManifest records path, eventCount, errors, fetchedAt
- persistCalendarJson() writes data/calendar/{date}.json
- Injectable fetchFn + sleepFn for complete test isolation
- CLI: tsx scripts/fetch-calendar.ts --from YYYY-MM-DD --to YYYY-MM-DD [--persist]

Tests (52 passing):
- Primary MCP path + retry + HTML-error fallback
- Web HTML parser (article + list-item patterns)
- Retry exhaustion on both paths → path=none
- normalizeMcpCalendarEvent, parseCalendarArgs, formatManifestMarkdown
- CalendarMcpError kind discrimination

Coverage: 88% statements, 75% branches, 74% functions (all thresholds met)

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/e84d7514-c5ef-4034-90b4-000e256579bc

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions Bot added testing Test coverage refactor Code refactoring size-xl Extra large change (> 1000 lines) labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Comment thread tests/fetch-calendar.test.ts Fixed
- Add delay cap (Math.min(..., 30_000)) to prevent unbounded retry delays
- Replace urlStr.includes('onrender.com') with URL.hostname comparison to fix
  js/incomplete-url-substring-sanitization CodeQL alert in tests
- Add test: normalizeMcpCalendarEvent omits dtend when absent from raw event
- 53 tests passing (added 1 new dtend-absent test)

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/e84d7514-c5ef-4034-90b4-000e256579bc

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement Riksdag calendar API fallback for resilience feat: Riksdag calendar API fallback (primary MCP → riksdagen.se web scrape) Apr 27, 2026
Copilot AI requested a review from pethers April 27, 2026 01:07
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers marked this pull request as ready for review April 27, 2026 08:05
Copilot AI review requested due to automatic review settings April 27, 2026 08:05
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a resilient calendar-fetching utility to keep week-ahead/month-ahead workflows supplied with Riksdag agenda/vote scheduling data when the primary MCP tool intermittently returns HTML instead of JSON.

Changes:

  • Introduces scripts/fetch-calendar.ts implementing MCP JSON-RPC primary fetch with retry + immediate HTML-error short-circuit to a riksdagen.se HTML-scrape fallback.
  • Normalizes output to a consistent CalendarEvent[] shape, with optional persistence to data/calendar/{from}.json and a provenance manifest snippet formatter.
  • Adds a comprehensive Vitest suite (tests/fetch-calendar.test.ts) covering primary/fallback behaviors, parsing, retries, and CLI helpers with zero live network calls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
scripts/fetch-calendar.ts Implements the calendar fetch resilience chain (MCP → web scrape), normalization, persistence, and CLI utilities.
tests/fetch-calendar.test.ts Adds unit coverage for MCP parsing/error handling, HTML scraping patterns, retry behavior, and CLI/manifest helpers.

Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +132 to +145
// HTML detection: any response whose first non-whitespace token is a tag.
const HTML_PREFIX_RE = /^\s*<!(?:DOCTYPE|doctype)|^\s*<html\b/;

// ---------------------------------------------------------------------------
// HTML detection
// ---------------------------------------------------------------------------

/**
* Returns true when `text` looks like an HTML document rather than JSON.
* Used to detect when the MCP endpoint returns an error page instead of JSON.
*/
export function isHtmlErrorResponse(text: string): boolean {
return HTML_PREFIX_RE.test(text);
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML_PREFIX_RE is case-sensitive and narrower than the comment suggests (it only matches <!DOCTYPE... or lowercase <html>). An HTML error page starting with uppercase <HTML> (or other leading tags like <head>) would not be detected and would instead fall through to JSON parsing/retry logic. Consider making the regex case-insensitive and/or broadening it to reliably detect HTML documents, or adjust the comment to match the actual detection behavior.

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +378 to +385
// Extract <article> blocks (Pattern A).
const articleRe = /<article\b([^>]*)>([\s\S]*?)<\/article>/gi;
for (const articleMatch of html.matchAll(articleRe)) {
const attrs = articleMatch[1] ?? '';
const body = articleMatch[2] ?? '';
const event = parseCalendarArticle(attrs, body);
if (event) events.push(event);
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseRiksdagKalendariumHtml matches every <article> block in the page (/<article\b.../), but the documented Pattern A is specifically <article class="calendar-item" ...>. If the kalendarium page contains non-calendar <article> elements with a <time datetime=...>, this parser can produce false events. Tighten the regex to only match the calendar item articles (e.g., require a calendar-item class) to avoid accidental matches.

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +538 to +540
/** Normalize committee/organ codes (upper-case, trim). */
function normalizeOrgCode(raw: string): string {
return raw.replace(/\s+/g, ' ').trim().toUpperCase();
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizeOrgCode() uppercases committee codes (FiUFIU), which makes web-fallback events diverge from MCP (organ is typically FiU) and breaks lookups keyed by canonical committee codes elsewhere (e.g. scripts/data-transformers/constants/committee-names.ts uses FiU, JuU, FöU, etc.). Consider preserving the original casing (trim only) or normalizing to the canonical committee-code casing instead of forcing all-uppercase.

Suggested change
/** Normalize committee/organ codes (upper-case, trim). */
function normalizeOrgCode(raw: string): string {
return raw.replace(/\s+/g, ' ').trim().toUpperCase();
/** Normalize committee/organ codes by collapsing whitespace and trimming only. */
function normalizeOrgCode(raw: string): string {
return raw.replace(/\s+/g, ' ').trim();

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +827 to +829
process.argv[1] &&
(process.argv[1].endsWith('/fetch-calendar.ts') ||
process.argv[1].endsWith('/fetch-calendar.js'))
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI-entry guard is implemented via process.argv[1].endsWith('/fetch-calendar.ts'), which is not cross-platform (Windows paths use \) and differs from the repo’s established pattern (e.g. scripts/catalog-downloaded-data.ts:290 uses path.resolve(fileURLToPath(import.meta.url)) === path.resolve(process.argv[1] ?? '')). Aligning with the existing path.resolve(fileURLToPath(import.meta.url)) check will make the entry detection more robust and consistent.

Suggested change
process.argv[1] &&
(process.argv[1].endsWith('/fetch-calendar.ts') ||
process.argv[1].endsWith('/fetch-calendar.js'))
path.resolve(fileURLToPath(import.meta.url)) === path.resolve(process.argv[1] ?? '')

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

- Broaden HTML detection to case-insensitive common HTML leading tags
- Restrict article parsing to calendar-item articles to avoid false events
- Preserve canonical organ casing for web fallback events
- Use path.resolve(fileURLToPath(import.meta.url)) CLI guard pattern
- Add tests for uppercase HTML, head fragments, non-calendar articles, and casing

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/db694a2e-6c35-47ed-bff5-c8c1cbeec028

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

- Use explicit HTML tag lookahead detection instead of word-boundary checks
- Filter article blocks with a two-pass calendar-item class-token check
- Support single-quoted class attributes in calendar article detection
- Add tests for self-closing meta fragments and single-quoted calendar-item articles

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/db694a2e-6c35-47ed-bff5-c8c1cbeec028

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

Implemented the requested review-thread changes in 602df88: broadened HTML detection, restricted article parsing to calendar-item, preserved canonical organ casing, aligned the CLI guard with the repo pattern, and added targeted tests. Validation: npx vitest run tests/fetch-calendar.test.ts (58 passed), ESLint, and TypeScript type-check passed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +495 to +497
const re = new RegExp(`<span\\b[^>]*class="[^"]*${name}[^"]*"[^>]*>([\\s\\S]*?)<\\/span>`, 'i');
const m = html.match(re);
return m ? stripTags(m[1] ?? '').trim() || null : null;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractSpanText() assumes class="..." and won’t match spans with single-quoted attributes (class='calendar-list__organ'). To make the fallback scraper robust, update the regex to accept both ' and " around the class attribute value.

Suggested change
const re = new RegExp(`<span\\b[^>]*class="[^"]*${name}[^"]*"[^>]*>([\\s\\S]*?)<\\/span>`, 'i');
const m = html.match(re);
return m ? stripTags(m[1] ?? '').trim() || null : null;
const re = new RegExp(`<span\\b[^>]*\\bclass\\s*=\\s*(["'])[^"']*${name}[^"']*\\1[^>]*>([\\s\\S]*?)<\\/span>`, 'i');
const m = html.match(re);
return m ? stripTags(m[2] ?? '').trim() || null : null;

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts
Comment on lines +510 to +517
// Collect document reference links (/sv/dokument-och-lagar/… or /dokument/…).
const docRefs: string[] = [];
const hrefRe = /<a\b[^>]*\bhref="([^"]+)"[^>]*>/gi;
for (const m of html.matchAll(hrefRe)) {
const href = (m[1] ?? '').trim();
if (isRiksdagDocumentHref(href)) {
docRefs.push(href);
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractHeadingAndLinks() only matches href="...". If links are rendered as href='...', doc_refs will be empty and the normalized output loses document provenance. Consider matching both quote styles (e.g., href=["']([^"']+)["']).

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts
Comment on lines +428 to +433
dtstart,
org: normalizeOrgCode(org),
akt: normalizeAkt(akt),
summary: stripTags(summary).trim(),
doc_refs: docRefs,
source: 'web-fallback',
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback HTML path strips tags but does not decode HTML entities (e.g. &amp;, &#228;), so summary (and possibly org/akt) can be persisted with entity-encoded text. The repo already centralizes this via decodeHtmlEntities() in scripts/html-utils.ts; consider decoding extracted text before returning events.

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +734 to +744
export function persistCalendarJson(from: string, result: CalendarFetchResult): string {
fs.mkdirSync(CALENDAR_DIR, { recursive: true });
const outputPath = path.join(CALENDAR_DIR, `${from}.json`);
const payload = {
schema: 'riksdagsmonitor-calendar/1.0',
manifest: result.manifest,
events: result.events,
};
fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2), 'utf8');
console.log(` 💾 [fetch-calendar] Persisted ${result.events.length} events → ${outputPath}`);
return outputPath;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persistCalendarJson() introduces filesystem-writing behavior but is currently untested (no unit tests exercise it). Given the repo’s existing coverage for persistence logic, add a focused test that verifies directory creation + JSON schema/shape, ideally by making the output directory injectable (or by isolating the path computation) to avoid writing into the real repo during tests.

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +390 to +393
const liRe = /<li\b([^>]*class="[^"]*calendar[^"]*"[^>]*)>([\s\S]*?)<\/li>/gi;
for (const liMatch of html.matchAll(liRe)) {
const attrs = liMatch[1] ?? '';
const body = liMatch[2] ?? '';
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseRiksdagKalendariumHtml's list-item regex only matches class="..." attributes. riksdagen.se markup (and this module’s own article-class parsing) may use single quotes, which would cause list-item events to be missed entirely. Consider updating liRe to accept both single and double quotes for the class attribute (e.g., class=["']...).

Suggested change
const liRe = /<li\b([^>]*class="[^"]*calendar[^"]*"[^>]*)>([\s\S]*?)<\/li>/gi;
for (const liMatch of html.matchAll(liRe)) {
const attrs = liMatch[1] ?? '';
const body = liMatch[2] ?? '';
const liRe = /<li\b([^>]*class=(["'])[^"']*calendar[^"']*\2[^>]*)>([\s\S]*?)<\/li>/gi;
for (const liMatch of html.matchAll(liRe)) {
const attrs = liMatch[1] ?? '';
const body = liMatch[3] ?? '';

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +473 to +474
const m = html.match(/<time\b[^>]*\bdatetime="([^"]+)"/i);
return m ? (m[1] ?? null) : null;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractDatetime() only matches datetime="..." and will return null if the <time> element uses single quotes (datetime='...'). That would drop otherwise valid events during fallback parsing; update the regex to accept both quote styles.

Suggested change
const m = html.match(/<time\b[^>]*\bdatetime="([^"]+)"/i);
return m ? (m[1] ?? null) : null;
const m = html.match(/<time\b[^>]*\bdatetime=(["'])(.*?)\1/i);
return m ? (m[2] ?? null) : null;

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +479 to +481
const re = new RegExp(`\\bdata-${name}="([^"]*)"`, 'i');
const m = attrs.match(re);
return m && m[1]?.trim() ? m[1].trim() : null;
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractDataAttr() only supports data-*= "..." values. If riksdagen.se uses single-quoted attributes (data-organ='FiU'), organ/type extraction will fail and produce empty org/akt. Consider matching both quote styles (similar to HREF_RE in scripts/statskontoret-client.ts).

Suggested change
const re = new RegExp(`\\bdata-${name}="([^"]*)"`, 'i');
const m = attrs.match(re);
return m && m[1]?.trim() ? m[1].trim() : null;
const re = new RegExp(`\\bdata-${name}\\s*=\\s*(["'])(.*?)\\1`, 'i');
const m = attrs.match(re);
return m && m[2]?.trim() ? m[2].trim() : null;

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot AI and others added 2 commits April 27, 2026 12:03
…ng, injectable persist dir, persist tests

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/e2c0e6bd-7c8e-45cd-be30-579cad326206

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…Attr/extractSpanText

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/e2c0e6bd-7c8e-45cd-be30-579cad326206

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

Applied all changes from review thread 4180180259 in commits 6e2bdee and 2c8ad28:

  • Single-quote robustnessliRe, extractDatetime(), extractDataAttr(), extractSpanText(), and extractHeadingAndLinks() all now accept both " and ' HTML attribute quoting styles
  • HTML entity decodingdecodeHtmlEntities() (from scripts/html-utils.ts) now applied to summary, org, and akt in both parseCalendarArticle and parseCalendarListItem
  • Injectable persist dirpersistCalendarJson() now takes an optional outputDir parameter (defaults to CALENDAR_DIR) so tests never write into the real repo
  • Persist tests — 3 new persistCalendarJson unit tests verify directory creation, correct JSON schema/shape, and return value
  • Regex injection safety — Added escapeRegex() helper and applied it in extractDataAttr() and extractSpanText() to prevent metacharacter injection via the name parameter

All 61 tests pass, TypeScript and CodeQL clean.

@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

…sdag-calendar-api-fallback

# Conflicts:
#	scripts/fetch-calendar.ts
#	tests/fetch-calendar.test.ts

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Resolved in 0f24065. Both files had add/add conflicts (both branches created them independently). Kept our branch version since it has all the reviewed improvements, then merged in the new files that came from main (fetch-voting-records, fetch-statskontoret, etc.). All 61 tests pass and TypeScript is clean.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers requested a review from Copilot April 27, 2026 12:29
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +667 to +671
try {
events = await deps.fetchViaWeb(args);
log(`fetch-calendar: web_fallback returned ${events.length} event(s)`);
} catch (webErr) {
log(`fetch-calendar: web_fallback also failed (${String(webErr)}), returning empty`);
events = [];
console.log(` 🔄 [fetch-calendar] MCP primary attempt ${attempt + 1}/${maxRetries + 1}…`);
const raw = await callMcpCalendarEvents(from, to, resolved);
const events = raw.map(normalizeMcpCalendarEvent);
console.log(` ✅ [fetch-calendar] MCP primary succeeded — ${events.length} events`);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchCalendarWithFallback() writes progress output using console.log(...) (stdout). Since the CLI also writes JSON to stdout for piping, these log lines will corrupt the JSON stream and make it non-machine-readable. Consider routing all progress logs to stderr (e.g., console.error/console.warn), or adding an injectable logger / quiet option so library usage stays side-effect free on stdout.

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
events: result.events,
};
fs.writeFileSync(outputPath, JSON.stringify(payload, null, 2), 'utf8');
console.log(` 💾 [fetch-calendar] Persisted ${result.events.length} events → ${outputPath}`);
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persistCalendarJson() logs the persisted path via console.log(...) (stdout). This will also pollute stdout for callers that expect to consume JSON output from the CLI. Prefer writing status messages to stderr, or make logging optional/injectable.

Suggested change
console.log(` 💾 [fetch-calendar] Persisted ${result.events.length} events → ${outputPath}`);
console.error(` 💾 [fetch-calendar] Persisted ${result.events.length} events → ${outputPath}`);

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts
Comment on lines +750 to +752
fs.mkdirSync(outputDir, { recursive: true });
const outputPath = path.join(outputDir, `${from}.json`);
const payload = {
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persistCalendarJson() always writes to data/calendar/{from}.json. If the script is run multiple times for the same from date but different to ranges (e.g. week-ahead and month-ahead starting on the same day), later runs will overwrite earlier outputs. Consider including to in the filename (e.g. {from}_{to}.json) or persisting under a range-specific subdirectory to avoid accidental data loss.

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts
Comment on lines +814 to +823
const ISO_DATE_RE = /^\d{4}-\d{2}-\d{2}$/;
const from = flags.get('from') ?? '';
const to = flags.get('to') ?? '';
if (!ISO_DATE_RE.test(from)) {
throw new Error(`--from must be an ISO 8601 date (YYYY-MM-DD), got: "${from}"`);
}
if (!ISO_DATE_RE.test(to)) {
throw new Error(`--to must be an ISO 8601 date (YYYY-MM-DD), got: "${to}"`);
}
return { from, to, persist: booleans.has('persist') };
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseCalendarArgs() only accepts --to, but existing repo documentation uses --tom for the end date (e.g. analysis/methodologies/ai-driven-analysis-guide.md:112-115). To avoid breaking documented usage and any existing automation, consider supporting --tom as an alias for --to (or accepting both and erroring only if neither is provided).

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts Outdated
Comment on lines +828 to +838
console.log(`📅 [fetch-calendar] Fetching ${args.from} → ${args.to}`);

const result = await fetchCalendarWithFallback(args.from, args.to);

console.log(formatManifestMarkdown(result.manifest));

if (args.persist) {
persistCalendarJson(args.from, result);
} else {
// Print JSON to stdout for piping / agentic workflow consumption.
console.log(JSON.stringify(result, null, 2));
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI always prints formatManifestMarkdown(...) to stdout before emitting JSON (when --persist is not set). This makes stdout contain Markdown + JSON and breaks consumers that expect pure JSON (similar scripts use process.stdout.write(JSON) and send logs/errors to stderr). Consider writing the manifest to stderr, gating it behind a flag, or emitting it as a JSON field only.

Suggested change
console.log(`📅 [fetch-calendar] Fetching ${args.from}${args.to}`);
const result = await fetchCalendarWithFallback(args.from, args.to);
console.log(formatManifestMarkdown(result.manifest));
if (args.persist) {
persistCalendarJson(args.from, result);
} else {
// Print JSON to stdout for piping / agentic workflow consumption.
console.log(JSON.stringify(result, null, 2));
console.error(`📅 [fetch-calendar] Fetching ${args.from}${args.to}`);
const result = await fetchCalendarWithFallback(args.from, args.to);
console.error(formatManifestMarkdown(result.manifest));
if (args.persist) {
persistCalendarJson(args.from, result);
} else {
// Print JSON to stdout for piping / agentic workflow consumption.
process.stdout.write(`${JSON.stringify(result, null, 2)}\n`);

Copilot uses AI. Check for mistakes.
Comment thread scripts/fetch-calendar.ts
Comment on lines +846 to 851
// Guard: run `main()` only when this file is the direct entry point.
if (path.resolve(fileURLToPath(import.meta.url)) === path.resolve(process.argv[1] ?? '')) {
main().catch((err: unknown) => {
process.stderr.write(`fetch-calendar: fatal error: ${String(err)}\n`);
console.error('❌ [fetch-calendar] Fatal error:', err instanceof Error ? err.message : err);
process.exit(1);
});
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main().catch(...) always exits with code 1, even for CLI argument errors thrown by parseCalendarArgs(). Other CLI scripts in this repo use exit code 2 for bad arguments (e.g. scripts/statskontoret-fetch.ts:169-174). Consider introducing a small typed CLI error (or returning a parse result) so invalid flags produce exit code 2 as documented in this module header.

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Copy Markdown
Member

pethers commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot apply changes based on the comments in this thread

Applied all 6 changes from review thread 4180927536 in a5e395f:

  • stdout/stderr discipline — all progress/status logs in fetchCalendarWithFallback, persistCalendarJson, and main now go to console.error/stderr; stdout is reserved for the JSON payload via process.stdout.write
  • Range-aware filenamespersistCalendarJson now writes {from}_{dateTo}.json when the range spans multiple days, falling back to {from}.json for single-day ranges (preserves existing test expectations)
  • --tom aliasparseCalendarArgs accepts both --to and --tom (Swedish, matches analysis/methodologies/ai-driven-analysis-guide.md); --to wins when both are provided
  • Manifest to stderrmain writes the markdown manifest to stderr, leaving stdout machine-readable
  • Typed CliArgsError — new exported error class; main().catch now exits with code 2 for CliArgsError and 1 for other failures (matches module header & repo convention)

Tests: 65 passing (added 4 new tests for --tom alias, CliArgsError typing, and range filename). TypeScript and ESLint clean.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers merged commit 2c7719a into main Apr 27, 2026
17 checks passed
@pethers pethers deleted the copilot/implement-riksdag-calendar-api-fallback branch April 27, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Code refactoring size-xl Extra large change (> 1000 lines) size-xs Extra small change (< 10 lines) testing Test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Methodology] Implement Riksdag calendar API fallback (web_fetch riksdagen.se/sv/kalendarium)

4 participants