feat: data lineage view — "where did this number come from"

## Context

From AASCU Intermediary feedback session (see `docs/aascu_intermediary_feedback_summary.md`, pain points A + D):

> "The dashboard will sometimes pull from the wrong set... institutions could be missing, campuses that have submitted data may not be actually being listed."
>
> "Full transparency on what that means, what the outputs look like, but then also where that data is being stored. So, yeah, like, full data governance, full data lineage."

PDP loses institutional trust when numbers can't be traced back to source rows. Andres explicitly flagged data being incorrectly processed at submission time. The strongest competitive differentiator we can build is **provable lineage** — click any number, see exactly which uploaded rows produced it.

## Goal

Every aggregate number in the dashboard is traceable to (a) the source rows, (b) the upload event that introduced them, (c) any transformations applied, and (d) timestamps for each step.

## Scope

- Click any KPI / chart bar / table cell → opens a lineage drawer
- Drawer shows:
  - Source row IDs (paginated list)
  - Upload event ID + timestamp + uploader
  - Transformation chain (e.g., \"filtered to cohort=2022\", \"aggregated mean GPA\")
  - SQL or pipeline step IDs that produced the value
- New table or columns to track upload-event provenance per row
- Read-only API endpoint: `GET /api/lineage?metric=<id>&filters=<...>`

## Out of scope

- Editing or deleting rows from the lineage view (separate audit/correction flow)
- Lineage for ML model predictions (separate issue — partially covered by SHAP narrator #97)

## Acceptance criteria

- [ ] Lineage drawer reachable from at least 3 representative metrics (KPI, chart, table)
- [ ] Each lineage record links upload event → row → transformation → output
- [ ] Lineage queries return in < 1s on a 4K-student dataset
- [ ] All lineage views respect existing RBAC (FERPA-compliant) — see #75

## Why this is P0

This is the single highest-leverage gap from the intermediary session. It directly answers (a) the data-trust complaint, (b) the AI-governance requirement, and (c) the differentiator-vs-PDP question in one feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: data lineage view — "where did this number come from" #107

Context

Goal

Scope

Out of scope

Acceptance criteria

Why this is P0

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: data lineage view — "where did this number come from" #107

Description

Context

Goal

Scope

Out of scope

Acceptance criteria

Why this is P0

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions