Context
From AASCU Intermediary feedback session (see docs/aascu_intermediary_feedback_summary.md, pain points A + D):
"The dashboard will sometimes pull from the wrong set... institutions could be missing, campuses that have submitted data may not be actually being listed."
"Full transparency on what that means, what the outputs look like, but then also where that data is being stored. So, yeah, like, full data governance, full data lineage."
PDP loses institutional trust when numbers can't be traced back to source rows. Andres explicitly flagged data being incorrectly processed at submission time. The strongest competitive differentiator we can build is provable lineage — click any number, see exactly which uploaded rows produced it.
Goal
Every aggregate number in the dashboard is traceable to (a) the source rows, (b) the upload event that introduced them, (c) any transformations applied, and (d) timestamps for each step.
Scope
- Click any KPI / chart bar / table cell → opens a lineage drawer
- Drawer shows:
- Source row IDs (paginated list)
- Upload event ID + timestamp + uploader
- Transformation chain (e.g., "filtered to cohort=2022", "aggregated mean GPA")
- SQL or pipeline step IDs that produced the value
- New table or columns to track upload-event provenance per row
- Read-only API endpoint:
GET /api/lineage?metric=<id>&filters=<...>
Out of scope
Acceptance criteria
Why this is P0
This is the single highest-leverage gap from the intermediary session. It directly answers (a) the data-trust complaint, (b) the AI-governance requirement, and (c) the differentiator-vs-PDP question in one feature.
Context
From AASCU Intermediary feedback session (see
docs/aascu_intermediary_feedback_summary.md, pain points A + D):PDP loses institutional trust when numbers can't be traced back to source rows. Andres explicitly flagged data being incorrectly processed at submission time. The strongest competitive differentiator we can build is provable lineage — click any number, see exactly which uploaded rows produced it.
Goal
Every aggregate number in the dashboard is traceable to (a) the source rows, (b) the upload event that introduced them, (c) any transformations applied, and (d) timestamps for each step.
Scope
GET /api/lineage?metric=<id>&filters=<...>Out of scope
Acceptance criteria
Why this is P0
This is the single highest-leverage gap from the intermediary session. It directly answers (a) the data-trust complaint, (b) the AI-governance requirement, and (c) the differentiator-vs-PDP question in one feature.