PostgreSQL schemas for the Trender analytics platform with a 3-layer data pipeline architecture.
The database implements a medallion architecture (Raw → Staging → Analytics) with dimensional modeling for efficient querying and reporting.
Raw Layer (JSONB storage)
↓ Validation & Enrichment
Staging Layer (Cleaned data)
↓ Dimensional Modeling
Analytics Layer (Facts + Dimensions)
↓ Pre-aggregated Views
Dashboard Queries
| File | Purpose |
|---|---|
init.sql |
Master initialization script (imports all schema files) |
schema/01_raw_layer.sql |
Raw data storage (GitHub API responses) |
schema/02_staging_layer.sql |
Validated & cleaned data |
schema/03_analytics_layer.sql |
Dimensional model (facts + dimensions) |
schema/04_views.sql |
Pre-aggregated analytics views |
schema/05_workflow_traces.sql |
Workflow execution tracking |
Purpose: Audit trail and reprocessing capability
Tables:
Stores complete GitHub API responses in JSONB format.
CREATE TABLE raw_github_repos (
id BIGSERIAL PRIMARY KEY,
repo_full_name VARCHAR(255) NOT NULL,
api_response JSONB NOT NULL,
fetch_timestamp TIMESTAMPTZ DEFAULT NOW(),
source_language VARCHAR(50) -- 'Python', 'TypeScript', 'Go', 'render'
);Use case: Reprocess data without re-fetching from GitHub
Purpose: ETL audit trail and data quality validation
Tables:
Cleaned and validated repository data ready for analytics.
CREATE TABLE stg_repos_validated (
repo_full_name VARCHAR(255) PRIMARY KEY,
repo_url VARCHAR(500),
language VARCHAR(50) NOT NULL,
description TEXT,
stars INTEGER DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL,
updated_at TIMESTAMPTZ NOT NULL,
readme_content TEXT,
loaded_at TIMESTAMPTZ DEFAULT NOW()
);Business rules applied:
- Non-null language
- Valid timestamps
- Star count ≥ 0
Purpose: Dimensional model for high-performance analytics
Dimension tables:
Repository master data with SCD Type 2 history.
CREATE TABLE dim_repositories (
repo_key BIGSERIAL PRIMARY KEY,
repo_full_name VARCHAR(255) NOT NULL,
repo_url VARCHAR(500),
description TEXT,
readme_content TEXT,
language VARCHAR(50),
created_at TIMESTAMPTZ,
render_category VARCHAR(50),
valid_from TIMESTAMPTZ DEFAULT NOW(),
valid_to TIMESTAMPTZ,
is_current BOOLEAN DEFAULT TRUE
);SCD Type 2: Tracks changes over time (description, README updates)
Language reference data.
CREATE TABLE dim_languages (
language_key BIGSERIAL PRIMARY KEY,
language_name VARCHAR(50) UNIQUE NOT NULL,
language_category VARCHAR(50),
display_name VARCHAR(100)
);Pre-populated values:
- Python
- TypeScript
- Go
- render (for Render ecosystem repos)
Render service type reference data.
CREATE TABLE dim_render_services (
service_key BIGSERIAL PRIMARY KEY,
service_type VARCHAR(50) UNIQUE NOT NULL,
service_category VARCHAR(50),
description TEXT
);Pre-populated values:
- web, worker, cron, background-worker, private-service, static-site, postgres, redis
Fact tables:
Daily snapshots of repository metrics and momentum scores.
CREATE TABLE fact_repo_snapshots (
snapshot_key BIGSERIAL PRIMARY KEY,
repo_key BIGINT REFERENCES dim_repositories(repo_key),
language_key BIGINT REFERENCES dim_languages(language_key),
snapshot_date DATE NOT NULL,
stars INTEGER,
star_velocity INTEGER,
activity_score INTEGER,
momentum_score NUMERIC(5, 3),
rank_overall INTEGER,
rank_in_language INTEGER
);Unique constraint: (repo_key, snapshot_date)
Workflow execution traces for observability.
CREATE TABLE fact_workflow_runs (
run_id VARCHAR(50) PRIMARY KEY,
started_at TIMESTAMPTZ NOT NULL,
completed_at TIMESTAMPTZ,
status VARCHAR(20),
task_tree JSONB,
error_message TEXT,
repos_processed INTEGER,
execution_time_seconds NUMERIC(10, 2)
);Pre-aggregated views for dashboard queries.
Top trending repos across all languages.
SELECT
dr.repo_full_name,
dr.language,
frs.stars,
frs.momentum_score,
dr.description,
dr.readme_content,
dr.created_at
FROM fact_repo_snapshots frs
JOIN dim_repositories dr ON frs.repo_key = dr.repo_key
WHERE frs.snapshot_date = CURRENT_DATE
ORDER BY frs.momentum_score DESC
LIMIT 100;Render ecosystem projects.
SELECT
dr.repo_full_name,
dr.render_category,
frs.stars,
frs.momentum_score,
frs.star_velocity,
frs.activity_score
FROM dim_repositories dr
JOIN fact_repo_snapshots frs ON dr.repo_key = frs.repo_key
WHERE dr.language = 'render'
AND frs.snapshot_date = CURRENT_DATE
ORDER BY frs.momentum_score DESC;Per-language leaderboards with Render adoption stats.
SELECT
dl.language_name,
dr.repo_full_name,
frs.stars,
frs.momentum_score,
frs.rank_in_language,
(dr.language = 'render') as uses_render
FROM fact_repo_snapshots frs
JOIN dim_repositories dr ON frs.repo_key = dr.repo_key
JOIN dim_languages dl ON frs.language_key = dl.language_key
WHERE frs.snapshot_date = CURRENT_DATE
ORDER BY dl.language_name, frs.rank_in_language;./bin/db_setup.shcd database
psql $DATABASE_URL -f init.sqlcd database
psql $DATABASE_URL -f schema/01_raw_layer.sql
psql $DATABASE_URL -f schema/02_staging_layer.sql
psql $DATABASE_URL -f schema/03_analytics_layer.sql
psql $DATABASE_URL -f schema/04_views.sql
psql $DATABASE_URL -f schema/05_workflow_traces.sqlpsql $DATABASE_URL -c "\dt"
psql $DATABASE_URL -c "\dv"Expected:
- 9 tables
- 6+ views
Applies tiered retention policy (run automatically after each workflow).
Retention windows:
- Raw layer: 7 days
- Staging layer: 7 days
- Analytics layer: 30 days
Manual run:
psql $DATABASE_URL -f database/data_retention_cleanup.sqlMigration script to remove old workflow execution tables (if upgrading).
psql $DATABASE_URL -f database/cleanup_workflow_tracking.sqlDESTRUCTIVE: Truncates all tables (for testing only).
psql $DATABASE_URL -f database/clear_data.sqlDiagnostic query to verify data loading.
psql $DATABASE_URL -f database/check_analytics_load.sqlChecks:
- Row counts per table
- Latest snapshot dates
- Repository distribution by language
Get row counts for all tables.
psql $DATABASE_URL -f database/row_counts.sqlCount repos per language.
psql $DATABASE_URL -f database/repos_by_language.sqlExample filtering query (customize as needed).
psql $DATABASE_URL -f database/filter_vercel_repos.sqlKey indexes for performance:
-- Dimension lookups
CREATE INDEX idx_dim_repos_fullname ON dim_repositories(repo_full_name) WHERE is_current = TRUE;
CREATE INDEX idx_dim_languages_name ON dim_languages(language_name);
-- Fact table queries
CREATE INDEX idx_fact_snapshots_date ON fact_repo_snapshots(snapshot_date);
CREATE INDEX idx_fact_snapshots_momentum ON fact_repo_snapshots(momentum_score DESC);
-- Workflow traces
CREATE INDEX idx_workflow_runs_started ON fact_workflow_runs(started_at DESC);GitHub API → workflows/workflow.py
↓ (store_raw_repos)
raw_github_repos
↓ (store_in_staging)
stg_repos_validated
↓ (load_to_analytics_simple)
dim_repositories
fact_repo_snapshots
↓ (views)
analytics_trending_repos_current
analytics_render_showcase
↓ (queries)
dashboard/lib/db.ts
↓
Next.js components
- Add new schema files to
schema/ - Update
init.sqlto import new files - Document tables in this README
- Test with
./bin/db_setup.sh - Add indexes for frequently queried columns