Engineer with a research-science background — a Purdue PhD followed by graduate CS work at UIUC — and a multi-year track record of shipping ML and engineering systems in industry. The repos linked here put that experience to a different use: distilling the canonical machine-learning curriculum into a series of small, well-instrumented reference implementations that other people can read, run, and modify.
Each one picks a single canonical topic, strips it down to its essential moving parts, and explains it through code and visualizations rather than equations and prose.
🔭 Currently — turning the major branches of ML into reference projects, one repo per topic
🛠️ Background — research science · applied ML engineering · production systems · technical mentoring
⚡ Approach — synthetic data with known ground truth · code that reads like an explanation · dashboards designed to be scanned in 30 seconds
💬 Useful for — anyone who wants a particular ML concept implemented end-to-end, without the usual benchmark-fetishism
These projects are deliberately small. The goal isn't state-of-the-art numbers — it's to make every step of a working ML pipeline visible, modifiable, and teachable. If they help someone go from "I've read the paper" to "I can build it from scratch," they've done their job.
A reference set covering the major branches of machine learning — one project per topic, each one self-contained and built to the same recipe so the boilerplate is invisible and the content is what stands out:
- A synthetic data generator with a known generative process — so models can be evaluated against the truth, not just a holdout score
- A from-scratch implementation in PyTorch or scikit-learn — minimal dependencies, readable top-to-bottom
- A dashboard-style README with embedded charts in a unified palette
- A "What I learned" reflection at the end — not a metrics dump
| Topic | Demonstrated skills |
|---|---|
| Supervised — regression | OLS · Ridge · Lasso · coefficient-recovery diagnostics |
| Supervised — classification | Logistic · Decision Tree · Random Forest · Gradient Boosting |
| Unsupervised — clustering | K-means · DBSCAN · Agglomerative · ARI vs Silhouette |
| Unsupervised — dim reduction | PCA · t-SNE · trustworthiness · scree plots |
| Deep learning — vision | CNN from scratch · PyTorch · synthetic image rendering |
| Deep learning — sequence | LSTM · time-series forecasting · seasonal-naive baselines |
| Deep learning — NLP | Transformer encoder from scratch · attention visualization |
| Modern AI — LLM | RAG · vector retrieval · refusal-threshold tuning · hallucination measurement |
| Production / MLOps | FastAPI · Docker · PSI / KS drift monitoring · latency probing |
| Reinforcement learning | DQN · replay buffer · target network · ε-greedy schedule |
🎮 Live · Algorithm Playground
Interactive CS algorithm visualizations — sorting, pathfinding, graphs, trees, dynamic programming, and more. Pure HTML / CSS / JS, no build tooling, runs entirely in the browser.
The complement to the ML portfolio above: where those repos demonstrate model-building from scratch, this site shows algorithmic thinking from scratch. Same blue / red / gray palette, same "boring code, sharp insights" philosophy. Currently 1/10 algorithms live (sorting), 9 in the queue.
A short tour of the older public repos on this profile — they pre-date the current ML focus and span device-physics simulation, deep-learning research, full-stack web services, a desktop application, and IoT / robotics. Together they show the breadth of programming work behind the ML reference set above.
Numerical simulations of canonical device structures: process flow, electrostatic field, carrier concentration, and IV-curve sweeps — rendered for each device type.
Skills demonstrated: semiconductor device physics · process / device simulation · electrostatic & carrier-transport solvers · IV-curve interpretation
- Heart-failure risk prediction (DG-RNN on MIMIC-III) — Domain-Knowledge-Guided Recurrent Neural Network with knowledge-graph features, comparing against standard EHR risk-prediction models. PyTorch + PyHealth. Coursework for Deep Learning for Healthcare (UIUC CS 598).
Skills demonstrated: PyTorch · RNN / GRU on irregular time-stamped sequences · knowledge-graph integration · PyHealth · clinical EHR data handling
- RESTful API from scratch — Express + MongoDB; full GET / PUT / PATCH / DELETE article CRUD; tested via Postman.
- Online to-do list service — Node.js + MongoDB on Heroku, with per-user collections and weather / location enrichment.
- Newsletter sign-up service — Node.js + MailChimp on Heroku.
Skills demonstrated: Node.js · Express · REST API design · MongoDB · third-party API integration · cloud deployment
- Web Browser — three-tier C# desktop app — Object-oriented event-driven browser with bookmark / history managers backed by SQL. Built incrementally from a single button to a full multi-tab application.
Skills demonstrated: OOP · C# / WinForms · event-driven UI · multi-tier architecture · SQL persistence
- Self-driving car — environment scanning + autonomous driving — Lab for IoT Systems (UIUC CS 437). Mapping the local environment and driving around obstacles.
Skills demonstrated: IoT pipelines · sensor data processing · simple autonomous control loops
Comfortable with
Python · PyTorch · scikit-learn · pandas · NumPy · matplotlib · FastAPI · Docker · Git
Working knowledge
SQL · Bash · JavaScript / TypeScript · Hugging Face transformers · Linux
"Synthetic data first. Dashboards over benchmarks. Boring code, sharp insights. Always end with what was learned."
- Synthetic data first. When the generative process is known, models can be evaluated against the truth, not just a holdout number. Coefficient recovery, ground-truth ARI, theoretical noise floors — diagnostics that benchmark datasets cannot offer.
- Dashboards over benchmarks. Every project ends with figures a reader can scan in 30 seconds, not a single F1 score buried in a table.
- Boring code, sharp insights. Code prioritizes clarity over cleverness. The interesting part lives in the analysis and visualizations, not in the lines themselves.
- Reflection beats reporting. Metrics describe what happened. "What I learned" sections describe what I would do differently next time — and that's the part worth keeping six months later.
All nine cards are self-generated by scripts/generate_cards.py on a daily GitHub Action (workflow) that calls the GitHub GraphQL API and renders SVGs in the portfolio palette. No third-party stats service in the loop — no DEPLOYMENT_PAUSED outages, no profile data sent to anyone else's server, and any colour or layout can be changed by editing one Python file.
#3B6EA8 blue · #C04040 red · #7A7A7A gray · #E5E5E5 light gray · #FFFFFF white · the same one used across every project.
