Machine learning for biology, health, and decision-making under uncertainty — from algorithms built from scratch to deep learning, LLMs, and provable guarantees.
🎓 MSc @ ETH Zürich · 🔬 ML × computational biology · 📍 Zürich / Basel
I build end-to-end machine-learning pipelines and care about getting the whole pipeline right — leakage-aware validation, honest metrics, and reproducibility — not just the model. My work spans applied ML in biology and healthcare, probabilistic / trustworthy ML, and a few data-science hackathons. I'm equally comfortable implementing a method from first principles in NumPy and wiring up PyTorch, BoTorch, or an LLM-backed RAG system.
- 🧬 Domains: protein structure & design, genomics, clinical & mobile-health data, molecular simulation
- 🤖 Methods: deep learning, foundation-model fine-tuning, LLMs & RAG, Gaussian processes & Bayesian optimization, formal verification
- 🧪 Principle: cross-validation over leaderboard-chasing; I report what actually reproduces
| Project | What it is | Highlights |
|---|---|---|
| OpenFold3 — PDE10A Domain Adaptation | Fine-tuning the OpenFold3 structure-prediction model for PDE10A protein–ligand pose prediction via distribution-aware, PDB-scale data augmentation | +0.20 PL LDDT and −2.3 Å ligand RMSD on a held-out set · foundation-model fine-tuning |
| GP-BayesOpt for Antibody Design | Gaussian-process surrogate + multi-task Bayesian optimization over ESM-2 protein embeddings | GP predicts developability/specificity at R² ≈ 0.86 / 0.97 on real lab data; closed-loop BO beats random search (2.26 vs 1.45) · BoTorch · GPyTorch |
| DeepPoly Robustness Verifier | Sound neural-network verifier that proves L∞ robustness via convex relaxation + learnable ReLU bounds | 69/70 test cases correct, 0 unsound certifications, across 13 networks · PyTorch (autograd) |
| LLM Post-Training Recipe Lab | An autonomous, git-ratcheted loop that searches SFT/DPO post-training recipes for small instruct models | Reproducible evals and honest negative results · SFT · DPO |
| Custom RAG Challenge (NaNsense) | Retrieval-augmented generation that grounds GPT-4o answers in a scraped-web corpus, with source attribution | Cleaning pipeline cuts the corpus ~9× (17 GB → 1.9 GB); live Gradio demo · LangChain · Chroma · GPT-4o |
| Synthetic Market Sharpe | Cross-sectional position sizing for a simulated market, optimizing the Sharpe ratio | Packaged pipeline: feature builders, Bayesian/gradient-boosted regressors, and an attention-BiLSTM trained on a differentiable Sharpe loss · session-level CV |
Machine learning & deep learning
- Machine Learning for Genomics — gene expression from chromatin signal (Spearman ρ ≈ 0.56) + single-cell clustering & bulk RNA-seq deconvolution
- Clinical ML Interpretability & Tweet NLP — explainable clinical classification (SHAP / Grad-CAM, sanity-checked) + fine-tuned BERT (F1 0.73)
- ml-regression-to-deep-learning — four ML tasks from regularized regression to deep metric & transfer learning (all cleared the course's hard baseline)
- Classical → Deep Computer Vision — five CV projects from PCA/graph-cuts to CNNs and ResNet transfer learning, built from primitives
- Probabilistic AI — Gaussian processes, Bayesian deep learning (SWAG), constrained BO, and DDPG reinforcement learning
Computational biology & from-scratch algorithms
- molecular-phylogenetics-r — alignment, tree-building & Felsenstein likelihood in R, 363 passing test assertions
- Data Mining from Scratch — distances, DTW, graph & string kernels, k-NN/Naive Bayes implemented in pure NumPy
- GROMOS Biomolecular MD — six molecular-dynamics studies benchmarked against experiment
Health & sensing
- Mobile-Sensing Symptom Prediction — leakage-aware, participant-grouped CV on 4 years of GLOBEM data
- IMU Step Counting & Activity Recognition — peak-detection step counter + Random-Forest activity classifier on wearable IMU data
Data-science hackathons
- Solar PV Forecasting & Spot-Market Trading — day-ahead Swiss PV forecasting (Ridge + XGBoost ensemble) → trading positions (Axpo Datathon 2024)
Also: BoTorch / GPyTorch · OpenFold3 · Scanpy / AnnData · XGBoost / LightGBM · SHAP / Captum · Gradio · Chroma