Senior AI Engineer Β· GenAI & LLM Systems Β· Open-Source Contributor
"Build it end-to-end. Ship it. Then explain why it works."
Fragile Safety: Automated Circuit Discovery is Vulnerable to Dormant Feature Bundling Chandrashekar DP β Zenodo Preprint, June 2026
Automated circuit discovery tools used for AI safety verification are structurally vulnerable to adversarial input manipulation. A linear probe achieving 97.5% accuracy on clean data fails completely on adversarial inputs (0% detection rate). The adversarial distribution is geometrically inseparable from clean positives at the anchor token (cosine similarity 0.989). Reproduced on Pythia-410m. Practical mitigation: context-aware probing at the last sequence token achieves 100% adversarial detection with clean accuracy preserved. All experiments use TransformerLens and run on CPU.
π Preprint β Zenodo Β· π» Experiments code Β· arXiv submission pending endorsement
-
TransformerLens β Contributing unit tests for architecture adapters in Google DeepMind's mechanistic interpretability library (3,500+ β). 5 PRs submitted, 4 merged β covering GPT-Neo, GPT-NeoX, Apple OpenELM, LLaVA-OneVision, and LLaMA. Each PR adds production-quality test coverage (147 tests, 1,286 lines) for AI architectures used by safety researchers at Anthropic, Google DeepMind, and universities worldwide.
-
enterprise-rag-patterns β Production RAG architectures from my "GenAI in Production" newsletter. Code derived from real enterprise systems β not demo code.
article-01: Intelligent RCA Agent with Claude API + Databricks (80% triage reduction)article-02: RAG Evaluation at scale β proxy metrics, embedding drift detection, labeled eval with Claude as judge
- ComplianceShield β AI compliance assistant with PII detection, session audit logging, multi-provider LLM support, and HITL review workflows. FastAPI + Streamlit + Docker. Built to solve the trust problem in enterprise AI deployments.
- Working through nanoGPT β micrograd β minbpe (Karpathy's series) to understand transformers from raw math, not API calls
- Studying mechanistic interpretability via TransformerLens contributions β learning how attention heads and MLP circuits encode knowledge
-
The LLM observability gap: LLMs are in production everywhere. Engineers have nothing equivalent to what researchers have in TransformerLens. No production-grade "Sentry for model reasoning." That's the problem I keep coming back to.
-
Why evaluation is harder than training: Writing the RAG evaluation framework taught me that measuring whether an LLM is correct is a deeper problem than making it correct. Most teams skip it. The ones who don't, ship reliable AI.
-
Interpretability as infrastructure: TransformerLens contributions are clarifying something β the people who build the measurement tools shape what the whole field builds next. Open-source research infrastructure is underrated leverage.
-
Foundations vs. APIs: There's a big difference between engineers who use LLMs and engineers who understand them. Working through micrograd β nanoGPT is making that difference concrete for me.
AI & Research
- Concrete Problems in AI Safety β Amodei et al.
- Attention Is All You Need β Vaswani et al.
- Anthropic's Mechanistic Interpretability papers (superposition, features, circuits)
- Neel Nanda's mech interp blog posts
Engineering
- Designing Machine Learning Systems β Chip Huyen
- Building production RAG systems (hands-on, via enterprise-rag-patterns)
Just for interest
- Thinking, Fast and Slow β Daniel Kahneman
- My wife's opinions on everything (ongoing study, steep difficulty)
β‘ Fun fact: I studied both AI interpretability research infrastructure and enterprise root cause analysis in the same week β and they're the same problem.

