feat: add DeepSeek-V2 / V2-Lite / Coder-V2 architecture adapter by mukund1985 · Pull Request #1408 · TransformerLensOrg/TransformerLens

mukund1985 · 2026-06-18T16:58:32Z

Closes #1400.

Summary

Adds TransformerBridge support for DeepseekV2ForCausalLM (DeepSeek-V2, DeepSeek-V2-Lite, DeepSeek-Coder-V2). The adapter follows the same pattern as the existing V3 adapter but handles three V2-specific differences.

What changed

New files

transformer_lens/model_bridge/supported_architectures/deepseek_v2.py — the adapter
tests/integration/model_bridge/test_deepseek_v2_adapter.py — 17 integration tests covering both V2-full and V2-Lite configs

Modified files

supported_architectures/__init__.py and factories/architecture_adapter_factory.py — register the adapter
generalized_components/mla_attention.py — complex RoPE support (see below)
generalized_components/rotary_embedding.py — complex tensor pass-through

V2-specific differences from V3

1. Complex-exponential RoPE

V3 returns (cos, sin) from its rotary embedding. V2 returns freqs_cis = torch.polar(ones, freqs) — a complex tensor — and applies rotation via complex multiplication rather than the standard (q * cos) + (rotate_half(q) * sin) formula.

Two small changes to shared components:

RotaryEmbeddingBridge.forward(): when the original component returns a complex tensor, pass it through instead of raising. The hook_cos/hook_sin pattern does not apply here.
MLAAttentionBridge.forward(): detects complex position_embeddings and dispatches to a new _apply_rotary_complex() helper that mirrors apply_rotary_emb from the V2 modeling code (view_as_complex → multiply → flatten). The (cos, sin) path for V3 is unchanged. KV-cache kwargs are guarded so cos/sin are only included when present.

2. Optional Q LoRA path (V2-Lite)

V2-Lite sets q_lora_rank=None, skipping Q compression and using q_proj directly. The three Q-path submodules (q_a_proj, q_a_layernorm, q_b_proj) are marked optional=True so bridge setup skips them when absent. q_a_layernorm uses GeneralizedComponent (which already supports optional) rather than RMSNormalizationBridge — the layernorm call happens inside MLAAttentionBridge.forward() via the stored HF module, so a plain wrapper suffices. MLAAttentionBridge already branches on q_lora_rank at runtime.

3. Gate not hookable

DeepseekV2Moe.forward() routes via nn.functional.linear(hidden_states, self.gate.weight) rather than self.gate(hidden_states), so the gate module forward is never called and bridge hooks cannot be attached. The gate is omitted from MoEBridge submodules. shared_experts is called via __call__ and hooks correctly.

Tests

17 tests, all passing:

TestDeepSeekV2BridgeCreation   — block count, MLA type
TestDeepSeekV2ForwardPass      — output shape, HF output match (atol < 0.15)
TestDeepSeekV2DenseVsMoELayers — dense vs MoE hook presence, mlp hooks all layers
TestDeepSeekV2AttentionHooks   — attn hooks fire all layers, MLA latent hooks
TestDeepSeekV2LiteBridgeCreation — block count, MLA type for Lite
TestDeepSeekV2LiteForwardPass  — output shape, HF match for Lite
TestDeepSeekV2LiteNoQLatentHook — hook_q_latent absent, hook_kv_latent fires, no NaN

…RoPE) Closes TransformerLensOrg#1400. DeepSeek-V2, V2-Lite, and Coder-V2 all use DeepseekV2ForCausalLM. This adds a bridge adapter covering three V2-specific differences from V3: 1. Complex-exponential RoPE: V2's rotary embedding returns freqs_cis (a complex tensor via torch.polar) rather than a (cos, sin) tuple. - RotaryEmbeddingBridge.forward() now passes complex tensors through without raising, leaving them for the attention bridge to consume. - MLAAttentionBridge.forward() detects complex position_embeddings and dispatches to a new _apply_rotary_complex() helper that mirrors DeepSeek-V2's apply_rotary_emb (view_as_complex, multiply, flatten). 2. Optional Q LoRA path: V2-Lite sets q_lora_rank=None, skipping q_a_proj/q_a_layernorm/q_b_proj and using q_proj directly instead. All three Q-path submodules are marked optional=True in the adapter; q_a_layernorm uses GeneralizedComponent (which already supports optional) rather than RMSNormalizationBridge. MLAAttentionBridge already branches on q_lora_rank at runtime. 3. Gate not hookable: DeepseekV2Moe.forward() routes via nn.functional.linear(..., self.gate.weight) rather than self.gate(hidden_states), so the gate module's forward() is never called and bridge hooks cannot fire. The gate is omitted from MoEBridge submodules; shared_experts uses __call__ and hooks fine. Files changed: - supported_architectures/deepseek_v2.py (new) - supported_architectures/__init__.py: register adapter - factories/architecture_adapter_factory.py: map DeepseekV2ForCausalLM - generalized_components/mla_attention.py: complex RoPE support - generalized_components/rotary_embedding.py: complex tensor pass-through - tests/integration/model_bridge/test_deepseek_v2_adapter.py (new, 17 tests)

mukund1985 · 2026-06-18T17:03:00Z

@jlarson4 tagging you as you have been reviewing the related MLA/bridge work — happy to take any feedback on the approach here, particularly the complex RoPE handling in MLAAttentionBridge.

jlarson4 · 2026-06-18T17:07:01Z

I will review this later today! In the future, would you please comment on issues you wish to take on? This allows other contributors to see what you are working on so we aren't repeatedly duplicating efforts and allows me to get you properly assigned to the issue.

jlarson4 · 2026-06-19T02:41:25Z

@mukund1985 This looks good to me. Excellent solution for branching the two different variations of the rotary implementation.

I can merge this once the formatting CI issues are cleared up. Please run make check-format and uv run mypy . on your code when you have a moment.

mukund1985 · 2026-06-19T04:39:51Z

@jlarson4 done — both addressed in commit 96fd927:

make check-format: black flagged spacing and line wraps in test_deepseek_v2_adapter.py and deepseek_v2.py, fixed via make format.
uv run mypy .: one error in rotary_embedding.py — forward() was annotated -> Tuple[Tensor, Tensor] but the V2 path legitimately returns a single complex freqs_cis tensor. Widened to Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor] and updated the docstring. Both checks pass clean now.

mukund1985 · 2026-06-19T04:41:23Z

@jlarson4 format and type checks are both green. The one failing check — Notebook Checks (Attribution_Patching_Demo) — does not involve any file this PR touches. Looks like a pre-existing failure in dev, possibly related to the changes merged with #1398 yesterday.

mukund1985 · 2026-06-19T04:50:05Z

@jlarson4 just dug into the Attribution_Patching_Demo failure. Cell 3 fails because the import from transformer_lens.model_bridge import TransformerBridge now produces stderr (nbval flags any unexpected stderr as a failure). The stderr is:

<frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
<frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

These are SwigPy C-extension deprecation warnings — not from any code in this PR. I confirmed they reproduce identically on dev itself (without our changes), so this is a pre-existing issue in the base branch, not introduced here.

style: fix formatting and mypy errors

96fd927

mukund1985 mentioned this pull request Jun 19, 2026

[Proposal] Add DeepSeek-V2 architecture adapter (DeepseekV2ForCausalLM) #1400

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DeepSeek-V2 / V2-Lite / Coder-V2 architecture adapter#1408

feat: add DeepSeek-V2 / V2-Lite / Coder-V2 architecture adapter#1408
mukund1985 wants to merge 2 commits into
TransformerLensOrg:devfrom
mukund1985:feat/deepseek-v2-adapter

mukund1985 commented Jun 18, 2026

Uh oh!

mukund1985 commented Jun 18, 2026

Uh oh!

jlarson4 commented Jun 18, 2026

Uh oh!

jlarson4 commented Jun 19, 2026

Uh oh!

mukund1985 commented Jun 19, 2026

Uh oh!

mukund1985 commented Jun 19, 2026

Uh oh!

mukund1985 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mukund1985 commented Jun 18, 2026

Summary

What changed

V2-specific differences from V3

Tests

Uh oh!

mukund1985 commented Jun 18, 2026

Uh oh!

jlarson4 commented Jun 18, 2026

Uh oh!

jlarson4 commented Jun 19, 2026

Uh oh!

mukund1985 commented Jun 19, 2026

Uh oh!

mukund1985 commented Jun 19, 2026

Uh oh!

mukund1985 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants