Skip to content

fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh#241

Open
rylinjames wants to merge 1 commit into
mainfrom
fix/smolvla-expert-silu-activation
Open

fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh#241
rylinjames wants to merge 1 commit into
mainfrom
fix/smolvla-expert-silu-activation

Conversation

@rylinjames

Copy link
Copy Markdown
Collaborator

Bug

ExpertGQALayer.forward (the SmolVLA-family expert layer in src/tether/models/heads/expert_stack.py:212) was calling F.gelu(..., approximate="tanh") -- Gemma's MLP activation -- in the gated FFN:

# BEFORE (wrong)
return res + self.down_proj(F.gelu(self.gate_proj(x), approximate="tanh") * self.up_proj(x))

The SmolVLA expert is built from config.text_config, which is a LlamaConfig / SmolLM2Config. Its hidden_act defaults to "silu" (the Llama/SmolLM2 standard). The in-code comment acknowledged "SmolVLA / SmolLM2 uses silu" but then justified using gelu-tanh as "a different family" -- that justification is wrong. The ExpertGQALayer IS the SmolLM2 text stack expert; using gelu-tanh corrupts parity on the decomposed SmolVLA export path.

The monolithic path (which wraps lerobot directly) is unaffected.

Reference evidence

  • transformers SmolLM2Config (inherits LlamaConfig): hidden_act = "silu" (Llama default)
  • lerobot smolvlm_with_expert.py: expert MLP built from text_config, activation path is silu
  • transformers GemmaConfig: hidden_act = "gelu_pytorch_tanh" -- this is where gelu-tanh belongs

Fix

Changed line 212 from F.gelu(self.gate_proj(x), approximate="tanh") to F.silu(self.gate_proj(x)) and updated the comment to state the expert uses SiLU (matching SmolLM2/LlamaConfig).

# AFTER (correct)
return res + self.down_proj(F.silu(self.gate_proj(x)) * self.up_proj(x))

Scoping note

Pi05ExpertGQALayer (pi0.5, Gemma-based) at line 434 intentionally keeps gelu-tanh -- it is correct for the Gemma family. That layer is NOT touched.

Test

tests/test_smolvla_expert_silu_activation.py -- 5 tests, no lerobot/model download required:

  • test_mlp_matches_silu_reference: instantiates ExpertGQALayer with tiny dimensions (hidden=8), runs a forward pass, asserts output matches the SiLU reference formula and does NOT match the old gelu-tanh formula
  • test_silu_and_geluTanh_differ_on_known_input: sanity-checks that silu and gelu-tanh are not equivalent on the test input (ensures the discriminating test above is non-vacuous)
  • test_expert_gqa_layer_no_gelu_tanh: source guard -- ExpertGQALayer.forward must not contain approximate="tanh"
  • test_expert_gqa_layer_uses_silu: source guard -- ExpertGQALayer.forward must call F.silu
  • test_pi05_expert_still_uses_gelu_tanh: source guard -- Pi05ExpertGQALayer.forward still has approximate="tanh" (Gemma, intentional)

All 5 pass. py_compile and ruff clean on both touched files.

🤖 Generated with Claude Code

…t gelu-tanh

- ExpertGQALayer.forward line 212 was calling F.gelu(..., approximate="tanh")
  (Gemma's activation) despite the SmolVLA expert being built from
  config.text_config (a LlamaConfig/SmolLM2Config) whose hidden_act defaults
  to "silu". This corrupted parity on the decomposed SmolVLA export path.
- Fixed by replacing F.gelu(self.gate_proj(x), approximate="tanh") with
  F.silu(self.gate_proj(x)) and correcting the now-inaccurate comment.
- Pi05ExpertGQALayer (Gemma-based pi0.5 expert) intentionally retains
  gelu-tanh -- not touched.
- Added tests/test_smolvla_expert_silu_activation.py with 5 tests:
  forward-pass matches SiLU reference, silu vs gelu-tanh sanity check,
  source-level guards asserting ExpertGQALayer has no gelu-tanh and
  Pi05ExpertGQALayer still does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant