fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh by rylinjames · Pull Request #241 · FastCrest/tether

rylinjames · 2026-06-11T05:24:31Z

Bug

ExpertGQALayer.forward (the SmolVLA-family expert layer in src/tether/models/heads/expert_stack.py:212) was calling F.gelu(..., approximate="tanh") -- Gemma's MLP activation -- in the gated FFN:

# BEFORE (wrong)
return res + self.down_proj(F.gelu(self.gate_proj(x), approximate="tanh") * self.up_proj(x))

The SmolVLA expert is built from config.text_config, which is a LlamaConfig / SmolLM2Config. Its hidden_act defaults to "silu" (the Llama/SmolLM2 standard). The in-code comment acknowledged "SmolVLA / SmolLM2 uses silu" but then justified using gelu-tanh as "a different family" -- that justification is wrong. The ExpertGQALayer IS the SmolLM2 text stack expert; using gelu-tanh corrupts parity on the decomposed SmolVLA export path.

The monolithic path (which wraps lerobot directly) is unaffected.

Reference evidence

transformers SmolLM2Config (inherits LlamaConfig): hidden_act = "silu" (Llama default)
lerobot smolvlm_with_expert.py: expert MLP built from text_config, activation path is silu
transformers GemmaConfig: hidden_act = "gelu_pytorch_tanh" -- this is where gelu-tanh belongs

Fix

Changed line 212 from F.gelu(self.gate_proj(x), approximate="tanh") to F.silu(self.gate_proj(x)) and updated the comment to state the expert uses SiLU (matching SmolLM2/LlamaConfig).

# AFTER (correct)
return res + self.down_proj(F.silu(self.gate_proj(x)) * self.up_proj(x))

Scoping note

Pi05ExpertGQALayer (pi0.5, Gemma-based) at line 434 intentionally keeps gelu-tanh -- it is correct for the Gemma family. That layer is NOT touched.

Test

tests/test_smolvla_expert_silu_activation.py -- 5 tests, no lerobot/model download required:

test_mlp_matches_silu_reference: instantiates ExpertGQALayer with tiny dimensions (hidden=8), runs a forward pass, asserts output matches the SiLU reference formula and does NOT match the old gelu-tanh formula
test_silu_and_geluTanh_differ_on_known_input: sanity-checks that silu and gelu-tanh are not equivalent on the test input (ensures the discriminating test above is non-vacuous)
test_expert_gqa_layer_no_gelu_tanh: source guard -- ExpertGQALayer.forward must not contain approximate="tanh"
test_expert_gqa_layer_uses_silu: source guard -- ExpertGQALayer.forward must call F.silu
test_pi05_expert_still_uses_gelu_tanh: source guard -- Pi05ExpertGQALayer.forward still has approximate="tanh" (Gemma, intentional)

All 5 pass. py_compile and ruff clean on both touched files.

🤖 Generated with Claude Code

…t gelu-tanh - ExpertGQALayer.forward line 212 was calling F.gelu(..., approximate="tanh") (Gemma's activation) despite the SmolVLA expert being built from config.text_config (a LlamaConfig/SmolLM2Config) whose hidden_act defaults to "silu". This corrupted parity on the decomposed SmolVLA export path. - Fixed by replacing F.gelu(self.gate_proj(x), approximate="tanh") with F.silu(self.gate_proj(x)) and correcting the now-inaccurate comment. - Pi05ExpertGQALayer (Gemma-based pi0.5 expert) intentionally retains gelu-tanh -- not touched. - Added tests/test_smolvla_expert_silu_activation.py with 5 tests: forward-pass matches SiLU reference, silu vs gelu-tanh sanity check, source-level guards asserting ExpertGQALayer has no gelu-tanh and Pi05ExpertGQALayer still does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh#241

fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh#241
rylinjames wants to merge 1 commit into
mainfrom
fix/smolvla-expert-silu-activation

rylinjames commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rylinjames commented Jun 11, 2026

Bug

Reference evidence

Fix

Scoping note

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant