fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh#241
Open
rylinjames wants to merge 1 commit into
Open
fix(models): SmolVLA expert uses SiLU (matches SmolLM2 reference), not gelu-tanh#241rylinjames wants to merge 1 commit into
rylinjames wants to merge 1 commit into
Conversation
…t gelu-tanh - ExpertGQALayer.forward line 212 was calling F.gelu(..., approximate="tanh") (Gemma's activation) despite the SmolVLA expert being built from config.text_config (a LlamaConfig/SmolLM2Config) whose hidden_act defaults to "silu". This corrupted parity on the decomposed SmolVLA export path. - Fixed by replacing F.gelu(self.gate_proj(x), approximate="tanh") with F.silu(self.gate_proj(x)) and correcting the now-inaccurate comment. - Pi05ExpertGQALayer (Gemma-based pi0.5 expert) intentionally retains gelu-tanh -- not touched. - Added tests/test_smolvla_expert_silu_activation.py with 5 tests: forward-pass matches SiLU reference, silu vs gelu-tanh sanity check, source-level guards asserting ExpertGQALayer has no gelu-tanh and Pi05ExpertGQALayer still does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
ExpertGQALayer.forward(the SmolVLA-family expert layer insrc/tether/models/heads/expert_stack.py:212) was callingF.gelu(..., approximate="tanh")-- Gemma's MLP activation -- in the gated FFN:The SmolVLA expert is built from
config.text_config, which is aLlamaConfig/SmolLM2Config. Itshidden_actdefaults to"silu"(the Llama/SmolLM2 standard). The in-code comment acknowledged "SmolVLA / SmolLM2 uses silu" but then justified using gelu-tanh as "a different family" -- that justification is wrong. TheExpertGQALayerIS the SmolLM2 text stack expert; using gelu-tanh corrupts parity on the decomposed SmolVLA export path.The monolithic path (which wraps lerobot directly) is unaffected.
Reference evidence
transformersSmolLM2Config(inheritsLlamaConfig):hidden_act = "silu"(Llama default)smolvlm_with_expert.py: expert MLP built fromtext_config, activation path issilutransformersGemmaConfig:hidden_act = "gelu_pytorch_tanh"-- this is where gelu-tanh belongsFix
Changed line 212 from
F.gelu(self.gate_proj(x), approximate="tanh")toF.silu(self.gate_proj(x))and updated the comment to state the expert uses SiLU (matching SmolLM2/LlamaConfig).Scoping note
Pi05ExpertGQALayer(pi0.5, Gemma-based) at line 434 intentionally keepsgelu-tanh-- it is correct for the Gemma family. That layer is NOT touched.Test
tests/test_smolvla_expert_silu_activation.py-- 5 tests, no lerobot/model download required:test_mlp_matches_silu_reference: instantiatesExpertGQALayerwith tiny dimensions (hidden=8), runs a forward pass, asserts output matches the SiLU reference formula and does NOT match the old gelu-tanh formulatest_silu_and_geluTanh_differ_on_known_input: sanity-checks that silu and gelu-tanh are not equivalent on the test input (ensures the discriminating test above is non-vacuous)test_expert_gqa_layer_no_gelu_tanh: source guard --ExpertGQALayer.forwardmust not containapproximate="tanh"test_expert_gqa_layer_uses_silu: source guard --ExpertGQALayer.forwardmust callF.silutest_pi05_expert_still_uses_gelu_tanh: source guard --Pi05ExpertGQALayer.forwardstill hasapproximate="tanh"(Gemma, intentional)All 5 pass.
py_compileandruffclean on both touched files.🤖 Generated with Claude Code