[Llama4 Guard] Add JAX Llama-Guard-4-12B Text Portion #1090

JiriesKaileh · 2025-11-13T01:10:20Z

Summary:
This PR introduces the JAX implementation for the text portion of the Llama-Guard-4-12B model and configures its end-to-end continuous integration (CI) pipeline. This implementation successfully parity with the PyTorch/TorchAX baseline.

Relevant bugs:

b/439655882

Performance Testing and Verification

The JAX implementation now runs consistently faster than the PyTorch baseline, achieving superior throughput and TTFT latency.

Metric	JAX Run (Final)	TorchAX Run (Baseline)	Analysis
Output Throughput (tok/s)	573.00	563.03	JAX is 1.02x Faster. Achieved performance superiority.
Mean TTFT (ms)	1,429.90	1,480.30	JAX is Faster. Initial latency is consistently lower.
Mean TPOT (ms)	89.18	88.49	Parity. Token generation speed is aligned between the two backends.
Benchmark Duration (s)	2.62	2.66	Faster. Total execution time is shorter.
Accuracy	31.43%	31.43%	Parity. Accuracies are the same for the same subset of prompts.

See meta-llama_Llama-Guard-4-12B.yml for unit, integration, and performance tests

…output of the tokenizer.encode() call in the inference script

…. No longer have any changes in vllm

…_template, removing the need for the .jinja file

…tances from Llama Guard 4 subclass initializations to comply with unit tests

…ertion error from model downsizing

…into jiries/llama-guard-4-text

…ile and replaced the model_loader registry key name for Llama Guard 4 with a recognized text only model

…into jiries/llama-guard-4-text

github-actions · 2025-11-13T01:10:37Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

The rest of the description includes relevant details and context, examples:

why is this change being made,
the problem being solved and any relevant context,
why this is a good solution,
some information about the specific implementation,
shortcomings of the solution and possible future improvements.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Checklist

Before submitting this PR, please make sure:

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

.buildkite/models/meta-llama_Llama-Guard-4-12B.yml

examples/offline_llama_guard_4_inference.py

tests/e2e/benchmarking/llama_guard_perf_recipe.sh

tests/e2e/benchmarking/test_llama_guard_4_accuracy.sh

jrplatin · 2025-11-13T02:37:35Z

Please update the title + add accuracy / performance testing in the description too

examples/offline_llama_guard_4_inference.py

tpu_inference/models/jax/llama_guard_4.py

tests/e2e/benchmarking/llama_guard_perf_recipe.sh

tests/e2e/benchmarking/test_llama_guard_4_accuracy.sh

tpu_inference/models/jax/llama_guard_4.py

tpu_inference/models/common/model_loader.py

tests/e2e/benchmarking/test_llama_guard_4_accuracy.sh

…atement in TPUModelRunner.load_model()

…verride in CI scripts, and simplified prompt formatting in offline inference script

…nference script, and made minor change to CI scripts to prevent breaking CI

…into jiries/llama-guard-4-text

examples/offline_llama_guard_4_inference.py

…scripts. Still need to resolve dataset origin issue and modify buildkite yml to reflect changes

… new perf and accuracy script changes

examples/offline_safety_model_inference.py

tpu_inference/runner/tpu_runner.py

tpu_inference/models/jax/llama_guard_4.py

tests/e2e/benchmarking/safety_model_benchmark.sh

…nted error for multimodal inputs

…into jiries/llama-guard-4-text

tests/e2e/benchmarking/safety_model_benchmark.sh

tests/e2e/benchmarking/llama_guard_perf_recipe.sh

.buildkite/models/meta-llama_Llama-Guard-4-12B.yml

Signed-off-by: JiriesKaileh <jiries@google.com>

…into jiries/llama-guard-4-text

Signed-off-by: JiriesKaileh <jiries@google.com>

jcyang43 · 2025-11-20T22:22:23Z

tests/e2e/benchmarking/safety_model_benchmark.sh

+    echo -e "\n--- Running Accuracy Check (Mode: ACCURACY) ---"
+
+    CONFTEST_DIR="/workspace/tpu-inference/scripts/vllm/integration"
+    CONFTEST_DIR="/mnt/disks/jiries-disk_data/tpu-inference/scripts/vllm/integration"


Hi @JiriesKaileh, is this for your local dev environment? I don't think this works for the agents that are running the CI: link

Yes, line 136 should be removed. Let me open a PR for that. Thank you for point this out.

JiriesKaileh added 18 commits November 7, 2025 18:01

Included functional text portion of Llama Guard 4

108f2bb

Changed key name in _MODEL_REGISTRY for LlamaGuard4ForCausalLM

1f31aa5

Fixed tokenizer.Encoding issue by extracting the .ids field from the …

a72f4cb

…output of the tokenizer.encode() call in the inference script

Added dtype reassignment to top of __init__ of LlamaGuard4ForCausalLM…

0d48063

…. No longer have any changes in vllm

Added buildkite CI YAML file

fd397c5

Merged with main

67c1fbc

Modified the Llama Guard 4 offline inference script to use apply_chat…

aa1020d

…_template, removing the need for the .jinja file

Changed yml file to model specific kind and removed NamedSharding ins…

2c6c1c2

…tances from Llama Guard 4 subclass initializations to comply with unit tests

Code clean up

a09bd4a

Modified sizes in the MockVllmConfig to avoid OOM

8e012e0

Changed argument value in assert_called_once_with() call to avoid ass…

c0974d0

…ertion error from model downsizing

Merge branch 'main' of https://github.com/vllm-project/tpu-inference …

5593e14

…into jiries/llama-guard-4-text

Added integration tests and performance benchmarks to buildkite yml f…

53aba82

…ile and replaced the model_loader registry key name for Llama Guard 4 with a recognized text only model

Merge branch 'main' of https://github.com/vllm-project/tpu-inference …

277be42

…into jiries/llama-guard-4-text

Changed serve command to use JAX model instead of Torchax model

11faff5

Merge branch 'main' of https://github.com/vllm-project/tpu-inference …

a555d03

…into jiries/llama-guard-4-text

Merge branch 'main' of https://github.com/vllm-project/tpu-inference …

c67ab90

…into jiries/llama-guard-4-text

Code clean up

5e09f5f

JiriesKaileh requested a review from jrplatin November 13, 2025 01:10

jrplatin reviewed Nov 13, 2025

View reviewed changes

JiriesKaileh changed the title ~~Jiries/llama guard 4 text~~ feat(llama_guard): Add JAX Llama-Guard-4-12B Text Portion and Achieve Torchax Performance/Accuracy Parity Nov 14, 2025

jrplatin changed the title ~~feat(llama_guard): Add JAX Llama-Guard-4-12B Text Portion and Achieve Torchax Performance/Accuracy Parity~~ [Llama4 Guard]: Add JAX Llama-Guard-4-12B Text Portion Nov 14, 2025

General code clean up, LG4 alignment with L4, CI script refactoring

aef6143

jrplatin changed the title ~~[Llama4 Guard]: Add JAX Llama-Guard-4-12B Text Portion~~ [Llama4 Guard] Add JAX Llama-Guard-4-12B Text Portion Nov 14, 2025

jrplatin reviewed Nov 14, 2025

View reviewed changes

JiriesKaileh added 3 commits November 14, 2025 03:16

Got past pydantic model validation error by inserting temporary if st…

ae4533f

…atement in TPUModelRunner.load_model()

Fixed rngs issue in Llama Guard 4, removed unnecessary architecture o…

2c91248

…verride in CI scripts, and simplified prompt formatting in offline inference script

Aligned sharding with that of L4, included example usage in offline i…

02cfd2f

…nference script, and made minor change to CI scripts to prevent breaking CI

Merge branch 'main' of https://github.com/vllm-project/tpu-inference …

b929184

…into jiries/llama-guard-4-text

jrplatin reviewed Nov 17, 2025

View reviewed changes

examples/offline_llama_guard_4_inference.py Outdated Show resolved Hide resolved

jrplatin reviewed Nov 17, 2025

View reviewed changes

examples/offline_llama_guard_4_inference.py Outdated Show resolved Hide resolved

jrplatin reviewed Nov 17, 2025

View reviewed changes

examples/offline_llama_guard_4_inference.py Outdated Show resolved Hide resolved

JiriesKaileh added 2 commits November 19, 2025 18:58

WIP: conslidated perf and accuracy scripts into general safety model …

9db0de2

…scripts. Still need to resolve dataset origin issue and modify buildkite yml to reflect changes

Fixed dataset download issue and modified buidlkite CI yml to reflect…

8d4c35a

… new perf and accuracy script changes