feat: @mkopcins/gemma4 by mkopcins · Pull Request #1162 · software-mansion/react-native-executorch

mkopcins · 2026-05-21T07:57:28Z

Description

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Test by running apps/llm app on llm screen (for text only model) and multimodal screen (for audio-vision-text model). Text model should work as any other llm model. Multimodal can process up-to-30sec audio chunks as well as image inputs, should be able to transcribe audio, describe pictures or similar.

Screenshots

Related issues

#1062

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

benITo47 · 2026-06-02T09:02:09Z

What was changed in the binaries? Asking for the record.

Adding Vulkan to RNE

msluszniak · 2026-06-02T09:15:48Z

-        inputs.push_back(llm::make_text_input(prompt.substr(searchPos)));
-      }
+  size_t imageIdx = 0, audioIdx = 0, pos = 0;
+  constexpr int32_t kAudioSampleRate = 16000;


What if in the future some other model will accept different sample rate or multiple sample rates? Every time we add such model, we will need to break this file that suppose to be general, for every llm. This is a code smell.

msluszniak · 2026-06-02T09:23:43Z

+  std::string generateMultimodal(
+      std::string prompt, std::shared_ptr<jsi::Function> callback,
+      std::vector<std::string> imagePaths = {}, std::string imageToken = "",
+      std::vector<std::vector<float>> audioWaveforms = {},


Again, what if in the future multimodal llm will accept video as an input, we need to add another positional argument and break common header. Bad code design that was there and we just extend it. But to be fair, if we plan to refactor whole codebase using this approach proposed by Bartek: see #1208, then I think it doesn't matter as much as if we would keep this codebase as standalone one.

msluszniak · 2026-06-02T09:24:52Z

+  return module_->is_method_loaded(kAudioEncoderMethod);
+}
+
+int32_t AudioEncoder::encoderTokenCount() const { return last_token_count_; }


Probably noexcept

msluszniak · 2026-06-02T09:26:13Z

+      static_cast<long long>(n_valid), static_cast<long long>(k_blocks),
+      static_cast<long long>(kAudioBlockKMin),
+      static_cast<long long>(kAudioBlockKMax),
+      static_cast<int>(kSamplesPerBlock),


Why do you mix long long and int64_t?

msluszniak · 2026-06-02T09:41:12Z

+  inline ::executorch::runtime::Result<::executorch::aten::Tensor>
+  decode(const ::executorch::runtime::EValue &embeddings,
+         const ::executorch::runtime::EValue &ple_tok, int64_t start_pos) {
+    auto start_pos_tensor = ::executorch::extension::from_blob(
+        &start_pos, {1}, ::executorch::aten::ScalarType::Long);
+    auto outputs_result = module_->execute(
+        kTextModelMethod, {embeddings, ple_tok, start_pos_tensor});
+    if (!outputs_result.ok()) {
+      return outputs_result.error();
+    }
+    auto &outputs = *outputs_result;
+    ET_CHECK_MSG(outputs.size() == 1,
+                 "Expected 1 output from text_decoder, got %zu",
+                 outputs.size());
+    ET_CHECK_MSG(outputs[0].isTensor(), "text_decoder output is not a tensor");
+    return outputs[0].toTensor();
+  }


Similar to previous comments, PLE is related to gemma 4 or similar models. If there will be gemma 5 with another quirks, then we need to break this contract again.

msluszniak · 2026-06-02T09:46:53Z

+      std::make_unique<ProbIndex<T>[]>(vocab_size_);
+  T sum = 0;
+  for (int i = 0; i < vocab_size_; i++) {
+    T e = static_cast<T>(expf(static_cast<float>(logits[i] - max_val)));


Suggested change

T e = static_cast<T>(expf(static_cast<float>(logits[i] - max_val)));

T e = static_cast<T>(std::expf(static_cast<float>(logits[i] - max_val)));

msluszniak · 2026-06-02T09:47:21Z

+  if (sum <= T(0)) {
+    return;
+  }
+  for (int i = 0; i < vocab_size_; i++) {


And fix other places as such

Suggested change

for (int i = 0; i < vocab_size_; i++) {

for (size_t i = 0; i < vocab_size_; i++) {

msluszniak · 2026-06-02T09:54:49Z

+Sampler::Sampler(int vocab_size, float temperature, float topp, int32_t topk,
+                 unsigned long long rng_seed)
+    : vocab_size_(vocab_size),
+      inv_temperature_((temperature != 0.0f) ? (1.0f / temperature) : 0.0f),
+      topp_(topp), min_p_(0.0f), repetition_penalty_(1.0f), topk_(topk),
      rng_state_(rng_seed) {}

+Sampler::Sampler(int vocab_size, float temperature, float topp, int32_t topk)
+    : Sampler(vocab_size, temperature, topp, topk, std::time(nullptr)) {}
+
+Sampler::Sampler(int vocab_size, float temperature, float topp,
+                 unsigned long long rng_seed)
+    : Sampler(vocab_size, temperature, topp, 0, rng_seed) {}
+
 Sampler::Sampler(int vocab_size, float temperature, float topp)


Poor design, can't we somehow avoid 5 different constructors which take or not some subset of arguments?

msluszniak · 2026-06-02T10:01:57Z

+const AUDIO_SAMPLES_PER_BLOCK = 7680;
+const AUDIO_TOKENS_PER_BLOCK = 12;


Similar to cpp, LLMController is a general file for LLMs, but AUDIO_SAMPLES_PER_BLOCK = 7680 is value specific for gemma 4, shouldn't be there.

yes, absolutely agreed, those are leftovers from iterating and I missed them in self-review

msluszniak

CI needs to be fixed. Also please add testing instructions.

msluszniak · 2026-06-02T10:26:04Z

+  for (const auto &input : inputs) {
+    if (input.is_image()) {
+      ET_CHECK_OR_RETURN_ERROR(image_encoder_ != nullptr, InvalidState,
+                               "No image encoder registered");
+      const int32_t num_visual = image_encoder_->encoderTokenCount();
+      ET_CHECK_OR_RETURN_ERROR(num_visual > 0, InvalidState,
+                               "Image encoder reports 0 visual tokens");
+      image_slots.push_back(ImageSlot{&input, static_cast<int64_t>(ids.size()),
+                                      static_cast<int64_t>(num_visual)});
+      ids.insert(ids.end(), static_cast<size_t>(num_visual), 0);
+    } else if (input.is_audio()) {
+      ET_CHECK_OR_RETURN_ERROR(audio_encoder_ != nullptr, InvalidState,
+                               "No audio encoder registered");
+      const long t_aud_begin = time_in_ms();
+      auto enc = audio_encoder_->encode(input);
+      ET_CHECK_OK_OR_RETURN_ERROR(enc.error(), "Audio encoding failed");
+      audio_encode_ms += time_in_ms() - t_aud_begin;
+      audio_calls += 1;
+      // Snapshot the encoder output NOW — see AudioSlot comment above for
+      // why the returned EValue's tensor metadata can't survive past the
+      // next module_->execute(). num_audio and audio_hidden are read from


As we dicussed, this must be refactored. It is hard to read, functions are way too long. Here each if block might be a separate function. Please create issue for the refactor if the multimodal prefiller if you don't want to handle this in the PR.

mkopcins force-pushed the @mkopcins/gemma4 branch 4 times, most recently from b2b6a24 to bf62c0b Compare May 25, 2026 09:18

mkopcins added 3 commits May 25, 2026 11:34

gemma4 audio

668c37c

gemma_vulkan

27776de

uploaded production gemma with vulkan

938bc11

mkopcins force-pushed the @mkopcins/gemma4 branch from bf62c0b to 938bc11 Compare May 25, 2026 09:41

code cleanup

66b3d24

mkopcins force-pushed the @mkopcins/gemma4 branch from ec22b0d to 66b3d24 Compare June 1, 2026 09:30

cleanup

70f1e79

msluszniak marked this pull request as ready for review June 1, 2026 15:32

msluszniak assigned mkopcins Jun 1, 2026

msluszniak added the feature PRs that implement a new feature label Jun 1, 2026

msluszniak linked an issue Jun 1, 2026 that may be closed by this pull request

Gemma4 support #1062

Open

msluszniak self-requested a review June 1, 2026 15:33

cleanup

a606c79

mkopcins force-pushed the @mkopcins/gemma4 branch from 12f147e to a606c79 Compare June 2, 2026 08:16

benITo47 reviewed Jun 2, 2026

View reviewed changes

msluszniak requested changes Jun 2, 2026

View reviewed changes

	T e = static_cast<T>(expf(static_cast<float>(logits[i] - max_val)));
	T e = static_cast<T>(std::expf(static_cast<float>(logits[i] - max_val)));

	for (int i = 0; i < vocab_size_; i++) {
	for (size_t i = 0; i < vocab_size_; i++) {

		const AUDIO_SAMPLES_PER_BLOCK = 7680;
		const AUDIO_TOKENS_PER_BLOCK = 12;

Conversation

mkopcins commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

benITo47 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

msluszniak Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkopcins Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

msluszniak Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mkopcins commented May 21, 2026 •

edited

Loading

msluszniak Jun 2, 2026 •

edited

Loading

msluszniak Jun 2, 2026 •

edited

Loading

msluszniak Jun 2, 2026 •

edited

Loading

msluszniak Jun 2, 2026 •

edited

Loading

msluszniak Jun 2, 2026 •

edited

Loading

msluszniak Jun 2, 2026 •

edited

Loading