Skip to content

Conversation

@xenova
Copy link
Collaborator

@xenova xenova commented Jul 31, 2025

This is the official, long-awaited PR that introduces Transformers.js V4. Although it's currently still in draft mode, I'll be posting updates here for early review!

See benchmarks

https://huggingface.co/onnx-community/all-MiniLM-L6-v2-ONNX:

image

https://huggingface.co/onnx-community/bge-base-en-v1.5-ONNX:

image

xenova and others added 30 commits December 23, 2024 14:10
* ONNX Runtime improvements (experimental native webgpu; fix iOS) (#1231)

* customize the wasm paths

* update implementation

* allow using 'webgpu' in nodejs binding

* update version of onnxruntime-node

* Upgrade onnxruntime-web to same version as onnxruntime-node

* Update list of supported devices

---------

Co-authored-by: Joshua Lochner <26504141+xenova@users.noreply.github.com>

* customize the wasm paths (#1250)

* customize the wasm paths

* update implementation

* [internal] Add is_decoder option to session retrieval for preferred output location

* Update tests

* Formatting

* Bump ort versions

* Bump onnxruntime-node version

* Bump versions

* Bump ORT versions

* Bump versions

* Only check webgpu fp16 for non-node environments

* Fix

* Assume node supports webgpu

* Update ORT node support comment

* Relax test strictness

* Update conversion script versions

* Downgrade onnxslim

* cleanup

* Update package-lock.json

* Update onnxruntime versions

* Update post-build script

* Use built-in session release function

* Call garbage collection after each tokenizer test

* Do not double-throw error

* Fix race-condition in build process with file removal

* Update versions

* Bump jinja version

* [version] Update to 3.6.3

* Bump jinja version to support new features

* [version] Update to 3.6.3

* Add support for LFM2 models (#1367)

* Use prefix in lfm2 output location (#1369)

* Update package-lock.json

* Run `npm audit fix`

* Add special tokens in text-generation pipeline if tokenizer requires (#1370)

* Add special tokens in text-generation pipeline if tokenizer requires

* Fix logits processors tests

* Update bundles.test.js

* Update comment

* Formatting

* Add support for ModernBERT Decoder (#1371)

* Use from/to buffer instead of string

Actually fixes #1343

* Add support for Voxtral (#1373)

* Support longform voxtral processing (#1375)

* [version] Update to 3.7.0

* Add support for Arcee (#1377)

* Optimize tensor.slice() (#1381)

* Optimize tensor.slice()

The performance of executing `tensor.slice()` is super poor, especially for
the 'logits' tensor with large dimensions.

```
const logits = outputs.logits.slice(null, -1, null);`
```

This is because currently implementation of the `slice` method manually iterates
through each element and calculate indices which is a big time consuming if
the tensor shape is large.

For cases like `slice(null, -1, null)`, where the slicing operation is
contiguous along certain dimensions, which can be optimized by bulk copy
by using `TypeArray.subarray()` and `TypeArray.set()`.

* nit

* Add a few more tensor slice unit tests

---------

Co-authored-by: Joshua Lochner <26504141+xenova@users.noreply.github.com>

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment