Skip to content

chore(deps): update loader dependencies major (major)#194

Open
dreadnode-renovate-bot[bot] wants to merge 1 commit intomainfrom
renovate/major-loader-deps-major
Open

chore(deps): update loader dependencies major (major)#194
dreadnode-renovate-bot[bot] wants to merge 1 commit intomainfrom
renovate/major-loader-deps-major

Conversation

@dreadnode-renovate-bot
Copy link
Contributor

@dreadnode-renovate-bot dreadnode-renovate-bot bot commented Feb 24, 2026

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package Type Update Change Age Confidence
nvcr.io/nvidia/pytorch final major 24.12-py326.02-py3 age confidence
psutil major ==6.1.1==7.2.2 age confidence
transformers major ==4.57.6==5.3.0 age confidence

Release Notes

giampaolo/psutil (psutil)

v7.2.2

Compare Source

v7.2.1

Compare Source

v7.2.0

Compare Source

v7.1.3

Compare Source

v7.1.2

Compare Source

v7.1.1

Compare Source

v7.1.0

Compare Source

v7.0.0

Compare Source

huggingface/transformers (transformers)

v5.3.0: : EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs Audio V2

Compare Source

New Model additions

EuroBERT
image

EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention. It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens.

Links: Documentation | Paper | Blog Post

VibeVoice ASR
image

VibeVoice ASR is an automatic speech recognition model from Microsoft that combines acoustic and semantic audio tokenizers with a causal language model for robust speech-to-text transcription. The model uses VibeVoice's acoustic and semantic tokenizers that process audio at 24kHz, paired with a Qwen2-based language decoder for generating transcriptions. It can process up to 60 minutes of continuous audio input, supports customized hotwords, performs joint ASR/diarization/timestamping, and handles over 50 languages with code-switching support.

Links: Documentation | Paper

TimesFM2.5
image

TimesFM 2.5 is a pretrained time-series foundation model that uses a decoder-only attention architecture with input patching for forecasting. The model is designed to provide accurate zero-shot forecasts across different domains, forecasting horizons and temporal granularities without requiring dataset-specific training. It builds on the original TimesFM architecture with enhancements including rotary attention, QK normalization, per-dimension attention scaling, and continuous quantile prediction.

Links: Documentation | Paper

PP-DocLayoutV2
image

PP-DocLayoutV2 is a dedicated lightweight model for layout analysis, focusing specifically on element detection, classification, and reading order prediction. The model is composed of two sequentially connected networks: an RT-DETR-based detection model that performs layout element detection and classification, followed by a pointer network that orders these layout elements. It is designed to analyze document layouts by identifying and organizing various layout components in their proper reading sequence.

Links: Documentation

OlmoHybrid

OLMo Hybrid is a hybrid architecture model from Ai2 that combines standard transformer attention layers with linear attention layers using the Gated Deltanet. This hybrid approach aims to improve efficiency while maintaining model quality by interleaving full attention layers with linear attention layers. The model uses a custom cache system that handles both KV cache for attention layers and recurrent state for linear attention layers.

Links: Documentation

ModernVBert
image

ModernVBert is a Vision-Language encoder that combines ModernBert with a SigLIP vision encoder. It is optimized for visual document understanding and retrieval tasks, making it suitable for processing documents that contain both text and visual elements.

Links: Documentation | Paper

ColModernVBert

ColModernVBert is a model for efficient visual document retrieval that leverages ModernVBert to construct multi-vector embeddings directly from document images, following the ColPali approach. The model enables retrieval and scoring of visual documents by processing both text queries and document images to generate embeddings that can be compared for relevance scoring.

Links: Documentation | Paper

Higgs Audio V2
image

Higgs Audio V2 is a powerful audio foundation model developed by Boson AI that was pretrained on over 10 million hours of audio data and diverse text data. Despite having no post-training or fine-tuning, the model excels in expressive audio generation thanks to its deep language and acoustic understanding. The model supports various audio generation tasks including single-speaker and multi-speaker smart voice, zero-shot voice cloning, and multi-speaker voice cloning.

Links: Documentation

Higgs Audio V2 Tokenizer

The Higgs Audio V2 Tokenizer is an audio tokenization model that operates at a low frame rate of 25 fps while maintaining high audio quality, effectively halving the frame rate of many baseline models. It uses unified 24 kHz training that mixes speech, music, and sound-event clips in one model to capture both semantic and acoustic details, facilitating the training of audio language models. The model enables fast inference by avoiding diffusion steps, with an encoder/decoder architecture that processes batches quickly for real-time or large-scale tasks.

Links: Documentation

Breaking changes

Tensor parallelism (TP) support for dense and MoE decoder-only models has been fixed and stabilized, requiring users to update their TP configurations and conversion mappings accordingly.

The Ernie4.5 VL MoE model class and configuration names have been renamed to align with vLLM/SGLang conventions, requiring users to update any references to the old model names in their code.

  • 🚨 [Ernie 4.5 VL Moe] Fix up namings to vllm/sglang convention (#​44299) by @​vasqu

Several pipeline tasks have been removed or updated in the V5 cleanup (including question-answering, visual-question-answering, and image-to-image), requiring users to migrate to the replacement pipelines or updated task names.

3D position IDs for vision-language models have been unified under a common interface (sourced from qwen2-vl), requiring users of affected VLMs (e.g., Ernie, GLM4V) to update their processors and any code that manually constructs position IDs.

🚨 Tokenizer x vLLM fixes 🚨 :

Unigram tokenizers were missing the spm precompiled charsmap support. We ran an overall v4 vs v5 regression test and fixed what we had missed.

This was done in:

Generation

Generation input preparation was significantly refactored to stop relying on cache_position and instead pass pre-sliced input_ids/inputs_embeds directly to prepare_inputs_for_generation, simplifying the generation loop and laying groundwork for broader cache_position removal. Several bug fixes were also applied, including correct sampling for HiggsAudioV2, flaky cache-equality test stabilization for Idefics, and restored generation integration tests.

Tokenization

Several tokenization bugs were fixed in this release, including resolving an AttributeError in MLukeTokenizer caused by the v5 rename of additional_special_tokens, correcting the Fuyu tokenizer class mapping, fixing LayoutXLM tokenization test failures from the slow tokenizer removal refactor, and adding olmo_hybrid to the auto-tokenizer mapping. The tokenizer documentation was also updated to reflect the new unified v5 backend architecture and reorganized for clarity.

Kernels

Fixed several kernel-related issues including a security vulnerability, corrected Mamba kernel loading to handle incompatible import structures, ensured Liger Kernel is properly enabled during hyperparameter search, and expanded Flash Attention to support multiple compatible implementations.

Quantization

This release adds several new quantization backends and fixes, including MLX quantization support for MPS devices, Four Over Six (4/6) NVFP4 quantization integration for NVIDIA Blackwell GPUs, and CPU support for MXFP4 models, alongside a bug fix for MXFP4 model saving using reverse_op.

Vision

Fixed backward compatibility for image processors loaded from older remote code that lack valid_kwargs definitions, and resolved test failures in AMD ROCm CI by adding the missing timm dependency to the Docker image.

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 Immortal: This PR will be recreated if closed unmerged. Get config help if that's undesired.


  • If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

@dreadnode-renovate-bot dreadnode-renovate-bot bot added the type/digest Dependency digest updates label Feb 24, 2026
@dreadnode-renovate-bot dreadnode-renovate-bot bot force-pushed the renovate/major-loader-deps-major branch 3 times, most recently from 07525d6 to 3ac3e72 Compare March 1, 2026 00:53
| datasource | package                | from   | to    |
| ---------- | ---------------------- | ------ | ----- |
| docker     | nvcr.io/nvidia/pytorch | 24.12  | 26.02 |
| pypi       | psutil                 | 6.1.1  | 7.2.2 |
| pypi       | transformers           | 4.57.6 | 5.3.0 |
@dreadnode-renovate-bot dreadnode-renovate-bot bot force-pushed the renovate/major-loader-deps-major branch from 3ac3e72 to 4daa5d1 Compare March 8, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/digest Dependency digest updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants