SkinTokens is a learned, compact, and discrete representation for skinning weights. Built on this representation, TokenRig is a unified autoregressive framework that models the entire rig, i.e., skeleton and skinning weights, as a single token sequence. Given an input 3D mesh, it generates a complete skeleton hierarchy and skin weights that can be directly imported into standard 3D pipelines for character animation and simulation.
SkinTokens is the successor to UniRig (SIGGRAPH '25). While UniRig uses separate stages for skeleton prediction and skinning, SkinTokens unifies both into a single autoregressive sequence via learned discrete skin tokens, yielding 98%–133% improvement in skinning accuracy and 17%–22% improvement in bone prediction over state-of-the-art baselines.
TokenRig takes a single 3D mesh as input and autoregressively produces a fully rigged asset — a coherent skeleton hierarchy plus dense per-vertex skinning weights — in a single unified sequence. Method in three stages:
- Learn SkinTokens — An FSQ-CVAE compresses sparse skinning weights into a compact discrete vocabulary.
- Unified Autoregressive Modeling — A Qwen3-0.6B-based Transformer generates the full rig (skeleton followed by SkinTokens) as one interleaved sequence.
- RL Refinement via GRPO — Tailored geometric and semantic rewards (volumetric joint coverage, bone-mesh containment, skinning sparsity, deformation smoothness) refine the model for out-of-distribution assets.
Qualitative comparison of skinning prediction. TokenRig produces clean, locally coherent influence maps that closely match the ground truth, while baselines suffer from "bleeding" artifacts across disconnected mesh parts.
See the project page for the full teaser video and additional qualitative comparisons (skeleton generation and impact of GRPO).
- Hardware: An NVIDIA GPU with at least 14 GB of memory is required for inference.
- Software:
- Python >= 3.11
- CUDA Toolkit >= 12.1
- uv is recommended for managing dependencies.
-
Clone the repo:
git clone https://github.com/VAST-AI-Research/SkinTokens.git cd SkinTokens -
Create a virtual environment and install PyTorch:
uv venv --python 3.11 source .venv/bin/activate uv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128[!NOTE] Adjust the CUDA version in the PyTorch index URL to match your driver. See PyTorch Get Started for other CUDA versions.
-
Install dependencies:
uv pip install -r requirements.txt
-
Install flash-attn:
uv pip install flash-attn --no-build-isolation
-
Download pretrained models:
python download.py --model
This downloads the TokenRig and FSQ-CVAE checkpoints to
experiments/, and the Qwen3-0.6B config tomodels/.
We provide the following pretrained models on Hugging Face:
| Model | Description | Download |
|---|---|---|
articulation_xl_quantization_256_token_4 |
TokenRig autoregressive rigging model, trained on ArticulationXL 2.0 + VRoid Hub + ModelsResource and refined with GRPO (recommended) | Download |
skin_vae_2_10_32768 |
FSQ-CVAE (SkinTokens) — skin-weight tokenizer used to encode and decode skinning weights | Download |
The easiest way to try TokenRig without any local setup is the hosted Hugging Face Space — upload a mesh and get a rigged result in the browser.
python demo.pyThen open http://127.0.0.1:1024 in your browser.
# Rig a single model
python demo.py --input examples/giraffe.glb --output results/giraffe.glb
# Rig with original texture and scale preserved
python demo.py --input examples/giraffe.glb --output results/giraffe.glb --use_transfer
# Skin a model using its existing skeleton
python demo.py --input examples/giraffe_skeleton.glb --output results/giraffe.glb --use_skeleton --use_transfer
# Batch process a directory
python demo.py --input examples/ --output results/ --use_transfer| Parameter | Default | Description |
|---|---|---|
--top_k |
5 | Top-k sampling |
--top_p |
0.95 | Top-p (nucleus) sampling |
--temperature |
1.0 | Sampling temperature |
--repetition_penalty |
2.0 | Repetition penalty |
--num_beams |
10 | Number of beams for beam search |
--use_skeleton |
False | Use existing skeleton (generate skin only) |
--use_transfer |
False | Transfer original texture and scale |
--use_postprocess |
False | Apply voxel-based skin postprocessing |
- Server fails to start: Make sure
http_proxy/https_proxyenvironment variables are unset or correctly configured. - Blender export issues: Remove the
glTF_not_exportednode when importing results into Blender.
- UniRig — the predecessor to this work.
- Qwen3 — the LLM architecture used by the TokenRig autoregressive backbone.
- Michelangelo — 3D shape encoder.
- 3DShape2VecSet — shape-representation backbone used by the FSQ-CVAE.
- FSQ — Finite Scalar Quantization, the discretization scheme behind SkinTokens.
- GRPO (DeepSeekMath) — the policy-optimization method used for RL refinement.
- Tripo — the 3D generative studio from Tripo, a broader context for this line of work.
We sincerely appreciate the contributions of these excellent projects and their authors. We believe open source helps accelerate research, lower barriers to innovation, and make progress more accessible to the broader community.
This project is licensed under the MIT License.
If you find this work helpful, please consider citing our paper:
@article{zhang2026skintokens,
title = {SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging},
author = {Zhang, Jia-Peng and Pu, Cheng-Feng and Guo, Meng-Hao and Cao, Yan-Pei and Hu, Shi-Min},
journal = {arXiv preprint arXiv:2602.04805},
year = {2026}
}

