NEO Series: Native Vision-Language Models

2026/04: SenseNova-U1 with NEO-unify Architecture (Technical Report 2026)
2026/02: NEO-unify: Building Native Multimodal Unified Models End to End (HuggingFace Blog 2026)
2025/12: From Pixels to Words -- Towards Native One-Vision Models at Scale (Arxiv 2026)
2025/09: From Pixels to Words -- Towards Native Vision-Language Primitives at Scale (ICLR 2026)

📜 News

[2026/05] 🔥 The paper, weights, and evaluation code of NEO-ov are released !
[2025/12] 💥 NEO-ov has been completed !
[2026/01] 🔥 The training code of NEO is released !
[2025/10] 🔥 The paper, weights, and evaluation code of NEO are released !
[2025/09] 💥 NEO has been completed !

📋 Todo List

💡 Motivation

What constraints set native VLMs apart from modular ones, and to what extent can they be overcome?
How to make native VLMs more accessible and democratized, thereby accelerating their progress?

💡 Highlights

🔥 Native Architecture: NEO innovates a native VLM primitive that unifies pixel-word encoding, alignment, and reasoning within a dense, monolithic model architecture.
🔥 Superior Efficiency: With merely 390M image-text examples, NEO develops strong visual perception from scratch, rivaling top-tier modular VLMs and outperforming native ones.
🔥 Promising Roadmap: NEO pioneers a promising route for scalable and powerful native VLMs, paired with diverse reusable components that foster a cost-effective and extensible ecosystem.

✒️ Citation

If NEO series is helpful for your research, please consider star ⭐ and citation 📝 :

@article{Diao2025NEO,
  title        = {From Pixels to Words--Towards Native Vision-Language Primitives at Scale},
  author       = {Diao, Haiwen and Li, Mingxuan and Wu, Silei and Dai, Linjun and Wang, Xiaohua and Deng, Hanming and Lu, Lewei and Lin, Dahua and Liu, Ziwei},
  journal      = {arXiv preprint arXiv:2510.14979},
  year         = {2025}
}

@article{Diao2026NEOov,
  title        = {From Pixels to Words--Towards Native One-Vision Models at Scale},
  author       = {Diao, Haiwen and Wang, Jiahao and Wu, Penghao and Dong, Yuhao and Niu, Yuwei and Zhu, Yue and Cai, Zhongang and Fan, Weichen and Dai, Linjun and Wu, Silei and others},
  journal      = {arXiv preprint arXiv:2605.28820},
  year         = {2026}
}

@misc{sensenova2026neounify,
  title        = {NEO-unify: Building Native Multimodal Unified Models End to End},
  author       = {SenseNova},
  journal      = {Hugging Face blog},
  url          = {https://huggingface.co/blog/sensenova/neo-unify},
  year         = {2026}
}

@article{sensenova2026sensenovau1,
  title        = {SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture},
  author       = {Diao, Haiwen and Wu, Penghao and Deng, Hanming and Wang, Jiahao and Bai, Shihao and Wu, Silei and Fan, Weichen and Ye, Wenjie and Tong, Wenwen and Fan, Xiangyu and others},
  journal      = {arXiv preprint arXiv:2605.12500},
  year         = {2026}
}

📄 License

The content of this project itself is licensed under LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
VLMEvalKit		VLMEvalKit
VLMEvalKit_ov		VLMEvalKit_ov
VLMTrainKit		VLMTrainKit
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NEO Series: Native Vision-Language Models

📜 News

📋 Todo List

💡 Motivation

💡 Highlights

✒️ Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NEO Series: Native Vision-Language Models

📜 News

📋 Todo List

💡 Motivation

💡 Highlights

✒️ Citation

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages