-
2026/04: SenseNova-U1 with NEO-unify Architecture (Technical Report 2026)
-
2026/02: NEO-unify: Building Native Multimodal Unified Models End to End (HuggingFace Blog 2026)
-
2025/12: From Pixels to Words -- Towards Native One-Vision Models at Scale (Arxiv 2026)
-
2025/09: From Pixels to Words -- Towards Native Vision-Language Primitives at Scale (ICLR 2026)
[2026/05] 🔥 The paper, weights, and evaluation code of NEO-ov are released !
[2025/12] 💥 NEO-ov has been completed !
[2026/01] 🔥 The training code of NEO is released !
[2025/10] 🔥 The paper, weights, and evaluation code of NEO are released !
[2025/09] 💥 NEO has been completed !
-
What constraints set native VLMs apart from modular ones, and to what extent can they be overcome?
-
How to make native VLMs more accessible and democratized, thereby accelerating their progress?
-
🔥 Native Architecture: NEO innovates a native VLM primitive that unifies pixel-word encoding, alignment, and reasoning within a dense, monolithic model architecture.
-
🔥 Superior Efficiency: With merely 390M image-text examples, NEO develops strong visual perception from scratch, rivaling top-tier modular VLMs and outperforming native ones.
-
🔥 Promising Roadmap: NEO pioneers a promising route for scalable and powerful native VLMs, paired with diverse reusable components that foster a cost-effective and extensible ecosystem.
If NEO series is helpful for your research, please consider star ⭐ and citation 📝 :
@article{Diao2025NEO,
title = {From Pixels to Words--Towards Native Vision-Language Primitives at Scale},
author = {Diao, Haiwen and Li, Mingxuan and Wu, Silei and Dai, Linjun and Wang, Xiaohua and Deng, Hanming and Lu, Lewei and Lin, Dahua and Liu, Ziwei},
journal = {arXiv preprint arXiv:2510.14979},
year = {2025}
}
@article{Diao2026NEOov,
title = {From Pixels to Words--Towards Native One-Vision Models at Scale},
author = {Diao, Haiwen and Wang, Jiahao and Wu, Penghao and Dong, Yuhao and Niu, Yuwei and Zhu, Yue and Cai, Zhongang and Fan, Weichen and Dai, Linjun and Wu, Silei and others},
journal = {arXiv preprint arXiv:2605.28820},
year = {2026}
}
@misc{sensenova2026neounify,
title = {NEO-unify: Building Native Multimodal Unified Models End to End},
author = {SenseNova},
journal = {Hugging Face blog},
url = {https://huggingface.co/blog/sensenova/neo-unify},
year = {2026}
}
@article{sensenova2026sensenovau1,
title = {SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture},
author = {Diao, Haiwen and Wu, Penghao and Deng, Hanming and Wang, Jiahao and Bai, Shihao and Wu, Silei and Fan, Weichen and Ye, Wenjie and Tong, Wenwen and Fan, Xiangyu and others},
journal = {arXiv preprint arXiv:2605.12500},
year = {2026}
}The content of this project itself is licensed under LICENSE.


