Skip to content

EvolvingLMMs-Lab/NEO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NEO Series: Native Vision-Language Models

📜 News

[2026/05] 🔥 The paper, weights, and evaluation code of NEO-ov are released !
[2025/12] 💥 NEO-ov has been completed !
[2026/01] 🔥 The training code of NEO is released !
[2025/10] 🔥 The paper, weights, and evaluation code of NEO are released !
[2025/09] 💥 NEO has been completed !

📋 Todo List

💡 Motivation

  • What constraints set native VLMs apart from modular ones, and to what extent can they be overcome?

  • How to make native VLMs more accessible and democratized, thereby accelerating their progress?

💡 Highlights

  • 🔥 Native Architecture: NEO innovates a native VLM primitive that unifies pixel-word encoding, alignment, and reasoning within a dense, monolithic model architecture.

  • 🔥 Superior Efficiency: With merely 390M image-text examples, NEO develops strong visual perception from scratch, rivaling top-tier modular VLMs and outperforming native ones.

  • 🔥 Promising Roadmap: NEO pioneers a promising route for scalable and powerful native VLMs, paired with diverse reusable components that foster a cost-effective and extensible ecosystem.

✒️ Citation

If NEO series is helpful for your research, please consider star ⭐ and citation 📝 :

@article{Diao2025NEO,
  title        = {From Pixels to Words--Towards Native Vision-Language Primitives at Scale},
  author       = {Diao, Haiwen and Li, Mingxuan and Wu, Silei and Dai, Linjun and Wang, Xiaohua and Deng, Hanming and Lu, Lewei and Lin, Dahua and Liu, Ziwei},
  journal      = {arXiv preprint arXiv:2510.14979},
  year         = {2025}
}

@article{Diao2026NEOov,
  title        = {From Pixels to Words--Towards Native One-Vision Models at Scale},
  author       = {Diao, Haiwen and Wang, Jiahao and Wu, Penghao and Dong, Yuhao and Niu, Yuwei and Zhu, Yue and Cai, Zhongang and Fan, Weichen and Dai, Linjun and Wu, Silei and others},
  journal      = {arXiv preprint arXiv:2605.28820},
  year         = {2026}
}

@misc{sensenova2026neounify,
  title        = {NEO-unify: Building Native Multimodal Unified Models End to End},
  author       = {SenseNova},
  journal      = {Hugging Face blog},
  url          = {https://huggingface.co/blog/sensenova/neo-unify},
  year         = {2026}
}

@article{sensenova2026sensenovau1,
  title        = {SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture},
  author       = {Diao, Haiwen and Wu, Penghao and Deng, Hanming and Wang, Jiahao and Bai, Shihao and Wu, Silei and Fan, Weichen and Ye, Wenjie and Tong, Wenwen and Fan, Xiangyu and others},
  journal      = {arXiv preprint arXiv:2605.12500},
  year         = {2026}
}

📄 License

The content of this project itself is licensed under LICENSE.