Skip to content

Latest commit

 

History

History
165 lines (117 loc) · 10.8 KB

File metadata and controls

165 lines (117 loc) · 10.8 KB

OpenMOSS 研究方向代表性论文

AI System(AI 系统)

  1. Support SpeechGPT 2.0-preview: A GPT-4o-level Real-Time Spoken Dialogue System
    Hanfu Chen, Ke Chen, Qinyuan Cheng, Mingshu Chen, Ruifan Deng, Liwei Fan, Zhaoye Fei, QingHui Gao, Yitian Gong, Ching Wing Kwok, Kexin Huang, Yaozhou Jiang, Xingyu Lu, Shimin Li, Zhengyuan Lin, Ruixiao Li, Qian Tu, Jin Wang, Yang Wang, Siyin Wang, Zhe Xu, Chenchen Yang, Donghua Yu, Yuqian Yao, Yucheng Yuan, Chufan Yu, Dong Zhang, YiWei Zhao, Yuqian Zhang, Jun Zhan, Xin Zhang, Xingjian Zhao, Chengyang Zhu 2025. [GitHub](https://github.com/OpenMOSS/SpeechGPT-2.0-preview)

  2. Support MOSS-TTSD: Zero-Shot Multi-Speaker Dialogue Speech Synthesis
    Cheng Chang, Ke Chen, Mingshu Chen, Qinyuan Cheng, Ruifan Deng, Liwei Fan, Zhaoye Fei, Qinghui Gao, Yitian Gong, Kexin Huang, Botian Jiang, Yaozhou Jiang, Luozhijie Jin, Ruixiao Li, Shimin Li, Zhengyuan Lin, Xipeng Qiu, Qian Tu, Jin Wang, Ruiming Wang, Wenxuan Wang, Yang Wang, Chenchen Yang, Zhe Xu, Yucheng Yuan, Donghua Yu, Jun Zhan, Dong Zhang, Wenbo Zhang, Xin Zhang, Yuqian Zhang, Yiwei Zhao, Xingjian Zhao 2025. [GitHub](https://github.com/OpenMOSS/MOSS-TTSD)

  3. Support MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
    Hanfu Chen, Ke Chen, Mingshu Chen, Qinyuan Cheng, Zhaoye Fei, Qinghui Gao, Yang Gao, Yitian Gong, Xuanjing Huang, Yaozhou Jiang, Luozhijie Jin, Ruixiao Li, Xipeng Qiu, Ruiming Wang, Yang Wang, Yuanfan Xu, Xiaogui Yang, Zhe Xu, Donghua Yu, Wenbo Zhang, Yiyang Zhang, Xingjian Zhao, Yaqian Zhou 2025. [ArXiv](https://arxiv.org/abs/2510.00499)

  4. Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
    Tao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Lixing Shen, Zhan Chen, Xipeng Qiu, Qi Zhang, Tao Gui
    ACL, 2025. [ArXiv](https://arxiv.org/abs/2502.14837)


Multimodal Foundation Models(多模态基础模型)

  1. SpeechGPT 2.0-preview: A GPT-4o-level Real-Time Spoken Dialogue System
    Hanfu Chen, Ke Chen, Qinyuan Cheng, Mingshu Chen, Ruifan Deng, Liwei Fan, Zhaoye Fei, QingHui Gao, Yitian Gong, Ching Wing Kwok, Kexin Huang, Yaozhou Jiang, Xingyu Lu, Shimin Li, Zhengyuan Lin, Ruixiao Li, Qian Tu, Jin Wang, Yang Wang, Siyin Wang, Zhe Xu, Chenchen Yang, Donghua Yu, Yuqian Yao, Yucheng Yuan, Chufan Yu, Dong Zhang, YiWei Zhao, Yuqian Zhang, Jun Zhan, Xin Zhang, Xingjian Zhao, Chengyang Zhu 2025. [GitHub](https://github.com/OpenMOSS/SpeechGPT-2.0-preview)

  2. MOSS-TTSD: Zero-Shot Multi-Speaker Dialogue Speech Synthesis
    Cheng Chang, Ke Chen, Mingshu Chen, Qinyuan Cheng, Ruifan Deng, Liwei Fan, Zhaoye Fei, Qinghui Gao, Yitian Gong, Kexin Huang, Botian Jiang, Yaozhou Jiang, Luozhijie Jin, Ruixiao Li, Shimin Li, Zhengyuan Lin, Xipeng Qiu, Qian Tu, Jin Wang, Ruiming Wang, Wenxuan Wang, Yang Wang, Chenchen Yang, Zhe Xu, Yucheng Yuan, Donghua Yu, Jun Zhan, Dong Zhang, Wenbo Zhang, Xin Zhang, Yuqian Zhang, Yiwei Zhao, Xingjian Zhao 2025. [GitHub](https://github.com/OpenMOSS/MOSS-TTSD)

  3. MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
    Hanfu Chen, Ke Chen, Mingshu Chen, Qinyuan Cheng, Zhaoye Fei, Qinghui Gao, Yang Gao, Yitian Gong, Xuanjing Huang, Yaozhou Jiang, Luozhijie Jin, Ruixiao Li, Xipeng Qiu, Ruiming Wang, Yang Wang, Yuanfan Xu, Xiaogui Yang, Zhe Xu, Donghua Yu, Wenbo Zhang, Yiyang Zhang, Xingjian Zhao, Yaqian Zhou 2025. [ArXiv](https://arxiv.org/abs/2510.00499)

  4. InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems
    Kexin Huang, Qian Tu, Liwei Fan, Chenchen Yang, Dong Zhang, Shimin Li, Zhaoye Fei, Qinyuan Cheng, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2506.16381)

  5. XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs
    Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2506.23325)

  6. CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
    Jinlan Fu, Shenzhen Huangfu, Hao Fei, Xiaoyu Shen, Bryan Hooi, Xipeng Qiu, See-Kiong Ng
    ICLR, 2025. [ArXiv](https://arxiv.org/abs/2501.16629)

  7. Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Models
    Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang
    NAACL, 2024. [ArXiv](https://arxiv.org/abs/2406.15279)

  8. VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
    Yikun Wang, Siyin Wang, Qinyuan Cheng, Zhaoye Fei, Liang Ding, Qipeng Guo, D. Tao, Xipeng Qiu
    ACL, 2025. [ArXiv](https://arxiv.org/abs/2504.09130)

  9. RoboOmni: Proactive Robot Manipulation in Omni-modal Context
    Siyin Wang, Jinlan Fu, Feihong Liu, Xinzhe He, Huangxuan Wu, Junhao Shi, Kexin Huang, Zhaoye Fei, Jingjing Gong, Zuxuan Wu, Yu-Gang Jiang, See-Kiong Ng, Tat-Seng Chua, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2510.23763)


Agents & Reinforcement Learning(智能体与强化学习)

  1. Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
    Bo Wang, Qinyuan Cheng, Runyu Peng, Rong Bao, Peiji Li, Qipeng Guo, Linyang Li, Zhiyuan Zeng, Yunhua Zhou, Xipeng Qiu
    NeurIPS, 2025. [ArXiv](https://arxiv.org/abs/2507.00018)

  2. RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization
    Zeng Zhiyuan, Jiashuo Liu, Zhangyue Yin, Ge Zhang, Wenhao Huang, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2511.04285)

  3. Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
    Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Bo Wang, Shimin Li, Yunhua Zhou, Qipeng Guo, Xuanjing Huang, Xipeng Qiu 2024. [ArXiv](https://arxiv.org/abs/2412.14135)

  4. In-Memory Learning: A Declarative Learning Framework for Large Language Models
    Bo Wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu
    2024. [ArXiv](https://arxiv.org/abs/2403.02757)

  5. Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
    Jingqi Tong, Jixin Tang, Hangcheng Li, Yurong Mou, Ming Zhang, Jun Zhao, Yanbo Wen, Fan Song, Jiahao Zhan, Yuyang Lu, Chaoran Tao, Zhiyuan Guo, Jizhou Yu, Tianhao Cheng, Zhiheng Xi, Changhao Jiang, Zhangyue Yin, Yining Zheng, Weifeng Ge, Guanhua Chen, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang 2025. [ArXiv](https://arxiv.org/abs/2505.13886)


Embodied AI(具身智能)

  1. World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
    Siyin Wang, Zhaoye Fei, Qinyuan Cheng, Shiduo Zhang, Panpan Cai, Jinlan Fu, Xipeng Qiu
    ACL, 2025. [ArXiv](https://arxiv.org/abs/2503.10480)

  2. LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
    Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2510.13626)

  3. VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
    Shiduo Zhang, Zhe Xu, Peiju Liu, Xiaopeng Yu, Yuan Li, Qinghui Gao, Zhaoye Fei, Zhangyue Yin, Zuxuan Wu, Yu-Gang Jiang, Xipeng Qiu
    ICCV, 2025. [ArXiv](https://arxiv.org/abs/2412.18194)

  4. World-aware Planning Narratives Enhance Large Vision-Language Model Planner
    Junhao Shi, Zhaoye Fei, Siyin Wang, Qipeng Guo, Jingjing Gong, Xipeng Qiu
    NeurIPS, 2025. [ArXiv](https://arxiv.org/abs/2506.21230)

  5. Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
    Zhaoye Fei, Li Ji, Siyin Wang, Junhao Shi, Jingjing Gong, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2506.23127)

Aiology(模型可解释性)

  1. Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT
    Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu
    2024. [ArXiv](https://arxiv.org/abs/2402.12201)

  2. Automatically Identifying Local and Global Circuits with Linear Computation Graphs
    Xuyang Ge, Fukang Zhu, Wentao Shu, Junxuan Wang, Zhengfu He, Xipeng Qiu
    2024. [ArXiv](https://arxiv.org/abs/2405.13868)

  3. Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
    Junxuan Wang, Xuyang Ge, Wentao Shu, Qiong Tang, Yunhua Zhou, Zhengfu He, Xipeng Qiu
    ICLR, 2024. [ArXiv](https://arxiv.org/abs/2410.06672)

  4. Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders
    Zhengfu He, Wentao Shu, Xuyang Ge, Lingjie Chen, Junxuan Wang, Yunhua Zhou, Frances Liu, Qipeng Guo, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang, Xipeng Qiu
    2024. [ArXiv](https://arxiv.org/abs/2410.20526)

  5. Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
    Zhengfu He, Junxuan Wang, Rui Lin, Xuyang Ge, Wentao Shu, Qiong Tang, Junping Zhang, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2504.20938)

  6. Attention Layers Add Into Low-Dimensional Residual Subspaces
    Junxuan Wang, Xuyang Ge, Wentao Shu, Zhengfu He, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2508.16929)

  7. Evolution of Concepts in Language Model Pre-Training
    Xuyang Ge, Wentao Shu, Jiaxing Wu, Yunhua Zhou, Zhengfu He, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2509.17196)


New Architectures(新架构模型)

  1. DiRL: an Efficient Training Framework for Diffusion Language Models
    Ying Zhu, Jiaxin Wan, Tianyi Liang, Xu Guo, Xiaoran Liu, Zengfeng Huang, Ziwei He, Xipeng Qiu
    2025. [GitHub](https://github.com/OpenMOSS/DiRL)

  2. Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction
    Yuerong Song, Xiaoran Liu, Ruixiao Li, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2508.02558)

  3. LONGLLADA: Unlocking Long Context Capabilities in Diffusion LLMs
    Xiaoran Liu, Zhigeng Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2506.14429)

  4. Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache
    Xiaoran Liu, Siyang He, Qiqi Wang, Ruixiao Li, Yuerong Song, Zhigeng Liu, Linlin Li, Qun Liu, Zengfeng Huang, Qipeng Guo, Ziwei He, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2506.11886)

  5. Thus Spake Long-Context Large Language Model
    Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu
    2025. [ArXiv](https://arxiv.org/abs/2502.17129)