44
55llm-export是一个llm模型导出工具,能够将llm模型导出为onnx和mnn模型。
66
7- - 🚀 均完成` onnxruntime ` 正确性测试
87- 🚀 优化原始代码,支持动态形状
98- 🚀 优化原始代码,减少常量部分
109- 🚀 使用[ OnnxSlim] ( https://github.com/inisis/OnnxSlim ) 优化onnx模型,性能提升约5%; by [ @inisis ] ( https://github.com/inisis )
1110- 🚀 支持将lora权重导出为onnx和mnn
1211- 🚀 Onnx推理代码[ OnnxLLM] ( https://github.com/inisis/OnnxLLM )
1312
14- ## 模型支持与下载
15- - [ ![ Download] [ download-chatglm-6b-onnx ]] [ release-chatglm-6b-onnx ]
16- - [ ![ Download] [ download-chatglm2-6b-onnx ]] [ release-chatglm2-6b-onnx ]
17- - [ ![ Download] [ download-chatglm3-6b-onnx ]] [ release-chatglm3-6b-onnx ]
18- - [ ![ Download] [ download-codegeex2-6b-onnx ]] [ release-codegeex2-6b-onnx ]
19- - [ ![ Download] [ download-qwen-7b-chat-onnx ]] [ release-qwen-7b-chat-onnx ]
20- - [ ![ Download] [ download-baichuan2-7b-chat-onnx ]] [ release-baichuan2-7b-chat-onnx ]
21- - [ ![ Download] [ download-llama2-7b-chat-onnx ]] [ release-llama2-7b-chat-onnx ]
22- - [ ![ Download] [ download-qwen-1.8b-chat-onnx ]] [ release-qwen-1.8b-chat-onnx ]
23- - [ ![ Download] [ download-phi-2-onnx ]] [ release-phi-2-onnx ]
24- - [ ![ Download] [ download-internlm-7b-onnx ]] [ release-internlm-7b-onnx ]
25- - [ ![ Download] [ download-qwen-vl-onnx ]] [ release-qwen-vl-onnx ]
26- - [ ![ Download] [ download-bge-large-zh-onnx ]] [ release-bge-large-zh-onnx ]
27- - [ ![ Download] [ download-tinyllama-1.1b-chat-onnx ]] [ release-tinyllama-1.1b-chat-onnx ]
28- - [ ![ Download] [ download-yi-6b-chat-onnx ]] [ release-yi-6b-chat-onnx ]
29- - [ ![ Download] [ download-deepseek-7b-chat-onnx ]] [ release-deepseek-7b-chat-onnx ]
30- - [ ![ Download] [ download-qwen1.5-0.5b-chat-onnx ]] [ release-qwen1.5-0.5b-chat-onnx ]
31- - [ ![ Download] [ download-qwen1.5-1.8b-chat-onnx ]] [ release-qwen1.5-1.8b-chat-onnx ]
32- - [ ![ Download] [ download-qwen1.5-4b-chat-onnx ]] [ release-qwen1.5-4b-chat-onnx ]
33- - [ ![ Download] [ download-qwen1.5-7b-chat-onnx ]] [ release-qwen1.5-7b-chat-onnx ]
34- - [ ![ Download] [ download-llama3-8b-instruct-onnx ]] [ release-llama3-8b-instruct-onnx ]
35-
36- [ download-chatglm-6b-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm-6b-onnx/total
37- [ download-chatglm2-6b-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm2-6b-onnx/total
38- [ download-chatglm3-6b-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm3-6b-onnx/total
39- [ download-codegeex2-6b-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/codegeex2-6b-onnx/total
40- [ download-qwen-7b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-7b-chat-onnx/total
41- [ download-baichuan2-7b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/baichuan2-7b-chat-onnx/total
42- [ download-llama2-7b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/llama2-7b-chat-onnx/total
43- [ download-qwen-1.8b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-1.8b-onnx/total
44- [ download-phi-2-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/phi-2-onnx/total
45- [ download-internlm-7b-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/internlm-7b-onnx/total
46- [ download-qwen-vl-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-vl-onnx/total
47- [ download-bge-large-zh-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/bge-large-zh-onnx/total
48- [ download-tinyllama-1.1b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/tinyllama-1.1b-chat-onnx/total
49- [ download-yi-6b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/yi-6b-chat-onnx/total
50- [ download-deepseek-7b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/deepseek-7b-chat-onnx/total
51- [ download-qwen1.5-0.5b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-0.5b-chat-onnx/total
52- [ download-qwen1.5-1.8b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-1.8b-chat-onnx/total
53- [ download-qwen1.5-4b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-4b-chat-onnx/total
54- [ download-qwen1.5-7b-chat-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-7b-chat-onnx/total
55- [ download-llama3-8b-instruct-onnx ] : https://img.shields.io/github/downloads/wangzhaode/llm-export/llama3-8b-instruct-onnx/total
56- [ release-chatglm-6b-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/chatglm-6b-onnx
57- [ release-chatglm2-6b-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/chatglm2-6b-onnx
58- [ release-chatglm3-6b-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/chatglm3-6b-onnx
59- [ release-codegeex2-6b-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/codegeex2-6b-onnx
60- [ release-qwen-7b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen-7b-chat-onnx
61- [ release-baichuan2-7b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/baichuan2-7b-chat-onnx
62- [ release-llama2-7b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/llama2-7b-chat-onnx
63- [ release-qwen-1.8b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen-1.8b-onnx
64- [ release-phi-2-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/phi-2-onnx
65- [ release-internlm-7b-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/internlm-7b-onnx
66- [ release-qwen-vl-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen-vl-onnx
67- [ release-bge-large-zh-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/bge-large-zh-onnx
68- [ release-tinyllama-1.1b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/tinyllama-1.1b-chat-onnx
69- [ release-yi-6b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/yi-6b-chat-onnx
70- [ release-deepseek-7b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/deepseek-7b-chat-onnx
71- [ release-qwen1.5-0.5b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-0.5b-chat-onnx
72- [ release-qwen1.5-1.8b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-1.8b-chat-onnx
73- [ release-qwen1.5-4b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-4b-chat-onnx
74- [ release-qwen1.5-7b-chat-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-7b-chat-onnx
75- [ release-llama3-8b-instruct-onnx ] : https://github.com/wangzhaode/llm-export/releases/tag/llama3-8b-instruct-onnx
76-
7713## 用法
78141 . 将该项目clone到本地
7915``` sh
@@ -85,42 +21,24 @@ git clone https://huggingface.co/THUDM/chatglm2-6b
8521# 如果huggingface下载慢可以使用modelscope
8622git clone https://modelscope.cn/ZhipuAI/chatglm2-6b.git
8723```
88- 3 . 执行LLMExporter导出模型
24+ 3 . 导出模型
8925``` sh
9026cd mnn-llm
91- # 将chatglm2-6b分为embedding, blocks, lm分别导出为onnx并转换为mnn, 并导出tokenizer.txt
92- python llm_export.py \
93- --path ../chatglm2-6b \
94- --export_split \
95- --export_token \
96- --export_mnn \
97- --onnx_path ./chatglm2-6b-onnx \
98- --mnn_path ./chatglm2-6b-mnn
27+ # 将chatglm2-6b导出为onnx模型
28+ python llm_export.py --path ../chatglm2-6b --export onnx
29+ # 将chatglm2-6b导出为mnn模型, 量化参数为4bit, blokc-wise = 128
30+ python llm_export.py --path ../chatglm2-6b --export mnn --quant_bit 4 --quant_block 128
9931```
10032
10133## 功能
102- - 支持将模型完整导出为一个onnx模型,使用` --export `
103- - 支持将模型分段导出为多个模型,使用` --export_split `
104- - 支持导出模型的词表到一个文本文件,每行代表一个token;其中token使用base64编码;使用` --export_verbose `
105- - 支持导出模型的Embedding层为一个onnx模型,使用` --export_embed ` ,同时支持bf16格式,使用` --embed_bf16 `
106- - 支持分层导出模型的block,使用` --export_blocks ` 导出全部层;使用` --export_block $id ` 导出指定层
107- - 支持导出模型的lm_head层为一个onnx模型,使用` --export_lm `
108- - 支持导出多模态模型的visual模型为一个onnx模型,使用` --export_visual `
34+ - 支持将模型为onnx或mnn模型,使用` --export onnx ` 或` --export mnn `
10935- 支持对模型进行对话测试,使用` --test $query ` 会返回llm的回复内容
110- - 支持在导出onnx模型后使用onnxruntime对结果一致性进行校验,使用` --export_test `
111- - 支持将tokenizer导出为文本文件,使用` --export_token `
112- - 支持将导出的onnx模型转换为mnn模型,默认转换为非对称4bit量化,使用` --export_mnn `
113- - 指定导出路径使用` --onnx_path ` 和` --mnn_path `
11436- 默认会使用onnx-slim对onnx模型进行优化,跳过该步骤使用` --skip_slim `
11537- 支持合并lora权重后导出,指定lora权重的目录使用` --lora_path `
11638
11739## 参数
11840```
119- usage: llm_export.py [-h] --path PATH
120- [--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-1_8B,Qwen-VL-Chat,Qwen1_5-0_5B-Chat,Qwen1_5-1_8B-Chat,Qwen1_5-4B-Chat,Qwen1_5-7B-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,Llama-3-8B-Instruct,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh,lora}]
121- [--lora_path LORA_PATH] [--onnx_path ONNX_PATH] [--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export]
122- [--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm] [--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bin]
123- [--embed_bf16] [--skip_slim]
41+ usage: llm_export.py [-h] --path PATH [--type TYPE] [--lora_path LORA_PATH] [--dst_path DST_PATH] [--test TEST] [--export EXPORT] [--skip_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK]
12442
12543llm_exporter
12644
@@ -130,31 +48,16 @@ optional arguments:
13048 Can be either:
13149 - A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
13250 - A path to a *directory* clone from repo like `../chatglm-6b`.
133- --type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-1_8B,Qwen-VL-Chat,Qwen1_5-0_5B-Chat,Qwen1_5-1_8B-Chat,Qwen1_5-4B-Chat,Qwen1_5-7B-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,Llama-3-8B-Instruct,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh,lora}
134- type(`str`, *optional*):
51+ --type TYPE type(`str`, *optional*):
13552 The pretrain llm model type.
13653 --lora_path LORA_PATH
13754 lora path, defaut is `None` mean not apply lora.
138- --onnx_path ONNX_PATH
139- export onnx model path, defaut is `./onnx`.
140- --mnn_path MNN_PATH export mnn model path, defaut is `./mnn`.
141- --export_mnn Whether or not to export mnn model after onnx.
142- --export_verbose Whether or not to export onnx with verbose.
143- --export_test Whether or not to export onnx with test using onnxruntime.
55+ --dst_path DST_PATH export onnx/mnn model to path, defaut is `./model`.
14456 --test TEST test model inference with query `TEST`.
145- --export export model to an `onnx` model.
146- --export_split export model split to some `onnx` models:
147- - embedding model.
148- - block models.
149- - lm_head model.
150- --export_token export llm tokenizer to a txt file.
151- --export_embed export llm embedding to an `onnx` model.
152- --export_visual export llm visual model to an `onnx` model.
153- --export_lm export llm lm_head to an `onnx` model.
154- --export_block EXPORT_BLOCK
155- export llm block [id] to an `onnx` model.
156- --export_blocks export llm all blocks to `onnx` models.
157- --embed_bin export embedding weight as bin file with dtype `bfloat16`
158- --embed_bf16 using `bfloat16` replace `float32` in embedding.
57+ --export EXPORT export model to an onnx/mnn model.
15958 --skip_slim Whether or not to skip onnx-slim.
59+ --quant_bit QUANT_BIT
60+ mnn quant bit, 4 or 8, default is 4.
61+ --quant_block QUANT_BLOCK
62+ mnn quant block, default is 0 mean channle-wise.
16063```
0 commit comments