Skip to content

Commit e2edef4

Browse files
committed
[refactor] refactor mnn export.
1 parent b25ca41 commit e2edef4

File tree

4 files changed

+598
-469
lines changed

4 files changed

+598
-469
lines changed

README.md

Lines changed: 14 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -4,76 +4,12 @@
44

55
llm-export是一个llm模型导出工具,能够将llm模型导出为onnx和mnn模型。
66

7-
- 🚀 均完成`onnxruntime`正确性测试
87
- 🚀 优化原始代码,支持动态形状
98
- 🚀 优化原始代码,减少常量部分
109
- 🚀 使用[OnnxSlim](https://github.com/inisis/OnnxSlim)优化onnx模型,性能提升约5%; by [@inisis](https://github.com/inisis)
1110
- 🚀 支持将lora权重导出为onnx和mnn
1211
- 🚀 Onnx推理代码[OnnxLLM](https://github.com/inisis/OnnxLLM)
1312

14-
## 模型支持与下载
15-
- [![Download][download-chatglm-6b-onnx]][release-chatglm-6b-onnx]
16-
- [![Download][download-chatglm2-6b-onnx]][release-chatglm2-6b-onnx]
17-
- [![Download][download-chatglm3-6b-onnx]][release-chatglm3-6b-onnx]
18-
- [![Download][download-codegeex2-6b-onnx]][release-codegeex2-6b-onnx]
19-
- [![Download][download-qwen-7b-chat-onnx]][release-qwen-7b-chat-onnx]
20-
- [![Download][download-baichuan2-7b-chat-onnx]][release-baichuan2-7b-chat-onnx]
21-
- [![Download][download-llama2-7b-chat-onnx]][release-llama2-7b-chat-onnx]
22-
- [![Download][download-qwen-1.8b-chat-onnx]][release-qwen-1.8b-chat-onnx]
23-
- [![Download][download-phi-2-onnx]][release-phi-2-onnx]
24-
- [![Download][download-internlm-7b-onnx]][release-internlm-7b-onnx]
25-
- [![Download][download-qwen-vl-onnx]][release-qwen-vl-onnx]
26-
- [![Download][download-bge-large-zh-onnx]][release-bge-large-zh-onnx]
27-
- [![Download][download-tinyllama-1.1b-chat-onnx]][release-tinyllama-1.1b-chat-onnx]
28-
- [![Download][download-yi-6b-chat-onnx]][release-yi-6b-chat-onnx]
29-
- [![Download][download-deepseek-7b-chat-onnx]][release-deepseek-7b-chat-onnx]
30-
- [![Download][download-qwen1.5-0.5b-chat-onnx]][release-qwen1.5-0.5b-chat-onnx]
31-
- [![Download][download-qwen1.5-1.8b-chat-onnx]][release-qwen1.5-1.8b-chat-onnx]
32-
- [![Download][download-qwen1.5-4b-chat-onnx]][release-qwen1.5-4b-chat-onnx]
33-
- [![Download][download-qwen1.5-7b-chat-onnx]][release-qwen1.5-7b-chat-onnx]
34-
- [![Download][download-llama3-8b-instruct-onnx]][release-llama3-8b-instruct-onnx]
35-
36-
[download-chatglm-6b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm-6b-onnx/total
37-
[download-chatglm2-6b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm2-6b-onnx/total
38-
[download-chatglm3-6b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/chatglm3-6b-onnx/total
39-
[download-codegeex2-6b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/codegeex2-6b-onnx/total
40-
[download-qwen-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-7b-chat-onnx/total
41-
[download-baichuan2-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/baichuan2-7b-chat-onnx/total
42-
[download-llama2-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/llama2-7b-chat-onnx/total
43-
[download-qwen-1.8b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-1.8b-onnx/total
44-
[download-phi-2-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/phi-2-onnx/total
45-
[download-internlm-7b-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/internlm-7b-onnx/total
46-
[download-qwen-vl-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen-vl-onnx/total
47-
[download-bge-large-zh-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/bge-large-zh-onnx/total
48-
[download-tinyllama-1.1b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/tinyllama-1.1b-chat-onnx/total
49-
[download-yi-6b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/yi-6b-chat-onnx/total
50-
[download-deepseek-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/deepseek-7b-chat-onnx/total
51-
[download-qwen1.5-0.5b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-0.5b-chat-onnx/total
52-
[download-qwen1.5-1.8b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-1.8b-chat-onnx/total
53-
[download-qwen1.5-4b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-4b-chat-onnx/total
54-
[download-qwen1.5-7b-chat-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/qwen1.5-7b-chat-onnx/total
55-
[download-llama3-8b-instruct-onnx]: https://img.shields.io/github/downloads/wangzhaode/llm-export/llama3-8b-instruct-onnx/total
56-
[release-chatglm-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/chatglm-6b-onnx
57-
[release-chatglm2-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/chatglm2-6b-onnx
58-
[release-chatglm3-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/chatglm3-6b-onnx
59-
[release-codegeex2-6b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/codegeex2-6b-onnx
60-
[release-qwen-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen-7b-chat-onnx
61-
[release-baichuan2-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/baichuan2-7b-chat-onnx
62-
[release-llama2-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/llama2-7b-chat-onnx
63-
[release-qwen-1.8b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen-1.8b-onnx
64-
[release-phi-2-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/phi-2-onnx
65-
[release-internlm-7b-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/internlm-7b-onnx
66-
[release-qwen-vl-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen-vl-onnx
67-
[release-bge-large-zh-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/bge-large-zh-onnx
68-
[release-tinyllama-1.1b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/tinyllama-1.1b-chat-onnx
69-
[release-yi-6b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/yi-6b-chat-onnx
70-
[release-deepseek-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/deepseek-7b-chat-onnx
71-
[release-qwen1.5-0.5b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-0.5b-chat-onnx
72-
[release-qwen1.5-1.8b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-1.8b-chat-onnx
73-
[release-qwen1.5-4b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-4b-chat-onnx
74-
[release-qwen1.5-7b-chat-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/qwen1.5-7b-chat-onnx
75-
[release-llama3-8b-instruct-onnx]: https://github.com/wangzhaode/llm-export/releases/tag/llama3-8b-instruct-onnx
76-
7713
## 用法
7814
1. 将该项目clone到本地
7915
```sh
@@ -85,42 +21,24 @@ git clone https://huggingface.co/THUDM/chatglm2-6b
8521
# 如果huggingface下载慢可以使用modelscope
8622
git clone https://modelscope.cn/ZhipuAI/chatglm2-6b.git
8723
```
88-
3. 执行LLMExporter导出模型
24+
3. 导出模型
8925
```sh
9026
cd mnn-llm
91-
# 将chatglm2-6b分为embedding, blocks, lm分别导出为onnx并转换为mnn, 并导出tokenizer.txt
92-
python llm_export.py \
93-
--path ../chatglm2-6b \
94-
--export_split \
95-
--export_token \
96-
--export_mnn \
97-
--onnx_path ./chatglm2-6b-onnx \
98-
--mnn_path ./chatglm2-6b-mnn
27+
# 将chatglm2-6b导出为onnx模型
28+
python llm_export.py --path ../chatglm2-6b --export onnx
29+
# 将chatglm2-6b导出为mnn模型, 量化参数为4bit, blokc-wise = 128
30+
python llm_export.py --path ../chatglm2-6b --export mnn --quant_bit 4 --quant_block 128
9931
```
10032

10133
## 功能
102-
- 支持将模型完整导出为一个onnx模型,使用`--export`
103-
- 支持将模型分段导出为多个模型,使用`--export_split`
104-
- 支持导出模型的词表到一个文本文件,每行代表一个token;其中token使用base64编码;使用`--export_verbose`
105-
- 支持导出模型的Embedding层为一个onnx模型,使用`--export_embed`,同时支持bf16格式,使用`--embed_bf16`
106-
- 支持分层导出模型的block,使用`--export_blocks`导出全部层;使用`--export_block $id`导出指定层
107-
- 支持导出模型的lm_head层为一个onnx模型,使用`--export_lm`
108-
- 支持导出多模态模型的visual模型为一个onnx模型,使用`--export_visual`
34+
- 支持将模型为onnx或mnn模型,使用`--export onnx``--export mnn`
10935
- 支持对模型进行对话测试,使用`--test $query`会返回llm的回复内容
110-
- 支持在导出onnx模型后使用onnxruntime对结果一致性进行校验,使用`--export_test`
111-
- 支持将tokenizer导出为文本文件,使用`--export_token`
112-
- 支持将导出的onnx模型转换为mnn模型,默认转换为非对称4bit量化,使用`--export_mnn`
113-
- 指定导出路径使用`--onnx_path``--mnn_path`
11436
- 默认会使用onnx-slim对onnx模型进行优化,跳过该步骤使用`--skip_slim`
11537
- 支持合并lora权重后导出,指定lora权重的目录使用`--lora_path`
11638

11739
## 参数
11840
```
119-
usage: llm_export.py [-h] --path PATH
120-
[--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-1_8B,Qwen-VL-Chat,Qwen1_5-0_5B-Chat,Qwen1_5-1_8B-Chat,Qwen1_5-4B-Chat,Qwen1_5-7B-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,Llama-3-8B-Instruct,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh,lora}]
121-
[--lora_path LORA_PATH] [--onnx_path ONNX_PATH] [--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export]
122-
[--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm] [--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bin]
123-
[--embed_bf16] [--skip_slim]
41+
usage: llm_export.py [-h] --path PATH [--type TYPE] [--lora_path LORA_PATH] [--dst_path DST_PATH] [--test TEST] [--export EXPORT] [--skip_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK]
12442
12543
llm_exporter
12644
@@ -130,31 +48,16 @@ optional arguments:
13048
Can be either:
13149
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
13250
- A path to a *directory* clone from repo like `../chatglm-6b`.
133-
--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-1_8B,Qwen-VL-Chat,Qwen1_5-0_5B-Chat,Qwen1_5-1_8B-Chat,Qwen1_5-4B-Chat,Qwen1_5-7B-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,Llama-3-8B-Instruct,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh,lora}
134-
type(`str`, *optional*):
51+
--type TYPE type(`str`, *optional*):
13552
The pretrain llm model type.
13653
--lora_path LORA_PATH
13754
lora path, defaut is `None` mean not apply lora.
138-
--onnx_path ONNX_PATH
139-
export onnx model path, defaut is `./onnx`.
140-
--mnn_path MNN_PATH export mnn model path, defaut is `./mnn`.
141-
--export_mnn Whether or not to export mnn model after onnx.
142-
--export_verbose Whether or not to export onnx with verbose.
143-
--export_test Whether or not to export onnx with test using onnxruntime.
55+
--dst_path DST_PATH export onnx/mnn model to path, defaut is `./model`.
14456
--test TEST test model inference with query `TEST`.
145-
--export export model to an `onnx` model.
146-
--export_split export model split to some `onnx` models:
147-
- embedding model.
148-
- block models.
149-
- lm_head model.
150-
--export_token export llm tokenizer to a txt file.
151-
--export_embed export llm embedding to an `onnx` model.
152-
--export_visual export llm visual model to an `onnx` model.
153-
--export_lm export llm lm_head to an `onnx` model.
154-
--export_block EXPORT_BLOCK
155-
export llm block [id] to an `onnx` model.
156-
--export_blocks export llm all blocks to `onnx` models.
157-
--embed_bin export embedding weight as bin file with dtype `bfloat16`
158-
--embed_bf16 using `bfloat16` replace `float32` in embedding.
57+
--export EXPORT export model to an onnx/mnn model.
15958
--skip_slim Whether or not to skip onnx-slim.
59+
--quant_bit QUANT_BIT
60+
mnn quant bit, 4 or 8, default is 4.
61+
--quant_block QUANT_BLOCK
62+
mnn quant block, default is 0 mean channle-wise.
16063
```

README_en.md

Lines changed: 23 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
[中文](./README_en.md)
44

55
llm-export is a tool for exporting llm models, capable of converting llm models into ONNX or MNN models.
6-
- 🚀 All passed `onnxruntime` correctness tests
76
- 🚀 Optimized the original code to support dynamic shapes
87
- 🚀 Optimized the original code to reduce the constant portion
98
- 🚀 Using [OnnxSlim](https://github.com/inisis/OnnxSlim) slim onnx model,speed up 5%; by [@inisis](https://github.com/inisis)
@@ -23,71 +22,43 @@ git clone https://huggingface.co/THUDM/chatglm2-6b
2322
# If downloading from Hugging Face is slow, you can use ModelScope
2423
git clone https://modelscope.cn/ZhipuAI/chatglm2-6b.git
2524
```
26-
3. Execute LLMExporter to export the model
25+
3. export the model
2726
```sh
2827
cd mnn-llm
29-
# Divide chatglm2-6b into embedding, blocks, lm, export each as ONNX and convert to MNN, and also export tokenizer.txt
30-
python llm_export.py \
31-
--path ../chatglm2-6b \
32-
--export_split \
33-
--export_token \
34-
--export_mnn \
35-
--onnx_path ./chatglm2-6b-onnx \
36-
--mnn_path ./chatglm2-6b-mnn
28+
cd mnn-llm
29+
# export chatglm2-6b to onnx
30+
python llm_export.py --path ../chatglm2-6b --export onnx
31+
# export chatglm2-6b to mnn and quant
32+
python llm_export.py --path ../chatglm2-6b --export mnn --quant_bit 4 --quant_block 128
3733
```
3834

3935
## Features
40-
- Supports exporting the entire model as a single ONNX model, use --export
41-
- Supports exporting the model in segments as multiple models, use --export_split
42-
- Supports exporting the model's vocabulary to a text file, each line representing a token; tokens are encoded using base64, use --export_verbose
43-
- Supports exporting the model's Embedding layer as an ONNX model, use --export_embed, also supports bf16 format, use --embed_bf16
44-
- Supports layered export of the model's blocks, use --export_blocks to export all layers; use --export_block $id to export a specified layer
45-
- Supports exporting the model's lm_head layer as an ONNX model, use --export_lm
46-
- Supports exporting the VL model's visual model as an ONNX model, use --export_visual
47-
- Supports conducting a dialogue test on the model, using --test $query will return the llm's response
48-
- Supports verifying the consistency of results using onnxruntime after exporting the ONNX model, use --export_test
49-
- Supports exporting the tokenizer as a text file, use --export_token
50-
- Supports converting the exported ONNX model to an MNN model, with default conversion to non-symmetric 4bit quantization, use --export_mnn
51-
- Specify export paths using --onnx_path and --mnn_path
52-
- Default using onnx-slim, skip using --skip_slim
36+
- Supports exporting the entire model as a onnx model or mnn model, use `--export onnx/mnn`
37+
- Default using onnx-slim, skip using `--skip_slim`
38+
- Support merge lora.
5339

5440
## Commad Args
5541
```
56-
usage: llm_export.py [-h] --path PATH
57-
[--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh}]
58-
[--onnx_path ONNX_PATH] [--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export] [--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm]
59-
[--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bf16] [--skip_slim]
42+
usage: llm_export.py [-h] --path PATH [--type TYPE] [--lora_path LORA_PATH] [--dst_path DST_PATH] [--test TEST] [--export EXPORT] [--skip_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK]
6043
6144
llm_exporter
6245
6346
optional arguments:
6447
-h, --help show this help message and exit
6548
--path PATH path(`str` or `os.PathLike`):
6649
Can be either:
67-
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
68-
- A path to a *directory* clone from repo like `../chatglm-6b`.
69-
--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh}
70-
type(`str`, *optional*):
71-
The pretrain llm model type.
72-
--onnx_path ONNX_PATH
73-
export onnx model path, defaut is `./onnx`.
74-
--mnn_path MNN_PATH export mnn model path, defaut is `./mnn`.
75-
--export_mnn Whether or not to export mnn model after onnx.
76-
--export_verbose Whether or not to export onnx with verbose.
77-
--export_test Whether or not to export onnx with test using onnxruntime.
50+
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
51+
- A path to a *directory* clone from repo like `../chatglm-6b`.
52+
--type TYPE type(`str`, *optional*):
53+
The pretrain llm model type.
54+
--lora_path LORA_PATH
55+
lora path, defaut is `None` mean not apply lora.
56+
--dst_path DST_PATH export onnx/mnn model to path, defaut is `./model`.
7857
--test TEST test model inference with query `TEST`.
79-
--export export model to an `onnx` model.
80-
--export_split export model split to some `onnx` models:
81-
- embedding model.
82-
- block models.
83-
- lm_head model.
84-
--export_token export llm tokenizer to a txt file.
85-
--export_embed export llm embedding to an `onnx` model.
86-
--export_visual export llm visual model to an `onnx` model.
87-
--export_lm export llm lm_head to an `onnx` model.
88-
--export_block EXPORT_BLOCK
89-
export llm block [id] to an `onnx` model.
90-
--export_blocks export llm all blocks to `onnx` models.
91-
--embed_bf16 using `bfloat16` replace `float32` in embedding.
58+
--export EXPORT export model to an onnx/mnn model.
9259
--skip_slim Whether or not to skip onnx-slim.
93-
```
60+
--quant_bit QUANT_BIT
61+
mnn quant bit, 4 or 8, default is 4.
62+
--quant_block QUANT_BLOCK
63+
mnn quant block, default is 0 mean channle-wise.
64+
```

0 commit comments

Comments
 (0)