Skip to content

Commit a667117

Browse files
committed
open slim default.
1 parent 3bc99d7 commit a667117

File tree

4 files changed

+26
-23
lines changed

4 files changed

+26
-23
lines changed

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ llm-export是一个llm模型导出工具,能够将llm模型导出为onnx和mnn
77
- 🚀 均完成`onnxruntime`正确性测试
88
- 🚀 优化原始代码,支持动态形状
99
- 🚀 优化原始代码,减少常量部分
10+
- 🚀 使用[OnnxSlim](https://github.com/WeLoveAI/OnnxSlim)优化onnx模型,性能提升约5%; by [@inisis](https://github.com/inisis)
1011

1112

1213
## 模型支持与下载
@@ -94,12 +95,14 @@ python llm_export.py \
9495
- 支持将tokenizer导出为文本文件,使用`--export_token`
9596
- 支持将导出的onnx模型转换为mnn模型,默认转换为非对称4bit量化,使用`--export_mnn`
9697
- 指定导出路径使用`--onnx_path``--mnn_path`
98+
- 默认会使用onnx-slim对onnx模型进行优化,跳过该步骤使用`--skip_slim`
9799

98100
## 参数
99101
```
100-
usage: llm_export.py [-h] --path PATH [--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b}] [--onnx_path ONNX_PATH]
101-
[--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export] [--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm]
102-
[--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bf16]
102+
usage: llm_export.py [-h] --path PATH
103+
[--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh}]
104+
[--onnx_path ONNX_PATH] [--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export] [--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm]
105+
[--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bf16] [--skip_slim]
103106
104107
llm_exporter
105108
@@ -109,7 +112,7 @@ optional arguments:
109112
Can be either:
110113
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
111114
- A path to a *directory* clone from repo like `../chatglm-6b`.
112-
--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b}
115+
--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh}
113116
type(`str`, *optional*):
114117
The pretrain llm model type.
115118
--onnx_path ONNX_PATH
@@ -132,4 +135,5 @@ optional arguments:
132135
export llm block [id] to an `onnx` model.
133136
--export_blocks export llm all blocks to `onnx` models.
134137
--embed_bf16 using `bfloat16` replace `float32` in embedding.
138+
--skip_slim Whether or not to skip onnx-slim.
135139
```

README_en.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ llm-export is a tool for exporting llm models, capable of converting llm models
66
- 🚀 All passed `onnxruntime` correctness tests
77
- 🚀 Optimized the original code to support dynamic shapes
88
- 🚀 Optimized the original code to reduce the constant portion
9+
- 🚀 Using [OnnxSlim](https://github.com/WeLoveAI/OnnxSlim) slim onnx model,speed up 5%; by [@inisis](https://github.com/inisis)
910

1011

1112
## Model Support and Downloads
@@ -47,12 +48,14 @@ python llm_export.py \
4748
- Supports exporting the tokenizer as a text file, use --export_token
4849
- Supports converting the exported ONNX model to an MNN model, with default conversion to non-symmetric 4bit quantization, use --export_mnn
4950
- Specify export paths using --onnx_path and --mnn_path
51+
- Default using onnx-slim, skip using --skip_slim
5052

5153
## Commad Args
5254
```
53-
usage: llm_export.py [-h] --path PATH [--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b}] [--onnx_path ONNX_PATH]
54-
[--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export] [--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm]
55-
[--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bf16]
55+
usage: llm_export.py [-h] --path PATH
56+
[--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh}]
57+
[--onnx_path ONNX_PATH] [--mnn_path MNN_PATH] [--export_mnn] [--export_verbose] [--export_test] [--test TEST] [--export] [--export_split] [--export_token] [--export_embed] [--export_visual] [--export_lm]
58+
[--export_block EXPORT_BLOCK] [--export_blocks] [--embed_bf16] [--skip_slim]
5659
5760
llm_exporter
5861
@@ -62,7 +65,7 @@ optional arguments:
6265
Can be either:
6366
- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
6467
- A path to a *directory* clone from repo like `../chatglm-6b`.
65-
--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b}
68+
--type {chatglm-6b,chatglm2-6b,chatglm3-6b,codegeex2-6b,Qwen-7B-Chat,Qwen-1_8B-Chat,Qwen-VL-Chat,Baichuan2-7B-Chat,Llama-2-7b-chat-ms,internlm-chat-7b,TinyLlama-1_1B-Chat,Yi-6B-Chat,deepseek-llm-7b-chat,phi-2,bge-large-zh}
6669
type(`str`, *optional*):
6770
The pretrain llm model type.
6871
--onnx_path ONNX_PATH
@@ -85,4 +88,5 @@ optional arguments:
8588
export llm block [id] to an `onnx` model.
8689
--export_blocks export llm all blocks to `onnx` models.
8790
--embed_bf16 using `bfloat16` replace `float32` in embedding.
91+
--skip_slim Whether or not to skip onnx-slim.
8892
```

llm_export.py

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import argparse
66
import torch
77
import numpy as np
8+
from onnxslim import slim
89
import onnxruntime as ort
910
import _tools as MNNTools
1011
import sentencepiece as spm
@@ -83,7 +84,7 @@ def __init__(self, args):
8384
self.export_verbose = args.export_verbose
8485
self.export_test = args.export_test
8586
self.embed_bf16 = args.embed_bf16
86-
self.slim = args.slim
87+
self.skip_slim = args.skip_slim
8788
tokenizer_model = os.path.join(args.path, 'tokenizer.model')
8889
if os.path.exists(tokenizer_model):
8990
self.sp_model = spm.SentencePieceProcessor(tokenizer_model)
@@ -186,8 +187,7 @@ def export_lm(self):
186187
output_names=['token_id'],
187188
do_constant_folding=True,
188189
opset_version=15)
189-
if self.slim:
190-
from onnxslim import slim
190+
if not self.skip_slim:
191191
slim(onnx_model, output_model=onnx_model)
192192
# test lm
193193
if self.export_test:
@@ -217,8 +217,7 @@ def export_visual(self):
217217
}},
218218
do_constant_folding=True,
219219
opset_version=15)
220-
if self.slim:
221-
from onnxslim import slim
220+
if not self.skip_slim:
222221
slim(onnx_model, output_model=onnx_model)
223222
# test
224223
if self.export_test:
@@ -246,8 +245,7 @@ def export_embed(self):
246245
}},
247246
do_constant_folding=True,
248247
opset_version=15)
249-
if self.slim:
250-
from onnxslim import slim
248+
if not self.skip_slim:
251249
slim(onnx_model, output_model=onnx_model)
252250
# test
253251
if self.export_test:
@@ -281,8 +279,7 @@ def export_block(self, block_id: int):
281279
dynamic_axes=self.block_dynamic_axes,
282280
do_constant_folding=True,
283281
opset_version=15)
284-
if self.slim:
285-
from onnxslim import slim
282+
if not self.skip_slim:
286283
slim(onnx_model, output_model=onnx_model)
287284
if self.export_test:
288285
original_outs = model(inputs_embeds, attention_mask, position_ids, past_key_values)
@@ -322,8 +319,7 @@ def export(self):
322319
dynamic_axes=self.model_dynamic_axes,
323320
do_constant_folding=True,
324321
opset_version=15)
325-
if self.slim:
326-
from onnxslim import slim
322+
if not self.skip_slim:
327323
slim(onnx_model, output_model=onnx_model)
328324
if self.export_test:
329325
# test
@@ -961,8 +957,7 @@ def export(self):
961957
dynamic_axes=self.model_dynamic_axes,
962958
do_constant_folding=True,
963959
opset_version=15)
964-
if self.slim:
965-
from onnxslim import slim
960+
if not self.skip_slim:
966961
slim(onnx_model, output_model=onnx_model)
967962
if self.export_test:
968963
self.seq_len = 4
@@ -1042,7 +1037,7 @@ def get_attention_mask(self) -> torch.Tensor:
10421037
parser.add_argument('--export_block', type=int, help='export llm block [id] to an `onnx` model.')
10431038
parser.add_argument('--export_blocks', action='store_true', help='export llm all blocks to `onnx` models.')
10441039
parser.add_argument('--embed_bf16', action='store_true', help='using `bfloat16` replace `float32` in embedding.')
1045-
parser.add_argument('--slim', action='store_true', help='Whether or not to slim the exported onnx model.')
1040+
parser.add_argument('--skip_slim', action='store_true', help='Whether or not to skip onnx-slim.')
10461041

10471042

10481043
args = parser.parse_args()

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
MNN==2.8.0
1+
MNN==2.8.1
22
numpy==1.25.2
33
onnxruntime==1.15.1
44
torch==2.0.1

0 commit comments

Comments
 (0)