Skip to content

Commit 8f4ff68

Browse files
authored
Merge pull request #196 from LiTingyu1997/update_231
docs: add requirements and performance to readmes
2 parents 4c7b609 + c167744 commit 8f4ff68

File tree

6 files changed

+76
-75
lines changed

6 files changed

+76
-75
lines changed

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,12 @@ MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https
2424

2525
The following is the corresponding `mindaudio` versions and supported `mindspore` versions.
2626

27-
| `mindspore` | `mindaudio` |
28-
|--------------|-------------|
29-
| `master` | `master` |
30-
| `2.3.0` | `0.4` |
31-
| `2.2.10` | `0.3` |
27+
| mindaudio | mindspore |
28+
|:-----------:|:-------------:|
29+
| main | master |
30+
| 0.4 | 2.3.0/2.3.1 |
31+
| 0.3 | 2.2.10 |
32+
3233

3334
### data processing
3435

README_CN.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算
2222

2323
下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。
2424

25-
| `mindspore` | `mindaudio` |
26-
|--------------|-------------|
27-
| `master` | `master` |
28-
| `2.3.0` | `0.4` |
29-
| `2.2.10` | `0.3` |
25+
| mindaudio | mindspore |
26+
|:-----------:|:-------------:|
27+
| main | master |
28+
| 0.4 | 2.3.0/2.3.1 |
29+
| 0.3 | 2.2.10 |
3030

3131
### 数据处理
3232

examples/conformer/readme.md

Lines changed: 13 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@
66

77
Conformer is a model that combines transformers and CNNs to model both local and global dependencies in audio sequences. Currently, models based on transformers and convolutional neural networks (CNNs) have achieved good results in automatic speech recognition (ASR). Transformers can capture long-sequence dependencies and global interactions based on content, while CNNs can effectively utilize local features. Therefore, a convolution-enhanced transformer model called Conformer has been proposed for speech recognition, showing performance superior to both transformers and CNNs. The current version supports using the Conformer model for training/testing and inference on the AISHELL-1 dataset on ascend NPU and GPU.
88

9+
## Requirements
10+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11+
|:---------:|:-------------:|:-----------:|:-------------------:|
12+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
13+
914
### Model Structure
1015

1116
The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Linear, Dropout, and ConformerBlocks×N, as shown in the structure diagram below.
@@ -18,6 +23,7 @@ The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Lin
1823

1924

2025

26+
2127
## Usage Steps
2228

2329
### 1. Dataset Preparation
@@ -103,35 +109,17 @@ python predict.py --config_path ./conformer.yaml
103109
# using ctc prefix beam search decoder
104110
python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search
105111

106-
# using attention decoder
107-
python predict.py --config_path ./conformer.yaml --decode_mode attention
108-
109112
# using attention rescoring decoder
110113
python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
111114
```
112115

113116

117+
## Performance
114118

115-
## Model Performance
116-
The training config can be found in the [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml).
117-
118-
Performance tested on ascend 910 (8p) with graph mode:
119-
120-
| model | decoding mode | CER |
121-
|-----------|------------------------|--------------|
122-
| conformer | ctc greedy search | 5.35 |
123-
| conformer | ctc prefix beam search | 5.36 |
124-
| conformer | attention decoder | comming soon |
125-
| conformer | attention rescoring | 4.95 |
126-
- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) can be downloaded here.
127-
128-
---
129-
Performance tested on ascend 910* (8p) with graph mode:
119+
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:
130120

131-
| model | decoding mode | CER |
132-
|-----------|------------------------|--------------|
133-
| conformer | ctc greedy search | 5.62 |
134-
| conformer | ctc prefix beam search | 5.62 |
135-
| conformer | attention decoder | comming soon |
136-
| conformer | attention rescoring | 5.12 |
137-
- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) can be downloaded here.
121+
| model name | decoding method | cards | batch size | jit level | graph compile | ms/step | cer | recipe | weight |
122+
|:----------:|:----------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:----:|:--------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|
123+
| conformer | ctc greedy search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
124+
| conformer | ctc prefix beam search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
125+
| conformer | attention rescoring | 8 | bucket | O0 | 103s | 727.5 | 5.12 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |

examples/conformer/readme_cn.md

Lines changed: 16 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,12 @@
66

77
conformer是将一种transformer和cnn结合起来,对音频序列进行局部和全局依赖都进行建模的模型。目前基于transformer和卷积神经网络cnn的模型在ASR上已经达到了较好的效果,Transformer能够捕获长序列的依赖和基于内容的全局交互信息,CNN则能够有效利用局部特征,因此针对语音识别问题提出了卷积增强的transformer模型,称为conformer,模型性能优于transformer和cnn。目前提供版本支持在NPU和GPU上使用[conformer](https://arxiv.org/pdf/2102.06657v1.pdf)模型在aishell-1数据集上进行训练/测试和推理。
88

9-
### 模型结构
9+
## 版本要求
10+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11+
|:---------:|:-------------:|:-----------:|:-------------------:|
12+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
13+
14+
## 模型结构
1015

1116
Conformer整体结构包括:SpecAug、ConvolutionSubsampling、Linear、Dropout、ConformerBlocks×N,可见如下结构图。
1217

@@ -17,6 +22,7 @@ Conformer整体结构包括:SpecAug、ConvolutionSubsampling、Linear、Dropou
1722
![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png)
1823

1924

25+
2026
## 使用步骤
2127

2228
### 1. 数据集准备
@@ -102,32 +108,16 @@ python predict.py --config_path ./conformer.yaml
102108
# using ctc prefix beam search decoder
103109
python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search
104110

105-
# using attention decoder
106-
python predict.py --config_path ./conformer.yaml --decode_mode attention
107-
108111
# using attention rescoring decoder
109112
python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
110113
```
111114

112-
## **模型表现**
113-
训练的配置文件见 [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml)
114-
115-
在 ascend 910(8p) 图模式上的测试性能:
116-
117-
| model | decoding mode | CER |
118-
| --------- | ---------------------- |--------------|
119-
| conformer | ctc greedy search | 5.35 |
120-
| conformer | ctc prefix beam search | 5.36 |
121-
| conformer | attention decoder | comming soon |
122-
| conformer | attention rescoring | 4.95 |
123-
- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) 可以在此处下载。
124-
---
125-
在 ascend 910*(8p) 图模式上的测试性能:
126-
127-
| model | decoding mode | CER |
128-
| --------- | ---------------------- |--------------|
129-
| conformer | ctc greedy search | 5.62 |
130-
| conformer | ctc prefix beam search | 5.62 |
131-
| conformer | attention decoder | comming soon |
132-
| conformer | attention rescoring | 5.12 |
133-
- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) 可以在此处下载。
115+
## 模型表现
116+
117+
在 ascend 910* mindspore2.3.1图模式上的测试性能:
118+
119+
| model name | decoding method | cards | batch size | jit level | graph compile | ms/step | cer | recipe | weight |
120+
|:----------:|:----------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:----:|:--------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|
121+
| conformer | ctc greedy search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
122+
| conformer | ctc prefix beam search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
123+
| conformer | attention rescoring | 8 | bucket | O0 | 103s | 727.5 | 5.12 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |

examples/deepspeech2/readme.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,15 @@
33
44
## Introduction
55

6-
DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU and GPU.
6+
DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU.
77

8-
### Model Architecture
8+
9+
## Requirements
10+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11+
|:---------:|:-------------:|:-----------:|:-------------------:|
12+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
13+
14+
## Model Architecture
915

1016
The current reproduced model includes:
1117

@@ -15,7 +21,7 @@ The current reproduced model includes:
1521
- Five bidirectional LSTM layers (size 1024)
1622
- A projection layer [size equal to the number of characters plus 1 (for the CTC blank symbol), 28]
1723

18-
### Data Processing
24+
## Data Processing
1925

2026
- Audio:
2127
1. Feature extraction: log power spectrum.
@@ -94,8 +100,12 @@ Update the path to the trained weights in the Pretrained_model section of the de
94100
python eval.py -c "./deepspeech2.yaml"
95101
```
96102

97-
## **Model Performance**
103+
## Performance
104+
105+
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:
106+
107+
| model name | cards | batch size | jit level | graph compile | ms/step | cer | wer | recipe | weight |
108+
|:-----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------:|
109+
| deepspeech2 | 8 | 64 | O0 | 404s | 9078 | 3.461 | 10.24 | [yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
98110

99-
| Model | Machine | LM | Test Clean CER | Test Clean WER | Parameters | Weights |
100-
|--------------|-----------|------|----------------|----------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
101-
| DeepSpeech2 | D910x8-G | No | 3.461 | 10.24 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
111+
- cer and wer tested in Librispeech `test clean` datasets.

examples/deepspeech2/readme_cn.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,15 @@
44

55
## 介绍
66

7-
DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU和GPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。
7+
DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。
88

9-
### 模型结构
9+
10+
## 版本要求
11+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
12+
|:---------:|:-------------:|:-----------:|:-------------------:|
13+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
14+
15+
## 模型结构
1016

1117
目前的复现的模型包括:
1218

@@ -16,7 +22,8 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
1622
- 五个双向 LSTM 层(大小为 1024)
1723
- 一个投影层【大小为字符数加 1(为CTC空白符号),28】
1824

19-
### 数据处理
25+
26+
## 数据处理
2027

2128
- 音频:
2229

@@ -28,6 +35,7 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
2835

2936
​ 文字编码使用labels进行英文字母转换,用户可使用分词模型进行替换。
3037

38+
## 使用步骤
3139

3240
### 1. 数据集准备
3341
如为未下载数据集,可使用提供的脚本进行一键下载以及数据准备,如下所示:
@@ -102,8 +110,12 @@ python eval.py -c "./deepspeech2.yaml"
102110

103111

104112

105-
## **性能表现**
113+
## 性能表现
114+
115+
在 ascend 910* mindspore2.3.1图模式上的测试性能:
116+
117+
| model name | cards | batch size | jit level | graph compile | ms/step | cer | wer | recipe | weight |
118+
|:-----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------:|
119+
| deepspeech2 | 8 | 64 | O0 | 404s | 9078 | 3.461 | 10.24 | [yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
106120

107-
| model | LM | test clean cer| test clean wer | config | weights|
108-
| ----------- | ---- | -------------- | -------------- |--------------------------------------------------------------------------------------------------| ------------------------------------------------------------ |
109-
| deepspeech2 | No | 3.461 | 10.24 | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
121+
- cer和wer由Librispeech `test clean` 数据集上测试得到。

0 commit comments

Comments
 (0)