Skip to content

Commit c167744

Browse files
ChongWei905LiTingyu1997
authored andcommitted
docs: fix readme bugs
1 parent b53587b commit c167744

File tree

6 files changed

+54
-50
lines changed

6 files changed

+54
-50
lines changed

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,12 @@ MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https
2424

2525
The following is the corresponding `mindaudio` versions and supported `mindspore` versions.
2626

27-
| `mindaudio` | `mindspore` |
28-
|-------------|---------------------|
29-
| `master` | `master` |
30-
| `0.4` | `2.3.0`/`2.3.1` |
31-
| `0.3` | `2.2.10` |
27+
| mindaudio | mindspore |
28+
|:-----------:|:-------------:|
29+
| main | master |
30+
| 0.4 | 2.3.0/2.3.1 |
31+
| 0.3 | 2.2.10 |
32+
3233

3334
### data processing
3435

README_CN.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,11 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算
2222

2323
下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。
2424

25-
| `mindaudio` | `mindspore` |
26-
|-------------|---------------------|
27-
| `master` | `master` |
28-
| `0.4` | `2.3.0`/`2.3.1` |
29-
| `0.3` | `2.2.10` |
25+
| mindaudio | mindspore |
26+
|:-----------:|:-------------:|
27+
| main | master |
28+
| 0.4 | 2.3.0/2.3.1 |
29+
| 0.3 | 2.2.10 |
3030

3131
### 数据处理
3232

examples/conformer/readme.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@
66

77
Conformer is a model that combines transformers and CNNs to model both local and global dependencies in audio sequences. Currently, models based on transformers and convolutional neural networks (CNNs) have achieved good results in automatic speech recognition (ASR). Transformers can capture long-sequence dependencies and global interactions based on content, while CNNs can effectively utilize local features. Therefore, a convolution-enhanced transformer model called Conformer has been proposed for speech recognition, showing performance superior to both transformers and CNNs. The current version supports using the Conformer model for training/testing and inference on the AISHELL-1 dataset on ascend NPU and GPU.
88

9+
## Requirements
10+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11+
|:---------:|:-------------:|:-----------:|:-------------------:|
12+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
13+
914
### Model Structure
1015

1116
The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Linear, Dropout, and ConformerBlocks×N, as shown in the structure diagram below.
@@ -16,10 +21,6 @@ The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Lin
1621

1722
![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png)
1823

19-
## Requirements
20-
| mindspore | ascend driver | firmware | cann toolkit/kernel |
21-
|:-------------:|:----------------------:|:------------:|:-----------------------:|
22-
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
2324

2425

2526

@@ -117,8 +118,8 @@ python predict.py --config_path ./conformer.yaml --decode_mode attention_rescori
117118

118119
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:
119120

120-
| model name| decoding method | cards | batch size | jit level | graph compile | ms/step | cer | recipe | weight |
121-
|:---------:|:---------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:---:|:------:|:-----:|
122-
| conformer |ctc greedy search | 8 | bucket | O0 | 103s | 727.5 | 5.62 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
123-
| conformer |ctc prefix beam search | 8 | bucket | O0 | 103s | 727.5 | 5.62 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
124-
| conformer |attention rescoring | 8 | bucket | O0 | 103s | 727.5 | 5.12 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
121+
| model name | decoding method | cards | batch size | jit level | graph compile | ms/step | cer | recipe | weight |
122+
|:----------:|:----------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:----:|:--------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|
123+
| conformer | ctc greedy search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
124+
| conformer | ctc prefix beam search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
125+
| conformer | attention rescoring | 8 | bucket | O0 | 103s | 727.5 | 5.12 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |

examples/conformer/readme_cn.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,12 @@
66

77
conformer是将一种transformer和cnn结合起来,对音频序列进行局部和全局依赖都进行建模的模型。目前基于transformer和卷积神经网络cnn的模型在ASR上已经达到了较好的效果,Transformer能够捕获长序列的依赖和基于内容的全局交互信息,CNN则能够有效利用局部特征,因此针对语音识别问题提出了卷积增强的transformer模型,称为conformer,模型性能优于transformer和cnn。目前提供版本支持在NPU和GPU上使用[conformer](https://arxiv.org/pdf/2102.06657v1.pdf)模型在aishell-1数据集上进行训练/测试和推理。
88

9-
### 模型结构
9+
## 版本要求
10+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11+
|:---------:|:-------------:|:-----------:|:-------------------:|
12+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
13+
14+
## 模型结构
1015

1116
Conformer整体结构包括:SpecAug、ConvolutionSubsampling、Linear、Dropout、ConformerBlocks×N,可见如下结构图。
1217

@@ -16,10 +21,6 @@ Conformer整体结构包括:SpecAug、ConvolutionSubsampling、Linear、Dropou
1621

1722
![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png)
1823

19-
## 版本要求
20-
| mindspore | ascend driver | firmware | cann toolkit/kernel |
21-
|:-------------:|:----------------------:|:------------:|:-----------------------:|
22-
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
2324

2425

2526
## 使用步骤
@@ -111,12 +112,12 @@ python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_s
111112
python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
112113
```
113114

114-
## **模型表现**
115+
## 模型表现
115116

116117
在 ascend 910* mindspore2.3.1图模式上的测试性能:
117118

118-
| model name| decoding method | cards | batch size | jit level | graph compile | ms/step | cer | recipe | weight |
119-
|:---------:|:---------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:---:|:------:|:-----:|
120-
| conformer |ctc greedy search | 8 | bucket | O0 | 103s | 727.5 | 5.62 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
121-
| conformer |ctc prefix beam search | 8 | bucket | O0 | 103s | 727.5 | 5.62 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
122-
| conformer |attention rescoring | 8 | bucket | O0 | 103s | 727.5 | 5.12 |[conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) |[weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
119+
| model name | decoding method | cards | batch size | jit level | graph compile | ms/step | cer | recipe | weight |
120+
|:----------:|:----------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:----:|:--------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|
121+
| conformer | ctc greedy search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
122+
| conformer | ctc prefix beam search | 8 | bucket | O0 | 103s | 727.5 | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
123+
| conformer | attention rescoring | 8 | bucket | O0 | 103s | 727.5 | 5.12 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |

examples/deepspeech2/readme.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@
66
DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU.
77

88

9-
### Requirements
10-
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11-
|:-------------:|:----------------------:|:------------:|:-----------------------:|
12-
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
9+
## Requirements
10+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
11+
|:---------:|:-------------:|:-----------:|:-------------------:|
12+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
1313

14-
### Model Architecture
14+
## Model Architecture
1515

1616
The current reproduced model includes:
1717

@@ -21,7 +21,7 @@ The current reproduced model includes:
2121
- Five bidirectional LSTM layers (size 1024)
2222
- A projection layer [size equal to the number of characters plus 1 (for the CTC blank symbol), 28]
2323

24-
### Data Processing
24+
## Data Processing
2525

2626
- Audio:
2727
1. Feature extraction: log power spectrum.
@@ -100,12 +100,12 @@ Update the path to the trained weights in the Pretrained_model section of the de
100100
python eval.py -c "./deepspeech2.yaml"
101101
```
102102

103-
## **Performance**
103+
## Performance
104104

105105
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:
106106

107-
| model name | cards | batch size | jit level | graph compile | ms/step | cer | wer | recipe | weight |
108-
|:----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:---:|:---:|:-------|:------:|
109-
| deepspeech2| 8 | 64 | O0 | 404s | 9078 | 3.461 | 10.24 |[yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)|
107+
| model name | cards | batch size | jit level | graph compile | ms/step | cer | wer | recipe | weight |
108+
|:-----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------:|
109+
| deepspeech2 | 8 | 64 | O0 | 404s | 9078 | 3.461 | 10.24 | [yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
110110

111111
- cer and wer tested in Librispeech `test clean` datasets.

examples/deepspeech2/readme_cn.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,12 @@
77
DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道,可以处理各种各样的语音,包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。
88

99

10-
### 版本要求
11-
| mindspore | ascend driver | firmware | cann toolkit/kernel |
12-
|:-------------:|:----------------------:|:------------:|:-----------------------:|
13-
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
10+
## 版本要求
11+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
12+
|:---------:|:-------------:|:-----------:|:-------------------:|
13+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
1414

15-
### 模型结构
15+
## 模型结构
1616

1717
目前的复现的模型包括:
1818

@@ -23,7 +23,7 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
2323
- 一个投影层【大小为字符数加 1(为CTC空白符号),28】
2424

2525

26-
### 数据处理
26+
## 数据处理
2727

2828
- 音频:
2929

@@ -35,6 +35,7 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
3535

3636
​ 文字编码使用labels进行英文字母转换,用户可使用分词模型进行替换。
3737

38+
## 使用步骤
3839

3940
### 1. 数据集准备
4041
如为未下载数据集,可使用提供的脚本进行一键下载以及数据准备,如下所示:
@@ -109,12 +110,12 @@ python eval.py -c "./deepspeech2.yaml"
109110

110111

111112

112-
## **性能表现**
113+
## 性能表现
113114

114115
在 ascend 910* mindspore2.3.1图模式上的测试性能:
115116

116-
| model name | cards | batch size | jit level | graph compile | ms/step | cer | wer | recipe | weight |
117-
|:----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:---:|:---:|:-------|:------:|
118-
| deepspeech2| 8 | 64 | O0 | 404s | 9078 | 3.461 | 10.24 |[yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)|
117+
| model name | cards | batch size | jit level | graph compile | ms/step | cer | wer | recipe | weight |
118+
|:-----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------:|
119+
| deepspeech2 | 8 | 64 | O0 | 404s | 9078 | 3.461 | 10.24 | [yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
119120

120121
- cer和wer由Librispeech `test clean` 数据集上测试得到。

0 commit comments

Comments
 (0)