Merge pull request #196 from LiTingyu1997/update_231

zhanghuiyao · web-flow · commit 8f4ff6897c2e · 2024-11-22T15:32:39.000+08:00
docs: add requirements and performance to readmes
diff --git a/README.md b/README.md
@@ -24,11 +24,12 @@ MindAudio is a toolbox of audio models and algorithms based on [MindSpore](https
 
 The following is the corresponding `mindaudio` versions and supported `mindspore` versions.
 
-| `mindspore`  | `mindaudio` |
-|--------------|-------------|
-| `master`     | `master`    |
-| `2.3.0`      | `0.4`       |
-| `2.2.10`     | `0.3`       |
+|  mindaudio  |   mindspore   |
+|:-----------:|:-------------:|
+|    main     |    master     |
+|     0.4     |  2.3.0/2.3.1  |
+|     0.3     |    2.2.10     |
+
 
 ### data processing
 
diff --git a/README_CN.md b/README_CN.md
@@ -22,11 +22,11 @@ MindAudio 是基于 [MindSpore](https://www.mindspore.cn/) 的音频模型和算
 
 下表显示了相应的 `mindaudio` 版本和支持的 `mindspore` 版本。
 
-| `mindspore`  | `mindaudio` |
-|--------------|-------------|
-| `master`     | `master`    |
-| `2.3.0`      | `0.4`       |
-| `2.2.10`     | `0.3`       |
+|  mindaudio  |   mindspore   |
+|:-----------:|:-------------:|
+|    main     |    master     |
+|     0.4     |  2.3.0/2.3.1  |
+|     0.3     |    2.2.10     |
 
 ### 数据处理
 
diff --git a/examples/conformer/readme.md b/examples/conformer/readme.md
@@ -6,6 +6,11 @@
 
 Conformer is a model that combines transformers and CNNs to model both local and global dependencies in audio sequences. Currently, models based on transformers and convolutional neural networks (CNNs) have achieved good results in automatic speech recognition (ASR). Transformers can capture long-sequence dependencies and global interactions based on content, while CNNs can effectively utilize local features. Therefore, a convolution-enhanced transformer model called Conformer has been proposed for speech recognition, showing performance superior to both transformers and CNNs. The current version supports using the Conformer model for training/testing and inference on the AISHELL-1 dataset on ascend NPU and GPU.
 
+## Requirements
+| mindspore | ascend driver |  firmware   | cann toolkit/kernel |
+|:---------:|:-------------:|:-----------:|:-------------------:|
+|   2.3.1   |   24.1.RC2    | 7.3.0.1.231 |    8.0.RC2.beta1    |
+
 ### Model Structure
 
 The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Linear, Dropout, and ConformerBlocks×N, as shown in the structure diagram below.
@@ -18,6 +23,7 @@ The overall structure of Conformer includes SpecAug, ConvolutionSubsampling, Lin
 
 
 
+
 ## Usage Steps
 
 ### 1. Dataset Preparation
@@ -103,35 +109,17 @@ python predict.py --config_path ./conformer.yaml
 # using ctc prefix beam search decoder
 python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search
 
-# using attention decoder
-python predict.py --config_path ./conformer.yaml --decode_mode attention
-
 # using attention rescoring decoder
 python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
 ```
 
 
+## Performance
 
-## Model Performance
-The training config can be found in the [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml).
-
-Performance tested on ascend 910 (8p) with graph mode:
-
-| model     | decoding mode          | CER          |
-|-----------|------------------------|--------------|
-| conformer | ctc greedy search      | 5.35         |
-| conformer | ctc prefix beam search | 5.36         |
-| conformer | attention decoder      | comming soon |
-| conformer | attention rescoring    | 4.95         |
-- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) can be downloaded here.
-
----
-Performance tested on ascend 910* (8p) with graph mode:
+Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:
 
-| model     | decoding mode          | CER          |
-|-----------|------------------------|--------------|
-| conformer | ctc greedy search      | 5.62         |
-| conformer | ctc prefix beam search | 5.62         |
-| conformer | attention decoder      | comming soon |
-| conformer | attention rescoring    | 5.12         |
-- [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) can be downloaded here.
+| model name |    decoding method     | cards | batch size | jit level | graph compile | ms/step | cer  |                                                  recipe                                                  |                                                       weight                                                       |
+|:----------:|:----------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:----:|:--------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|
+| conformer  |   ctc greedy search    |   8   |   bucket   |    O0     |     103s      |  727.5  | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
+| conformer  | ctc prefix beam search |   8   |   bucket   |    O0     |     103s      |  727.5  | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
+| conformer  |  attention rescoring   |   8   |   bucket   |    O0     |     103s      |  727.5  | 5.12 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
diff --git a/examples/conformer/readme_cn.md b/examples/conformer/readme_cn.md
@@ -6,7 +6,12 @@
 
 conformer是将一种transformer和cnn结合起来，对音频序列进行局部和全局依赖都进行建模的模型。目前基于transformer和卷积神经网络cnn的模型在ASR上已经达到了较好的效果，Transformer能够捕获长序列的依赖和基于内容的全局交互信息，CNN则能够有效利用局部特征，因此针对语音识别问题提出了卷积增强的transformer模型，称为conformer，模型性能优于transformer和cnn。目前提供版本支持在NPU和GPU上使用[conformer](https://arxiv.org/pdf/2102.06657v1.pdf)模型在aishell-1数据集上进行训练/测试和推理。
 
-### 模型结构
+## 版本要求
+| mindspore | ascend driver |  firmware   | cann toolkit/kernel |
+|:---------:|:-------------:|:-----------:|:-------------------:|
+|   2.3.1   |   24.1.RC2    | 7.3.0.1.231 |    8.0.RC2.beta1    |
+
+## 模型结构
 
 Conformer整体结构包括：SpecAug、ConvolutionSubsampling、Linear、Dropout、ConformerBlocks×N，可见如下结构图。
 
@@ -17,6 +22,7 @@ Conformer整体结构包括：SpecAug、ConvolutionSubsampling、Linear、Dropou
   ![image-20230310165349460](https://raw.githubusercontent.com/mindspore-lab/mindaudio/main/tests/result/conformer.png)
 
 
+
 ## 使用步骤
 
 ### 1. 数据集准备
@@ -102,32 +108,16 @@ python predict.py --config_path ./conformer.yaml
 # using ctc prefix beam search decoder
 python predict.py --config_path ./conformer.yaml --decode_mode ctc_prefix_beam_search
 
-# using attention decoder
-python predict.py --config_path ./conformer.yaml --decode_mode attention
-
 # using attention rescoring decoder
 python predict.py --config_path ./conformer.yaml --decode_mode attention_rescoring
 ```
 
-## **模型表现**
-训练的配置文件见 [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml)。
-
-在 ascend 910(8p) 图模式上的测试性能:
-
-| model     | decoding mode          | CER          |
-| --------- | ---------------------- |--------------|
-| conformer | ctc greedy search      | 5.35         |
-| conformer | ctc prefix beam search | 5.36         |
-| conformer | attention decoder      | comming soon |
-| conformer | attention rescoring    | 4.95         |
-- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-548ee31b.ckpt) 可以在此处下载。
----
-在 ascend 910*(8p) 图模式上的测试性能:
-
-| model     | decoding mode          | CER          |
-| --------- | ---------------------- |--------------|
-| conformer | ctc greedy search      | 5.62         |
-| conformer | ctc prefix beam search | 5.62         |
-| conformer | attention decoder      | comming soon |
-| conformer | attention rescoring    | 5.12         |
-- 训练好的 [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) 可以在此处下载。
+## 模型表现
+
+在 ascend 910* mindspore2.3.1图模式上的测试性能:
+
+| model name |    decoding method     | cards | batch size | jit level | graph compile | ms/step | cer  |                                                  recipe                                                  |                                                       weight                                                       |
+|:----------:|:----------------------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:----:|:--------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------:|
+| conformer  |   ctc greedy search    |   8   |   bucket   |    O0     |     103s      |  727.5  | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
+| conformer  | ctc prefix beam search |   8   |   bucket   |    O0     |     103s      |  727.5  | 5.62 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
+| conformer  |  attention rescoring   |   8   |   bucket   |    O0     |     103s      |  727.5  | 5.12 | [conformer.yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/conformer/conformer.yaml) | [weights](https://download-mindspore.osinfra.cn/toolkits/mindaudio/conformer/conformer_avg_30-692d57b3-910v2.ckpt) |
diff --git a/examples/deepspeech2/readme.md b/examples/deepspeech2/readme.md
@@ -3,9 +3,15 @@
 
 ## Introduction
 
-DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU and GPU.
+DeepSpeech2 is a speech recognition model trained using CTC loss. It replaces the entire manually designed component pipeline with neural networks and can handle a variety of speech, including noisy environments, accents, and different languages. The currently provided version supports using the [DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf) model for training/testing and inference on the librispeech dataset on NPU.
 
-### Model Architecture
+
+## Requirements
+| mindspore | ascend driver |  firmware   | cann toolkit/kernel |
+|:---------:|:-------------:|:-----------:|:-------------------:|
+|   2.3.1   |   24.1.RC2    | 7.3.0.1.231 |    8.0.RC2.beta1    |
+
+## Model Architecture
 
 The current reproduced model includes:
 
@@ -15,7 +21,7 @@ The current reproduced model includes:
 - Five bidirectional LSTM layers (size 1024)
 - A projection layer [size equal to the number of characters plus 1 (for the CTC blank symbol), 28]
 
-### Data Processing
+## Data Processing
 
 - Audio:
   1. Feature extraction: log power spectrum.
@@ -94,8 +100,12 @@ Update the path to the trained weights in the Pretrained_model section of the de
 python eval.py -c "./deepspeech2.yaml"
 ```
 
-## **Model Performance**
+## Performance
+
+Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode:
+
+| model name  | cards | batch size | jit level | graph compile | ms/step |  cer  |  wer  | recipe                                                                                             |                                          weight                                          |
+|:-----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------:|
+| deepspeech2 |   8   |     64     |    O0     |     404s      |  9078   | 3.461 | 10.24 | [yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
 
-| Model        | Machine   | LM   | Test Clean CER | Test Clean WER | Parameters                                                                                               | Weights                                                         |
-|--------------|-----------|------|----------------|----------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
-| DeepSpeech2  | D910x8-G  | No   | 3.461          | 10.24          | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml)          | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt)               |
+- cer and wer tested in Librispeech `test clean` datasets.
diff --git a/examples/deepspeech2/readme_cn.md b/examples/deepspeech2/readme_cn.md
@@ -4,9 +4,15 @@
 
 ## 介绍
 
-DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道，可以处理各种各样的语音，包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU和GPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。
+DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网络取代了整个手工设计组件的管道，可以处理各种各样的语音，包括嘈杂的环境、口音和不同的语言。目前提供版本支持在NPU上使用[DeepSpeech2](http://arxiv.org/pdf/1512.02595v1.pdf)模型在librispeech数据集上进行训练/测试和推理。
 
-### 模型结构
+
+## 版本要求
+| mindspore | ascend driver |  firmware   | cann toolkit/kernel |
+|:---------:|:-------------:|:-----------:|:-------------------:|
+|   2.3.1   |   24.1.RC2    | 7.3.0.1.231 |    8.0.RC2.beta1    |
+
+## 模型结构
 
 目前的复现的模型包括:
 
@@ -16,7 +22,8 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
 - 五个双向 LSTM 层（大小为 1024）
 - 一个投影层【大小为字符数加 1（为CTC空白符号)，28】
 
-### 数据处理
+
+## 数据处理
 
 - 音频：
 
@@ -28,6 +35,7 @@ DeepSpeech2是一种采用CTC损失训练的语音识别模型。它用神经网
 
 ​		文字编码使用labels进行英文字母转换，用户可使用分词模型进行替换。
 
+## 使用步骤
 
 ### 1. 数据集准备
 如为未下载数据集，可使用提供的脚本进行一键下载以及数据准备，如下所示：
@@ -102,8 +110,12 @@ python eval.py -c "./deepspeech2.yaml"
 
 
 
-## **性能表现**
+## 性能表现
+
+在 ascend 910* mindspore2.3.1图模式上的测试性能:
+
+| model name  | cards | batch size | jit level | graph compile | ms/step |  cer  |  wer  | recipe                                                                                             |                                          weight                                          |
+|:-----------:|:-----:|:----------:|:---------:|:-------------:|:-------:|:-----:|:-----:|:---------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------:|
+| deepspeech2 |   8   |     64     |    O0     |     404s      |  9078   | 3.461 | 10.24 | [yaml](https://github.com/mindspore-lab/mindaudio/blob/main/examples/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
 
-| model        | LM   | test clean cer| test clean wer | config                                     | weights|
-| ----------- | ---- | -------------- | -------------- |--------------------------------------------------------------------------------------------------| ------------------------------------------------------------ |
-| deepspeech2 | No   | 3.461          | 10.24          | [yaml](https://github.com/mindsporelab/mindaudio/blob/main/example/deepspeech2/deepspeech2.yaml) | [weights](https://download.mindspore.cn/toolkits/mindaudio/deepspeech2/deepspeech2.ckpt) |
+- cer和wer由Librispeech `test clean` 数据集上测试得到。