Skip to content
This repository was archived by the owner on Aug 3, 2021. It is now read-only.
This repository was archived by the owner on Aug 3, 2021. It is now read-only.

Model training stops after 1 step for Speech to Text Jasper model  #539

@conqueror7

Description

@conqueror7

Hi,
I am trying to train speech to text model on my dataset using checkpoint file of Jasper DR 10x5 model as starting point. Jasper model reference link is below- https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
I have created my dataset and using the config file from above link, I ran the training code using cmd as-
python run.py --mode=train --config_file=example_configs/speech2text/jasper10x5_LibriSpeech_nvgrad_masks.py --enable_logs --continue_learning
I had made changes in config file for training_params section in dataset_files as per my CSV dataset, num_gpus=1, num_epochs=4, batch_size_per_gpu=32.
The code runs completely but it stops after 1st step only. I am not able to figure out what is triggering sess.should_stop() in train function. Referring to train function present in file open_seq2seq/utils/funcs.py

This is causing incomplete training. I have around 95K files and batch_size is set to 32. It should ideally run for around 2900 steps.
Can you provide the reason for sess stopping after 1st step?

Configuration used-
Python Version: 3.6.10
Tensorflow Version : 1.14.0
OpenSeq2Seq commit ID: 61204b2
Model: Jasper DR 10x5
Model reference link: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
GPU: 1 P100
Cuda Version : V10.0.130

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions