Model training stops after 1 step for Speech to Text Jasper model  

Hi,
I am trying to train speech to text model on my dataset using checkpoint file of Jasper DR 10x5 model as starting point. Jasper model reference link is below- https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
I have created my dataset and using the config file from above link, I ran the training code using cmd as-
python run.py --mode=train  --config_file=example_configs/speech2text/jasper10x5_LibriSpeech_nvgrad_masks.py --enable_logs --continue_learning
I had made changes in config file for training_params section in dataset_files as per my CSV dataset, num_gpus=1, num_epochs=4, batch_size_per_gpu=32.
The code runs completely but it stops after 1st step only. I am not able to figure out what is triggering sess.should_stop() in train function. Referring to train function present in file open_seq2seq/utils/funcs.py 

This is causing incomplete training. I have around 95K files and batch_size is set to 32. It should ideally run for around 2900 steps. 
Can you provide the reason for sess stopping after 1st step?

Configuration used-
Python Version: 3.6.10
Tensorflow Version : 1.14.0
OpenSeq2Seq commit ID: 61204b212cfe5c9ceda2be816b9052e9caf021a9
Model: Jasper DR 10x5 
Model reference link: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
GPU: 1 P100
Cuda Version : V10.0.130


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model training stops after 1 step for Speech to Text Jasper model #539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model training stops after 1 step for Speech to Text Jasper model #539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions