This repository was archived by the owner on Aug 3, 2021. It is now read-only.

Description
Hi,
I am trying to train speech to text model on my dataset using checkpoint file of Jasper DR 10x5 model as starting point. Jasper model reference link is below- https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
I have created my dataset and using the config file from above link, I ran the training code using cmd as-
python run.py --mode=train --config_file=example_configs/speech2text/jasper10x5_LibriSpeech_nvgrad_masks.py --enable_logs --continue_learning
I had made changes in config file for training_params section in dataset_files as per my CSV dataset, num_gpus=1, num_epochs=4, batch_size_per_gpu=32.
The code runs completely but it stops after 1st step only. I am not able to figure out what is triggering sess.should_stop() in train function. Referring to train function present in file open_seq2seq/utils/funcs.py
This is causing incomplete training. I have around 95K files and batch_size is set to 32. It should ideally run for around 2900 steps.
Can you provide the reason for sess stopping after 1st step?
Configuration used-
Python Version: 3.6.10
Tensorflow Version : 1.14.0
OpenSeq2Seq commit ID: 61204b2
Model: Jasper DR 10x5
Model reference link: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders-ref
GPU: 1 P100
Cuda Version : V10.0.130