Skip to content

Feedback about NLP From Scratch: Translation with a Sequence to Sequence Network and Attention #3643

@pahujam

Description

@pahujam

There is the following issue on this page: https://docs.pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

isn't this code little problematic batch wise ? basically GRU in encoder gives latest hidden state which could be hidden state of PAD token ?
also, the CE loss should not be computed for PAD token ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions