Is value must be linear transformed at multi-head attention? 

In the paper, **Attention is All You Need**, query, key, value are linear transformed at the multi-head attention.
 `   

        Q = tf.layers.dense(queries, d_model, use_bias=True) # (N, T_q, d_model)

        K = tf.layers.dense(keys, d_model, use_bias=True) # (N, T_k, d_model)

        V = tf.layers.dense(values, d_model, use_bias=True) # (N, T_k, d_model)
`
And I want to know whether value must be linear transformed at multi-head attention?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is value must be linear transformed at multi-head attention? #170

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is value must be linear transformed at multi-head attention? #170

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions