Does the selfattentionLayer also perform softmax and scaling?
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Chih
el 3 de Abr. de 2023
Editada: cui,xingxing
el 27 de Abr. de 2024
In https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html, it states that:
A self-attention layer computes single-head or multihead self-attention of its input.
The layer:
- Computes the queries, keys, and values from the input
- Computes the scaled dot-product attention across heads using the queries, keys, and values
- Merges the results from the heads
- Performs a linear transformation on the merged result
I wonder if the layer also apply softmax to the scaling (i.e. divide (Q*K) by sqrt(dim))? My understanding is that, within step 2, this softmax and scaling should happen.
Please clarify that for me or more general users.
Thanks.
0 comentarios
Respuesta aceptada
Rohit
el 20 de Abr. de 2023
I understand that you want to know whether ‘selfAttentionLayer’ performs softmax and scaling operations which are involved to compute attention score.
Yes, we perform both operations to compute scaled attention score and then apply softmax as required in attention mechanism.
Más respuestas (1)
cui,xingxing
el 11 de En. de 2024
Editada: cui,xingxing
el 27 de Abr. de 2024
Hi,@Chih
-------------------------Off-topic interlude, 2024-------------------------------
I am currently looking for a job in the field of CV algorithm development, based in Shenzhen, Guangdong, China,or a remote support position. I would be very grateful if anyone is willing to offer me a job or make a recommendation. My preliminary resume can be found at: https://cuixing158.github.io/about/ . Thank you!
Email: cuixingxing150@gmail.com
0 comentarios
Ver también
Categorías
Más información sobre Image Data Workflows en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!