A Gentle Introduction to Attention Masking in Transformer Models
Attention mechanisms in transformer models need to handle various constraints that prevent the model from attending to certain positions. This post explores how attention masking enables these constraints and their…
