Transformers 3: Building and training a Transformer
Printworks, London Transformers 3: Building and training a Transformer Having discussed how attention works and the structure of Transformers we’ll now implement a simple Transformer that translates German text into English. To do this we’ll take the attention mechanism and other network components that Andrej Karpathy developed in nanoGPT for language generation and reuse them to implement a Transformer for language translation. This uses the PyTorch framework. The language translation Transformer’s structure follows the example in François Chollet’s book ‘Deep Learning with Python’ which is written in Keras/TensorFlow....