Transformers are the basis of GPT: Giving attention to transformers

NanoGPT

https://github.com/karpathy/nanoGPT

GPT-3

Specific changes from the architecture of the transformer: Further notes and status quo

Dropout, layer norms previous to the heads rather than after, and the encoder is removed since it is just generating text based on input data, not decoding language such as in the original paper.

Training

Regularization

Computation

Efficiency