https://github.com/fastai/fastbook/blob/master/17_foundations.ipynb
Da vinci fine tune
From scratch basic RNN
From scratch transformer
Meta learning prompt optimizer
RELU: Activation function - The universal approximation theorem from scratch
Batch normalization: normalize the inputs of a layer by subtracting the mean and dividing by the standard deviation. Makes learning process more stable
Dropout: Randomly sets to zero a certain fraction of the neurons in a layer during training. This helps to prevent overfitting by ensuring that the model does not rely too heavily on any particular set of neurons.
Batch dropout: Applies dropout to the input of the network rather than the output of the neurons. The main idea behind this technique is to randomly drop out certain samples from the input batch before they are fed into the network. Debated effectiveness but can also help prevent overfitting
Catastrophic forgetting: Model, after being fine-tuned on a new task, forgets the information it has learned from the original pre-training dataset. This results in a significant decrease in performance on the original pre-training task.
Concat pooling: uses a combination of pooling techniques, commonly used in transformer networks. Pooling is a technique used to reduce the dimensionality of the output of a layer in a neural network. Reducing dimensionality can reduce computation time (less data), reduce the risk of overfitting (from too complex of a model), subsequently generalizes better (less complex), improves interpretability (smaller number of dimensions)
Soft-max activation: used for the last layer to output a probability distribution over the target classes
Classification
This is the task of assigning a predefined category or label to a given text. Examples of text classification tasks include sentiment analysis, spam detection, and topic classification.
Add context by combining the different input fields.
df['input'] = 'TEXT1: ' + df.context + '; TEXT2: ' + df.target + '; ANC1: ' + df.anchor
Next Sequence Prediction