https://github.com/fastai/fastbook/blob/master/17_foundations.ipynb

Learn from
Learn
- Run code locally
- Understand further using ChatGPT
- Go down rabbit holes
- Papers
- Advanced concepts
- Take notes and convert into flashcards
Build
1. Da vinci fine tune
2. From scratch basic RNN
3. From scratch transformer
  
  Let's build GPT: from scratch, in code, spelled out.
4. Meta learning prompt optimizer

General Terms

RELU: Activation function - The universal approximation theorem from scratch
Batch normalization: normalize the inputs of a layer by subtracting the mean and dividing by the standard deviation. Makes learning process more stable
Dropout: Randomly sets to zero a certain fraction of the neurons in a layer during training. This helps to prevent overfitting by ensuring that the model does not rely too heavily on any particular set of neurons.
Batch dropout: Applies dropout to the input of the network rather than the output of the neurons. The main idea behind this technique is to randomly drop out certain samples from the input batch before they are fed into the network. Debated effectiveness but can also help prevent overfitting
Catastrophic forgetting: Model, after being fine-tuned on a new task, forgets the information it has learned from the original pre-training dataset. This results in a significant decrease in performance on the original pre-training task.
Concat pooling: uses a combination of pooling techniques, commonly used in transformer networks. Pooling is a technique used to reduce the dimensionality of the output of a layer in a neural network. Reducing dimensionality can reduce computation time (less data), reduce the risk of overfitting (from too complex of a model), subsequently generalizes better (less complex), improves interpretability (smaller number of dimensions)
- Mean pooling: Mean pooling computes the mean of the output of a layer across all timesteps
- Max pooling: Selects the maximum value of the output of a layer across all timesteps
Soft-max activation: used for the last layer to output a probability distribution over the target classes

Preprocessing

Adding Context (Feature Engineering)

Classification

This is the task of assigning a predefined category or label to a given text. Examples of text classification tasks include sentiment analysis, spam detection, and topic classification.

Add context by combining the different input fields.

df['input'] = 'TEXT1: ' + df.context + '; TEXT2: ' + df.target + '; ANC1: ' + df.anchor

Next Sequence Prediction