Self-supervised Learning
Self-supervised learning is a form of unsupervised learning. In the example below, we split the input
Masking Input
- Use a special token of random token to masking some tokens randomly.
- minimize cross entropy of masked output and ground truth.
Next Sentence Prediction
SOP: predict the order of sentence1 and sentence2. It is more useful than next sentence prediction, perhabs because that sop is more complicate than nsp.
Bert can be a powerful baseline model. ## GlUE
GLUE Score is often used to evaluate perfomerence of bert.
How to use BERT
Case 1
Case 2
Case 3
Case 4
Pre-training a seq2seq model
## Trainging BERT is challenging
Training data has more than 3 billions of words.
Why does BERT work
BERT can consider context.
GPT Series
Predict the next token
Problems and solutions of PLMs
Labeled Data Scarcity
Data-Efficient Fine-tuning
Prompt Tuning
How prompt tuning be used in different level of labeled data scarcity
### Semi-supervised Learning
We have some labeled training data and a large amount of unlabeled data. And we want to try to label the unlabeled data.
PLM is too big
Reduce parameters scale while fine tuning
### Parameter-Efficient Fine-tuning
- Use a small amount of parameters for each downstream task
### LoRA
### Prefix Tuning
Early Exit
Self-supervised Leaning for speech and image
Predictive Approach
Speech and images contain many details that are difficult to generate, but there is a way to learn without generation.
Contrastive Learning
Find some positive samples, make the, as close as possible. And negative samples as far as possible.
However, considering that the job is a self-supervised job, we do not know labels of samples ### SimCLR ## Moco
## Contrasitive Learning for Speech
Bootstrapping
、