Self-supervised Learning
Self-supervised learning is a form of unsupervised learning. In the example below, we split the input
Masking Input
- Use a special token of random token to masking some tokens randomly.
- minimize cross entropy of masked output and ground truth.
Next Sentence Prediction
SOP: predict the order of sentence1 and sentence2. It is more useful than next sentence prediction, perhabs because that sop is more complicate than nsp.
Bert can be a powerful baseline model. ## GlUE
GLUE Score is often used to evaluate perfomerence of bert.
How to use BERT
Case 1
Case 2
Case 3
Case 4
Pre-training a seq2seq model
## Trainging BERT is challenging
Training data has more than 3 billions of words.
Why does BERT work
BERT can consider context.