0%

Self-Supervised Learning

Self-supervised Learning

Self-supervised learning is a form of unsupervised learning. In the example below, we split the input into two parts: and . serves as the input to the model, while contains the remaining information, which the model uses to learn the label information.

Masking Input

  • Use a special token of random token to masking some tokens randomly.
  • minimize cross entropy of masked output and ground truth.

Next Sentence Prediction

SOP: predict the order of sentence1 and sentence2. It is more useful than next sentence prediction, perhabs because that sop is more complicate than nsp.

Bert can be a powerful baseline model. ## GlUE

GLUE Score is often used to evaluate perfomerence of bert.

How to use BERT

Case 1

Case 2

Case 3

Case 4

Pre-training a seq2seq model

## Trainging BERT is challenging

Training data has more than 3 billions of words.

Why does BERT work

BERT can consider context.