0%

Self-Supervised Learning

Self-supervised Learning

Self-supervised learning is a form of unsupervised learning. In the example below, we split the input into two parts: and . serves as the input to the model, while contains the remaining information, which the model uses to learn the label information.

Masking Input

  • Use a special token of random token to masking some tokens randomly.
  • minimize cross entropy of masked output and ground truth.

Next Sentence Prediction

SOP: predict the order of sentence1 and sentence2. It is more useful than next sentence prediction, perhabs because that sop is more complicate than nsp.

Bert can be a powerful baseline model. ## GlUE

GLUE Score is often used to evaluate perfomerence of bert.

How to use BERT

Case 1

Case 2

Case 3

Case 4

Pre-training a seq2seq model

## Trainging BERT is challenging

Training data has more than 3 billions of words.

Why does BERT work

BERT can consider context.

GPT Series

Predict the next token

Problems and solutions of PLMs

Labeled Data Scarcity

Data-Efficient Fine-tuning

Prompt Tuning

How prompt tuning be used in different level of labeled data scarcity

### Semi-supervised Learning

We have some labeled training data and a large amount of unlabeled data. And we want to try to label the unlabeled data.

PLM is too big

Reduce parameters scale while fine tuning ### Parameter-Efficient Fine-tuning

  • Use a small amount of parameters for each downstream task ### LoRA

### Prefix Tuning

Early Exit

Self-supervised Leaning for speech and image

Predictive Approach

Speech and images contain many details that are difficult to generate, but there is a way to learn without generation.

Contrastive Learning

Find some positive samples, make the, as close as possible. And negative samples as far as possible.

However, considering that the job is a self-supervised job, we do not know labels of samples ### SimCLR ## Moco ## Contrasitive Learning for Speech

Bootstrapping