From NLP to Computer Vision

1. Good Reference

  1. (Updating) Awesome BERT & Transfer Learning in NLP
    • BERT related papers
    • BERT related articles
      • Attention
      • Transformer
      • GPT (Generative pre-training Transformer)
      • CLIP Contrastive Language-Image Pretraining
      • DALL-E 2 (Text-to-Image Revolution)
  2. Transformer Attention Video
  3. (Old) Awesome BERT
  4. Transformer understanding by Me (1) attention
  5. Transformer understanding by Me (2) Transformer
  6. Awesome-CLIP: See Application contents (Detection)

2. Youtube Contents

  1. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
  2. BERT and GPT (한국어 설명)
  3. BERT

3. Must Paper lists

Paper Date Institute, Citation Note
Masked Autoencoders Are Scalable Vision Learners (MAE) 21.11 Kaiming He
BEiT: BERT Pre-Training of Image Transformers 21.06 MAE cited.
BERT: Pre-training of deep bidirectional transformers for language understanding 18.10 See youtube.
Improving language understanding by generative pretraining 18.11 OpenAI GPT-1. MAE cited.
Language Models are Unsupervised Multitask Learners 19.09 OpenAI GPT-2. MAE cited.
Language Models are Few-Shot Learners 20.05 OpenAI GPT-3. MAE cited.
Contrastive Language-Image Pre-Training 21.01 OpenAI CLIP
Zero-Shot Text-to-Image Generation 21.02 DALL-E
Hierarchical Text-Conditional Image Generation with CLIP Latents 22.04 DALL-E 2. To leverage CLIP representations for
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance 22.04 Google Research Survey on NLP
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision 22.02 FAIR SEER 10 parameter

4. MAE related papers

Paper Date Institute, Citation Note
Context autoencoder for self-supervised representation learning
Tfill: Image completion via a transformer-based architecture
ibot: Image bert pre-training with online tokenizer
Benchmarking detection transfer learning with vision transformers
Corrupted image modeling for self-supervised visual pre-training
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Bevt: Bert pretraining of video transformers
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
ReMixer: Object-aware Mixing Layer for Vision Transformers and Mixers
A Survey on Dropout Methods and Experimental Verification in Recommendation