From NLP to Computer Vision
1. Good Reference
- (Updating) Awesome BERT & Transfer Learning in NLP
- BERT related papers
- BERT related articles
- Attention
- Transformer
- GPT (Generative pre-training Transformer)
- CLIP Contrastive Language-Image Pretraining
- DALL-E 2 (Text-to-Image Revolution)
- Transformer Attention Video
- (Old) Awesome BERT
- Transformer understanding by Me (1) attention
- Transformer understanding by Me (2) Transformer
- Awesome-CLIP: See Application contents (Detection)
2. Youtube Contents
- Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
- BERT and GPT (한국어 설명)
- BERT
3. Must Paper lists
Paper |
Date |
Institute, Citation |
Note |
Masked Autoencoders Are Scalable Vision Learners (MAE) |
21.11 |
Kaiming He |
|
BEiT: BERT Pre-Training of Image Transformers |
21.06 |
|
MAE cited. |
BERT: Pre-training of deep bidirectional transformers for language understanding |
18.10 |
|
See youtube. |
Improving language understanding by generative pretraining |
18.11 |
OpenAI |
GPT-1. MAE cited. |
Language Models are Unsupervised Multitask Learners |
19.09 |
OpenAI |
GPT-2. MAE cited. |
Language Models are Few-Shot Learners |
20.05 |
OpenAI |
GPT-3. MAE cited. |
Contrastive Language-Image Pre-Training |
21.01 |
OpenAI |
CLIP |
Zero-Shot Text-to-Image Generation |
21.02 |
|
DALL-E |
Hierarchical Text-Conditional Image Generation with CLIP Latents |
22.04 |
|
DALL-E 2. To leverage CLIP representations for |
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance |
22.04 |
Google Research |
Survey on NLP |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision |
22.02 |
FAIR |
SEER 10 parameter |
Paper |
Date |
Institute, Citation |
Note |
Context autoencoder for self-supervised representation learning |
|
|
|
Tfill: Image completion via a transformer-based architecture |
|
|
|
ibot: Image bert pre-training with online tokenizer |
|
|
|
Benchmarking detection transfer learning with vision transformers |
|
|
|
Corrupted image modeling for self-supervised visual pre-training |
|
|
|
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers |
|
|
|
Bevt: Bert pretraining of video transformers |
|
|
|
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond |
|
|
|
ReMixer: Object-aware Mixing Layer for Vision Transformers and Mixers |
|
|
|
A Survey on Dropout Methods and Experimental Verification in Recommendation |
|
|
|