@ Convincing 
 Token merging: Clustering algorithm = complex >> bipartite matching algorithm 
 Offsite-Tuning: layers from shallow to deep encode different levels of feature >> a sandwich design

@ Offsite-Tuning -SongHan
https://github.com/mit-han-lab/offsite-tuning
On the Effect of Dropping Layers: https://arxiv.org/abs/2004.03844

@ Magic3D: High-Resolution Text-to-3D Content Creation
- https://research.nvidia.com/labs/dir/magic3d/

@ DreamFusion: Text-to-3D using 2D diffusion
- https://dreamfusion3d.github.io/

@ GenAug
- https://genaug.github.io/

@ Diffusion moel
- https://arxiv.org/abs/2209.14988
- https://www.youtube.com/watch?v=W-O7AZNzbzQ

@ OpenAI CLIP: ConnectingText and Images
- https://www.youtube.com/watch?v=T9XSU0pKX2E (21.01)
- https://arxiv.org/pdf/2210.08402.pdf >> CLIP, ALIGN, BASIC, GLIDE (22.10)

@ The lottery ticket hypothesis (ICLR 2019 best paper)
- https://arxiv.org/pdf/1803.03635.pdf

@ Token Merging: Your ViT But Faster
- https://research.facebook.com/blog/2023/2/token-merging-your-vit-but-faster/
- They emphasize that bipartite soft matching algorithm dose not contain any iterative clustering algorithms, which make their approach more effective then existing works using clustering algorithm, such as k-means. This is why they focus on maching and not clustering. >> CDTTA

@ Data2vec-2: Highly efficient self-supervised learning for vision, speech and text.
- version1: https://arxiv.org/pdf/2202.03555.pdf
- https://ai.facebook.com/blog/ai-self-supervised-learning-data2vec/
- 'People appear to learn much more efficiently than current AI, and also learn from different kinds of information in a similar way, rather than relying on separate learning mechanisms for text, speech, and other modalities.'
- Motivation: generalize across modalities/efficiency by predicting contextualized targets, not raw pixels like MAE.
- Q: Are there the channels of a feature whose variant is zero when using the self-sup model?

@ Offsite-Tuning -SongHan
- https://arxiv.org/pdf/2302.04870.pdf
- On the effect of dropping layers of pre-trained transformer models (https://arxiv.org/pdf/2004.03844.pdf)

@ Efficient transformer, CVPR22
- MetaFormer Is Actually What You Need for Vision, CVPR22
- CMT: Convolutional Neural Networks Meet Vision Transformers, CVPR22

@ Google Research: Algorithms for efficient deep learning
- https://ai.googleblog.com/2023/02/google-research-2022-beyond-algorithms.html
- My people: https://ai.googleblog.com/2023/02/google-research-2022-beyond-algorithms.html#Acknowledgements
- Mixture-of-experts are interested. 
    - The sparsely-gated mixture-of-experts layer.
    - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
    - Mixture-of-Experts with Expert Choice Routing

@ Federated learning in the industry
- https://ai.googleblog.com/2023/02/google-research-2022-beyond-algorithmic.html#Theme2
- https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
- Google research mainly focuses on true data privacy, such as 1) degradation of the gradient information 2) Generative Gradient Leakage (GGL) which denotes the reconstruction method from shared gradients from clients to a server. 
- Ex, Auditing Privacy Defenses in Federated Learning, CVPR22 
    - 'FL places a heavy emphasis in privacy-sensitive scenarios such as typing prediction [21], spoken language understanding [16,20], medical research [4,8,41], and financial services [32, 50].'

@ Problem: variance of several channels of the intermediate feature = Zero 
- What about the self-sup model?

@ Vision with RL
- Tuning computer vision models with task rewards (https://arxiv.org/pdf/2302.08242.pdf)
- GhatGPT paper: https://arxiv.org/pdf/2203.02155.pdf, (+ https://openai.com/blog/how-should-ai-systems-behave/)
- I am not sure that RL+vision would be effective. If we can use human feedback, what about finding the approach for sampling false cases, such as LabOR, and giving the model a useful optimization signal. 
- New OD approach: Pix2seq: A Language Modeling Framework for Object Detection

@ Geoffrey Hinton
- The Forward-Forward Algorithm: Some Preliminary Investigations