Architecture Matters in Continual Learning
TLDR
Choice of architecture can significantly impact the continual learning performance.
Width increases, forgetting decreases.
Quick Look
- Authors & Affiliation: Seyed Iman Mirzadeh et. al.,Washington State University, DeepMind,
- Link: https://arxiv.org/pdf/2202.00275.pdf
- Comments: arXiv paper
- Relevance: 5, highly relevant.
Research Topic
- Category (General): Continual Learning
- Category (Specific): Continual Learning, Catastrophic Forgetting.
Paper Summary (What)
- Tried to execute a comprehensive version of ablation study in terms of CL arthitecture.
- Increasing width is helpful, increasing depth is not.
- Effect of the batchnorm layer is data-dependent.
- Max pooling layers are helpful.
- Skip connection is not necessarily helpful.
- ViTs show promising robustness against distributional shifts as evidenced by lower forgetting numbers.
Notable References
- Vision Transformers are Robust Learners
- EFFECT OF MODEL AND PRETRAINING SCALE ON CATASTROPHIC FORGETTING IN NEURAL NETWORKS
- Wide Neural Networks Forget Less Catastrophically
아래와 같은 양식을 활용한다.
#
## Quick Look
- **Authors & Affiliation**: [Authors][Affiliations]
- **Link**: [Paper link]
- **Comments**: [e.g. Published at X / arXiv paper / in review.]
- **TLDR**: [One or at most two line summary]
- **Relevance**: [Score between 1 and 5, stating how relevant this paper is to your work. Usually filled in at the end.]
# Research Topic
- **Category** (General):
- **Category** (Specific):
# Paper Summary (What)
[Summary of the paper - a few sentences with bullet points. What did they do?]