UQ4ML / COMETA Workshop: Bayesian Continual Learning and Forgetting in Neural Networks (30 minutes)
Date:
At the UQ4ML / COMETA Workshop, the talk Bayesian Continual Learning and Forgetting in Neural Networks presents recent work by Bonnet et al. introducing MESU (Metaplasticity from Synaptic Uncertainty), a Bayesian approach to continual learning that tackles both catastrophic forgetting and catastrophic remembering.
Instead of freezing important weights or relying on explicit task boundaries, MESU treats each network parameter as a probability distribution and adapts how much it learns based on its uncertainty. Parameters with high uncertainty remain plastic, while confident ones are naturally stabilized, mirroring ideas of metaplasticity observed in biological synapses.
The work was published in Nature Communications, a high-impact, peer-reviewed journal known for rigorous interdisciplinary research across physics, biology, and machine learning. The paper proposes a new algorithm, and grounds it theoretically (Bayesian inference, variational free energy, Hessian connections) and empirically (Permuted-MNIST, CIFAR-100, OOD detection).
MESU operates in boundary-free streaming settings, making it far more realistic for real-world machine learning systems that continuously receive data without clean task boundaries.
Talk Video
MESU Update Rule (Metaplasticity from Synaptic Uncertainty)
For each parameter with mean \( \mu \) and standard deviation \( \sigma \), the MESU updates are:
\[ \Delta \mu = - \sigma_{t-1}^2 \, \frac{\partial C_t}{\partial \mu_{t-1}} + \frac{\sigma_{t-1}^2}{N \, \sigma_{\text{prior}}^2} \left( \mu_{\text{prior}} - \mu_{t-1} \right) \]
\[ \Delta \sigma = - \frac{\sigma_{t-1}^2}{2} \, \frac{\partial C_t}{\partial \sigma_{t-1}} + \frac{\sigma_{t-1}}{2N \, \sigma_{\text{prior}}^2} \left( \sigma_{\text{prior}}^2 - \sigma_{t-1}^2 \right) \]
with:
- \( C_t = \mathbb{E}_{q(\omega)}[-\log p(D_t \mid \omega)] \) (expected loss),
- \( N \) the memory window controlling forgetting,
- \( \mu_{\text{prior}}, \sigma_{\text{prior}} \) the prior parameters.
This work points toward a unifying perspective in which forgetting is a control mechanism preserving the uncertainty of the network for long-term adaptation. MESU establishes that bounding posterior memory through principled Bayesian learning-and-forgetting sustains plasticity and epistemic uncertainty in real-valued networks.
These ideas could be extended to low-storage systems where controlling the amount of information stored is crucial and uncertainty is decisive for learning. In parallel, extending these ideas to large-scale sequence models could reveal that transformers also require explicit mechanisms to regulate evidence accumulation if they are to adapt continually without rigidifying.
Across these settings, a common trait emerges: continual learning is less about preserving parameters than about preserving uncertainty in the right places, enabling models to decide when to learn, when to forget, and when to abstain.
