YuyaoGe's Website
YuyaoGe's Website
About
Highlight Publications
Other Publications
Projects
Posts
Light
Dark
Automatic
Home
Tags
Memory
Memory
Gated Differentiable Working Memory for Long-Context Language Modeling
Abstract: Long contexts challenge transformers as attention scores dilute across thousands of tokens and critical information is often lost in the middle. We reframe test-time adaptation as a budget-constrained memory consolidation problem and propose Gdwm (Gated Differentiable Working Memory), which introduces a write controller that estimates Contextual Utility, an information-theoretic measure of long-range contextual dependence, to allocate gradient steps efficiently.
Lingrui Mei
,
Shenghua Liu
,
Yiwei Wang
,
Yuyao Ge 葛钰峣
,
Baolong Bi
,
Jiayu Yao
,
Jun Wan
,
Ziling Yin
,
Jiafeng Guo
,
Xueqi Cheng
Jan 19, 2026
Cite
PDF
arXiv
Cite
×