We propose Anchor Forcing, which supports prompt switches that introduce new subjects and actions while preserving context, motion quality, and temporal coherence across clips. In contrast, prior methods degrade over time and often fail to realize newly introduced interactions, as highlighted by the red boxes. Red text denotes the interaction newly specified in each segment.
Overall Framework
We denote the anchor memory for generating frame \(t\) as \(\mathcal{M}(t)\), and apply anchor-guided re-cache at \(f_1\) and \(f_2\) to update the local KV cache under the new prompt condition.
(a) Overview of Anchor Forcing in an interactive setting with two prompt switches.
(b) Prior re-cache. It rebuilds the local cache solely from historical frame latents, which fails to retain prior KV evidence across prompt switches.
(c) Anchor-guided re-cache at \(f_2\). It augments re-cache with the anchor memory \(\mathcal{M}(6)\) and refreshes the junction caches \(x_5\).
Interactive Long Video Showcases
Hover on the video to see corresponding text prompts
Play speed:
Comparisons
Compared with baselines, Anchor Forcing achieves stronger prompt compliance, more coherent dynamic motion, and higher long-range visual quality.
Play speed:
LongLive
Infinity-RoPE
MemFlow
AnchorForcing
0-10s: Sushi chef with headband slicing fresh tuna with a long knife at a minimalist bar.
10-20s: Young couple on a date admiring the chef's precision as he slices tuna at the bar.
20-30s: Chef expertly forming tuna nigiri and placing them on a wooden board for the couple.
30-40s: Chef sliding the fresh nigiri board to the smiling couple in a quiet Japanese restaurant.
40-50s: The couple tastes the sushi, and their faces show their delight at the delicious, fresh flavor.
50-60s: Chef preparing the next course with graceful, precise movements for the seated couple.
0-10s: Sushi chef with headband slicing fresh tuna with a long knife at a minimalist bar.
10-20s: Young couple on a date admiring the chef's precision as he slices tuna at the bar.
20-30s: Chef expertly forming tuna nigiri and placing them on a wooden board for the couple.
30-40s: Chef sliding the fresh nigiri board to the smiling couple in a quiet Japanese restaurant.
40-50s: The couple tastes the sushi, and their faces show their delight at the delicious, fresh flavor.
50-60s: Chef preparing the next course with graceful, precise movements for the seated couple.
0-10s: Sushi chef with headband slicing fresh tuna with a long knife at a minimalist bar.
10-20s: Young couple on a date admiring the chef's precision as he slices tuna at the bar.
20-30s: Chef expertly forming tuna nigiri and placing them on a wooden board for the couple.
30-40s: Chef sliding the fresh nigiri board to the smiling couple in a quiet Japanese restaurant.
40-50s: The couple tastes the sushi, and their faces show their delight at the delicious, fresh flavor.
50-60s: Chef preparing the next course with graceful, precise movements for the seated couple.
0-10s: Sushi chef with headband slicing fresh tuna with a long knife at a minimalist bar.
10-20s: Young couple on a date admiring the chef's precision as he slices tuna at the bar.
20-30s: Chef expertly forming tuna nigiri and placing them on a wooden board for the couple.
30-40s: Chef sliding the fresh nigiri board to the smiling couple in a quiet Japanese restaurant.
40-50s: The couple tastes the sushi, and their faces show their delight at the delicious, fresh flavor.
50-60s: Chef preparing the next course with graceful, precise movements for the seated couple.
0-10s: Paris bridge at sunset; street saxophonist; distant Eiffel Tower; melancholic melody.
10-20s: Young female dancer, moved, starts dancing gracefully.
20-30s: Duo perform together; he plays with more emotion, her dance more expressive; Eiffel Tower backdrop.
30-40s: Small crowd gathers, captivated; colorful sunset sky.
40-50s: Finale: last lingering note; graceful final pose; applause.
50-60s: They smile at each other, bow to the crowd
0-10s: Paris bridge at sunset; street saxophonist; distant Eiffel Tower; melancholic melody.
10-20s: Young female dancer, moved, starts dancing gracefully.
20-30s: Duo perform together; he plays with more emotion, her dance more expressive; Eiffel Tower backdrop.
30-40s: Small crowd gathers, captivated; colorful sunset sky.
40-50s: Finale: last lingering note; graceful final pose; applause.
50-60s: They smile at each other, bow to the crowd
0-10s: Paris bridge at sunset; street saxophonist; distant Eiffel Tower; melancholic melody.
10-20s: Young female dancer, moved, starts dancing gracefully.
20-30s: Duo perform together; he plays with more emotion, her dance more expressive; Eiffel Tower backdrop.
30-40s: Small crowd gathers, captivated; colorful sunset sky.
40-50s: Finale: last lingering note; graceful final pose; applause.
50-60s: They smile at each other, bow to the crowd
0-10s: Paris bridge at sunset; street saxophonist; distant Eiffel Tower; melancholic melody.
10-20s: Young female dancer, moved, starts dancing gracefully.
20-30s: Duo perform together; he plays with more emotion, her dance more expressive; Eiffel Tower backdrop.
30-40s: Small crowd gathers, captivated; colorful sunset sky.
40-50s: Finale: last lingering note; graceful final pose; applause.
50-60s: They smile at each other, bow to the crowd
0-10s: Sunlit museum gallery with high ceilings, polished floors; female curator in tweed adjusts a classical marble statue.
10-20s: Security guard in crisp uniform arrives to report a minor issue; statue stands majestically.
20-30s: Curator listens; both hear a faint strange noise from another exhibit.
30-40s: They cautiously head toward the noise.
40-50s: Source found: a small lost kitten hiding behind a large marble bust’s base, meowing.
50-60s: Curator gently picks up the kitten; guard smiles.
0-10s: Sunlit museum gallery with high ceilings, polished floors; female curator in tweed adjusts a classical marble statue.
10-20s: Security guard in crisp uniform arrives to report a minor issue; statue stands majestically.
20-30s: Curator listens; both hear a faint strange noise from another exhibit.
30-40s: They cautiously head toward the noise.
40-50s: Source found: a small lost kitten hiding behind a large marble bust’s base, meowing.
50-60s: Curator gently picks up the kitten; guard smiles.
0-10s: Sunlit museum gallery with high ceilings, polished floors; female curator in tweed adjusts a classical marble statue.
10-20s: Security guard in crisp uniform arrives to report a minor issue; statue stands majestically.
20-30s: Curator listens; both hear a faint strange noise from another exhibit.
30-40s: They cautiously head toward the noise.
40-50s: Source found: a small lost kitten hiding behind a large marble bust’s base, meowing.
50-60s: Curator gently picks up the kitten; guard smiles.
0-10s: Sunlit museum gallery with high ceilings, polished floors; female curator in tweed adjusts a classical marble statue.
10-20s: Security guard in crisp uniform arrives to report a minor issue; statue stands majestically.
20-30s: Curator listens; both hear a faint strange noise from another exhibit.
30-40s: They cautiously head toward the noise.
40-50s: Source found: a small lost kitten hiding behind a large marble bust’s base, meowing.
50-60s: Curator gently picks up the kitten; guard smiles.
Contact Us
Feel free to contact Yang Yang at yangyang98@zju.edu.com for any question, cooperation, and communication.
If you find this work useful, please consider citing:
@article{yang2026anchor,
title={Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion},
author={Yang, Yang and Zhang, Tianyi and Huang, Wei and Chen, Jinwei and Wu, Boxi and He, Xiaofei and Cai, Deng and Li, Bo and Jiang, Peng-Tao},
journal={arXiv preprint arXiv:2603.13405},
year={2026}
}
Thank LongLive to provide us the project page's template!