HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions

Shuolin Xu1,2, Siming Zheng2, Ziyi Wang2, HC Yu1, Jinwei Chen2, Huaqi Zhang2, Bo Li2 Peng-Tao Jiang2
1National Centre for Computer Animation, Bournemouth University, UK, 2vivo Mobile Communication Co., Ltd
Teaser Figure

Our method generates realistic human animations from complex motions guided by pose and reference appearance.

Abstract

Recent advances in diffusion models have significantly improved conditional video generation, particularly in the pose-guided human image animation task. Although existing methods are capable of generating high-fidelity and time-consistent animation sequences in regular motions and static scenes, there are still obvious limitations when facing complex human body motions (Hypermotion) that contain highly dynamic, non-standard motions, and the lack of a high-quality benchmark for evaluation of complex human motion animations. To address this challenge, we introduce the Open-HyperMotionX Dataset and HyperMotionX Bench, which provide high-quality human pose annotations and curated video clips for evaluating and improving pose-guided human image animation models under complex human motion conditions. Furthermore, we propose a simple yet powerful DiT-based video generation baseline and design spatial low-frequency enhanced RoPE, a novel module that selectively enhances low-frequency spatial feature modeling by introducing learnable frequency scaling. Our method significantly improves structural stability and appearance consistency in highly dynamic human motion sequences. Extensive experiments demonstrate the effectiveness of our dataset and proposed approach in advancing the generation quality of complex human motion image animations.

Method

Teaser Figure

The overview of our Hypermotion framework.

The model takes a reference image and a driving pose video as inputs and generates human animation. Pose control and reference image are injected via latent composition and guided by a binary mask. Spatial Low-Frequency Enhanced RoPE is applied in self-attention.

Video 1

Video 2

Video 1

Video 2

Video 3

Teaser Figure

HyperMotionX Bench content example.

BibTeX

@misc{xu2025hypermotion,
      title={HyperMotion: DiT-Based Pose-Guided Human Image Animation of Complex Motions}, 
      author={Shuolin Xu and Siming Zheng and Ziyi Wang and HC Yu and Jinwei Chen and Huaqi Zhang and Bo Li and Peng-Tao Jiang},
      year={2025},
      eprint={2505.22977},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.22977}, 
}