Abstract
We present MACE-Dance, a cascaded expert framework for music-driven dance video generation. It decouples the problem into music-to-3D motion generation and motion-guided appearance synthesis, producing dance videos with plausible motion, expressive style, coherent identity, and stable temporal details.
A Motion Expert first generates rhythm-aligned 3D SMPL motion from music; an Appearance Expert then animates the reference subject with the generated motion. This motion-appearance cascade enables high-quality comparisons across music-driven video generation, 3D dance generation, pose-driven animation, long-sequence generation, and diverse dance genres.
Overview Video
Video
Results
Comparison
Dance Video Generation
3D Dance Generation
Image Animation
Results
Ablation Studies
Results
Long Sequence
Results
Diverse Genres
Citation
BibTeX
@article{yang2025mace,
title={MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation},
author={Yang, Kaixing and Zhu, Jiashu and Tang, Xulong and Peng, Ziqiao and Zhang, Xiangyue and Wang, Puwei and Wu, Jiahong and Chu, Xiangxiang and Liu, Hongyan and He, Jun},
journal={arXiv preprint arXiv:2512.18181},
year={2025}
}