MACE-Dance

Abstract

We present MACE-Dance, a cascaded expert framework for music-driven dance video generation. It decouples the problem into music-to-3D motion generation and motion-guided appearance synthesis, producing dance videos with plausible motion, expressive style, coherent identity, and stable temporal details.

A Motion Expert first generates rhythm-aligned 3D SMPL motion from music; an Appearance Expert then animates the reference subject with the generated motion. This motion-appearance cascade enables high-quality comparisons across music-driven video generation, 3D dance generation, pose-driven animation, long-sequence generation, and diverse dance genres.

Overview Video

Video

Results

Comparison

Dance Video Generation

3D Dance Generation

Image Animation

Ablation Studies

Long Sequence

Diverse Genres

Citation

BibTeX

@article{yang2025mace,
  title={MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation},
  author={Yang, Kaixing and Zhu, Jiashu and Tang, Xulong and Peng, Ziqiao and Zhang, Xiangyue and Wang, Puwei and Wu, Jiahong and Chu, Xiangxiang and Liu, Hongyan and He, Jun},
  journal={arXiv preprint arXiv:2512.18181},
  year={2025}
}