FlowerDance

Abstract

Music-to-dance generation translates auditory signals into expressive human motion, yet existing approaches still struggle to balance refined 3D motion quality with strict inference budgets. FlowerDance is designed for both physically plausible, artistically expressive motion and efficient generation in speed and memory usage.

FlowerDance combines MeanFlow with Physical Consistency Constraints for high-quality few-step sampling, and uses a lightweight non-autoregressive BiMamba backbone with Channel-Level Fusion for long-horizon music-to-dance synthesis. It also supports motion editing through time-decayed soft masking, enabling users to refine generated dance sequences interactively.

Overview Video

Video

Method

FlowerDance Framework

01

MeanFlow Few-Step Generation

Predicts interval-averaged velocity for stable, high-fidelity sampling.

02

Physical Consistency Constraints

Regularizes reconstructed motion, velocity, and 3D joints toward plausible kinematics.

03

BiMamba Channel-Level Fusion

Uses a lightweight non-autoregressive backbone for long-horizon music-motion alignment.

Efficiency

Fast Long-Horizon Generation

63M Params

4145 FPS

Results

Music-to-Dance Results

Comparison

Ours vs MEGA

Ours vs Lodge

Ours vs FineNet

Four-Way Comparison I

Four-Way Comparison II

Ablation

Generative Model

Model Backbone

Channel-Level Fusion

Physical Consistency Constraint

MeanFlow Loss

Sampling Strategy

ODE Sampling Step

Motion Editing

Citation

BibTeX

@inproceedings{yang2026flowerdance,
  title={FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation},
  author={Yang, Kaixing and Tang, Xulong and Peng, Ziqiao and Zhang, Xiangyue and Chen, Chubin and Zhou, Xukun and Wang, Puwei and Liu, Hongyan and He, Jun},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2026}
}