Meet ‘EDGE’. A Diffusion Based AI Model That Generates Realistic And Long-Form Dance Sequences Conditioned On Music

Many cultures place a high value on dance as a means of expression, communication, and social connection. However, producing new dances or dance animations is challenging because dance movements are expressive and freeform while carefully organized by music. In actuality, this calls for either time-consuming hand animation or infeasible motion capture techniques. However, the burden of the creation process can be reduced by using computational methods to generate dances automatically. This has a wide range of applications, including assisting animators in creating new dances and providing interactive characters in video games or virtual reality with realistic and varied movements based on user-provided music. Additionally, dance creation can shed light on how music and movement interact, a required field of study in neurology.

Previous research has made tremendous strides in the application of machine learning-based techniques. Still, it has yet to have much success in producing dances from music that adhere to user requirements. Furthermore, prior works frequently employ quantitative criteria that they demonstrate to be unreliable, and evaluating created dances is a difficult and subjective process. This paper offers Editable Dance Generation (EDGE), a cutting-edge dancing generation technique that generates physiologically reasonable, realistic dance motions from input music. In their approach, a powerful music feature extractor called Jukebox is used in conjunction with a transformer-based diffusion model.

EDGE creates various physically plausible dance choreographies based on musical compositions

With its diffusion-based methodology, dance may benefit from powerful editing features like joint-wise conditioning. A novel metric that captures the physical correctness of ground contact behaviors without explicit physical modeling is suggested, in addition to the benefits that the modeling decisions instantly impart. In conclusion, the following is what they have contributed.

1. They provide a diffusion-based dance generation method that can produce arbitrary-length dance sequences while combining cutting-edge performance with strong editing tools.

2. They examine the measures in earlier studies and demonstrate that they are inaccurate representations of the human-evaluated quality, as revealed by significant user research.

3. They introduce the Physical Foot Contact Score, a straightforward new acceleration-based quantitative metric for scoring the physical plausibility of generated kinematic motions that do not require explicit physical modeling. Using a novel Contact Consistency Loss, they propose a new method to remove foot-sliding physical implausibilities in induced signs.

4. By using music audio representations from Jukebox, a pre-trained generative model for music that has previously shown high performance on music-specific prediction challenges, they improve on earlier hand-crafted audio feature extraction methodologies.

One can check out their website, which has wonderful video demonstrations as well. It is something you would not see every day.


Check out the: Paper: and: Project:. All Credit For This Research Goes To Researchers On This Project. Also, don’t forget to join our Reddit page: and: discord channel:where we share the latest AI research news, cool AI projects, and more.


Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.