Dubbed the “Swiss Army knife for sound,” this tool offers flexibility and creative control, which could make it a game-changer in audio production.
Unlike previous models that specialize in limited tasks, Fugatto can generate music snippets, modify voices with different emotions or accents, and even create entirely new soundscapes. For example, it can produce a trumpet that barks or a saxophone that meows—limited only by the user’s imagination.
Developed by a global team of researchers, Fugatto uses 2.5 billion parameters and leverages Nvidia’s DGX systems equipped with H100 GPUs. The model supports a wide range of applications, such as music production, marketing and advertising, education, and gaming – from prototyping customizable songs and tailoring voiceovers to adjusting and generating sound effects.
One standout feature is temporal interpolation, allowing soundscapes to evolve over time. For instance, Fugatto can simulate a thunderstorm transitioning into a peaceful dawn, complete with birdsong.
Fugatto also allows users to combine and fine-tune instructions in creative ways through ComposableART. For example, a prompt could specify text spoken with a French accent and a hint of sadness.
“I wanted users to blend attributes in an artistic way,” said Rohan Badlani, an Nvidia researcher. “The results often surprised me, making me feel like an artist despite being a computer scientist.”
Fugatto’s innovative capabilities highlight Nvidia’s vision for AI-driven creativity. By enabling unsupervised multitask learning, the model bridges the gap between traditional audio synthesis and cutting-edge AI.
“We’re writing the next chapter of music and sound,” said multi-platinum producer Ido Zmishlany. “With AI, we have a new instrument—and that’s super exciting.”