Sony Pixel Power calrec Sony

Now Hear This: World's Most Flexible Sound Machine Debuts

25/11/2024

A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.

While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.

Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.

For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.

This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.

A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.

Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.

A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.

The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.

An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.

Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.

Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.

Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.

For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.

With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.

Users Get Artistic Controls Several capabilities add to Fugatto's novelty.

During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.

The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.

I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.

In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.

The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.

Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.

A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.

The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.

Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.

One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.

They also scrutinized existing datasets to reveal new relationships among the dat
LINK: https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

06/09/2026

Dolby and MagentaTV Bring Fans Closer to the FIFA World Cup 2026 in Germany with Dolby Vision and Dolby Atmos

June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

11/06/2026

HBSs Johannes Franken on Digital Innovations, the Role of the Influencer at the 2026 FIFA World Cup

The immense size of the tourney and its Atlantic-spanning operation also disting...

11/06/2026

Nielsen: Soccer Fandom in North America Tops 136 Million, Up 10.9% in Five Years

Nielsen has released a new soccer fandom consumer research report, The Fans Behind The Game: FIFA World Cup 2026 Edition, examining the soccer audience in the...

11/06/2026

Telemundo Announces All-Day Opening Day Coverage for FIFA World Cup 2026 on June 11

Telemundo will launch its FIFA World Cup 2026 coverage on Thursday, June 11 with...

11/06/2026

Fubo Announces Distribution Agreement With NBCUniversal

FuboTV Inc. has announced a distribution agreement with NBCUniversal. Fubo customers can now stream Telemundo and Universo, with NBC Sports Network (NBCSN), NBC...

11/06/2026

DAZN Announces In-App Features for FIFA World Cup 2026 Coverage in Spain, Italy, and Japan

DAZN has announced its in-app features for FIFA World Cup 2026 coverage in Spain...

11/06/2026

Roblox Report: Sports Engagement on Platform Drives Real-World Fandom and Purchases

Roblox has released the 2026 Roblox Digital Expression Report: Wave 4 - Sports D...

11/06/2026

Andrea Bocelli, David Guetta, Megan Thee Stallion, and EJAE Release Official FIFA World Cup 2026 Anthem DNA'

FIFA has unveiled DNA, the Official FIFA World Cup 2026 Anthem, performed by A...

11/06/2026

ESPN Announces Extensive English- and Spanish-Language World Cup 2026 Coverage

ESPN will provide English- and Spanish-language news and information coverage of FIFA World Cup 2026 across its U.S. media platforms from June 11 through July 1...

11/06/2026

SVG Students To Watch: Teddy Batkin, Rochester Institute of Technology

The latest product of the outstanding RIT Sports Network program, this recent grad from Long Island is carving out a promising path in broadcast engineering In...

11/06/2026

DAZN and DSPORTS Announce Distribution Agreement Across Five Latin American Countries

DAZN has announced a multi-year agreement to make DSPORTS channels available to ...

11/06/2026

Resource Actors Throughout the Years at Sundance Institute's Directors Lab

Laura Dern at the 1986 Sundance Institute Directors Lab (Photo by Eric Edwards) By Lucy Spicer It takes a village to bring together the Sundance Institute lab...

11/06/2026

Introducing a New Standard for Podcast Plays and Upgraded Creator Analytics Experience

As podcast formats evolve in the streaming era, podcasting needs updated, transp...

11/06/2026

RADAR Italia Unveils 6 New Artists and a New Approach for 2026

As Spotify's global RADAR program enters its sixth year in Italy, a new class of artists is stepping into the spotlight. Today, we're announcing the six...

11/06/2026

5 Audiobooks that Amplify and Celebrate Queer Voices

Pride Month is a time for celebration, reflection, and amplifying the diverse stories and perspectives from the LGBTQIA+ community that enrich our world. To hel...

11/06/2026

VSL introduce Synchron Solo Violin 1 & Cello (sordino)

First in new line of muted string libraries VSL have just announced the launch of two new string libraries that represent the first two instalments in a new...

11/06/2026

Novation reveal the Launchkey 61 MK4 White

New colour option for 61-key Launchkey MK4 At Superbooth 2025, Novation introduced the Launchkey Mini 37 White and Launchkey 49 White, bringing an additiona...

11/06/2026

Arturia announce the MiniLab 37

Larger, but still compact! Arturia's popular compact MIDI controller keyboard is now available in a, well, slightly less compact version! The new MiniLa...

11/06/2026

Eurosatory 2026: Rohde & Schwarz shapes the new-generation battlefield

Eurosatory 2026: Rohde & Schwarz shapes the new-generation battlefield Rohde & Schwarz unveils next generation SIGINT/EW and CUAS solutions on uncrewed system...

11/06/2026

Rohde & Schwarz unveils NEMACS - Directional, ultra secure connectivity for the future battlefield

Rohde & Schwarz unveils NEMACS - Directional, ultra secure connectivity for the ...

11/06/2026

MTI FILM acquires Mango/New Edit

MTI FILM acquires Mango/New Edit Posted by MTI Film on June 10, 2026 LOS ANGELES, CA - June 2026 - MTI FILM, the multiple Emmy Award winning Hollywood post-p...

11/06/2026

Ungrounded LLM Fabricates Every Detail for Nearly 1 in 5 Movie and TV Titles Tested, New Gracenote Report Finds

Study underscores the need for authoritative content intelligence to build trust...

11/06/2026

PTZOptics, LayerJot Partner on AI-Powered PTZ at InfoComm 2026

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Chyron Unveils PAINT 10.4

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Maxon Brings Real-Time Architectural Visualization to AIA26 With New Redshift for Revit and Archicad Integration Beta

Maxon Brings Real-Time Architectural Visualization to AIA26 With New Redshift fo...

11/06/2026

ABC Kid's Caper Crew Shoots Australian Adventure with Blackmagic Design

ABC Kid's Caper Crew Shoots Australian Adventure with Blackmagic Design Brie Clayton June 11, 2026 0 Comments DP Judd Overton and team bring Wes A...

11/06/2026

PTZOptics and LayerJot demo Visual Reasoning at InfoComm...

PTZOptics, and LayerJot today announced live demonstrations at InfoComm 2026 showing how prompt-based AI, robotic camera control, and high-performance computing...

11/06/2026

Lightware launches GPIO Button to deliver simplified hard...

Lightware, an industry leader in signal management, announces the release of GPIO-Button-10S, a dedicated control interface enabling straightforward press-to-a...

11/06/2026

NABs LeGeyt Urges Congress to Limit NFL's Antitrust Exemption

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Fubo Inks New Distribution Agreement with NBCUniversal

Share Copy link Facebook X Linkedin Bluesky Email...

11/06/2026

Kiloview to Showcase Broadcast-Grade AV-over-IP Solutions...

Kiloview, a leading innovator in AV-over-IP video solutions, will return to InfoComm 2026 (Booth# N8327) with broadcast-grade AV-over-IP solutions designed for ...

11/06/2026

Australian Games Industry Glossary of Terms

Australian Games Industry Glossary of Terms 10 June 2026 From DAU and EULA to COT and QADE, here's a list of game industry terms, industry jargon and their...

11/06/2026

Berklee's Tonya Butler Named Music Business Educator of the Year

Berklee's Tonya Butler Named Music Business Educator of the Year The Music Business Association honored Butler at its annual Bizzy Awards. June 10, 2026 ...

11/06/2026

Ann Mincieli to Receive Honorary Doctorate at Berklee NYC Graduate Commencement

Ann Mincieli to Receive Honorary Doctorate at Berklee NYC Graduate Commencement The five-time Grammy-winning engineer and producer, known for her longstanding...

11/06/2026

Daisy May Cooper rallies the nation ahead of ICC Womens T20 World Cup

Thursday 11 June 2026 Daisy May Cooper rallies the nation ahead of ICC Women's T20 World CupTurn on cookies to view this content. Go to Privacy options and...

11/06/2026

Hadewych Minis and Geert van Rampelberg to Star in New Netflix Series Directed by Paula van der Oest

Back to All News Hadewych Minis and Geert van Rampelberg to Star in New Netflix...

11/06/2026

Official Trailer for Anime Adaptation of Thunder 3' Unveiled Ahead of July 9 Premiere

Back to All News Official Trailer for Anime Adaptation of Thunder 3' Unvei...

11/06/2026

RT Radio 1 and Irish Lights mark RT 100 with special broadcasts from Ireland's Lighthouses

Summer solstice shows from C il House and Late Date from 9pm on Saturday 20 Jun...

11/06/2026

Save Big and Play Bigger: GeForce NOW Summer Sale Brings Major Membership Savings

The GeForce NOW summer sale kicked off today with limited-time savings of up to ...

10/06/2026

SVG Sit-Down: Team Whistle's Joe Caporoso on Building World Cup Content Around Fans, Culture, IRL Experiences

DAZN-owned digital-media company launches three fan-first series leaning into cr...

10/06/2026

Clear-Com Appoints Jason Dino as Southwest Regional Sales Manager

Clear-Com has announced the appointment of Jason Dino as Southwest Regional Sales Manager USA, covering Southern California and the Southwest region. Dino joins...

10/06/2026

Caretta Research: 2026 World Cup Revenue Growth Due to More Matches; Rights Revenue Up 32%

An 11% decrease in number of global broadcast deals reflects the organization...

10/06/2026

Women Without Boundaries Awards Are Back!

The Women Without Boundaries Awards recognize women whose work is advancing the future of media, broadcast, AV, workplace technology, digital experience, and re...

10/06/2026

On Eve of World Cup Kickoff, FIFA and HBS Offer Deep Dive into IBC Operations, Commentary, and Ref Cam

Today is match day minus two for FIFA and HBS. On Thursday, there will be two ma...

10/06/2026

SES Supporting World's Biggest Soccer Tournament Broadcast Distribution Worldwide

SES is supporting broadcast distribution of the world's biggest football tou...

10/06/2026

BirdDog Achieves Full NDI 6.3 Compatibility Across Entire Product Line

NDI has announced that BirdDog has become the first hardware manufacturer to achieve full NDI 6.3 compatibility across its complete lineup of cameras, encoders,...

10/06/2026

Emmy Award-Winning Audio Team To Present at SVG Audio Symposium

Vince Caputo and Scott Carter, winners of the 2026 Sports Emmy for Outstanding Post Produced Audio have been announced as presenters for the 2026 SVG Advanced A...