Sony Pixel Power calrec Sony

Now Hear This: World's Most Flexible Sound Machine Debuts

25/11/2024

A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.

While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.

Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.

For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.

This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.

A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.

Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.

A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.

The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.

An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.

Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.

Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.

Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.

For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.

With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.

Users Get Artistic Controls Several capabilities add to Fugatto's novelty.

During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.

The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.

I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.

In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.

The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.

Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.

A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.

The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.

Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.

One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.

They also scrutinized existing datasets to reveal new relationships among the dat
LINK: https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

16/05/2026

Spectrum TV Adds Free Access to Discovery+

Share Copy link Facebook X Linkedin Bluesky Email...

16/05/2026

Tower Family Foundation Passes $3.5 Million Milestone

Share Copy link Facebook X Linkedin Bluesky Email...

16/05/2026

AIMS to Offer IPMX Education at InfoComm 2026

Share Copy link Facebook X Linkedin Bluesky Email...

16/05/2026

Rise AV Launches Second Year of UK Elevate Program

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

Seattle Sounders FC and Reign FC Announce Seattle Soccer Celebration at Waterfront Park

Seattle Sounders FC and Seattle Reign FC, in partnership with RAVE Foundation an...

15/05/2026

How Sound Designer Dan Brumm Built Blueys Audio World with Sennheiser and Neumann

Dan Brumm has served as sound designer on Bluey, the Australian children's t...

15/05/2026

Applications Close May 31 for Mark Brunner Professional Audio Scholarship

The Professional Audio Manufacturers Alliance (PAMA) and Shure Incorporated are accepting applications for the 6th annual Mark Brunner Professional Audio Schola...

15/05/2026

Netflix Expands NFL Coverage With Additional Games Starting in 2026

Netflix has announced an expanded NFL schedule for 2026 and beyond under a four-year partnership extension with the NFL through the 2029-30 season. Each season,...

15/05/2026

Ateme Supports TVRIs SRT-Based Live Sports Contribution and Distribution Workflow

Ateme is supporting TVRI (Televisi Republik Indonesia) with a contribution and d...

15/05/2026

Concacaf Launches New Website and Mobile App Powered by Deltatre

Concacaf has announced the launch of a new website and mobile app built on Deltatre's FORGE platform. Concacaf.com and the mobile app, available on iOS and ...

15/05/2026

Qatar Media Corporation Launches QBC Business Channel in 4K via Eutelsat

Eutelsat has announced the launch of QBC Business Economic Channel by Qatar Media Corporation, broadcasting in 4K/UHD via Eutelsat's 7/8 West video neighbo...

15/05/2026

Amazon to Serve as Exclusive Launch Home of MLS Original Series Cup Dreams on May 14

Major League Soccer has announced four original content series timed to the 2026...

15/05/2026

AIMS to Focus on IPMX Education at InfoComm 2026

The Alliance for IP Media Solutions (AIMS) has announced it will exhibit and present at InfoComm 2026, taking place June 13-19 at the Las Vegas Convention Cente...

15/05/2026

InfoComm 2026To Feature Sports, Broadcast, and Live Event Technologies

InfoComm 2026 will take place June 13-19 (exhibits June 17-19) at the Las Vegas Convention Center. The show will include sessions and exhibits covering broadcas...

15/05/2026

Tracy McGradys Ones Basketball League Signs First Streaming Agreement with Fubo Sports Network

Tracy McGrady's Ones Basketball League (OBL) and FuboTV Inc. have announced ...

15/05/2026

Disguise and Creative Technology Return for Eighth Year at Eurovision Song Contest 2026

Disguise has partnered with Creative Technology (CT) to deliver visual playback ...

15/05/2026

Sony Announces Alpha 7R VI Camera and FE 100-400mm F4.5 GM OSS Lens

Sony Electronics has announced two new products for professional imaging: the Alpha 7R VI full-frame mirrorless camera and the FE 100-400mm F4.5 GM OSS super-te...

15/05/2026

SVG GameDay, Ep. 15: New Jersey Devils Joe Kuchie - Growing the Game in the Garden State

In-venue and creative video staffers at the professional and collegiate level ha...

15/05/2026

Ratings Roundup: ESPN Secures Top Viewed Second Round Game 4 of Stanley Cup on Cable; NBA Draft Lottery Viewership Up 23%

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

15/05/2026

The Future of Sports Analytics: Building Trust and Intelligence With SmerSports and Cisco

For sports organizations, the most valuable assets are often the most sensitive:...

15/05/2026

NFL Broadcast Schedule Roundup: Breaking Down CBS, ESPN, FOX, NBC, Netflix, and Prime Lineups

The NFL's broadcast partners released their 2026 regular season schedules ye...

15/05/2026

Netflix Steps Into the Cage for First MMA Production With Rousey-Carano Showdown at Intuit Dome

When MMA icons Ronda Rousey and Gina Carano meet inside the Hexagon at Intuit Do...

15/05/2026

Dustin Hoffman and Leo Woodall Bring the Noise in Daniel Roher's Tuner

Daniel Roher attends the Tuner Premiere during the 2026 Sundance Film Festival at Eccles Theatre on January 22, 2026 in Park City, Utah. (Photo by Neilson Bar...

15/05/2026

And The Winners of the 2026 Spotify Podcast Awards in Mexico Are

Last night, the Spotify Podcast Awards in Mexico returned to the country's capital. Now in its second year, the evening honors creators whose voices are hel...

15/05/2026

Music Expo (San Francisco) becomes MONO Music Conference

Rebranded show announced Ahead of their 2026 return, Music Expo have announced that they have now officially changed their name to the MONO Music Conference...

15/05/2026

Buzzing Bugs Audio Devices introduce the Bolster

Fuzz pedal joins UK companys line-up UK-based pedal makers Buzzing Bugs Audio Devices have recently unveiled their latest creation, the Bolster. Said to pay...

15/05/2026

Joint Statement: News Bargaining Incentive

Joint Statement: News Bargaining Incentive 28 April, 2026 Media releases The vibrancy of Australian democracy relies on the robust and open exchange of new...

15/05/2026

Call it Deltavision, Australia's through to the Grand Final of this year's Eurovision Song Contest!

Call it Deltavision, Australia's through to the Grand Final of this year'...

15/05/2026

Join Calrec at MPTS 2026

Join Calrec at MPTS 2026 | May 13-14 | Stand A40 | Olympia, London We're looking forward to meeting up with customers and partners at this year's Media ...

15/05/2026

CTV's Data Gap Holding Back Bigger Ad Budgets, New Gracenote Research Finds

86% of media planners would move more linear TV budget to CTV if they had show-level targeting and reporting - and 65% would also shift dollars from programmati...

15/05/2026

Scripps Completes Station Swaps with Gray Media

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

Clear-Com Takes Communications Further at InfoComm 2026

Clear-Com will showcase new communications solutions and major platform updates at InfoComm 2026 (Booth N7005), June 17-19, in the North and Central Halls of t...

15/05/2026

Rise AV Launches Second Year of UK Elevate Programme Foll...

Following an outstanding inaugural year in 2025, Rise AV is proud to announce the return of its flagship leadership initiative, Elevate. The programme continues...

15/05/2026

Berklee Announces Lineup for Inaugural AI Music Summit

Berklee Announces Lineup for Inaugural AI Music Summit The three-day event puts musicians at the center of the future of music creation, ethics, and the indus...

15/05/2026

Lightware Highlights Scalable USB-C and AV-over-IP Innova...

Lightware returns to InfoComm 2026 with a focused showcase of scalable USB-C connectivity, next-generation AV-over-IP solutions, and technologies that help over...

15/05/2026

IAB Releases Campaign Data Standards 1.0 for Public Comment

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

ARRI Expands Management Board

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

Gray Media Names Joanie Vasiliadis SVP of Transformation

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

Study: Data and Measurement Problems Reduce CTV Ad Budgets

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

Upfronts: WBD Expands Advanced Ad Capabilities and AI Ad Tech

Share Copy link Facebook X Linkedin Bluesky Email...

15/05/2026

VLAST Powers PLAVEs Asia Tour Encore with AJA Gear

Delivering a live, arena-scale production of a massively popular band is no small feat. Between expansive in-arena LED walls and a global live stream fed to onl...

15/05/2026

Sun Broadcast Futureproofs Dayalbaghs Multimedia Van with...

Connection is the heartbeat of any strong community, and with live streaming becoming more accessible in the modern era, it's much easier for faith-based or...

15/05/2026

Disguise and Creative Technology Power Eurovision for the...

Powered by GX 3 media servers, optimised IP-VFC workflows and on-site engineering expertise, the production delivers high-performance visuals for one of the wor...

15/05/2026

UKTV joins forces with BritBox and Sony Pictures Television for a co-commission of Chocolate Wars (w/t)

The six-part series is a co-commission with BritBox and Sony Pictures Television...

15/05/2026

A Mother, Two Daughters and One Big Scandal: Netflix's Crime-Comedy 'Maa Behen' Premieres June 4

Back to All News A Mother, Two Daughters and One Big Scandal: Netflixs Crime-Co...

15/05/2026

Why Trusted Measurement Matters More Than Ever in Retail Media

Against that backdrop, IAB UK has added retail media to its Gold Standard. Jan Pitt, Commercial Director at ABC, spoke with Liv McCullagh, Retail Media Lead at ...