Sony Pixel Power calrec Sony

Now Hear This: World's Most Flexible Sound Machine Debuts

25/11/2024

A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.

While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.

Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.

For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.

This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.

A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.

Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.

A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.

The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.

An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.

Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.

Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.

Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.

For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.

With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.

Users Get Artistic Controls Several capabilities add to Fugatto's novelty.

During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.

The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.

I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.

In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.

The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.

Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.

A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.

The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.

Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.

One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.

They also scrutinized existing datasets to reveal new relationships among the dat
LINK: https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model/...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

07/05/2026

L3Harris Red Wolf + SKY RAIDER II INTERNATIONAL: Unmatched Flexibility for Kinetic Strike, EW, ISR and Support Missions

Two multi-role L3Harris products - the Red Wolf launched effects vehicle and SK...

07/05/2026

US Air Force Selects L3Harris to Develop Digital Infrastructure for its Advanced Battle Management System Network

L3Harris will be developing key features of a secure and resilient digital infra...

07/05/2026

New Nielsen Report Shows The Profound Impact of Asian Influence on Sports and Pop Culture

Alysa Liu and Shohei Ohtani Help Drive Viewership as 91 of the Top 100 Broadcast...

07/05/2026

IAB Tech Lab Introduces Standardized Signals For CTV Ad Portfolio

Share Copy link Facebook X Linkedin Bluesky Email...

07/05/2026

Cobalt Digital Wins Two Future Best of Show Awards at 202...

Cobalt Digital Wins Two Future Best of Show Awards at 2026 NAB Show Manufacturer Recognized by TV Tech and TVBEurope for Innovation in signal processing Cobal...

07/05/2026

FOR-A delivers creativity practicality and technical exce...

Software and hardware platforms, AI power and user-friendliness on show...

07/05/2026

Intinor brings advanced IP transport diagnostics and HDR...

Intinor will demonstrate its latest technical enhancements for the Direkt series at BroadcastAsia 2026. With a continued focus on reliable contribution and remo...

07/05/2026

Bitmovin Powers Global Film Streaming Platform MUBI

Bitmovin has announced that MUBI has chosen Bitmovin as its cloud VOD encoding partner, replacing MUBI's legacy on premises encoding setup to improve scalab...

07/05/2026

Meet Graduates from Berklees Class of 2026

Meet Graduates from Berklees Class of 2026 Members of this years graduating class reflect on their proudest moments at Berklee and look ahead to whats next. ...

07/05/2026

Convergence is Now Established Reality,' Says NAB 2026 Report

Share Copy link Facebook X Linkedin Bluesky Email...

07/05/2026

Interra Systems to Showcase Best in Class QC Monitoring a...

At BroadcastAsia 2026, Interra Systems will demonstrate its latest innovations in automated quality control (QC), real-time monitoring, and captioning. The comp...

07/05/2026

PlayBox Neo shines a spotlight on delivering Secure Scala...

At this year's Broadcast Asia, PlayBox Neo is set to unveil recent innovations across its PlayBox Neo Suite and integrated range of broadcast media solution...

07/05/2026

BFBS selects Synamedia to deliver New Digital Platform fo...

Strategic three-year partnership marks delivery phase of BFBS' next-generation media vision Media organisation and military charity BFBS (British Forces Br...

07/05/2026

Amagi Unlocks New Opportunities for CTV Performance with...

Amagi, the agentic industry cloud platform for unified broadcast, streaming, and monetization, today announced the availability of its In-Content Ads offering v...

07/05/2026

Lightware Introduces Rack Taurus For Clean Centralized AV...

Lightware Visual Engineering has developed Rack Taurus for rack-based environments such as education and corporate training spaces. It enables clean and simple ...

07/05/2026

Jigsaw24 Showcases End-to-End Media Workflows at MPTS 202...

Jigsaw24, the UK's leading media equipment supplier and systems integrator, will showcase its expanded end-to-end workflow capabilities at MPTS 2026, signal...

07/05/2026

Ease Live Granted US Patent for Technology Powering Perfe...

Ease Live, an Evertz company, today announced that it has been granted a US Patent for its proprietary graphical overlay technology, the foundation for interact...

07/05/2026

Leader prepares advanced Test and Measurement showcase fo...

Test & measurement innovator Leader Electronics of Europe will present an extensive range of Leader, PHABRIX and LeaderPhabrix test and measurement solutions at...

07/05/2026

LiveU Signals a New Era of Live Broadcasting at Broadcast...

At Broadcast Asia 2026 (Stand 5H3 8), LiveU is redefining live production with its most powerful IP-video EcoSystem to date built to help broadcasters and con...

07/05/2026

Colorfront Streaming Server Awarded Trusted Partner Network Gold Logo Certification

January 8, 2024 Colorfront (colorfront.com), a leader in high-performance, on-s...

07/05/2026

Colorfront Unleashes New Opportunities for Content Owners Plus Other Ground-Breaking Visual Experiences at Nab 2024

March 20, 2024 NAB 2024, Las Vegas - Colorfront (colorfront.com), the multi-awa...

07/05/2026

Colorfront Delivers Innovations for HDR Cinema at Cinemacon 2025

April 1, 2025 CINEMACON, APRIL 1, 2025 - Colorfront (colorfront.com), the multi-award-winning developer of high-performance dailies/transcoding/streaming syste...

07/05/2026

Colorfront to Showcase Advanced HDR Cinema Mastering Solutions at CineEurope 2025

June 15, 2025 Colorfront (colorfront.com), an Academy and Emmy Award-winning de...

07/05/2026

Colorfront at ICTA Barcelona Cinema Technology Summit

June 15, 2025 Colorfront participated in the ICTA Barcelona Cinema Technology Summit on Sunday, June 15, 2025. Held at The Phenomena Experience, the event feat...

07/05/2026

Colorfront Does IBC 2024 Live From Stockholm!

July 1, 2025 Colorfront (colorfront.com), the multi-award-winning developer of high-performance dailies/transcoding/streaming systems for motion pictures, OTT,...

07/05/2026

Colorfront Transkoder Delivers a Masterful Performance at Annapurna Studios

July 1, 2025 Passion and dedication will take you places. Come with us on a short trip to the heart of India, where Annapurna Studios is living-up to the inspi...

07/05/2026

Colorfront Opens a New Chapter in Color Tools For Cinema & Television

July 3, 2025 Colorfront (colorfront.com), the multi-award-winning developer of high-performance dailies/transcoding/streaming systems for motion pictures, OTT,...

07/05/2026

Colorfront at IBC 2025: Advanced Tools Make Easy Work of Critical Mastering Tasks and More!

September 1, 2025 IBC 2025, Amsterdam - Colorfront (colorfront.com) - the Acade...

07/05/2026

Colorfront Introduces Colorfront Immersive Utility, a New Mac App for Creating Apple Immersive Video

April 17, 2026 LOS ANGELES - April 17, 2026 - Colorfront today announced Colorf...

07/05/2026

Colorfront Delivers Even More AI Automation Power and Extends Technology Partnerships with Dolby Apple

April 23, 2026 NAB 2026, Las Vegas - the Academy and Emmy Award-winning develop...

07/05/2026

CNN Founder Ted Turner Dies at 87

Share Copy link Facebook X Linkedin Bluesky Email...

07/05/2026

FCC Urges Appeals Court to Toss Challenges to Nexstar-Tegna Deal

Share Copy link Facebook X Linkedin Bluesky Email...

07/05/2026

Carr Announces FCC Staff Promotions

Share Copy link Facebook X Linkedin Bluesky Email...

07/05/2026

Recreating the 1974 Doctor Who Time Tunnel in After Effects

Recreating the 1974 Doctor Who Time Tunnel in After Effects Graham Quince May 6, 2026 0 Comments The Time Tunnel from Doctor Who titles is one of th...

07/05/2026

When the host grabs the mic: keeping sound clean on 'Family Feud'

Production sound mixer Dirk Sciarrotta has mixed Family Feud for 28 seasons. Thats up to 200 syndicated episodes per year, plus primetime celebrity specials - a...

07/05/2026

The mic Broadway's biggest shows have trusted for 20+ years

As audience expectations have evolved, Owen has stopped trying to hide microphones or excuse amplification. Instead, he treats sound as a fully immersive part o...

07/05/2026

Well known Irish faces pushed to the edge in new series of Uncharted with Ray Goggins

Well known Irish faces pushed to the edge in new series of Uncharted with Ray G...

07/05/2026

Powerful four generation story A Traveller Family airs on RT One and RT Player this Monday night

RT will premiere A Traveller Family, a compelling new documentary exploring the...

07/05/2026

Linked and Loaded: Gaijin Single Sign-On Now Available on GeForce NOW

Less typing, more tanking. Faster logins mean more time in the gaming action - and this week provides GeForce NOW members with a smoother path straight into th...

06/05/2026

Wisycom RF Solutions Support Gravity Medias Live Cycling and Marathon Broadcasts

Gravity Media Chief RF Communications Engineer Glenn Willems uses Wisycom RF over Fiber and wireless solutions across major cycling events and international mar...

06/05/2026

Sennheiser Spectera Module Now Available in Bitfocus Companion and Buttons

A Sennheiser Spectera module is now available in Bitfocus Companion and Buttons, enabling direct integration of Spectera with the two software platforms. The mo...

06/05/2026

Ted Turner, Cable Television Pioneer, Sports Broadcasting Hall of Famer, Dead at 87

Ted Turner, the visionary media entrepreneur whose appetite for disruption helpe...

06/05/2026

FIFA World Cup 2026: Peacock Launches Visin de Campo (aka Pitchside Live), Will Stream All 104 Matches in Spanish

Peacock is going all-in on the beautiful game - streaming all 104 FIFA World Cup...

06/05/2026

NoiseWorks Audio launch VoiceAssist Basic, Standard & Advanced

New pricing tiers for vocal/dialogue restoration tool NoiseWorks Audio's AI-powered vocal and dialogue processing plug-in is now available in three diff...

06/05/2026

RME TotalMix FX 2 now available

Popular mixing & routing software overhauled Following a recent public beta test, RME have launched the final release version of the powerful mixing and rou...