
A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.
While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.
Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.
For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.
This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.
A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.
Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.
Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.
A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.
The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.
An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.
Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.
Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.
Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.
For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.
With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.
Users Get Artistic Controls Several capabilities add to Fugatto's novelty.
During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.
The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.
I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.
In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.
The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.
Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.
A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.
The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.
Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.
One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.
They also scrutinized existing datasets to reveal new relationships among the dat
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
01/06/2026
January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026
Throughout the week, Dolby brings to life the latest innovatio...
02/05/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
01/05/2026
January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...
15/04/2026
Open Broadcast Systems has announced that BBC World Service has selected its IP ...
15/04/2026
LiveU has announced an expansion of its collaboration with Sony Corporation, add...
15/04/2026
Ateme has announced a collaboration with NVIDIA to support live Apple Immersive ...
15/04/2026
The Professional Fighters League (PFL) has announced a multi-year partnership renewal with DAZN DACH, covering Germany, Switzerland, Austria, Liechtenstein, and...
15/04/2026
Canon U.S.A. (NAB Booth C3825) today took the lid off of the CINE-SERVO 40-1200m...
15/04/2026
Panasonic Video and Audio Systems North America and NEP Group will demonstrate a...
15/04/2026
For the fourth year running, independent analysts found businesses across all industries and verticals pay roughly the same amount in fees as they spend on stor...
15/04/2026
The Soccer Tournament (TST) has announced a media rights deal with NBC Sports to...
15/04/2026
JB&A will host the Pre-NAB 2026 Technology Event on April 17-18 at Flamingo Las Vegas, ahead of NAB Show. The event features hands-on demonstrations and technic...
15/04/2026
The Sennheiser Group will exhibit at NAB Show 2026 (Booth 4931, Central Hall), with demonstrations from Sennheiser, Neumann, and Merging across three areas: Rel...
15/04/2026
NAB Show 2026 will take place April 18-22 at the Las Vegas Convention Center, wi...
15/04/2026
AI-Media has announced the LEXI Text Encoder and LEXI Voice Encoder at NAB Show 2026, the company's first new encoder hardware release in more than a decade...
15/04/2026
Italian camera support manufacturer Cartoni will introduce several new products at NAB Show 2026 (Booth C6540, Central Hall), including the Master 30 OB fluid h...
15/04/2026
Lawo and swXtch.io have announced a memorandum of understanding at NAB Show 2026, under which Lawo will explore incorporating swXtch.io's groundSwXtch softw...
15/04/2026
CacheFly will exhibit at NAB Show 2026 (Booth W3129, April 19-22, Las Vegas Convention Center), showcasing three new additions to its content delivery platform:...
15/04/2026
Synamedia has announced GO Shorts, a new module within its Synamedia Go OTT platform that uses AI to convert an operator's existing content library into a s...
15/04/2026
The NAB Show kicks off on Saturday, and the SVG and SVG Europe editorial teams a...
15/04/2026
AJA Video Systems has announced an agreement to acquire Comprimato, a live video encoding and processing software company. The deal will unite the two companies...
15/04/2026
Prime Video Sports' NBA Playoffs coverage, which includes the entire SoFi NB...
15/04/2026
Just announced, the SDE standard provides a unified method and file format to ensure consistent and reliably comparable noise predictions
Sports and entertainm...
15/04/2026
From immersive storytelling to laugh-out-loud comedies, podcasts are booming in ...
15/04/2026
Books have always moved with us, whether tucked in our bags or humming in our he...
15/04/2026
For many artists, independent venues are where music careers begin and fan communities take shape. Independent venue operators work hard every day to keep local...
15/04/2026
From gripping thrillers to poignant memoirs, the 21st century has had no shortage of unforgettable books. To celebrate the standout storytelling of our modern e...
15/04/2026
Vintage broadcast experts release second plug-in
Telsie T is the second plug-in to be released by SonicWorld, a German audio company who specialise in servi...
15/04/2026
Includes eight free UAD plug-ins
Universal Audio's latest bundle brings together a selection of their renowned plug-ins and virtual instruments, and is ...
15/04/2026
Maximum uptime for broadcasters: Rohde & Schwarz launches R&S BroadcastShield at...
15/04/2026
Image courtesy of MD Helicopters...
15/04/2026
Virginia Gov. Abigail Spanberger, L3Harris VP Mark Farley, and state and local l...
15/04/2026
U.S. Space Forces Ground-Based Optical Sensor System upgrade at the Maui Space S...
15/04/2026
NBCU-Versant notches 13.1% of TV viewing in February, its best since August 2024...
15/04/2026
New data reveals older Kiwis are financially resilient, loyal to local products,...
15/04/2026
aconnic AG (ISIN: DE000A0LBKW6), Munich, announces the market launch of the ACCE...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Evergent introduces its Agentic Revenue Orchestration Platform, transforming how subscription businesses across direct-to-consumer streaming, pay-TV, telecommun...
15/04/2026
Harmonic's XOS Media Processor Delivers Exceptional Video Quality to More than Half of U.S. Public Media Viewership
Harmonic (NASDAQ: HLIT) today announce...
15/04/2026
LONGMONT, COLORADO, APRIL 15, 2026 DPA Microphones N Series Digital Wireless System users in North America can now take full advantage of the system's exc...
15/04/2026
Cobalt Iron, a leading provider of SaaS-based enterprise data protection, today announced the launch of Compass Tape Gateway (CTG), a transformative enhancemen...
15/04/2026
Disguise to Showcase Cutting-Edge Experience Tech for Sports, Broadcast and More...
15/04/2026
Arooj Aftab Makes the Music She Wants to Hear The singular artist explores the juxtaposition of grief and joy, dark and light, in her distinctive sound.
Apri...
15/04/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
15/04/2026
Interra Systems, a provider of end-to-end quality assurance solutions for the digital media industry, is proud to announce its central role in the digital trans...