
A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.
While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.
Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.
For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.
This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.
A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.
Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.
Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.
A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.
The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.
An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.
Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.
Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.
Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.
For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.
With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.
Users Get Artistic Controls Several capabilities add to Fugatto's novelty.
During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.
The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.
I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.
In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.
The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.
Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.
A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.
The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.
Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.
One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.
They also scrutinized existing datasets to reveal new relationships among the dat
Most recent headlines
05/01/2027
Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...
06/09/2026
June 9 2026, 23:00 (PDT) Dolby and MagentaTV Bring Fans Closer to the FIFA Worl...
04/08/2026
Dalet, a leading technology and service provider for media-rich organizations, t...
04/07/2026
April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...
03/07/2026
1 February 2023
SHARE Facebook Twitter Linkedin Email
Munich, Germany, 1st February 2023: Cinegy GmbH, the premier provider of software technology for digit...
03/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
03/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
03/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
03/07/2026
Death Star
Andy Marken July 2, 2026
0 Comments
Look, I can't get involved. I've got work to do. It's not that I like the Empire; I hate i...
03/07/2026
Berklee's New Visual Identity: Honoring Our History, Building for What's...
03/07/2026
Scripps Research scientists awarded $2M to advance global disease surveillance Two Gates Foundation grants will expand wastewater surveillance and AI-driven dis...
03/07/2026
Joan Pulupa joins Scripps Research faculty to study the organization of DNA in brain cells and its links to neurodegeneration Using smell-sensing neurons and ad...
02/07/2026
Entering her senior year, this hometown girl is paving a career in live sports production gaining experience in replay and audio and as a TD
In the live-sports...
02/07/2026
In-venue and creative video staffers at the professional and collegiate level ha...
02/07/2026
BLAST, a competitive entertainment company focused on esports, has announced more than $133 million in revenue for 2025, representing more than 40% year-over-ye...
02/07/2026
Riedel Communications has announced official SKAARHOJ panel support for SimplyLive production workflows, enabled through the SimplyLive 2.1 release. The integra...
02/07/2026
The Fire Rescue Service of the Czech Republic has deployed LiveU video-over-bond...
02/07/2026
Gravity Media USA has announced the appointment of Brittney Boston as Head of Business Development, effective July 1, 2026. Based in Nashville, Tennessee, Bosto...
02/07/2026
TwelveLabs, a video intelligence company, has announced $100 million in Series B funding co-led by NEA and NAVER Ventures, with participation from Amazon, Radic...
02/07/2026
The Pro Padel League (PPL) has announced a broadcast partnership with USA Sports that will air five PPL championship matches on CNBC during the 2026 season, the...
02/07/2026
LiveLike, a digital fan engagement platform, has announced eight confirmed FIFA ...
02/07/2026
Cobalt Digital has received Future's Best of Show Award, presented by AV Technology at InfoComm 2026, for its blueCORE family of standalone signal processor...
02/07/2026
Synamedia has announced the appointment of Dr. Tzvi Gerstl as Chief Executive Officer. Paul Segre, who has served as CEO for the past six years, will transition...
02/07/2026
The Esports Foundation (EF) and Sony Group Corporation have announced an expanded collaboration for the Esports World Cup 2026 (EWC), taking place in Paris, Fra...
02/07/2026
Zee Entertainment Enterprises Ltd. ( Z') has announced exclusive broadcast and digital rights for the Bundesliga in India for five years, beginning with the...
02/07/2026
NBCU brings together News, Sports, Local, and Telemundo for a 50+ camera live pr...
02/07/2026
Zoey Deutch, John Slattery, Ken Marino, Miles Gutierrez-Riley, and Ben Wang appe...
02/07/2026
Stammering, stuttering, strangulated tones
The Crow Hill Company's latest creation promises to be the most original sound set they've produced to d...
02/07/2026
A new era in unmixing and spectral editing
The latest version of Steinberg's spectral audio-editing software has just arrived, building on the strength...
02/07/2026
Aims to simplify additive synthesis
Sine Machine is the debut launch from Melatonin, a Vienna-based developer who have spent the past six years creating wha...
02/07/2026
Products to remain fully active & supported
Following the news of Native Instruments joining the inMusic brand line-up, Academy and Emmy Award-winning visua...
02/07/2026
What you missed!
Last weekend, Saturday 27 June 2026, saw the debut of Sound On Sounds new GearExpo UK event, the largest dedicated pro-audio event to take ...
02/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
02/07/2026
Following the successful launch of its inaugural APAC Mentoring Programme last month, the Rise AV APAC Regional Council will bring the conversation around mento...
02/07/2026
Blackmagic PYXIS 6K Used to Shoot Director Takahisa Zeze's Cry Out
Brie Clayton July 2, 2026
0 Comments
Highly mobile camera supports tense and de...
02/07/2026
Broadcast Solutions acquires BFE, expanding its lead in European broadcast, medi...
02/07/2026
Berklee Alum and Faculty Perform at Boston Public Library's 250th Anniversar...
02/07/2026
Broadcast Solutions GmbH, a leading systems integrator and provider of innovative solutions for the broadcast media industry, is acquiring BFE Studio und Medien...
02/07/2026
Cinegy GmbH, the premier provider of software-defined television technology, has extended the ingest facility at leading Brazilian sports company LiveMode, work...
02/07/2026
Share
Copy link
Facebook
X
Linkedin
Bluesky
Email...
02/07/2026
Standalone processors acknowledged for the innovation and value they bring to Pro AV
Cobalt Digital, a leading designer and manufacturer of signal processing ...
02/07/2026
Synamedia announced today the appointment of Dr Tzvi Gerstl as Chief Executive Officer. Paul Segre, who has served as CEO for the past six years, will transitio...
02/07/2026
Screen Australia backs audience-led filmmaking with new insight-driven initiativ...
02/07/2026
Screen Australia refines guidelines for Narrative Content Development and Docume...
02/07/2026
Maxon Autograph: Introduction to working with Tables
Simon Ubsdell July 1, 2026
0 Comments
An overview of Autograph's ridiculously powerful tables...
02/07/2026
Boston Conservatory's Soir e Breaks Records to Fund Student Scholarships The event achieved 127 percent of its fundraising goal in an evening celebrating ...
02/07/2026
How Adam Rosenwach Pivoted from Music to Med Tech Without Missing a Beat What do the rehearsal room and the boardroom have in common? More than you might thin...
02/07/2026
Warner Bros. Discovery UK & Ireland backs Unacceptable for a second series on TL...
02/07/2026
Thursday 2 July 2026
Tea with Judi Dench returns to Sky Arts with legendary guest, Sir Ian McKellen
Sky today confirms Tea with Judi Dench will return this su...
02/07/2026
Summer is heating up - and GeForce NOW is taking players along for the ride.
Start the month with Monopoly: Star Wars Heroes vs. Villains, bringing a galaxy fa...