Sony Pixel Power calrec Sony

Now Hear This: World's Most Flexible Sound Machine Debuts

25/11/2024

A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.

While some AI models can compose a song or modify a voice, none have the dexterity of the new offering.

Called Fugatto (short for Foundational Generative Audio Transformer Opus 1), it generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files.

For example, it can create a music snippet based on a text prompt, remove or add instruments from an existing song, change the accent or emotion in a voice - even let people produce sounds never heard before.

This thing is wild, said Ido Zmishlany, a multi-platinum producer and songwriter - and cofounder of One Take Audio, a member of the NVIDIA Inception program for cutting-edge startups. Sound is my inspiration. It's what moves me to create music. The idea that I can create entirely new sounds on the fly in the studio is incredible.

A Sound Grasp of Audio We wanted to create a model that understands and generates sound like humans do, said Rafael Valle, a manager of applied audio research at NVIDIA and one of the dozen-plus people behind Fugatto, as well as an orchestral conductor and composer.

Supporting numerous audio generation and transformation tasks, Fugatto is the first foundational generative AI model that showcases emergent properties - capabilities that arise from the interaction of its various trained abilities - and the ability to combine free-form instructions.

Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale, Valle said.

A Sample Playlist of Use Cases For example, music producers could use Fugatto to quickly prototype or edit an idea for a song, trying out different styles, voices and instruments. They could also add effects and enhance the overall audio quality of an existing track.

The history of music is also a history of technology. The electric guitar gave the world rock and roll. When the sampler showed up, hip-hop was born, said Zmishlany. With AI, we're writing the next chapter of music. We have a new instrument, a new tool for making music - and that's super exciting.

An ad agency could apply Fugatto to quickly target an existing campaign for multiple regions or situations, applying different accents and emotions to voiceovers.

Language learning tools could be personalized to use any voice a speaker chooses. Imagine an online course spoken in the voice of any family member or friend.

Video game developers could use the model to modify prerecorded assets in their title to fit the changing action as users play the game. Or, they could create new assets on the fly from text instructions and optional audio inputs.

Making a Joyful Noise One of the model's capabilities we're especially proud of is what we call the avocado chair, said Valle, referring to a novel visual created by a generative AI model for imaging.

For instance, Fugatto can make a trumpet bark or a saxophone meow. Whatever users can describe, the model can create.

With fine-tuning and small amounts of singing data, researchers found it could handle tasks it was not pretrained on, like generating a high-quality singing voice from a text prompt.

Users Get Artistic Controls Several capabilities add to Fugatto's novelty.

During inference, the model uses a technique called ComposableART to combine instructions that were only seen separately during training. For example, a combination of prompts could ask for text spoken with a sad feeling in a French accent.

The model's ability to interpolate between instructions gives users fine-grained control over text instructions, in this case the heaviness of the accent or the degree of sorrow.

I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one, said Rohan Badlani, an AI researcher who designed these aspects of the model.

In my tests, the results were often surprising and made me feel a little bit like an artist, even though I'm a computer scientist, said Badlani, who holds a master's degree in computer science with a focus on AI from Stanford.

The model also generates sounds that change over time, a feature he calls temporal interpolation. It can, for instance, create the sounds of a rainstorm moving through an area with crescendos of thunder that slowly fade into the distance. It also gives users fine-grained control over how the soundscape evolves.

Plus, unlike most models, which can only recreate the training data they've been exposed to, Fugatto allows users to create soundscapes it's never seen before, such as a thunderstorm easing into a dawn with the sound of birds singing.

A Look Under the Hood Fugatto is a foundational generative transformer model that builds on the team's prior work in areas such as speech modeling, audio vocoding and audio understanding.

The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.

Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto's multi-accent and multilingual capabilities stronger.

One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.

They also scrutinized existing datasets to reveal new relationships among the dat
LINK: https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model/...
See more stories from nvidia

Most recent headlines

12/12/2025

SVG Summit 2025 Preview: Everything You Need to Know for Next Week's Big Show in NYC

SVG Summit 2025 Preview: Everything You Need to Know for Next Week's Big Sho...

12/12/2025

Hailey Gates and Alia Shawkat Welcome You to the Village of Atropia

Hailey Gates at the Atropia premiere (photo by George Pimentel / Shutterstock for Sundance Film Festival)...

12/12/2025

Spotify and ATP Tour Launch First Episode of New Video Series

Last month, Spotify announced a new collaboration with the ATP Tour, the global governing body of men's professional tennis, aimed at bringing the next gene...

12/12/2025

Arkansas TV Drops PBS Affiliation Amid Funding Cuts

CONWAY, Ark. In a notable example of how the elimination of Federal federal funding is forcing public stations to make massive cuts and changes in the way they...

12/12/2025

Wisycom and DPA Microphones Appoint Rene Moerch as Group...

Wisycom and DPA Microphones announce the appointment of Ren Moerch as Group Product Director, Wireless, a strategic leadership role that will guide the combine...

12/12/2025

SMPTE Releases Updated Engineering Report on Artificial I...

SMPTE , the home of media professionals, technologists, and engineers, in conjuncture with the European Broadcasting Union (EBU) and the Entertainment Technolog...

12/12/2025

Keepit and Ingram Micro form strategic relationship in Po...

Keepit, the vendor-independent, cloud-native data protection provider, today announced a strategic go-to-market relationship in Poland with Ingram Micro, a lead...

12/12/2025

Atomos Enhances FUJIFILM GFX ETERNA 55 with RAW Capabilit...

Atomos announced the immediate availability of a new firmware update for its Ninja TX GO and Ninja TX monitor-recorders, unlocking Open Gate 48P RAW recording w...

12/12/2025

Professional Wireless Systems Provides Comprehensive RF S...

Professional Wireless Systems (PWS) once again played a critical role in delivering flawless wireless coordination and support at the 2025 Latin Grammy Awards a...

12/12/2025

AIMS Announces Inaugural IPMX Product Testing and Certifi...

The Alliance for IP Media Solutions (AIMS), together with the Video Services Forum (VSF), the Advanced Media Workflow Association (AMWA) and the European Broadc...

12/12/2025

DHD Gears for Hamburg Open 2026 with Latest Audio Product...

DHD audio will demonstrate the latest additions to its range of digital audio production solutions on Booth 321 in Hall B6 at Hamburg Open 2026. The show will b...

12/12/2025

Chaos Brings macOS Support and AI Tools to V-Ray for Blen...

Chaos today announces the release of V-Ray for Blender, update 2, bringing its award-winning rendering technology to even more Blender users by adding support f...

12/12/2025

UltraLEDs Launches Precision LED Tape for Professional Fi...

Lighting specialist UltraLEDs has launched Precision LED Tape, a high-CRI lighting solution designed specifically for professional film, TV, and studio use. P...

12/12/2025

Zixi Appoints Roi Sasson as Vice President Engineering

Zixi, the Emmy Award-winning leader in live broadcast-quality video over IP, today announced that Roi Sasson has joined the company as Vice President, Engineer...

12/12/2025

BitFire and Appear Partner to Advance Cloud and Edge Work...

BitFire (bitfire.tv), the leader in software-defined live production and IP transmission, today announced a strategic partnership with Appear, a leader in high-...

12/12/2025

HPA Announces Tech 2026 Retreat Agenda

LOS ANGELES The Hollywood Professional Association (HPA) today said futurist Robert Tercek, creative technologist Jessie Hughes from Leonardo.AI and Emmy-winnin...

12/12/2025

BitFire, Appear Form Strategic Partnership Integrating IP-Based Solutions

HUDSON, Mass. BitFire and Appear have struck a strategic partnership aimed at offering broadcasters, sports leagues and streaming platforms a faster, more flexi...

12/12/2025

TV Tech, TVBEurope to Explore MXLs Impact on Media Production

The broadcast industry is evolving faster than ever. #IPWorkflows #remoteproduction, and next-gen audio systems are reshaping how teams design, deliver, and sca...

12/12/2025

Wrapbook Acquires TV and Film Production Scheduling Platform Cinapse

LOS ANGELES The payroll and production accounting platform Wrapbook has announced the acquisition of Cinapse, a modern scheduling platform for film and televisi...

12/12/2025

Ross Video Expands South Asian Operations

DEHLI Ross Video has announced that it is expanding and restructuring its commercial and technical teams in the South Asian Association for Regional Cooperation...

12/12/2025

Rise AV Launches Asia Pacific Council and Mentoring Program

LONDON Following the success of its UK launch in January 2025, Rise AV, the global not-for-profit initiative dedicated to supporting and advancing women in the ...

12/12/2025

Tubi To Introduce Matter Casting For Fire TV

SAN FRANCISCO Ad-supported streaming service Tubi next week will launch Matter Casting, a new casting standard that will enable seamless mobile-to-TV viewing di...

12/12/2025

HPA Announces Tech Retreat Highlights

LOS ANGELES The Hollywood Professional Association (HPA) today said futurist Robert Tercek, creative technologist Jessie Hughes from Leonardo.AI and Emmy-winnin...

12/12/2025

Cheers to AI: ADAM Robot Bartender Makes Drinks at Vegas Golden Knights Game

In Las Vegas's T-Mobile Arena, fans of the Golden Knights are getting more than just hockey - they're getting a taste of the future. ADAM, a robot devel...

12/12/2025

President of Ireland Catherine Connolly visit to RT Raidi na Gaeltachta in Casla, Connemara

Uachtar n na h ireann, Catherine Connolly visited RT Raidi na Gaeltachta's...

12/12/2025

TV Host and social media sensation Eric Roberts revealed as sixth contestant for Dancing with the Stars 2026

Ireland AM host Eric Roberts has been revealed as the sixth contestant taking to...

12/12/2025

December 11, 2025

Scripps Research team pioneers an efficient way to stereoselectively add fluorine to drug-like molecules A new method uses a novel catalyst and inexpensive fluo...

11/12/2025

AI for Sustainability: Lessons from Sarajevo

Thomson and the Center for News, Technology and Innovation (CNTI) convened a two-day workshop in Sarajevo bringing together more than 35 journalists, editors, p...

11/12/2025

ESPN's Aims for Spectacular With Heisman Trophy Show

ESPN's Aims for Spectacular With Heisman Trophy ShowEvent firsts include 1080p HDR production airing on both national broadcast and cableBy Dan Daley, Audio...

11/12/2025

SVG Students To Watch: Frankie Patton, University of Colorado

SVG Students To Watch: Frankie Patton, University of ColoradoThe 2025 grad is hitting the ground running as a PA on national broadcastsBy Brandon Costa, Directo...

11/12/2025

SVG Summit 2025 Technology Exhibits Preview, Part 3

SVG Summit 2025 Technology Exhibits Preview, Part 3By SVG Staff Thursday, December 11, 2025 - 7:24 am Print This Story | Subscribe Story Highlights The 2...

11/12/2025

SVG Sit-Down: What Makes Gen Z, X, and Y Fans Tick? Dave Gavant of WSC Sports Goes Inside the 2025 Fan Engagement Survey

SVG Sit-Down: What Makes Gen Z, X, and Y Fans Tick? Dave Gavant of WSC Sports Go...

11/12/2025

SVG Summit 2025 Preview: 5G, MXL, Spectrum Loss, and Outerspace on Tap for Tuesday Tech Talks'

SVG Summit 2025 Preview: 5G, MXL, Spectrum Loss, and Outerspace on Tap for Tues...

11/12/2025

2025 Sports Broadcasting Hall of Fame: David Levy, Turner Titan and Master of All Sports-Media Trades

2025 Sports Broadcasting Hall of Fame: David Levy, Turner Titan and Master of Al...

11/12/2025

SVG Launches Follow the Money' Podcast: Go Inside the Sports Media Biz with Sam McCleery and John Kosner

SVG Launches Follow the Money' Podcast: Go Inside the Sports Media Biz with...

11/12/2025

A Deep Dive Inside Game Creek Video's Bird and Magic Mobile Units, Home to Amazon's NBA on Prime Video'

A Deep Dive Inside Game Creek Video's Bird and Magic Mobile Units, Home to A...

11/12/2025

How Sound Effects for Monsters Funday Football' Emulated the Sonic Soul of Monsters, Inc.'

How Sound Effects for Monsters Funday Football' Emulated the Sonic Soul of ...

11/12/2025

SVG New Sponsor Spotlight: CSP Mobile Productions' Len Chase on Upgrading Truck Fleet to 1080p, HDR, and ST 2110

SVG New Sponsor Spotlight: CSP Mobile Productions' Len Chase on Upgrading Tr...

11/12/2025

Spotify and The Game Awards Debut Gaming-Inspired Spotify Singles From Labrinth, Evanescence x GUNSHIP, and Bilmuri

Having the right song soundtrack your moves can make all the difference when gam...

11/12/2025

Celebrate Taylor Swift's Record-Breaking Year and New Docuseries with Exclusive Playlist Cover Art Stickers

It's been a big year for Taylor Swift. Her highly anticipated album The Life...

11/12/2025

L3Harris Ramps Up Production of Next-Gen Missile Tracking Satellites at Expanded Florida Facility

New satellites for the SDA Tranche 1 Tracking program in production at L3Harris&...

11/12/2025

L3Harris Delivers First Meadowlands Production Unit to US Space Force

The Meadowlands system, a compact and mobile version of the CCS, uses ground-based radio frequency units to disrupt satellite communications....

11/12/2025

L3Harris Demonstrates Interoperable Network to Unify Department of War and U.S. Government Agencies

The L3Harris demonstration united tactical communications devices, counter-UAS c...

11/12/2025

2025: L3Harris Year in Review

Throughout 2025, L3Harris delivered innovative solutions to U.S. and allied warfighters across every domain. With an unrelenting commitment to excellence, our...

11/12/2025

Nielsen reveals exclusive new data and insights in annual Tops of Sports report

A Majority of the World's Population (51%) Identify As Soccer Fans The 2025 MLB postseason notched 58.2 billion viewing minutes, up +24% from the prior y...

11/12/2025

Zixi Names Roi Sasson Vice President, Engineering

WALTHAM, Mass. Video-over-IP software provider Zixi said Roi Sasson has joined the company as vice president, engineering....

11/12/2025

LG Ad Solutions Expands Local CTV Data Coverage

MOUNTAIN VIEW, Calif. In a move that highlights the growing competition between broadcasters and CTV platforms for local advertising, LG Ad Solutions has announ...

11/12/2025

Boston Conservatory Earns Several Best of Accolades in 2025

Boston Conservatory Earns Several Best of Accolades in 2025 Highlights include a faculty Grammy win, a seventh consecutive year on Playbill's list of co...

11/12/2025

Lawo, SMPTE To Conduct ST 2110 Practical Lab

RASTATT, Germany Lawo and the Society of Motion Picture and Television Engineers (SMPTE) have partnered to launch the SMPTE ST 2110 Practical Lab, an immersive ...