Sony Pixel Power calrec Sony

Speaking the Language of the Genome: Gordon Bell Finalist Applies Large Language Models to Predict New COVID Variants

14/11/2022

A finalist for the Gordon Bell special prize for high performance computing-based COVID-19 research has taught large language models (LLMs) a new lingo - gene sequences - that can unlock insights in genomics, epidemiology and protein engineering.

Published in October, the groundbreaking work is a collaboration by more than two dozen academic and commercial researchers from Argonne National Laboratory, NVIDIA, the University of Chicago and others.

The research team trained an LLM to track genetic mutations and predict variants of concern in SARS-CoV-2, the virus behind COVID-19. While most LLMs applied to biology to date have been trained on datasets of small molecules or proteins, this project is one of the first models trained on raw nucleotide sequences - the smallest units of DNA and RNA.

We hypothesized that moving from protein-level to gene-level data might help us build better models to understand COVID variants, said Arvind Ramanathan, computational biologist at Argonne, who led the project. By training our model to track the entire genome and all the changes that appear in its evolution, we can make better predictions about not just COVID, but any disease with enough genomic data.

The Gordon Bell awards, regarded as the Nobel Prize of high performance computing, will be presented at this week's SC22 conference by the Association for Computing Machinery, which represents around 100,000 computing experts worldwide. Since 2020, the group has awarded a special prize for outstanding research that advances the understanding of COVID with HPC.

Training LLMs on a Four-Letter Language LLMs have long been trained on human languages, which usually comprise a couple dozen letters that can be arranged into tens of thousands of words, and joined together into longer sentences and paragraphs. The language of biology, on the other hand, has only four letters representing nucleotides - A, T, G and C in DNA, or A, U, G and C in RNA - arranged into different sequences as genes.

While fewer letters may seem like a simpler challenge for AI, language models for biology are actually far more complicated. That's because the genome - made up of over 3 billion nucleotides in humans, and about 30,000 nucleotides in coronaviruses - is difficult to break down into distinct, meaningful units.

When it comes to understanding the code of life, a major challenge is that the sequencing information in the genome is quite vast, Ramanathan said. The meaning of a nucleotide sequence can be affected by another sequence that's much further away than the next sentence or paragraph would be in human text. It could reach over the equivalent of chapters in a book.

NVIDIA collaborators on the project designed a hierarchical diffusion method that enabled the LLM to treat long strings of around 1,500 nucleotides as if they were sentences.

Standard language models have trouble generating coherent long sequences and learning the underlying distribution of different variants, said paper co-author Anima Anandkumar, senior director of AI research at NVIDIA and Bren professor in the computing + mathematical sciences department at Caltech. We developed a diffusion model that operates at a higher level of detail that allows us to generate realistic variants and capture better statistics.

Predicting COVID Variants of Concern Using open-source data from the Bacterial and Viral Bioinformatics Resource Center, the team first pretrained its LLM on more than 110 million gene sequences from prokaryotes, which are single-celled organisms like bacteria. It then fine-tuned the model using 1.5 million high-quality genome sequences for the COVID virus.

By pretraining on a broader dataset, the researchers also ensured their model could generalize to other prediction tasks in future projects - making it one of the first whole-genome-scale models with this capability.

Once fine-tuned on COVID data, the LLM was able to distinguish between genome sequences of the virus' variants. It was also able to generate its own nucleotide sequences, predicting potential mutations of the COVID genome that could help scientists anticipate future variants of concern.

Trained on a year's worth of SARS-CoV-2 genome data, the model can infer the distinction between various viral strains. Each dot on the left corresponds to a sequenced SARS-CoV-2 viral strain, color-coded by variant. The figure on the right zooms into one particular strain of the virus, which captures evolutionary couplings across the viral proteins specific to this strain. Image courtesy of Argonne National Laboratory's Bharat Kale, Max Zvyagin and Michael E. Papka. Most researchers have been tracking mutations in the spike protein of the COVID virus, specifically the domain that binds with human cells, Ramanathan said. But there are other proteins in the viral genome that go through frequent mutations and are important to understand.

The model could also integrate with popular protein-structure-prediction models like AlphaFold and OpenFold, the paper stated, helping researchers simulate viral structure and study how genetic mutations impact a virus' ability to infect its host. OpenFold is one of the pretrained language models included in the NVIDIA BioNeMo LLM service for developers applying LLMs to digital biology and chemistry applications.

Supercharging AI Training With GPU-Accelerated Supercomputers The team developed its AI models on supercomputers powered by NVIDIA A100 Tensor Core GPUs - including Argonne's Polaris, the U.S. Department of Energy's Perlmutter, and NVIDIA's in-house Selene system. By scaling up to these powerful systems, they achieved performance of more than 1,500 exaflops in training runs, creating the largest biological language models to date.

We're working with models today that have up to 25 billion
LINK: https://blogs.nvidia.com/blog/2022/11/14/genomic-large-language-model-...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

Ratings Roundup: NBA Playoffs Return to NBC Sports up 38%; NFL Draft Down 12% Overall From 2025

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

01/05/2026

BKB Bare Knuckle Boxing Appoints Will Wright as Chief Operating Officer to Drive Global Growth and Operational Excellence

BKB Bare Knuckle Boxing ( BKB ), today announced the appointment of Will Wright ...

01/05/2026

NAB Rewind: Lawo's Andreas Hilmer on the Power of the Edge One AV Stagebox

Lawo has been at the center of the industry's transition to IP and other next-generation technologies. At NAB 2026, its story was the Edge One AV stagebox, ...

01/05/2026

Kentucky Derby 152 to Air Across 19 Networks in 170-Plus Territories

HBA Media, acting on behalf of NBC Sports and Churchill Downs Incorporated, has announced broadcast and streaming distribution for Kentucky Derby 152, taking pl...

01/05/2026

Give Me the Backstory: Get to Know Barbara Kopple, the Director of American Dream

By Bailey Pennick One of the most exciting things about the Sundance Film Festi...

01/05/2026

Find Out Which The Devil Wears Prada 2' Character You Are With Our New Playlist

Florals for spring? Groundbreaking. But a playlist that tells you which The Devi...

01/05/2026

Olivia Rodrigo Takes Over FC Barcelona Jersey for El Clsico Match at Spotify Camp Nou

One of the world's biggest popstars is headed to El Cl sico. Later this mont...

01/05/2026

Heritage Audio announce the Baby RAM Black Edition

Limited-edition model celebrates 15th anniversary Heritage Audio's range of monitor controllers has just gained a new member, the Baby RAM Black Edition...

01/05/2026

Universal Audio release UAD Enigmatic '82 Overdrive Special Amp

Dumble recreation now available as UAD plug-in Along with their renowned processing plug-ins, Universal Audio have been steadily introducing emulations of c...

01/05/2026

UPDATED: Republican AGs Join Nexstar-Tegna Antitrust Suit

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Broadcaster Draper Media Names Bill Vernon President

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Analysts: 'Hollywood's Vertical Video Strategy Is Dead Wrong'

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Lightware UK celebrates new London showroom with launch e...

To celebrate the opening of its new showroom and office, Lightware UK hosted a dedicated launch event at the new London location. The event welcomed partners, c...

01/05/2026

Calrec Puts Broadcaster Choice Centre Stage at MPTS 2026

Choice without compromise The broadcast industrys transformation is accelerating, and traditional broadcasters are having to fundamentally reinvent how they o...

01/05/2026

Beam Dynamics Showcases its Asset Intelligence Platform a...

Beam Dynamics will return to MPTS 2026 with its asset intelligence platform, helping systems integrators, live production teams, media facilities and profession...

01/05/2026

Synamedia and FX Digital collaborate to bring GO Shorts a...

Best-in-class UX design and rapid, scalable delivery for next-generation viewing experiences Leading video software provider, Synamedia, today announced a coll...

01/05/2026

Compact new cforce MAX lens motor brings unrivaled speed and responsiveness to the Hi-5 ecosystem

Compact new cforce MAX lens motor brings unrivaled speed and responsiveness to t...

01/05/2026

Panavision welcomes Fritz Heinzle as Vice President of Sales

Panavision welcomes Fritz Heinzle as Vice President of Sales Brie Clayton May 1, 2026 0 Comments Heinzle will support Panavision's global growth s...

01/05/2026

NAB Hires FCC Staffer Ben Arden as SVP, Deputy General Counsel

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

ARRI Introduces Compact New cforce MAX Lens Motor

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

CPI Media Deploys QuickLink StudioCall

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

FCC Proposes to Amend Audible Crawl Rule to Preserve Accessibility

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Republican AGs Join Nexstar/Tegna Antitrust Suit

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Dan Johnson Elevates Precision Editing With NUGEN Audio D...

LONDON, APRIL 30, 2026 The Post Republic London's Re-recording Mixer and Dialogue Editor Dan Johnson has built a reputation for clean, emotionally resonan...

01/05/2026

Adobe Unveils Powerful New Innovations in Photoshop & Lightroom

Adobe Unveils Powerful New Innovations in Photoshop & Lightroom Deepa Subramaniam April 30, 2026 0 Comments Your most tedious creative tasks just got ea...

01/05/2026

Berklee Partners with Santander US to Establish Global Opportunity Fund

Berklee Partners with Santander US to Establish Global Opportunity Fund The $400,000 grant offers students access to experiential learning opportunities withi...

01/05/2026

Student Spotlight: Keziah Thomas

Student Spotlight: Keziah Thomas The Indian composer, who was named the 2026 student commencement speaker for Berklee College of Music, talks about how shes p...

01/05/2026

Hannah Waddingham and Ncuti Gatwa to host the series final two episodes of Saturday Night Live UK

Friday 1 May 2026 Hannah Waddingham and Ncuti Gatwa to host the series final tw...

01/05/2026

Got plans? Cancel them. Sky Sports Big Weekend is coming

Friday 1 May 2026 Got plans? Cancel them. Sky Sports Big Weekend is coming Sky Sports is preparing for a bumper weekend of live action, including Manchester ...

01/05/2026

Sky Sports to broadcast all matches from World Sevens Football London edition

Friday 1 May 2026 Sky Sports to broadcast all matches from World Sevens Football London edition Sky Sports will be the exclusive UK broadcaster of the women&#...

01/05/2026

NIAJ Fest Gets Los Angeles In on the Joke With Free Pop-Up Events

Back to All News NIAJ Fest Gets Los Angeles In on the Joke With Free Pop-Up Events Entertainment 01 May 2026 GlobalUnited States Link copied to clipboard ...

01/05/2026

RT Secures UEFA Champions League Rights from 2027-2031

RT Sport awarded first pick free-to-air on Wednesday nights Champions League and Super Cup finals Highlights on Wednesday nights RT today (Thursday 30 Apri...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

30/04/2026

PWHL Reports Record Growth in Third Regular Season as Playoffs Begin April 30

The Professional Women's Hockey League (PWHL) concluded its third regular season on Saturday, reporting growth across attendance, viewership, digital engage...

30/04/2026

NBC Sports Launches National Sunday MLB Coverage Beginning May 3

NBC Sports will air national MLB coverage on Sundays beginning May 3, with MLB Sunday Leadoff on Peacock and NBCSN at 12:30 p.m. ET, followed by the debut of th...

30/04/2026

Clear-Com Appoints Brian Grahn and Ben Turnwell to New Roles

Clear-Com has appointed Brian Grahn as Market Outreach Manager of the Americas and Ben Turnwell as Business Development Manager for EMEA live. Grahn joined Cle...

30/04/2026

ARRI Introduces cforce MAX Lens Motor for Hi-5 Lens Control System

ARRI has introduced the cforce MAX, a new lens motor for the Hi-5 lens control system. The cforce MAX is twice as fast as the cforce plus motor it replaces whil...

30/04/2026

Knuerr, Voxtronic, and IHSE to Present Integrated Control Room Solution at Airspace World

Knuerr, Voxtronic, and IHSE will jointly present an integrated control room solu...

30/04/2026

The CW Network and ESPN to Stream CW Sports Live Events on ESPN App

The CW Network and ESPN have announced an agreement to make the ESPN App the exclusive streaming home for all CW Sports live events. CW Sports will continue to ...

30/04/2026

Sennheiser Spectera Deployed on Ed Sheerans The Loop Global Stadium Tour

Ed Sheeran's The Loop' tour launched in Auckland in January 2026 before moving on to Australia, with South America and the United States to follow late...

30/04/2026

Audinate Launches Dante Preset Creator for Offline Network Configuration

Audinate has announced Dante Preset Creator, a free online tool for configuring Dante network settings before hardware is available on site. Presets created in ...

30/04/2026

Yahoo Sports Appoints Jarrod Schwarz as General Manager

Yahoo Sports has announced the appointment of Jarrod Schwarz as General Manager of Yahoo Sports. Schwarz will oversee product, design, and technology; revenue a...

30/04/2026

Nielsen: U.S. Viewers Spent 79.8 Billion Minutes Watching Soccer in 2025

Nielsen has released a new report, Get Ready with Media Intelligence: 2026 FIFA World Cup Edition, examining U.S. soccer viewership trends, fan engagement, and ...

30/04/2026

USA Lacrosse Names SportsEngine Preferred Youth Sports Management Platform Partner

USA Lacrosse and SportsEngine have announced an expanded partnership, naming Spo...

30/04/2026

Telos Alliance To Appear at MPTS 2026 Across Multiple Partner Booths

Telos Alliance will participate in the 2026 Media Production and Technology Show (MPTS), taking place May 13-14 at Olympia London. Rather than exhibiting from a...