Sony Pixel Power calrec Sony

Speaking the Language of the Genome: Gordon Bell Finalist Applies Large Language Models to Predict New COVID Variants

14/11/2022

A finalist for the Gordon Bell special prize for high performance computing-based COVID-19 research has taught large language models (LLMs) a new lingo - gene sequences - that can unlock insights in genomics, epidemiology and protein engineering.

Published in October, the groundbreaking work is a collaboration by more than two dozen academic and commercial researchers from Argonne National Laboratory, NVIDIA, the University of Chicago and others.

The research team trained an LLM to track genetic mutations and predict variants of concern in SARS-CoV-2, the virus behind COVID-19. While most LLMs applied to biology to date have been trained on datasets of small molecules or proteins, this project is one of the first models trained on raw nucleotide sequences - the smallest units of DNA and RNA.

We hypothesized that moving from protein-level to gene-level data might help us build better models to understand COVID variants, said Arvind Ramanathan, computational biologist at Argonne, who led the project. By training our model to track the entire genome and all the changes that appear in its evolution, we can make better predictions about not just COVID, but any disease with enough genomic data.

The Gordon Bell awards, regarded as the Nobel Prize of high performance computing, will be presented at this week's SC22 conference by the Association for Computing Machinery, which represents around 100,000 computing experts worldwide. Since 2020, the group has awarded a special prize for outstanding research that advances the understanding of COVID with HPC.

Training LLMs on a Four-Letter Language LLMs have long been trained on human languages, which usually comprise a couple dozen letters that can be arranged into tens of thousands of words, and joined together into longer sentences and paragraphs. The language of biology, on the other hand, has only four letters representing nucleotides - A, T, G and C in DNA, or A, U, G and C in RNA - arranged into different sequences as genes.

While fewer letters may seem like a simpler challenge for AI, language models for biology are actually far more complicated. That's because the genome - made up of over 3 billion nucleotides in humans, and about 30,000 nucleotides in coronaviruses - is difficult to break down into distinct, meaningful units.

When it comes to understanding the code of life, a major challenge is that the sequencing information in the genome is quite vast, Ramanathan said. The meaning of a nucleotide sequence can be affected by another sequence that's much further away than the next sentence or paragraph would be in human text. It could reach over the equivalent of chapters in a book.

NVIDIA collaborators on the project designed a hierarchical diffusion method that enabled the LLM to treat long strings of around 1,500 nucleotides as if they were sentences.

Standard language models have trouble generating coherent long sequences and learning the underlying distribution of different variants, said paper co-author Anima Anandkumar, senior director of AI research at NVIDIA and Bren professor in the computing + mathematical sciences department at Caltech. We developed a diffusion model that operates at a higher level of detail that allows us to generate realistic variants and capture better statistics.

Predicting COVID Variants of Concern Using open-source data from the Bacterial and Viral Bioinformatics Resource Center, the team first pretrained its LLM on more than 110 million gene sequences from prokaryotes, which are single-celled organisms like bacteria. It then fine-tuned the model using 1.5 million high-quality genome sequences for the COVID virus.

By pretraining on a broader dataset, the researchers also ensured their model could generalize to other prediction tasks in future projects - making it one of the first whole-genome-scale models with this capability.

Once fine-tuned on COVID data, the LLM was able to distinguish between genome sequences of the virus' variants. It was also able to generate its own nucleotide sequences, predicting potential mutations of the COVID genome that could help scientists anticipate future variants of concern.

Trained on a year's worth of SARS-CoV-2 genome data, the model can infer the distinction between various viral strains. Each dot on the left corresponds to a sequenced SARS-CoV-2 viral strain, color-coded by variant. The figure on the right zooms into one particular strain of the virus, which captures evolutionary couplings across the viral proteins specific to this strain. Image courtesy of Argonne National Laboratory's Bharat Kale, Max Zvyagin and Michael E. Papka. Most researchers have been tracking mutations in the spike protein of the COVID virus, specifically the domain that binds with human cells, Ramanathan said. But there are other proteins in the viral genome that go through frequent mutations and are important to understand.

The model could also integrate with popular protein-structure-prediction models like AlphaFold and OpenFold, the paper stated, helping researchers simulate viral structure and study how genetic mutations impact a virus' ability to infect its host. OpenFold is one of the pretrained language models included in the NVIDIA BioNeMo LLM service for developers applying LLMs to digital biology and chemistry applications.

Supercharging AI Training With GPU-Accelerated Supercomputers The team developed its AI models on supercomputers powered by NVIDIA A100 Tensor Core GPUs - including Argonne's Polaris, the U.S. Department of Energy's Perlmutter, and NVIDIA's in-house Selene system. By scaling up to these powerful systems, they achieved performance of more than 1,500 exaflops in training runs, creating the largest biological language models to date.

We're working with models today that have up to 25 billion
LINK: https://blogs.nvidia.com/blog/2022/11/14/genomic-large-language-model-...
See more stories from nvidia

Most recent headlines

04/08/2024

Dalet Appoints Santiago Solanas as CEO to Lead Next Era of Growth and Innovation

Dalet, a leading technology and service provider for media-rich organizations, is excited to announce Santiago Solanas as its new Chief Executive Officer (CEO)....

03/06/2024

Dalet and Veritone Reach Agreement to Distribute, Transact and Monetize Media Archives

Dalet, a leading technology and service provider for media-rich organizations, a...

28/04/2024

Mediahaus delivers the first SRT live-streaming sports production over 5G with URSA Broadcast G2

Mediahaus delivers the first SRT live-streaming sports production over 5G with U...

27/04/2024

L3Harris Chair and CEO Christopher E. Kubasik Discusses 1Q24 On CNBC's "Closing Bell: Overtime"

On April 26, L3Harris Chair and CEO Christopher E. Kubasik joined CNBC's Mor...

27/04/2024

Audinate Adds Major New Features to Dante Connect

PORTLAND, Oregon Audinate Group Limited, the developer of the Dante AV-over-IP solution, announced significant new additions to Dante Connect, its cloud-based D...

27/04/2024

Bell Media Launches New Portfolio of FAST Channels

TORONTO Bell Media has launched 10 English and French-language FAST channels featuring entertainment, factual, news, and sports programming. The new free stream...

27/04/2024

Study: Broadcast TV Evening News Avoids Serious Economic Issues

An extensive new analysis of the news segments in the broadcast evening news programs of ABC, CBS, NBC and PBS has found that broadcasters devoted most of their...

27/04/2024

Hughes Opens Manufacturing Facility and Private 5G Incubation Center in Maryland

GERMANTOWN, Md. EchoStar's Hughes Network Systems has opened a new manufacturing facility and private 5G incubation center in Germantown, Maryland....

27/04/2024

Broadcasting Legend Harry Pappas Dead At 78

Harry Pappas, one of three brothers who founded Pappas Telecasting Companies in 1971, died April 24. He was 78 years old....

27/04/2024

Televisa Selects Synamedia For Broadcast Distribution Overhaul

ATLANTA and LONDON Mexican telecommunications and broadcast company Televisa has selected Synamedia for an overhaul of its broadcast distribution....

27/04/2024

Participate in the Survey - The Impact of AI on Media and the Creative Industry

Participate in the Survey - The Impact of AI on Media and the Creative Industry Pascal Wagner April 26, 2024 0 Comments By participating in this surve...

27/04/2024

SDVI Rally Access Workstation Earns Two Top Awards at 2024 NAB Show

SDVI Rally Access Workstation Earns Two Top Awards at 2024 NAB Show Brie Clayton April 26, 2024 0 Comments SDVI, the leading platform provider for clo...

27/04/2024

Berklee's Music and Health Institute Launches Community Health Musician Certificate

Berklee's Music and Health Institute Launches Community Health Musician Cert...

27/04/2024

Charter Reports Higher Q1 Profits Despite Broadband, Video Losses

Charter Communications reported higher first-quarter profits despite continued cord-cutting and competition for broadband customers....

27/04/2024

Environmental Groups Aim To Make Unscripted TV More Sustainable

Two environmentally-focused groups are partnering to engage the unscripted TV world in finding better ways to address climate change. Reality of Change is an ec...

27/04/2024

Sarah Garcia Named Weekend Anchor at Telemundo 40 in Texas

Sarah Garcia has been promoted to weekend anchor at KTLM McAllen, Texas, known as Telemundo 40. Starting April 27, she will anchor Noticias Telemundo 40 weekend...

27/04/2024

CBS Sports Kicks Off FAST Channel for UEFA Champions League on Pluto TV

CBS Sports said it launched a new 24-hour free, ad supported streaming television (FAST) channel devoted to the UEFA Champions League....

27/04/2024

Brian Roberts's Pay Rose To $35 Million at Comcast

Comcast chairman and CEO Brian Roberts received $35.4 million in compensation in 2023, up 11% from the previous year, according to a proxy statement filed by th...

27/04/2024

John Lithgow Goes Back to School in Art Happens Here'

Art Happens Here With John Lithgow, which sees the actor study dance, ceramics, silk-screen printing and vocal jazz with students in Los Angeles, debuts on PBS ...

27/04/2024

FETV Wants Upfront Buyers Seeking Cable Viewers To Join Its Family

Remember Leave It to Beaver? Bewitched? Dragnet? When cable ratings were rising?...

27/04/2024

Catchy Comedy Features Gomer Pyle, USMC' Weekend Marathon

Next up for the weekend binge at Catchy Comedy is Gomer Pyle, U.S.M.C. Every weekend, Catchy Comedy features The Catchy Binge, a marathon of a classic sitcom....

26/04/2024

Sundance Film Festival CDMX 2024 kicks off today at Cinpolis

Sundance Film Festival CDMX 2024 kicks-off today with screenings in 5 theaters in Mexico City and the opening-night film, FRIDA, directed by Carla Guti rrez...

26/04/2024

Interview: Lourdes Portillo, Director of Las madres de la Plaza de Mayo, La Ofrenda

[Editor's Note: This interview is part of a larger feature about the women d...

26/04/2024

Career insights instead of everyday school life

Once again this year, SGL Carbon opened its doors to interested children and young people. On the occasion of the German Girls and Boys Day, which took place on...

26/04/2024

L3Harris Technologies Reports Strong First Quarter 2024 Results, Increases 2024 Profitability Guidance

Orders1 of $5.5 billion; book-to-bill of 1.06x Revenue of $5.2 billion, up 17%,...

26/04/2024

What Makes A Network Resilient?

Five Considerations For Communications Modernization In The 21st Century In the digital-enabled battlespace, the Joint Force needs to shoot, move and communica...

26/04/2024

CBS Sports Launches New Free Streaming Channel

CBS Sports has launched Champions League as a new, 24-hour streaming channel that will serve as the year-round destination for nonstop highlights of the UEFA ...

26/04/2024

Roku Streaming Homes Hit 81.6M

Despite tough competition in the streaming space, Roku reported solid results in Q1 2024, beating revenue expectations, with total net revenue up 19% YoY to $88...

26/04/2024

Sarah Farrell Named General Manager Of Pinewood Toronto Studios

LONDON AND TORONTO Pinewood Toronto Studios has appointed Sarah Farrell as general manager of the Studios in downtown Toronto....

26/04/2024

Quantum to Offer Advanced Filesharing Technology and Performance in StorNext and Myriad Solutions

Quantum to Offer Advanced Filesharing Technology and Performance in StorNext and...

26/04/2024

FilmLight Colour Awards welcomes 2024 entries and introduces new Emerging Talent' award

FilmLight Colour Awards welcomes 2024 entries and introduces new Emerging Talen...

26/04/2024

Picture Shop Announces Chris Evans as Head of Unscripted

Picture Shop Announces Chris Evans as Head of Unscripted Brie Clayton April 26, 2024 0 Comments Picture Shop announced Chris Evans will lead Unscripte...

26/04/2024

Participate in a Survey - The Impact of AI on Media and the Creative Industry

Participate in a Survey - The Impact of AI on Media and the Creative Industry Pascal Wagner April 26, 2024 0 Comments By participating in this survey,...

26/04/2024

Hi Barbie! Mattel Launching First FAST Channels on Samsung TV Plus

Toy maker Mattel said it is working with Samsung to launch its first free ad-supported streaming television (FAST) channels later this year....

26/04/2024

Marty Moe Named President Of Trusted Media Brands

Trusted Media Brands (TMB) said it named Marty Moe as president....

26/04/2024

Ron Howard Directs Jim Henson Documentary for Disney Plus

Ron Howard is the director on Jim Henson Idea Man, a documentary that premieres on Disney Plus May 31. Henson of course created Kermit the Frog, Miss Piggy, Big...

26/04/2024

Kraken Skate Away From RSN Root Sports for Deals With Tegna, Amazon

The ice continues to melt under the regional sports network business as the Seattle Kraken of the National Hockey League have made a long-term deal to broadcast...

26/04/2024

Warner Bros. Discovery Launches Olli First-Party Data Platform

Heading into the upfronts, Warner Bros. Discovery said it launched Olli, a first-party data platform advertiser can use for converged, targeted advertising camp...

26/04/2024

The Equalizer' Gets Season 5 on CBS

CBS has renewed the drama The Equalizer, which will see season five on in 2024-2025. Queen Latifah stars....

26/04/2024

The CW Inks New Deal for Miss USA, Miss Teen USA

The CW has entered into an exclusive multiyear broadcast partnership for the Miss USA Pageant and the Miss Teen USA Pageant. The 73rd Miss USA Pageant will air ...

26/04/2024

Fuse Urging Young Viewers To Vote With Blunt Campaign

Fuse Media isn't mincing words in a campaign urging its young viewers to register and participate in the 2024 elections....

26/04/2024

Neil Gaiman's Sandman' Universe Expands With Dead Boy Detectives'

Dead Boy Detectives, a series from Neil Gaiman about a detective agency staffed by ghosts, debuts on Netflix April 25. George Rexstrew and Jayden Revri are in t...

26/04/2024

The Story Collective opens largest film and TV studio in the heart of London

The Story Collective has gradually repurposed the former Mortlake Brewery to include production offices, workshops and sound stages By Matthew Corrigan Publi...

26/04/2024

Richard Perkett joins Amagi as chief product officer

Perkett joins the company following a 25 year career in product management, product marketing, engineering and user experience (UX) across multiple industries ...

26/04/2024

Teradek Announces Smaller More Robust Built-in Antennas f...

Teradek, the industry leader in wireless video transmitters and receivers, announced today the launch of new Bolt 6 LT 750 and Bolt 6 Monitor Module 750 RX with...

26/04/2024

Amagi Names Richard Perkett Chief Product Officer

NEW YORK Amagi has appointed Richard Perkett chief product officer (CPO)....

26/04/2024

NAB Board Election Results Announced

WASHINGTON, D.C. The National Association of Broadcasters (NAB) has announced the results of the 2024 NAB Radio and Television Board of Directors elections. The...

26/04/2024

Mattel to Launch First FAST Channels on Samsung TV Plus

EL SEGUNDO, Calif. & NEW YORK Mattel has announced a deal to launch its first three 24/7 free ad supported streaming (FAST) channels on Samsung TV Plus, Samsung...

26/04/2024

NextGen TV Launches In Portland, Maine

PORTLAND, Maine Viewers here can now receive the NextGen TV signals of five local stations with the launch of ATSC 3.0 service from host station WPFO, which is ...