Sony Pixel Power calrec Sony

Booked for Brilliance: Sweden's National Library Turns Page to AI to Parse Centuries of Data

23/01/2023

For the past 500 years, the National Library of Sweden has collected virtually every word published in Swedish, from priceless medieval manuscripts to present-day pizza menus.

Thanks to a centuries-old law that requires a copy of everything published in Swedish to be submitted to the library - also known as Kungliga biblioteket, or KB - its collections span from the obvious to the obscure: books, newspapers, radio and TV broadcasts, internet content, Ph.D. dissertations, postcards, menus and video games. It's a wildly diverse collection of nearly 26 petabytes of data, ideal for training state-of-the-art AI.

We can build state-of-the-art AI models for the Swedish language since we have the best data, said Love B rjeson, director of KBLab, the library's data lab.

Using NVIDIA DGX systems, the group has developed more than two dozen open-source transformer models, available on Hugging Face. The models, downloaded by up to 200,000 developers per month, enable research at the library and other academic institutions.

Before our lab was created, researchers couldn't access a dataset at the library - they'd have to look at a single object at a time, B rjeson said. There was a need for the library to create datasets that enabled researchers to conduct quantity-oriented research.

With this, researchers will soon be able to create hyper-specialized datasets - for example, pulling up every Swedish postcard that depicts a church, every text written in a particular style or every mention of a historical figure across books, newspaper articles and TV broadcasts.

Turning Library Archives Into AI Training Data The library's datasets represent the full diversity of the Swedish language - including its formal and informal variations, regional dialects and changes over time.

Our inflow is continuous and growing - every month, we see more than 50 terabytes of new data, said B rjeson. Between the exponential growth of digital data and ongoing work digitizing physical collections that date back hundreds of years, we'll never be finished adding to our collections.

The library's archives include audio, text and video. Soon after KBLab was established in 2019, B rjeson saw the potential for training transformer language models on the library's vast archives. He was inspired by an early, multilingual, natural language processing model by Google that included 5GB of Swedish text.

KBLab's first model used 4x as much - and the team now aims to train its models on at least a terabyte of Swedish text. The lab began experimenting by adding Dutch, German and Norwegian content to its datasets after finding that a multilingual dataset may improve the AI's performance.

NVIDIA AI, GPUs Accelerate Model Development The lab started out using consumer-grade NVIDIA GPUs, but B rjeson soon discovered his team needed data-center-scale compute to train larger models.

We realized we can't keep up if we try to do this on small workstations, said B rjeson. It was a no-brainer to go for NVIDIA DGX. There's a lot we wouldn't be able to do at all without the DGX systems.

The lab has two NVIDIA DGX systems from Swedish provider AddPro for on-premises AI development. The systems are used to handle sensitive data, conduct large-scale experiments and fine-tune models. They're also used to prepare for even larger runs on massive, GPU-based supercomputers across the European Union - including the MeluXina system in Luxembourg.

Our work on the DGX systems is critically important, because once we're in a high-performance computing environment, we want to hit the ground running, said B rjeson. We have to use the supercomputer to its fullest extent.

The team has also adopted NVIDIA NeMo Megatron, a PyTorch-based framework for training large language models, with NVIDIA CUDA and the NVIDIA NCCL library under the hood to optimize GPU usage in multi-node systems.

We rely to a large extent on the NVIDIA frameworks, B rjeson said. It's one of the big advantages of NVIDIA for us, as a small lab that doesn't have 50 engineers available to optimize AI training for every project.

Harnessing Multimodal Data for Humanities Research In addition to transformer models that understand Swedish text, KBLab has an AI tool that transcribes sound to text, enabling the library to transcribe its vast collection of radio broadcasts so that researchers can search the audio records for specific content.

AI-enhanced databases are the latest evolution of library records, which were long stored in physical card catalogs. KBLab is also starting to develop generative text models and is working on an AI model that could process videos and create automatic descriptions of their content.

We also want to link all the different modalities, B rjeson said. When you search the library's databases for a specific term, we should be able to return results that include text, audio and video.

KBLab has partnered with researchers at the University of Gothenburg, who are developing downstream apps using the lab's models to conduct linguistic research - including a project supporting the Swedish Academy's work to modernize its data-driven techniques for creating Swedish dictionaries.

The societal benefits of these models are much larger than we initially expected, B rjeson said.

Images courtesy of Kungliga biblioteket
LINK: https://blogs.nvidia.com/blog/2023/01/23/sweden-library-ai-open-source...
See more stories from nvidia

Most recent headlines

04/08/2024

Dalet Appoints Santiago Solanas as CEO to Lead Next Era of Growth and Innovation

Dalet, a leading technology and service provider for media-rich organizations, is excited to announce Santiago Solanas as its new Chief Executive Officer (CEO)....

03/06/2024

Dalet and Veritone Reach Agreement to Distribute, Transact and Monetize Media Archives

Dalet, a leading technology and service provider for media-rich organizations, a...

29/04/2024

Screen Australia announces James J. Robinson's debut feature First Light

29 04 2024 - Media release Screen Australia announces James J. Robinson's debut feature First Light First Light Principal Photography is underway on Firs...

29/04/2024

Capitol Broadcasting Becomes First Company Inducted into NC Media and Journalism Hall of Fame

Capitol Broadcasting recently became the inaugural company honored with inductio...

28/04/2024

Mediahaus delivers the first SRT live-streaming sports production over 5G with URSA Broadcast G2

Mediahaus delivers the first SRT live-streaming sports production over 5G with U...

27/04/2024

L3Harris Chair and CEO Christopher E. Kubasik Discusses 1Q24 On CNBC's "Closing Bell: Overtime"

On April 26, L3Harris Chair and CEO Christopher E. Kubasik joined CNBC's Mor...

27/04/2024

Audinate Adds Major New Features to Dante Connect

PORTLAND, Oregon Audinate Group Limited, the developer of the Dante AV-over-IP solution, announced significant new additions to Dante Connect, its cloud-based D...

27/04/2024

Bell Media Launches New Portfolio of FAST Channels

TORONTO Bell Media has launched 10 English and French-language FAST channels featuring entertainment, factual, news, and sports programming. The new free stream...

27/04/2024

Study: Broadcast TV Evening News Avoids Serious Economic Issues

An extensive new analysis of the news segments in the broadcast evening news programs of ABC, CBS, NBC and PBS has found that broadcasters devoted most of their...

27/04/2024

Hughes Opens Manufacturing Facility and Private 5G Incubation Center in Maryland

GERMANTOWN, Md. EchoStar's Hughes Network Systems has opened a new manufacturing facility and private 5G incubation center in Germantown, Maryland....

27/04/2024

Broadcasting Legend Harry Pappas Dead At 78

Harry Pappas, one of three brothers who founded Pappas Telecasting Companies in 1971, died April 24. He was 78 years old....

27/04/2024

Televisa Selects Synamedia For Broadcast Distribution Overhaul

ATLANTA and LONDON Mexican telecommunications and broadcast company Televisa has selected Synamedia for an overhaul of its broadcast distribution....

27/04/2024

Participate in the Survey - The Impact of AI on Media and the Creative Industry

Participate in the Survey - The Impact of AI on Media and the Creative Industry Pascal Wagner April 26, 2024 0 Comments By participating in this surve...

27/04/2024

SDVI Rally Access Workstation Earns Two Top Awards at 2024 NAB Show

SDVI Rally Access Workstation Earns Two Top Awards at 2024 NAB Show Brie Clayton April 26, 2024 0 Comments SDVI, the leading platform provider for clo...

27/04/2024

Berklee's Music and Health Institute Launches Community Health Musician Certificate

Berklee's Music and Health Institute Launches Community Health Musician Cert...

27/04/2024

Charter Reports Higher Q1 Profits Despite Broadband, Video Losses

Charter Communications reported higher first-quarter profits despite continued cord-cutting and competition for broadband customers....

27/04/2024

Environmental Groups Aim To Make Unscripted TV More Sustainable

Two environmentally-focused groups are partnering to engage the unscripted TV world in finding better ways to address climate change. Reality of Change is an ec...

27/04/2024

Sarah Garcia Named Weekend Anchor at Telemundo 40 in Texas

Sarah Garcia has been promoted to weekend anchor at KTLM McAllen, Texas, known as Telemundo 40. Starting April 27, she will anchor Noticias Telemundo 40 weekend...

27/04/2024

CBS Sports Kicks Off FAST Channel for UEFA Champions League on Pluto TV

CBS Sports said it launched a new 24-hour free, ad supported streaming television (FAST) channel devoted to the UEFA Champions League....

27/04/2024

Brian Roberts's Pay Rose To $35 Million at Comcast

Comcast chairman and CEO Brian Roberts received $35.4 million in compensation in 2023, up 11% from the previous year, according to a proxy statement filed by th...

27/04/2024

John Lithgow Goes Back to School in Art Happens Here'

Art Happens Here With John Lithgow, which sees the actor study dance, ceramics, silk-screen printing and vocal jazz with students in Los Angeles, debuts on PBS ...

27/04/2024

FETV Wants Upfront Buyers Seeking Cable Viewers To Join Its Family

Remember Leave It to Beaver? Bewitched? Dragnet? When cable ratings were rising?...

27/04/2024

Catchy Comedy Features Gomer Pyle, USMC' Weekend Marathon

Next up for the weekend binge at Catchy Comedy is Gomer Pyle, U.S.M.C. Every weekend, Catchy Comedy features The Catchy Binge, a marathon of a classic sitcom....

26/04/2024

Sundance Film Festival CDMX 2024 kicks off today at Cinpolis

Sundance Film Festival CDMX 2024 kicks-off today with screenings in 5 theaters in Mexico City and the opening-night film, FRIDA, directed by Carla Guti rrez...

26/04/2024

Interview: Lourdes Portillo, Director of Las madres de la Plaza de Mayo, La Ofrenda

[Editor's Note: This interview is part of a larger feature about the women d...

26/04/2024

Career insights instead of everyday school life

Once again this year, SGL Carbon opened its doors to interested children and young people. On the occasion of the German Girls and Boys Day, which took place on...

26/04/2024

L3Harris Technologies Reports Strong First Quarter 2024 Results, Increases 2024 Profitability Guidance

Orders1 of $5.5 billion; book-to-bill of 1.06x Revenue of $5.2 billion, up 17%,...

26/04/2024

What Makes A Network Resilient?

Five Considerations For Communications Modernization In The 21st Century In the digital-enabled battlespace, the Joint Force needs to shoot, move and communica...

26/04/2024

CBS Sports Launches New Free Streaming Channel

CBS Sports has launched Champions League as a new, 24-hour streaming channel that will serve as the year-round destination for nonstop highlights of the UEFA ...

26/04/2024

Roku Streaming Homes Hit 81.6M

Despite tough competition in the streaming space, Roku reported solid results in Q1 2024, beating revenue expectations, with total net revenue up 19% YoY to $88...

26/04/2024

Sarah Farrell Named General Manager Of Pinewood Toronto Studios

LONDON AND TORONTO Pinewood Toronto Studios has appointed Sarah Farrell as general manager of the Studios in downtown Toronto....

26/04/2024

Quantum to Offer Advanced Filesharing Technology and Performance in StorNext and Myriad Solutions

Quantum to Offer Advanced Filesharing Technology and Performance in StorNext and...

26/04/2024

FilmLight Colour Awards welcomes 2024 entries and introduces new Emerging Talent' award

FilmLight Colour Awards welcomes 2024 entries and introduces new Emerging Talen...

26/04/2024

Picture Shop Announces Chris Evans as Head of Unscripted

Picture Shop Announces Chris Evans as Head of Unscripted Brie Clayton April 26, 2024 0 Comments Picture Shop announced Chris Evans will lead Unscripte...

26/04/2024

Participate in a Survey - The Impact of AI on Media and the Creative Industry

Participate in a Survey - The Impact of AI on Media and the Creative Industry Pascal Wagner April 26, 2024 0 Comments By participating in this survey,...

26/04/2024

Hi Barbie! Mattel Launching First FAST Channels on Samsung TV Plus

Toy maker Mattel said it is working with Samsung to launch its first free ad-supported streaming television (FAST) channels later this year....

26/04/2024

Marty Moe Named President Of Trusted Media Brands

Trusted Media Brands (TMB) said it named Marty Moe as president....

26/04/2024

Ron Howard Directs Jim Henson Documentary for Disney Plus

Ron Howard is the director on Jim Henson Idea Man, a documentary that premieres on Disney Plus May 31. Henson of course created Kermit the Frog, Miss Piggy, Big...

26/04/2024

Kraken Skate Away From RSN Root Sports for Deals With Tegna, Amazon

The ice continues to melt under the regional sports network business as the Seattle Kraken of the National Hockey League have made a long-term deal to broadcast...

26/04/2024

Warner Bros. Discovery Launches Olli First-Party Data Platform

Heading into the upfronts, Warner Bros. Discovery said it launched Olli, a first-party data platform advertiser can use for converged, targeted advertising camp...

26/04/2024

The Equalizer' Gets Season 5 on CBS

CBS has renewed the drama The Equalizer, which will see season five on in 2024-2025. Queen Latifah stars....

26/04/2024

The CW Inks New Deal for Miss USA, Miss Teen USA

The CW has entered into an exclusive multiyear broadcast partnership for the Miss USA Pageant and the Miss Teen USA Pageant. The 73rd Miss USA Pageant will air ...

26/04/2024

Fuse Urging Young Viewers To Vote With Blunt Campaign

Fuse Media isn't mincing words in a campaign urging its young viewers to register and participate in the 2024 elections....

26/04/2024

Neil Gaiman's Sandman' Universe Expands With Dead Boy Detectives'

Dead Boy Detectives, a series from Neil Gaiman about a detective agency staffed by ghosts, debuts on Netflix April 25. George Rexstrew and Jayden Revri are in t...

26/04/2024

The Story Collective opens largest film and TV studio in the heart of London

The Story Collective has gradually repurposed the former Mortlake Brewery to include production offices, workshops and sound stages By Matthew Corrigan Publi...

26/04/2024

Richard Perkett joins Amagi as chief product officer

Perkett joins the company following a 25 year career in product management, product marketing, engineering and user experience (UX) across multiple industries ...

26/04/2024

Teradek Announces Smaller More Robust Built-in Antennas f...

Teradek, the industry leader in wireless video transmitters and receivers, announced today the launch of new Bolt 6 LT 750 and Bolt 6 Monitor Module 750 RX with...

26/04/2024

Amagi Names Richard Perkett Chief Product Officer

NEW YORK Amagi has appointed Richard Perkett chief product officer (CPO)....

26/04/2024

NAB Board Election Results Announced

WASHINGTON, D.C. The National Association of Broadcasters (NAB) has announced the results of the 2024 NAB Radio and Television Board of Directors elections. The...