Sony Pixel Power calrec Sony

Booked for Brilliance: Sweden's National Library Turns Page to AI to Parse Centuries of Data

23/01/2023

For the past 500 years, the National Library of Sweden has collected virtually every word published in Swedish, from priceless medieval manuscripts to present-day pizza menus.

Thanks to a centuries-old law that requires a copy of everything published in Swedish to be submitted to the library - also known as Kungliga biblioteket, or KB - its collections span from the obvious to the obscure: books, newspapers, radio and TV broadcasts, internet content, Ph.D. dissertations, postcards, menus and video games. It's a wildly diverse collection of nearly 26 petabytes of data, ideal for training state-of-the-art AI.

We can build state-of-the-art AI models for the Swedish language since we have the best data, said Love B rjeson, director of KBLab, the library's data lab.

Using NVIDIA DGX systems, the group has developed more than two dozen open-source transformer models, available on Hugging Face. The models, downloaded by up to 200,000 developers per month, enable research at the library and other academic institutions.

Before our lab was created, researchers couldn't access a dataset at the library - they'd have to look at a single object at a time, B rjeson said. There was a need for the library to create datasets that enabled researchers to conduct quantity-oriented research.

With this, researchers will soon be able to create hyper-specialized datasets - for example, pulling up every Swedish postcard that depicts a church, every text written in a particular style or every mention of a historical figure across books, newspaper articles and TV broadcasts.

Turning Library Archives Into AI Training Data The library's datasets represent the full diversity of the Swedish language - including its formal and informal variations, regional dialects and changes over time.

Our inflow is continuous and growing - every month, we see more than 50 terabytes of new data, said B rjeson. Between the exponential growth of digital data and ongoing work digitizing physical collections that date back hundreds of years, we'll never be finished adding to our collections.

The library's archives include audio, text and video. Soon after KBLab was established in 2019, B rjeson saw the potential for training transformer language models on the library's vast archives. He was inspired by an early, multilingual, natural language processing model by Google that included 5GB of Swedish text.

KBLab's first model used 4x as much - and the team now aims to train its models on at least a terabyte of Swedish text. The lab began experimenting by adding Dutch, German and Norwegian content to its datasets after finding that a multilingual dataset may improve the AI's performance.

NVIDIA AI, GPUs Accelerate Model Development The lab started out using consumer-grade NVIDIA GPUs, but B rjeson soon discovered his team needed data-center-scale compute to train larger models.

We realized we can't keep up if we try to do this on small workstations, said B rjeson. It was a no-brainer to go for NVIDIA DGX. There's a lot we wouldn't be able to do at all without the DGX systems.

The lab has two NVIDIA DGX systems from Swedish provider AddPro for on-premises AI development. The systems are used to handle sensitive data, conduct large-scale experiments and fine-tune models. They're also used to prepare for even larger runs on massive, GPU-based supercomputers across the European Union - including the MeluXina system in Luxembourg.

Our work on the DGX systems is critically important, because once we're in a high-performance computing environment, we want to hit the ground running, said B rjeson. We have to use the supercomputer to its fullest extent.

The team has also adopted NVIDIA NeMo Megatron, a PyTorch-based framework for training large language models, with NVIDIA CUDA and the NVIDIA NCCL library under the hood to optimize GPU usage in multi-node systems.

We rely to a large extent on the NVIDIA frameworks, B rjeson said. It's one of the big advantages of NVIDIA for us, as a small lab that doesn't have 50 engineers available to optimize AI training for every project.

Harnessing Multimodal Data for Humanities Research In addition to transformer models that understand Swedish text, KBLab has an AI tool that transcribes sound to text, enabling the library to transcribe its vast collection of radio broadcasts so that researchers can search the audio records for specific content.

AI-enhanced databases are the latest evolution of library records, which were long stored in physical card catalogs. KBLab is also starting to develop generative text models and is working on an AI model that could process videos and create automatic descriptions of their content.

We also want to link all the different modalities, B rjeson said. When you search the library's databases for a specific term, we should be able to return results that include text, audio and video.

KBLab has partnered with researchers at the University of Gothenburg, who are developing downstream apps using the lab's models to conduct linguistic research - including a project supporting the Swedish Academy's work to modernize its data-driven techniques for creating Swedish dictionaries.

The societal benefits of these models are much larger than we initially expected, B rjeson said.

Images courtesy of Kungliga biblioteket
LINK: https://blogs.nvidia.com/blog/2023/01/23/sweden-library-ai-open-source...
See more stories from nvidia

Most recent headlines

09/11/2025

Dalet Unveils Agentic AI Media Workflows at IBC2025

Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...

25/10/2025

GammaTime Raises $14M to Launch Micro-Drama Platform

LOS ANGELES As the popularity of short-for vertical videos from mobile devices continues to soar, vgames, Pitango and a group of Hollywood executives and celebr...

25/10/2025

Slapshot Unveils New AI Camera Tracking Tool

LONDON The AI-powered VFX toolkit Slapshot has launched a professional-grade AI camera tracking tool the company said is designed to deliver precise camera sol...

25/10/2025

2025 NAB Show New York Wraps Wth 11,500 Attendees from 95 Countries

NEW YORK NAB Show New York said its 2025 edition wrapped up its program on Oct. 23 with 11,500 registered attendees from 95 countries, reinforcing its status as...

25/10/2025

Vimeo Releases New AI-Powered Creator Tools

NEW YORK Vimeo said it rolled out new AI-powered features and creative tools that it said will make professional video production faster, smarter and more rewar...

25/10/2025

Space City Home Network Upgrades Audio Capabilities With Solid State Logic

HOUSTON Regional sports network Space City Home Network has upgraded its audio control room with a Solid State Logic System T S300-32 audio console as part of t...

25/10/2025

DoP Lean-Vercoe Lights Intrigue with Astera on Outrageous

BAFTA-nominated cinematographer Annemarie Lean-Vercoe ( Breeders , Call the Midwife , Murder in Provence ) was just the DoP to set the look on sophisticated a...

25/10/2025

OpenDrives and Versatile Distribution Services VDS Announ...

OpenDrives, Inc, a leading provider of software-defined data storage and data services, today announced a new distribution partnership with Versatile Distributi...

25/10/2025

Chaos Releases Vantage 3 - Unlocking Real-Time Creative O...

Chaos today announces the release of Chaos Vantage 3, the first major update to its real-time visualization platform in more than two years. With Vantage, AEC p...

25/10/2025

From Stage to Stream - Frequency Enables Tony Robbins to...

Frequency, the engine behind many of the world's leading streaming television channels, today announced that it powered the first-ever delayed-live broadcas...

25/10/2025

LTN launches new streaming capabilities for broadcast sta...

LTN is accelerating the digital evolution of US local broadcasters with innovations that enable stations to launch streaming channels faster, deliver live news ...

25/10/2025

Rise Group New Structure to Drive Global Growth and Leade...

Rise WIB and Rise AV, advocacy groups championing gender diversity and professional development in the broadcast and AV sectors, have announced key leadership u...

25/10/2025

Atlantic Club of Bulgaria chooses SubtitleNEXT for live s...

European technology developer Profuz Digital is proud to announce its partnership with the Atlantic Club of Bulgaria as a Special Technical Partner. To overcome...

25/10/2025

ARTE Enhances Live Production Capabilities with Grass Val...

European cultural broadcaster ARTE has strengthened its long-standing relationship with Grass Valley, selecting the company's LDX 135 cameras and Creative G...

25/10/2025

IBC calls for Challenge submissions for the 2026 Accelera...

IBC today announced the official call for challenge submissions to the Accelerator Media Innovation Programme 2026, inviting forward-thinking organisations from...

25/10/2025

ASB GlassFloor launches subsidiary to deliver 360-degree...

ASB GlassFloor, the Germany-based global leader in high-performance sports flooring, announces the official launch of ASB Arena and Event Services AG (AES), a s...

25/10/2025

Bitmovin Supercharges its Player Testing System with AI t...

Bitmovin, leading provider of video streaming solutions, has announced that its internal playback stream testing system for the Bitmovin Player now leverages AI...

25/10/2025

Envoi Launches Express Lane for Next-Gen UHR and Immersiv...

Envoi, multi-cloud data management and data protection solutions provider, has launched a new solution, Envoi Express Lane, for managing the demands of distribu...

25/10/2025

Accedo Enables Frictionless Sign-Ups for FloSports with I...

Accedo, a global provider of video streaming software and services, has supported FloSports, a leading sports media company, to expand its service to Samsung an...

25/10/2025

Spanish-Language Music Production Course Debuts at Berklee Online

Spanish-Language Music Production Course Debuts at Berklee Online New 12-week online course expands access to Berklee's renowned music curriculum for Span...

24/10/2025

NEP CEO Martin Stewart on $700M Investment, Restructuring, and the Challenges Facing Live Production

NEP CEO Martin Stewart on $700M Investment, Restructuring, and the Challenges Fa...

24/10/2025

FOX Sports Debuts Next-Gen Graphics, Celebrates Career of Lead Producer Pete Macheska as 2025 World Series Gets Underway in Toronto

FOX Sports Debuts Next-Gen Graphics, Celebrates Career of Lead Producer Pete Mac...

24/10/2025

GROUP MEDIAPRO Chairman and CEO Tatxo Benet Steps Down

GROUP MEDIAPRO Chairman and CEO Tatxo Benet Steps DownBy Ken Kerschbaumer, Editorial Director Friday, October 24, 2025 - 2:37 pm Print This Story | Subscri...

24/10/2025

NBA Tip-Off: Amazon Prime Video Debuts Cutting-Edge Studio, Mobile Units, Globally Distributed Production Ecosystem

NBA Tip-Off: Amazon Prime Video Debuts Cutting-Edge Studio, Mobile Units, Global...

24/10/2025

Director Justin Lin Returns to His Independent Roots With Last Days

(L-R) Director Justin Lin with his cast and producers at Eccles Theatre for the premiere of Last Days in Park City. (Photo by George Pimentel/Shutterstock for...

24/10/2025

Testing and Validation: Driving Reliability in Non-Terrestrial Networks (NTNs)

As global connectivity demands continue to grow, non-terrestrial networks (NTNs) are emerging as a transformative force in telecommunications. By extending cove...

24/10/2025

September 2025 - Decline in Streaming, Growth in TV in Poland

Warsaw - Poland, October 20, 2025 - Nielsen, a global leader in audience measurement, data and analytics, has published its latest All Screens Video Landscape r...

24/10/2025

Springsteen: Deliver Me from Nowhere Filmed at Berklee NYCs Power Station

Springsteen: Deliver Me from Nowhere Filmed at Berklee NYCs Power Station The biopic, starring Jeremy Allen White as the Boss, focuses on the period when Spri...

24/10/2025

AR, Enhanced Audio to Augment Fox Sports' 2025 World Series Coverage

TORONTO Sometimes in sports, as in life, it's the little things that matter, and that aphorism will be on full display tonight when the Toronto Blue Jays ta...

24/10/2025

Spectrum Reach Has Deployed More Than 15,000 AI-Powered Ad Campaigns

NEW YORK Charters Spectrum Reach has announced that its clients have used Waymark's AI-driven ad creation platform to create more than 15,000 ads since Spec...

24/10/2025

Avid Releases Pro Tools 2025.10

BURLINGTON, Mass. Avid has today announced the release of Pro Tools 2025.10, a feature-rich update that the company said offers notable advances in immersive mu...

24/10/2025

Comcast Advertising Unveils Programmatic Solution for Linear TV

NEW YORK In a major change for the ad industry, Comcast Advertising will unveil technology that enables agencies and brands to buy targetable, biddable ads on l...

24/10/2025

ATSC Expands Its Influence with Growing International Ties

WASHINGTON The ATSC broadcast standards group has outlined a growing list of international activities that the group said is expanding its influence and solidif...

24/10/2025

RT SHORT STORY COMPETITION 2025: WINNERS ANNOUNCED

FIRST PLACE AND 5,000 LYNDA McCARTHY FOR WITNESS' SECOND PLACE AND 4,000 ANGELA FINN FOR A SPECTRUM OF SORROW' THIRD PLACE AND 3,000 IAN FE...

24/10/2025

VEON to Release 3Q25 Earnings Update on November 10, 2025

24 Oct 2025 VEON to Release 3Q25 Earnings Update on November 10, 2025 Dubai, October 24, 2025 - VEON Ltd. (NASDAQ: VEON), a global digital operator, today conf...

24/10/2025

Sky Documentaries teams up with Candour Productions and The Observer for The Real Salt Path (w/t)

One-off special from the team behind BAFTA award-winning Libby, Are You Home Yet...

24/10/2025

ABC completes audit of Origin's Virtual ID model

The review examined how the model is developed, managed, and delivered against the requirements set out in the Origin framework. Simon Redlich, Chief Executive...

24/10/2025

NVIDIA GTC DC: Live Updates on What's Next in AI

Countdown to GTC Washington, DC: What to Watch Next Week Next week, Washington, D.C., becomes the center of gravity for artificial intelligence. NVIDIA GTC W...

24/10/2025

Presidential Election Results Coverage on RT

RT will provide extensive coverage of the results of the Presidential Election across television, radio and online on Saturday, 25 October 2025. Throughout th...

24/10/2025

New Coaches, New Families and New Challenges Set for Ireland's Fittest Family

New Coaches, New Families and New Challenges Set for Ireland's Fittest Famil...

24/10/2025

Westlife, Imelda May and Ben Elton among the guests on this week's Late Late Show

Westlife, Imelda May and Ben Elton among the guests on this week's Late Late...

23/10/2025

Unlocking Character: Sportcast on Executing the Bundesliga and Bundesliga 2 New Season Production

Unlocking character: Sportcast on executing the Bundesliga and Bundesliga 2 new ...

23/10/2025

Clear Coordination: Juggling the New Bundesliga Rights Cycle Requirements and Pushing Innovation Forward at Sportcast

Clear coordination: Juggling the new Bundesliga rights cycle requirements and pu...

23/10/2025

Analysis: Is Piracy Just the Cost of Doing Business?

Analysis: Is piracy just the cost of doing business? By Callum McCarthy, Editor-at-Large Tuesday, October 21, 2025 - 09:58 Print This Story It's high ...

23/10/2025

ESPN's Adam Whitlock on Driving Real-World Innovation Across the Video-Transmission Industry

ESPN's Adam Whitlock on Driving Real-World Innovation Across the Video-Trans...

23/10/2025

SVG TranSPORT 2025 Unites 300+ Industry Leaders in New York for Deep Dive Into Live Transmission Technology

SVG TranSPORT 2025 Unites 300+ Industry Leaders in New York for Deep Dive Into L...

23/10/2025

NBA Tip-Off: League Starts Season With Two New Broadcast Partners, In-House NBA TV/NBA App Ops

NBA Tip-Off: League Starts Season With Two New Broadcast Partners, In-House NBA ...

23/10/2025

NFL Deepens Business Partnership with EA Sports; More Madden Casts to Come?

NFL Deepens Business Partnership with EA Sports; More Madden Casts to Come?EA Sports will remain the exclusive producer and distributor of Madden NFL video game...

23/10/2025

NFL Moves Pro Bowl Games Indoors and to Super Bowl Week; Leans Into a Made-for-TV Presentation

NFL Moves Pro Bowl Games Indoors and to Super Bowl Week; Leans Into a Made-for-T...

23/10/2025

Together in Time: Alan Domnguez on the Common Themes in his Films and Sundance Institute's Support

By Alan Dominguez Recently I have been thinking about the intersection of two e...