Sony Pixel Power calrec Sony

Booked for Brilliance: Sweden's National Library Turns Page to AI to Parse Centuries of Data

23/01/2023

For the past 500 years, the National Library of Sweden has collected virtually every word published in Swedish, from priceless medieval manuscripts to present-day pizza menus.

Thanks to a centuries-old law that requires a copy of everything published in Swedish to be submitted to the library - also known as Kungliga biblioteket, or KB - its collections span from the obvious to the obscure: books, newspapers, radio and TV broadcasts, internet content, Ph.D. dissertations, postcards, menus and video games. It's a wildly diverse collection of nearly 26 petabytes of data, ideal for training state-of-the-art AI.

We can build state-of-the-art AI models for the Swedish language since we have the best data, said Love B rjeson, director of KBLab, the library's data lab.

Using NVIDIA DGX systems, the group has developed more than two dozen open-source transformer models, available on Hugging Face. The models, downloaded by up to 200,000 developers per month, enable research at the library and other academic institutions.

Before our lab was created, researchers couldn't access a dataset at the library - they'd have to look at a single object at a time, B rjeson said. There was a need for the library to create datasets that enabled researchers to conduct quantity-oriented research.

With this, researchers will soon be able to create hyper-specialized datasets - for example, pulling up every Swedish postcard that depicts a church, every text written in a particular style or every mention of a historical figure across books, newspaper articles and TV broadcasts.

Turning Library Archives Into AI Training Data The library's datasets represent the full diversity of the Swedish language - including its formal and informal variations, regional dialects and changes over time.

Our inflow is continuous and growing - every month, we see more than 50 terabytes of new data, said B rjeson. Between the exponential growth of digital data and ongoing work digitizing physical collections that date back hundreds of years, we'll never be finished adding to our collections.

The library's archives include audio, text and video. Soon after KBLab was established in 2019, B rjeson saw the potential for training transformer language models on the library's vast archives. He was inspired by an early, multilingual, natural language processing model by Google that included 5GB of Swedish text.

KBLab's first model used 4x as much - and the team now aims to train its models on at least a terabyte of Swedish text. The lab began experimenting by adding Dutch, German and Norwegian content to its datasets after finding that a multilingual dataset may improve the AI's performance.

NVIDIA AI, GPUs Accelerate Model Development The lab started out using consumer-grade NVIDIA GPUs, but B rjeson soon discovered his team needed data-center-scale compute to train larger models.

We realized we can't keep up if we try to do this on small workstations, said B rjeson. It was a no-brainer to go for NVIDIA DGX. There's a lot we wouldn't be able to do at all without the DGX systems.

The lab has two NVIDIA DGX systems from Swedish provider AddPro for on-premises AI development. The systems are used to handle sensitive data, conduct large-scale experiments and fine-tune models. They're also used to prepare for even larger runs on massive, GPU-based supercomputers across the European Union - including the MeluXina system in Luxembourg.

Our work on the DGX systems is critically important, because once we're in a high-performance computing environment, we want to hit the ground running, said B rjeson. We have to use the supercomputer to its fullest extent.

The team has also adopted NVIDIA NeMo Megatron, a PyTorch-based framework for training large language models, with NVIDIA CUDA and the NVIDIA NCCL library under the hood to optimize GPU usage in multi-node systems.

We rely to a large extent on the NVIDIA frameworks, B rjeson said. It's one of the big advantages of NVIDIA for us, as a small lab that doesn't have 50 engineers available to optimize AI training for every project.

Harnessing Multimodal Data for Humanities Research In addition to transformer models that understand Swedish text, KBLab has an AI tool that transcribes sound to text, enabling the library to transcribe its vast collection of radio broadcasts so that researchers can search the audio records for specific content.

AI-enhanced databases are the latest evolution of library records, which were long stored in physical card catalogs. KBLab is also starting to develop generative text models and is working on an AI model that could process videos and create automatic descriptions of their content.

We also want to link all the different modalities, B rjeson said. When you search the library's databases for a specific term, we should be able to return results that include text, audio and video.

KBLab has partnered with researchers at the University of Gothenburg, who are developing downstream apps using the lab's models to conduct linguistic research - including a project supporting the Swedish Academy's work to modernize its data-driven techniques for creating Swedish dictionaries.

The societal benefits of these models are much larger than we initially expected, B rjeson said.

Images courtesy of Kungliga biblioteket
LINK: https://blogs.nvidia.com/blog/2023/01/23/sweden-library-ai-open-source...
See more stories from nvidia

Most recent headlines

05/01/2027

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be demoed at CES 2026

Worlds first 802.15.4ab-UWB chip verified by Calterah and Rohde & Schwarz to be ...

04/08/2026

Dalet Announces Commercial Availability of Dalia, Bringing Media-Aware Agentic AI to Enterprise Productions

Dalet, a leading technology and service provider for media-rich organizations, t...

04/07/2026

Detective Conan: Fallen Angel of the Highway Opens in Dolby Cinemas Across Japan, Presented in Dolby Atmos and Dolby ...

April 7 2026, 19:00 (PDT) Detective Conan: Fallen Angel of the Highway Opens in...

01/06/2026

Dolby Sets the New Standard for Premium Entertainment at CES 2026

January 6 2026, 05:30 (PST) Dolby Sets the New Standard for Premium Entertainment at CES 2026 Throughout the week, Dolby brings to life the latest innovatio...

02/05/2026

Dalet Flex LTS Delivers Smarter Search, Faster Editing, and an AI-Ready Foundation for Modern Media

Dalet, a leading technology and service provider for media-rich organizations, t...

01/05/2026

UPDATED: Republican AGs Join Nexstar-Tegna Antitrust Suit

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Broadcaster Draper Media Names Bill Vernon President

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Analysts: 'Hollywood's Vertical Video Strategy Is Dead Wrong'

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Lightware UK celebrates new London showroom with launch e...

To celebrate the opening of its new showroom and office, Lightware UK hosted a dedicated launch event at the new London location. The event welcomed partners, c...

01/05/2026

Calrec Puts Broadcaster Choice Centre Stage at MPTS 2026

Choice without compromise The broadcast industrys transformation is accelerating, and traditional broadcasters are having to fundamentally reinvent how they o...

01/05/2026

Beam Dynamics Showcases its Asset Intelligence Platform a...

Beam Dynamics will return to MPTS 2026 with its asset intelligence platform, helping systems integrators, live production teams, media facilities and profession...

01/05/2026

Synamedia and FX Digital collaborate to bring GO Shorts a...

Best-in-class UX design and rapid, scalable delivery for next-generation viewing experiences Leading video software provider, Synamedia, today announced a coll...

01/05/2026

Compact new cforce MAX lens motor brings unrivaled speed and responsiveness to the Hi-5 ecosystem

Compact new cforce MAX lens motor brings unrivaled speed and responsiveness to t...

01/05/2026

Panavision welcomes Fritz Heinzle as Vice President of Sales

Panavision welcomes Fritz Heinzle as Vice President of Sales Brie Clayton May 1, 2026 0 Comments Heinzle will support Panavision's global growth s...

01/05/2026

NAB Hires FCC Staffer Ben Arden as SVP, Deputy General Counsel

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

ARRI Introduces Compact New cforce MAX Lens Motor

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

CPI Media Deploys QuickLink StudioCall

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

FCC Proposes to Amend Audible Crawl Rule to Preserve Accessibility

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Republican AGs Join Nexstar/Tegna Antitrust Suit

Share Copy link Facebook X Linkedin Bluesky Email...

01/05/2026

Dan Johnson Elevates Precision Editing With NUGEN Audio D...

LONDON, APRIL 30, 2026 The Post Republic London's Re-recording Mixer and Dialogue Editor Dan Johnson has built a reputation for clean, emotionally resonan...

01/05/2026

Adobe Unveils Powerful New Innovations in Photoshop & Lightroom

Adobe Unveils Powerful New Innovations in Photoshop & Lightroom Deepa Subramaniam April 30, 2026 0 Comments Your most tedious creative tasks just got ea...

01/05/2026

Berklee Partners with Santander US to Establish Global Opportunity Fund

Berklee Partners with Santander US to Establish Global Opportunity Fund The $400,000 grant offers students access to experiential learning opportunities withi...

01/05/2026

Student Spotlight: Keziah Thomas

Student Spotlight: Keziah Thomas The Indian composer, who was named the 2026 student commencement speaker for Berklee College of Music, talks about how shes p...

01/05/2026

RT Secures UEFA Champions League Rights from 2027-2031

RT Sport awarded first pick free-to-air on Wednesday nights Champions League and Super Cup finals Highlights on Wednesday nights RT today (Thursday 30 Apri...

01/05/2026

NBCUniversal's Peacock to Be First Streamer to Integrate Dolby's Full Suite of Premium Picture and Sound Innovations

January 5 2026, 18:30 (PST) NBCUniversal's Peacock to Be First Streamer to ...

30/04/2026

PWHL Reports Record Growth in Third Regular Season as Playoffs Begin April 30

The Professional Women's Hockey League (PWHL) concluded its third regular season on Saturday, reporting growth across attendance, viewership, digital engage...

30/04/2026

NBC Sports Launches National Sunday MLB Coverage Beginning May 3

NBC Sports will air national MLB coverage on Sundays beginning May 3, with MLB Sunday Leadoff on Peacock and NBCSN at 12:30 p.m. ET, followed by the debut of th...

30/04/2026

Clear-Com Appoints Brian Grahn and Ben Turnwell to New Roles

Clear-Com has appointed Brian Grahn as Market Outreach Manager of the Americas and Ben Turnwell as Business Development Manager for EMEA live. Grahn joined Cle...

30/04/2026

ARRI Introduces cforce MAX Lens Motor for Hi-5 Lens Control System

ARRI has introduced the cforce MAX, a new lens motor for the Hi-5 lens control system. The cforce MAX is twice as fast as the cforce plus motor it replaces whil...

30/04/2026

Knuerr, Voxtronic, and IHSE to Present Integrated Control Room Solution at Airspace World

Knuerr, Voxtronic, and IHSE will jointly present an integrated control room solu...

30/04/2026

The CW Network and ESPN to Stream CW Sports Live Events on ESPN App

The CW Network and ESPN have announced an agreement to make the ESPN App the exclusive streaming home for all CW Sports live events. CW Sports will continue to ...

30/04/2026

Sennheiser Spectera Deployed on Ed Sheerans The Loop Global Stadium Tour

Ed Sheeran's The Loop' tour launched in Auckland in January 2026 before moving on to Australia, with South America and the United States to follow late...

30/04/2026

Audinate Launches Dante Preset Creator for Offline Network Configuration

Audinate has announced Dante Preset Creator, a free online tool for configuring Dante network settings before hardware is available on site. Presets created in ...

30/04/2026

Yahoo Sports Appoints Jarrod Schwarz as General Manager

Yahoo Sports has announced the appointment of Jarrod Schwarz as General Manager of Yahoo Sports. Schwarz will oversee product, design, and technology; revenue a...

30/04/2026

Nielsen: U.S. Viewers Spent 79.8 Billion Minutes Watching Soccer in 2025

Nielsen has released a new report, Get Ready with Media Intelligence: 2026 FIFA World Cup Edition, examining U.S. soccer viewership trends, fan engagement, and ...

30/04/2026

USA Lacrosse Names SportsEngine Preferred Youth Sports Management Platform Partner

USA Lacrosse and SportsEngine have announced an expanded partnership, naming Spo...

30/04/2026

Telos Alliance To Appear at MPTS 2026 Across Multiple Partner Booths

Telos Alliance will participate in the 2026 Media Production and Technology Show (MPTS), taking place May 13-14 at Olympia London. Rather than exhibiting from a...

30/04/2026

DAZN Bolsters U.S. Ambitions With ViewLift Acquisition, Targets Evolving Regional Sports Landscape

The global streamer buys the U.S. DTC platform solutions provider for a reported...

30/04/2026

Tigo Sports Upgrades Video Infrastructure with Ateme Technology

Tigo Sports, Paraguay's leading sports broadcaster, has upgraded its video infrastructure with Ateme solutions for live encoding, multiplexing, and signal c...

30/04/2026

World Rugby and IMG Announce Long-Term Media Rights Partnership

World Rugby and IMG have announced a long-term media rights partnership focused on growing rugby in the United States ahead of the Men's and Women's Rug...

30/04/2026

NWSL & Overtime Re-Up Gen-Z Focused Content Partnership

For the second year in a row, Overtime and the National Women's Soccer League (NWSL) are teaming up through a renewed content partnership to bring fans even...

30/04/2026

ESPN Executive Vice President David Roberts Set To Retire

The 22-year ESPN vet's responsibilities will reportedly be taken over by SVP Mike Foss...

30/04/2026

SVG GameDay, Ep. 13: Anaheim Ducks' Scott Fausneaucht - Skating with the Ducks of Orange County

In-venue and creative video staffers at the professional and collegiate level ha...

30/04/2026

Prime Video Announces Multiyear Agreement with Duke Mens Basketball

Amazon and Duke University have announced a multiyear agreement for Prime Video to present exclusive coverage of three Duke Blue Devils men's basketball neu...

30/04/2026

Ratings Roundup: NBA Playoffs Return to NBC Sports up 38%; UFL Viewership up Midway Through Regular Season

Ratings Roundup is a rundown of recent rating news and is derived from press rel...

30/04/2026

Introducing Verified by Spotify, a Signal of Authenticity and Trust for the Artists Behind the Music

Music is evolving, and so are the ways you discover and connect with artists. In...

30/04/2026

First-Ever Stockholm Music Week Celebrates the Industry's Present and Future

Between April 22-29, the first inaugural Stockholm Music Week brought together thought leaders and partners across industries including music, tech, government,...

30/04/2026

API launch the Vision+ console

Iconic large-format console upgraded API's iconic Vision console has just been treated to an overhaul that aims to meet the demands of today's profe...

30/04/2026

Lewitt introduce the LCT 440 Pure stereo pair

Comes complete with miking accessories The LCT 440 Pure has proven to be a popular member of Lewitt's mic line-up, offering impressive technical perform...

30/04/2026

SynthFest UK 2026 Announced

24 October 2026 at The Octagon, Sheffield Now in its eighth year, SynthFest UK is the largest event of its kind in the UK, bringing together the top keyboar...