
I recently watched a demo of the IBM Watson natural language processing (NLP) tool that showed how it was used by police for criminal investigations at a Records and Information Management conference. I was struck by the fact that this fascinating tool may have led some Records and Information Managers in the audience to think that this was the new, disruptive way to manage records.
CAN YOU RUN A RECORDS AND INFORMATION MANAGEMENT PROGRAM ON POST-CLASSIFICATION AND MACHINE LEARNING? IBM Watson can be looked at as a post-classification tool, representing a whole group of NLP tools often marketed under the Big Data umbrella. Post-classification tools use algorithms to produce classification for an information asset after the asset is created or stored. In fact, every major supplier and the open source community offers similar tools. This can even seem overwhelming for those in the eDiscovery space.
The demo was really impressive. It was almost as if Watson went through the test documents by magic- thereby identifying people, organizations, places, phone numbers and frequently used terms. A click on one of the filtered terms lists its occurrences across various documents. The tool also has the ability to produce graphics that connect interrelated terms visually. This is, of course, quite a departure from the rather minimalistic user interfaces typically found in electronic records management systems (ERMS).
What is attractive about potentially using such an analytics tool for Records and Information Management is its ability to reduce the complexity of information management processes, in particular, capturing metadata. It could:
save us from lengthy discussions with information asset creators to get them to provide the right metadata and how and in what format its delivery should take place
eliminate the need to design metadata structures for document types, simplifying forms management and the optical character recognition process (OCR) during scanning
allow users to search for criteria originally not available for lookup
provide the ability to search for important emails in the email archive or unstructured documents.
No doubt, adopting such a tool has the potential to appear to be a silver bullet for Records and Information Managers.
However, two things should be kept in mind: with a tool like Watson, you (a) buy into algorithms and (b) require these algorithms to build the structure that has not been created and provided upfront, i.e. during an on-boarding analysis processes. Or to summarize the two issues: what sort of structure is actually used or produced by such algorithms?
NLP and text mining algorithms (with impressive names like Latent Semantic Indexing or Na ve Bayes) work on a mathematical-statistical basis, building trees based on the frequency and proximity of terms in a given document or set of documents. Typically, they don't work with pre-existing ontologies, and they don't really understand what they recognize. The algorithms are able to identify amounts in documents, but what sort of amounts are they? This is left to human review to sort out, but at least these tools can provide us hints.
PRE-CLASSIFICATION IS STILL THE WAY TO GO FOR RECORDS AND INFORMATION MANAGEMENT The classification structure for Watson is determined by its algorithms, whose outcomes are somehow unpredictable. Tools like Watson are widespread in the forensics and e-discovery communities, as these domains are often interested in unknown bits of information in datasets. But for Records and Information Management usage, this approach is simply not good enough. Precision and recall seem too arbitrary that they could truly satisfy the findability requirement stipulated in ARMA's General Record Keeping Principles, for example.
For simplicity, I am defining a record as something that it is created by a business process. Consequently, as much as business processes are defined and executed in a controlled environment, properties found in records and their metadata are also well known, and their possible values are actually restricted. Here are some of those properties:
client ID
transaction number
transaction type
privacy level
It is fairly intuitive what the metadata terms above mean. The size of their value sets (i.e. possible values) may differ dramatically, from a handful (privacy classification) to perhaps billions (transaction numbers) but both represent restricted value sets and not free, unstructured text. There is semantic behind these properties. Additionally, client IDs and transaction numbers may overlap in terms of format and value, creating serious problems for post-classification algorithms.
Pre-classification, or classifying information before creating or storing it, is clearly still the way to go for Records and Information Management professionals. Declaring the semantics of each required metadata property that describes a record and promotes finding it, and having the business process owner and the producing application to supply the correct values, remains an indispensable activity even in times of machine learning .
That said, I would not encourage you to dismiss of NLP or text mining. These tools can be very helpful if you need to classify, say, millions of office documents on shared drives or Sharepoints. But make sure it is you, the information professional, who can supply the terms, taxonomies, ontologies and document types of the knowledge domain to the analytics tool so the classification does not become arbitrary. It's all about creating well-understood, intended structure.
Most recent headlines
09/11/2025
Dalet today announced a transformative leap forward for media operations: Agentic Artificial Intelligence (AI) that unifies the Dalet ecosystem under one natura...
21/10/2025
NAB New York 2025: Although AES Show Is on Its Own, Audio Will Be a Major Part o...
21/10/2025
NAB New York 2025: Business of Broadcast and Media,' Future of Content'...
21/10/2025
SVG All-Stars: Ethan Folz, Senior Director, Digital Operations and Quality of Ex...
21/10/2025
NBA on NBC/Peacock: Livestream Offers Graphic Overlays, Predictive Gaming, Ancil...
21/10/2025
NBA on NBC/Peacock: At the Front Bench With Producer Frank DiGraci and Director ...
21/10/2025
NBA on NBC/Peacock: NBC Sports, NEP Build Ultra-Flexible Production Plan That Se...
21/10/2025
Indigenous storytelling has been at the heart of the work of the Sundance Instit...
21/10/2025
Top L-R: Mysterious Skin, American Dream Second Row L-R: Little Miss Sunshine, D...
21/10/2025
Last week, Spotify and Columbia Records transformed Pier 4 at the Brooklyn Army ...
21/10/2025
SBS Learn's Dharug Ngurra resource empowers classrooms to meaningfully celeb...
21/10/2025
As global operators simplify and evolve their digital platforms, NPS improvement...
21/10/2025
Critical Design Review completion is a key milestone on the path toward Wideband Global Satellite Communications certification for the network in 2026, opening ...
21/10/2025
New Australian undersea training range to implement and improve warfighting tactics, proficiency and safety; enable joint/allied training that contributes to pr...
21/10/2025
eds3_5_jq(document).ready(function($) { $(#eds_sliderM519).chameleonSlider_2_1({...
21/10/2025
September, the beginning of autumn, brought an expected revival to the TV market, largely due to the new fall TV schedules. The time spent in front of the TV sc...
21/10/2025
Broadcast Booms with 20% Uptick vs. August, Achieving Largest Monthly
Increase ...
21/10/2025
During September, streaming's share of TV viewing in Mexico settled at 24.5%, a marginal shift of -0.5 share points from the previous month.
Disclaimer: YU...
21/10/2025
RENNES, France BBright and GlobalM have conducted a technical trial validating Ultra HD interoperability across the entire contribution chain in the cloud, achi...
21/10/2025
Kokusai Denki Electric America will mark the U.S. debut of a new 4K camera at the 2025 NAB New York, Oct. 22-23. Now available, the Z-HD6500-S1 UHD/HD productio...
21/10/2025
Triveni Digital, a trusted leader in ATSC 1.0 and 3.0 service delivery, data broadcasting, and quality assurance solutions, will showcase its entire NEXTGEN TV ...
21/10/2025
Radio Azzurra FM, longest-running radio station in the province of Novara, northwest Italy, has invested in an DHD SX2 audio routing and mixing console for inte...
21/10/2025
Globecast, the leading provider of broadcast, media and entertainment managed services, has announced the appointment of G Morgan as Executive Vice President of...
21/10/2025
Visual Data announces the appointment of Maz Al-Jumaili as Senior Vice President, Worldwide Localization, to advance client engagement, strategic partnerships, ...
21/10/2025
As Amazon's Prime Video prepares to launch its coverage of NBA basketball under a major new deal, Grup Mediapro has announced that it is working with the st...
21/10/2025
ATLANTA Good news for consumers using an Atlanta DTH receiver to watch ATSC 3.0: with a new software update, they will be able to blanket their homes with Wi-Fi...
21/10/2025
While recent news has been heavily focused on Hispanic migration into the U.S., The 2025 Hispanic Market Report from Claritas highlights the fact that this gr...
21/10/2025
MAIDENHEAD, UK RWS has hired Michael Wayne as its head of media and entertainment in Los Angeles where he will lead the company's media localization busines...
21/10/2025
Imagine Communications and Rohde & Schwarz today announced a definitive agreement under which Imagine will acquire Pixel Power Limited, a wholly owned subsidiar...
21/10/2025
Atlanta DTH (ADTH) today announced a major update that will expand the functionality of its NEXTGEN TV receiver by enabling gateway capabilities allowing viewer...
21/10/2025
Heartland Video Systems, Inc. (HVS), a premier video systems integration, consulting, and expert ATSC 3.0 implementation firm announces that it has partnered wi...
21/10/2025
QuickLink, the leading provider of award-winning video production and remote guest integration solutions, today announced the appointment of Austin Hinton as it...
21/10/2025
nternet connectivity startup Miri Technologies Inc. will use this week's NAB Show New York as the launch pad for its latest ground-breaking innovation, the ...
21/10/2025
In today's evolving media landscape, audience measurement isn't one-size-fits-all. With multiple measurement providers now available, networks need the ...
21/10/2025
21 Oct 2025
VEON's Beeline Kazakhstan to Acquire Online Classifieds Busines...
21/10/2025
October 21st, 2025
Tribeca Films to Release Independent Comedy Serious People...
21/10/2025
Synthogy is excited to announce the launch of Ivory 3 LE Uprights - Modern & Vintage, the latest addition to our game-changing Ivory 3 platform. This Legacy Ed...
21/10/2025
Series from Sony Music Vision Features Exclusive Performances From Legendary Bas...
21/10/2025
Tuesday 21 October 2025
Current slide, 0 0, undefined1
0
Download assets
Buying a home ranks above becoming a parent as a key rite of passage into adulthood...
21/10/2025
Back to All News
Netflix Unveils Monster Mash in Trailer of Troll 2' - the...
21/10/2025
Back to All News
For the Fans! Netflix Goes Golden Forging Unprecedented KPop ...
21/10/2025
How can businesses close the AI adoption gap? Start with your recruitment teams, new LinkedIn research shows Published on Oct 21, 2025 Categories: Data and in...
21/10/2025
Synchron Stage Reverb SO: free for Focusrite customers Bring the sound of Vienna's legendary recording stage into your music, free for all Focusrite custo...
20/10/2025
Inside TAMS: How Time-Addressable Media Stores could redefine sports workflows By Paul Markham
Friday, October 17, 2025 - 08:57
Print This Story
A penalty...
20/10/2025
Transformational production: Inside TVN's remote production push for the DFL...
20/10/2025
How NBC Sports Transitioned Stamford Facility to One Format: 1080p HDRMulti-year plan harmonizes workflows, simplifies operationsBy Ken Kerschbaumer, Editorial ...
20/10/2025
NBA on NBC' Studio Production Team Is Ready for Tip-Off With Coast-to-Coast...
20/10/2025
Under pressure: TVN CEO Markus Osthaus considers the German sports broadcasting ...
20/10/2025
(L-R) Maria Dizzia, Carmen Emmi, and Russell Tovey attend the Plainclothes pre...
20/10/2025
In March, we launched Concerts Near You to help listeners find concerts from their favorite artists. Since then, more than 3 million people have used it to disc...