Sony Pixel Power calrec Sony

Pre- Vs. Post-Claification For Records Metadata by Juerg Meier

18/08/2016

I recently watched a demo of the IBM Watson natural language processing (NLP) tool that showed how it was used by police for criminal investigations at a Records and Information Management conference. I was struck by the fact that this fascinating tool may have led some Records and Information Managers in the audience to think that this was the new, disruptive way to manage records.

CAN YOU RUN A RECORDS AND INFORMATION MANAGEMENT PROGRAM ON POST-CLASSIFICATION AND MACHINE LEARNING? IBM Watson can be looked at as a post-classification tool, representing a whole group of NLP tools often marketed under the Big Data umbrella. Post-classification tools use algorithms to produce classification for an information asset after the asset is created or stored. In fact, every major supplier and the open source community offers similar tools. This can even seem overwhelming for those in the eDiscovery space.

The demo was really impressive. It was almost as if Watson went through the test documents by magic- thereby identifying people, organizations, places, phone numbers and frequently used terms. A click on one of the filtered terms lists its occurrences across various documents. The tool also has the ability to produce graphics that connect interrelated terms visually. This is, of course, quite a departure from the rather minimalistic user interfaces typically found in electronic records management systems (ERMS).

What is attractive about potentially using such an analytics tool for Records and Information Management is its ability to reduce the complexity of information management processes, in particular, capturing metadata. It could:

save us from lengthy discussions with information asset creators to get them to provide the right metadata and how and in what format its delivery should take place

eliminate the need to design metadata structures for document types, simplifying forms management and the optical character recognition process (OCR) during scanning

allow users to search for criteria originally not available for lookup

provide the ability to search for important emails in the email archive or unstructured documents.

No doubt, adopting such a tool has the potential to appear to be a silver bullet for Records and Information Managers.

However, two things should be kept in mind: with a tool like Watson, you (a) buy into algorithms and (b) require these algorithms to build the structure that has not been created and provided upfront, i.e. during an on-boarding analysis processes. Or to summarize the two issues: what sort of structure is actually used or produced by such algorithms?

NLP and text mining algorithms (with impressive names like Latent Semantic Indexing or Na ve Bayes) work on a mathematical-statistical basis, building trees based on the frequency and proximity of terms in a given document or set of documents. Typically, they don't work with pre-existing ontologies, and they don't really understand what they recognize. The algorithms are able to identify amounts in documents, but what sort of amounts are they? This is left to human review to sort out, but at least these tools can provide us hints.

PRE-CLASSIFICATION IS STILL THE WAY TO GO FOR RECORDS AND INFORMATION MANAGEMENT The classification structure for Watson is determined by its algorithms, whose outcomes are somehow unpredictable. Tools like Watson are widespread in the forensics and e-discovery communities, as these domains are often interested in unknown bits of information in datasets. But for Records and Information Management usage, this approach is simply not good enough. Precision and recall seem too arbitrary that they could truly satisfy the findability requirement stipulated in ARMA's General Record Keeping Principles, for example.

For simplicity, I am defining a record as something that it is created by a business process. Consequently, as much as business processes are defined and executed in a controlled environment, properties found in records and their metadata are also well known, and their possible values are actually restricted. Here are some of those properties:

client ID

transaction number

transaction type

privacy level

It is fairly intuitive what the metadata terms above mean. The size of their value sets (i.e. possible values) may differ dramatically, from a handful (privacy classification) to perhaps billions (transaction numbers) but both represent restricted value sets and not free, unstructured text. There is semantic behind these properties. Additionally, client IDs and transaction numbers may overlap in terms of format and value, creating serious problems for post-classification algorithms.

Pre-classification, or classifying information before creating or storing it, is clearly still the way to go for Records and Information Management professionals. Declaring the semantics of each required metadata property that describes a record and promotes finding it, and having the business process owner and the producing application to supply the correct values, remains an indispensable activity even in times of machine learning .

That said, I would not encourage you to dismiss of NLP or text mining. These tools can be very helpful if you need to classify, say, millions of office documents on shared drives or Sharepoints. But make sure it is you, the information professional, who can supply the terms, taxonomies, ontologies and document types of the knowledge domain to the analytics tool so the classification does not become arbitrary. It's all about creating well-understood, intended structure.
LINK: http://blogs.ironmountain.com/2016/service-lines/information-managemen...
See more stories from ironmountain

More from Iron Mountain Entertainment Services

14/06/2017

HIMs in the Driver Seat: Accelerating Data Integrity Efforts by Michelle Urban

Early on in EMR implementation it was all about getting up and running to realize the incentive payments offered by the government. How could legacy information...

14/06/2017

SMB Tax Season: Would Your Business Pass an IRS Audit? by Melissa Cantarow

If you run or work at a small business, you know there are many perks. From the close relationships to the opportunities to take on new skills and roles, there&...

14/06/2017

Are You Looking for Custom Kitting Solutions? Then Look No Further by Leslie Barton

Custom kitting can be viewed as the process by which your business touches pro...

29/03/2017

Leading the way to data quality by Karen Snyder

This year's Healthcare Information and Management Systems Society (HIMSS) Annual Conference had another record-setting year for attendance, with over 40,000...

28/03/2017

Leading the Way to Quality Data by James White

Having worked in Health Information for over 39 years, I recently had an opportunity to speak with Health Information students at Cuyahoga Community College. Th...

09/03/2017

AWS Outage Shows How Little Control Cloud Users Have by Nadine Dias

On Tuesday, February 28th, Amazon Web Services (AWS) had a service disruption that affected its Simple Storage Service (S3) which supports over 150,000 websites...

07/03/2017

Custom Kitting and Booklets: Cut the Clutter and Make a Good Impreion by Leslie Barton

I don't know about you, but I'm just way too busy these days. When I ask...

01/03/2017

What ds Trump s Dodd-Frank Reform mean for Banking and Financial Services? by Shawn A. Brazeau

The Dodd-Frank Act is currently being reviewed by the Trump administration in an...

28/02/2017

Do Your Point of Purchase Displays ATTRACT Customers? by Leslie Barton

In a competitive retail environment, creative use of point of purchase displays can set you apart from the dull roar of conventional installations. Manufacturer...

16/02/2017

Join the ranks of Heal IT profeionals standing up to #makeHITcount every day! by Elia Robins

Imagine a seamlessappointment with your health provider: Even with a last-minut...

14/02/2017

Treating the Old Wounds of Transition to Advance Value-Based Care by Michelle Urban

ARRA. HIPAA. Meaningful Use. Value-based Care. MACRA. And so it goes The only th...

09/02/2017

Meaningful Use and MACRA-Positive Change by James White

As a Healthcare Information management (HIM) professional for the past 40 years, I have experienced the transition of the paper medical record to today's mo...

08/02/2017

IG Solutions - It s A New Day for Records Retention Schedules! by Craig Grimestad

Companies often struggle with numerous issues when developing and administrating...

06/02/2017

Ds Healcare Care About Clinical Quality Improvement? by John Lynn

We face a big challenge in healthcare. A doctor's success is largely not dependent on the quality of care they provide a patient. We won't dive into the...

03/02/2017

Holy MACRA! by Karen Snyder

A lot has been written about the Medicare Access and CHIP Reauthorization Act, better known as MACRA, since it was published in October 2016. The intent of this...

31/01/2017

Destination: Value-Based Care by Nancy Twombly

Despite all predictions on the future of our healthcare delivery system, healthcare is - and will continue to be - one of the key drivers in our economy. While ...

27/01/2017

Data privacy in the IoT Era by Paul Gillin

It's appropriate that this year's Data Privacy Day (January 28) takes place just three weeks after the giant Consumer Electronic Show (CES). CES is an a...

13/01/2017

5 Top IT Predictions for 07: Which Technology Trends will Impact You? by John Boruvka

This is certainly the time of year for predictions, forecasts, and trends, so we...

05/01/2017

Are you using our Escrow Management Center to your Advantage? by Nadine Dias

As an Iron Mountain technology escrow customer, you have access to our online portal in Iron Mountain Connect called the Escrow Management Center. We hope all ...

05/01/2017

Retention Schedules To Purge or Not to Purge and When? by Linda Joshua

When developing a retention schedule and looking at the legal research that supports it, there are laws and regulations that set minimum periods that must be ad...

30/12/2016

What happens when your Source Code is released from Escrow? by Nadine Dias

You may know that technology escrow (also known as software escrow) is a best practice for safeguarding your software source code in case there is ever an issue...

23/12/2016

The Psychology of Records Management: Energize Compliance with Technology by Craig Grimestad

This is the last of a 7 part series on energizing compliance. The last? I though...

29/11/2016

Mergers, Acquisitions & Divestitures: Managing Information and Risk by Mark Emery

Mergers, Acquisitions and Divestures (MA&D) happen for many reasons - to create ...

18/11/2016

Concerned with Busine Continuity? Understand Software Licensing Risks. by John Boruvka

If you're concerned with business continuity in your company, it's impor...

18/11/2016

Evaluating Your Storage Options: Tape or Cloud by John Sharpe

Over the past few years or so, data protection has trended towards the Cloud. But that doesn't mean tape is dead-far from it. Today's forward-thinking o...

04/11/2016

A Millennial s Impreion: My first Gartner Conference by Nadine Dias

I was the oddball millennial at the recent Gartner IT Financial, Procurement & Asset Management Summit in Grapevine, Texas. As one of the few millennials at th...

21/10/2016

Managing Cyber Security reats to Personal Heal Information by John Lynn

One of the challenges that keeps healthcare leaders up at night more than any other is managing cyber security threats to health data management. Unfortunately,...

17/10/2016

The consequences of data hoarding by Paul Gillin

At some point during the past decade, storage costs crossed a threshold that made it cheaper and easier for organizations to keep data than to throw it away. Th...

08/10/2016

Why Hackers Love Small Businees

Typically, when a data breach makes the headlines it involves a well-known brand with deep pockets which might give you a sense that hackers are only interested...

05/10/2016

Records management best practices: In 40 characters or le by Karen Guglielmo

Records management is a complicated job. RIM professional work hard every day to take their company's R and I and M the heck out it. So we took to Twi...

30/09/2016

Building a Framework for the Future of Federal Information Management by Lisa De Luca

The information landscape is constantly changing, especially for government. Fed...

30/09/2016

Care Continuum And the Internet of ings by Simon Morrell

I was once part of the Internet of Things. A sensor on my sneaker linked to a tiny computer on my wrist. When I plugged the computer into my laptop the battery ...

30/09/2016

Is Your Busine Prepared for a Cyberattack? by Eileen Sweeney

September is National Preparedness Month and a good time to think about whether your company is ready. Companies can improve preparedness by being proactive - r...

28/09/2016

The Psychology of Records Management: Energize Compliance with Forcing Functions by Craig Grimestad

With what? Never heard the term Forcing Functions used in Records Management (...

16/09/2016

A Virus Backup Plan: Responding to a Terrible, Horrible, No Good, Very Bad Day by John Sharpe

Extortion. Ransom. International crime syndicates. No, this isn't a descript...

16/09/2016

How can procurement help an organization master information risk? by Amy Perras

There is a tendency for discussions about information risk to focus exclusively on the dangers organizations find themselves exposed to and potential disasters ...

15/09/2016

Why is procurement so critical to minimizing information risk? by Amy Perras

In today's business environment, no organization can afford to ignore information risk. The consequences of an information related catastrophe, whether it&#...

23/08/2016

The 4 Levels of Verification: Verify Your Developer s Compliance by Nadine Dias

Once you have made the decision of whether or not you need technology escrow, ha...

22/08/2016

The Standard is the Standard: ISO 700 & Law Firms by Brianne Aul

This past June, our Pittsburgh ARMA chapter toured Heinz Field, home to the Super Bowl LI Champion (you heard it here first!) Pittsburgh Steelers. Directly out...

18/08/2016

Pre- Vs. Post-Claification For Records Metadata by Juerg Meier

I recently watched a demo of the IBM Watson natural language processing (NLP) tool that showed how it was used by police for criminal investigations at a Record...

15/08/2016

Closing the Gap by Lisa De Luca

Empowering Federal Professionals with Next-Generation Information Management Skills As anybody who has been paying attention to federal headlines or budgets ca...

10/08/2016

Putting information privacy first by John JT Tomovcsik

There is a lot more to protecting customer privacy than locking down facilities and enforcing strong passwords. It's about getting the entire organization a...

09/08/2016

A Licensee s Guide to Technology Escrow [Free eBook] by Nadine Dias

Investing in technology is not a decision that is made lightly. In today's technology-driven world, companies are competing to be the best of the best and t...

05/08/2016

Disposing of IT ts: eBay, You Found What on ose Hard Drives? by Michele Hope

Some organizations dispose of their IT assets by recycling what they can and discarding the rest. For others, this process involves reselling parts that still h...

05/08/2016

How to Reduce Data Storage Costs by John Sharpe

In the age of big data, IT managers are increasingly tasked with taking more comprehensive views of their organizations' data. This requires them to evaluat...

05/08/2016

Uncovering the Hidden Risk wiin Federal Information Management Programs by Lisa De Luca

Identifying and addressing the burgeoning skills gap issue The federal governm...

04/08/2016

My First 6 Mons as an IG Project Coordinator by Jeica Bundy

I recently checked in with Karen, the project coordinator for the Records and Information Governance Group at a large healthcare corporation in Missouri. Karen ...

28/07/2016

Customer Care by Day, Rock Star by Night by Megan OKeefe

Friendliness, patience, and determination are qualities that make for a remarkable customer service agent, and Kim possesses all three. As a member of Iron Mou...

22/07/2016

Top Benefits of Outsourcing Print and Fulfillment by Leslie Barton

You have to spend money to make money, as the old adage goes. But, when it comes to print and fulfillment, most companies would prefer to spend less and make mo...

21/07/2016

6 Bad Habits of Data Management- Part by John Sharpe

In my last blog, 6 Bad Habits of Data Management- Part 1, I covered three bad habits many IT departments across the globe are guilty of committing. Here are 3 m...