Sunday, September 21, 2014

It’s Not Just About the Unstructured Data

For well over a decade, the content management world has been claiming unstructured data. The argument usually goes something like this:
Structured data is the information that comes in the form of numbers, words, dates, percentages, and currency amounts that all fit neatly into the rows and columns of a database. Unstructured data, on the other hand, consists of documents, images, web pages, video files, CAD drawings, and PowerPoint files for which a database is ill suited and that thus require specialized technologies to ingest, analyze, manipulate, share, and archive it. This unstructured data – or content - represents over 80% of all the data in the enterprise. BTW, I’m pretty sure that Gartner made up that 80% number.
I admit that I was one of the early pioneers of this message and I carried it dutifully for years. The entire content management industry did that. But the more I’m learning about what customers really want, the more I’m coming to realize that we have been all wrong.
Because, customers don’t care about managing unstructured data.
What customers want are applications that address real business problems. Real business problems require real information and that almost always comes in both, structured and unstructured form. In fact I can hardly think of an application that doesn’t need to combine both types of data sets.

Take Invoice Processing. There is the structured data like the name of the supplier, the date, the list of goods, the total, etc. But there are also the invoice itself, the bill of lading, the damage reports and pictures, and other unstructured data.

How about Employee File Management? You have the employee files such as the original job application, resume, contract, performance reviews, and training certificates – all of them are unstructured documents or scanned images. But you also have the reporting structure, salary data, bank account info, benefits, bonus attainment, and other structured data.
In most applications, the structured and unstructured data need to be used together. Sure, the data may need to be kept in different containers – structured data in a database and unstructured data in the repository of a content management system. But using one without the other doesn’t really solve real business problems.
I think that the myopic focus on unstructured data has hurt the enterprise content management (ECM) industry. Sure, we need the specialized software that can manage the unstructured data but ultimately, customers need applications that can handle both, structured and unstructured data together in a single solution.

No comments:

Post a Comment