Sunday, September 21, 2014

It’s Not Just About the Unstructured Data

For well over a decade, the content management world has been claiming unstructured data. The argument usually goes something like this:
Structured data is the information that comes in the form of numbers, words, dates, percentages, and currency amounts that all fit neatly into the rows and columns of a database. Unstructured data, on the other hand, consists of documents, images, web pages, video files, CAD drawings, and PowerPoint files for which a database is ill suited and that thus require specialized technologies to ingest, analyze, manipulate, share, and archive it. This unstructured data – or content - represents over 80% of all the data in the enterprise. BTW, I’m pretty sure that Gartner made up that 80% number.
I admit that I was one of the early pioneers of this message and I carried it dutifully for years. The entire content management industry did that. But the more I’m learning about what customers really want, the more I’m coming to realize that we have been all wrong.
Because, customers don’t care about managing unstructured data.
What customers want are applications that address real business problems. Real business problems require real information and that almost always comes in both, structured and unstructured form. In fact I can hardly think of an application that doesn’t need to combine both types of data sets.

Take Invoice Processing. There is the structured data like the name of the supplier, the date, the list of goods, the total, etc. But there are also the invoice itself, the bill of lading, the damage reports and pictures, and other unstructured data.

How about Employee File Management? You have the employee files such as the original job application, resume, contract, performance reviews, and training certificates – all of them are unstructured documents or scanned images. But you also have the reporting structure, salary data, bank account info, benefits, bonus attainment, and other structured data.
In most applications, the structured and unstructured data need to be used together. Sure, the data may need to be kept in different containers – structured data in a database and unstructured data in the repository of a content management system. But using one without the other doesn’t really solve real business problems.
I think that the myopic focus on unstructured data has hurt the enterprise content management (ECM) industry. Sure, we need the specialized software that can manage the unstructured data but ultimately, customers need applications that can handle both, structured and unstructured data together in a single solution.

Thursday, September 18, 2014

If Scotland votes ‘Yes’ to independence in the referendum, what will happen to our data?

The following article originally appeared in the IT Pro Portal on September 18, 2014:

Scotland goes to the polls today to vote on independence from the Union. If a ‘Yes’ vote is passed, it will throw into question the massive issue of data sovereignty.

It’s a curious notion, and one that both Whitehall and Holyrood have not publicly answered. If we consider the consequences from a data protection perspective, they are incredibly complex. Especially as the EU Data Protection Directive mandates that data cannot be transferred outside of the 28-member states territory. This means that organisations need to prove where their data is at all times. Scotland, as part of the UK, is presently an EU member but this could quickly change, as only last night the Telegraph reported that Spain’s European Foreign Affairs Minister said that a separate Scotland would need to wait five years for EU membership and join the single currency.

The next wave of EU data protection reforms will introduce further enforcements around information crossing borders. The fact that the majority of business communications are digital by nature – such as email and productivity tools like Microsoft Word and Excel – and are effectively borderless, these carry more problems for data sovereignty compliance. When these reforms were first suggested last year, the Direct Marketing Association said it was ‘”strict and unworkable” and claimed that it would cost UK businesses an eye-watering £47 billion in lost sales and regulatory costs.

Today your data may be based in the UK, but tomorrow, will it be in England or Scotland? The key is allowing companies from either country to pick where they want their data to live and guarantee that it resides there.

A ‘Yes’ vote will mean that the data will have to be migrated. But to where at this stage can only be speculated. What we do know is that for companies that want to move their data from Scotland to England, capacity and power availability within the Greater London M25 belt will be under extreme pressure. If Scottish data needs to head north, then perhaps the challenge isn’t quite so unwieldy. Scotland’s climate is well-suited for cooling power-hungry server farms. It boasts a thriving data centre economy, with substantial investment pumped into the hundreds across the country, with some areas even building new locally-generated renewable energy-based data centres, which are expected to go online next year. Scotland’s digital sector is worth around £3 billion to the economy and boasts over 73,000 jobs. For a population of just over five million, it’s certainly a healthy industry.

The nationalist campaigners suggest they could transfer Scotland’s data from Whitehall systems by 2018, but this is likely to result in considerable disruption to public services, not to mention commercial implications for organisations that own or host from data centres based there. Over time, these issues will have to be remedied. The result will be that data sovereignty will become a board issue and part of future business and legal operations as principle. But this is not necessarily a bad thing.

The recent spotlight on data sovereignty originates from the much-reported WikiLeaks and Snowden affairs and the US’ National Security Agency spying revelations. These stories have created such a wake that, according to ResearchNow, a quarter of UK companies are now expected to pull their data out of US data centres. Protecting the integrity of data is definitely at the top of the corporate agenda and it requires sovereignty and security embedded by design.

If Scotland does separate form the UK and takes a while to decide how it wants to pursue its membership with the EU, it will mean that all data housed in Scotland from another country’s origin would need to move inside the EU. An alternative is that a provision could be made for its own data protection regulation but this would need to be written and ratified – a pretty costly and complicated exercise, never mind the process of data migration.

Given that both England and Scotland speak the same language means that the information doesn’t need to be translated, but that also makes it more difficult to separate Scottish data from English data. The challenge will be organising and sorting through Petabytes of data and establishing whether it originates from England or Scotland. There are obvious clues, such as place names and cultural references, that will help with labelling but in reality this particular job is for humans who unfortunately are inherently unreliable when it comes to organising information. Auto-classification tools on the other hand typically deliver 80–90 per cent accuracy, as opposed to human classification, that on average results in 60 per cent of information being properly classified.

For the whole of the United Kingdom to adhere to data protection laws and deal with the sovereignty of data requires a strategic view when it comes to managing enterprise information. As far as Scottish, Welsh and English borders are concerned, the task of migrating so much information will be a tremendous undertaking. Let’s hope we don’t get to that stage.