Lubor On Tech: content management

Showing posts with label content management. Show all posts

Sunday, March 30, 2025

Enterprise Content and AI Security

You can’t talk to a business customer today without AI coming up. While most people seem to have embraced the power of public generative AI tools like OpenAI’s ChatGPT or Google’s Gemini, there’s a lot of hesitation when it comes to using generative AI on enterprise content. The one concern that comes up again and again?

Security.

Rightfully so. Public AI tools don’t have to worry about security. They’re gobbling up all the data on the public internet with the motto: “Train first, worry about intellectual property rights later.” Technically, nothing is stopping them from doing that, and their models are fed by scrapers and crawlers that grab everything they can find.

In an enterprise, however, that doesn’t work. Enterprise data is privileged, confidential, and subject to privacy laws. The data cannot be shared with everyone, and AI models must respect that. It means two users must receive different answers to the same question, depending on the data they’re authorized to access.

The problem is, if your content isn’t well secured and governed in the first place, AI will expose those holes quickly. You may have been able to hide some data behind cryptic file names, but that won’t stop the AI models. Having solid data governance with granular, clean permissions is imperative. Otherwise, it’s “bad security in, bad security out,” to paraphrase Fuechsel’s Law ("garbage in, garbage out").

It also means you need to bring AI tools to your content rather than trying to bring your content to the AI tools. It’s hard enough to secure your content in the first place, and the idea of copying a snapshot into a separate container for AI would obliterate any of that security.

Don’t expect public AI vendors like OpenAI, Google, Anthropic, Meta, or DeepSeek to solve this problem. Enterprise content is a different animal—one they neither understand nor care to understand. None of these vendors has any enterprise DNA. Security is not their concern, and their models aren’t built with the assumption that data access should vary by user.

To illustrate this point, let me remind you of what happened with enterprise search. Web search, which we all use many times a day, is based on an index created by crawlers that scour the internet to deliver the best content match for your keywords—the same results for everyone. That’s what Google does in simplest terms. But in the enterprise, that approach doesn’t work. Enter enterprise search.

About 20 years ago, Google—the heavyweight search champion—entered the enterprise search space with a bright yellow, rack-mounted Google Search Appliance, drawing a lot of attention with its promise that managing content wasn’t necessary: "Wherever it is, you can find it with Google". Or something like that.

It sounded great—except it didn’t work. Google eventually discontinued the product after a decade of trying. Interestingly, other major players in enterprise search met similar fates. There was FAST, which Microsoft acquired in 2008—only to discover a year later that FAST had been cooking the books. And then there was Autonomy, which HP acquired in 2011, only to eventually sue the CEO and CFO for—you guessed it—cooking the books. The Hollywood-worthy Autonomy saga ended with the CFO in jail and the CEO dying in a freak boating accident. (I described that story in more detail last year in “Mike Lynch, Autonomy, and Incredible Coincidences”.)

Today, search is provided by the companies that own the data. Enterprise search is hard, and usually, only the company that built the repository has a chance of doing it well. On the web, Google finds content that wants to be found—literally. Millions of companies spend billions of dollars each year on SEO to make their content easily discoverable. And there’s no security to worry about.

Enterprise content is different. It’s not optimized for search engines, and security is not optional. This is hard to get right. Eventually, the open-source Apache Lucene solved the problem well enough, and that’s what many enterprise applications use today. Still, you rarely hear anyone say, “Wow, this search is amazing”—because it doesn't match Google’s web search, which sets the expectations bar.

Now, let’s come back to AI. The vector databases at the heart of enterprise AI models must respect data security, just like search indexes do. That’s incredibly difficult for anyone other than the companies that hold the data. Only they understand the data structures, the users, and their permissions. For any external application, sure, it’s possible, but it's really hard to make that work. If you don’t believe me, think back to enterprise search.

AI in the context of enterprise data will be extremely valuable, with the potential to dramatically boost productivity—whether it’s through assistants, agents, or whatever comes next. But in an enterprise, the first rule will always be: respect the data’s security.

And that makes it hard.

Sunday, October 6, 2024

Why I Joined Egnyte

Yes, after two years of working on my own as an independent consultant, it happened. One of my clients made me an offer to join their company, and I decided to take the leap. Why did I do that?

"Lubor, I thought you were done with this space!" said Marko Sillanpaa from Gartner when I joined a recent briefing. Marko and I go way back to our Documentum days.

Well, it’s true. After spending nearly 15 years in content management companies like Vignette, Documentum, and OpenText, I was starting to get bored. For years, I had been trying to convince customers to manage their content to avoid compliance, governance, and litigation risks. But, aside from companies in highly regulated or litigious industries, most didn’t care.

Things became even clearer when the new upstarts began pushing Enterprise File Sync and Share (EFSS). In the spirit of "consumerization" and "enterprise 2.0", they claimed that IT was becoming irrelevant, and many companies decided to stop worrying about how their content was managed. Employees were indiscriminately sharing files straight from personal drives. It was madness, but I knew it would take time for people to realize that.

So, I left the enterprise content management (ECM) world and ventured into business applications, to finally get the experience of marketing to business buyers. Funnily enough, the companies I worked for had a horrible way of sharing and managing content. Their content chaos was a massive productivity drain, and they didn’t even know about it.

But, I kept an eye on the content platforms.

For a while, it seemed like nothing would change. Most of the old vendors faded away, and EFSS 'boxes’ became just as boring once they started selling to enterprises. They kept talking about compliance on and on, but no one cared. Content was seen as more of a liability than an asset, with "secure collaboration" being the most exciting positioning they could muster.

Then, generative AI arrived.

It quickly became clear that while ChatGPT is awesome, for business use, it needs to work with a company’s private, often sensitive, content. That changed the game for content platforms since they own the content repositories. Even if a lot of content still lives in personal drives, content management suddenly matters again. The focus shifted from reducing risk to boosting productivity. Now that’s cool—AI used for something truly useful! I want to be part of that.

When I started working with Egnyte as one of my clients, I found a company with leading-edge cloud technology, with all the bells and whistles you’d expect from a modern content platform. Unlike EFSS vendors, Egnyte has always focused on control and security, eliminating duplicates, data leaks, and privacy issues. This becomes crucial when applying generative AI to private company content. Random duplicates can lead to incorrect answers, which isn’t acceptable in business. Egnyte figured this out long ago.

On top of that, Egnyte has powerful and unique capabilities for managing highly complex content with massive files like CAD files, BIM models, and Adobe creative projects. I saw a company with amazing technology, strong prospects, a great team, good culture, and solid financials. I knew I could make a difference here. So, I said yes.

Yes, I’m running product marketing again, which is what I love. I’m diving deep into the technology, working closely with product teams, and using all my marketing craft to tell Egnyte’s story. There’s a lot of work ahead, but the opportunity is huge.

I’m excited to be back in the content management space. It’ll be fun reconnecting with industry analysts like Marko Sillanpaa, Cheryl McKinnon, Craig LeClair, Marci Maddox, Alan Pelz-Sharpe, Dan Lucarini, and others. I might check in with AIIM, where I served on the Board for five years, to see what they’re up to. And I’m really looking forward to meeting with customers in person.

It’s great to be back. We’re just getting started!

Saturday, November 4, 2023

The Lost Art of Content Management

After leaving the ECM space a few years ago, I had the chance to see how companies, those that don't sell content management products, manage their content. What I discovered is a disturbing mess.

In most companies, content like documents, spreadsheets, images, and more is left unmanaged. Thanks to cloud-based office suites like Microsoft 365 and Google Workspace, sharing documents has become incredibly easy. You no longer need to send documents via email (which was a mess by itself), and you don't have to put documents in shared folders either. Nowadays, you can share documents right from where they were created. Unfortunately, most people's default sharing location is their personal folder.

That's right, most knowledge workers share their documents directly from their personal folder - with some disastrous consequences. By not using a shared, well-organized repository for their documents, many issues arise. As each worker controls their own sharing permissions, they tend to be either overly restrictive or overly open. Sharing with everyone by default creates obvious security vulnerabilities. Sharing only with a specific list of individuals is more secure, but it prevents new colleagues from leveraging the content later.

The concept of a central repository is quite simple. It is a structure of containers (folders) from which documents inherit their properties, including access permissions. When organized logically, it is easy to navigate. If, for example, you need to find all Engineering projects from Q2 2022, you'd navigate through the folders Engineering -> Projects -> 2022 -> Q2 to find them. Since you might not know the project names or topics, relying on search would be a poor option, not to mention dealing with those pesky access permissions (you can;t find what you can’t access). This is where the hierarchical structure shines. A decent content management system handles access permissions while helping employees navigate the structure and locate relevant content without the need to search.

Another effective content management tool for organizing content is tagging or classification. It allows users to find content using filters as an alternative to hierarchical navigation and search. That is how good content libraries are organized. Unfortunately, few bother to tag content today.

With no efficient way to find and organize documents, employees are left to manage links to documents on their own. I've seen some crazy approaches forced by necessity - extensive browser bookmark collections, documents full of links, and browsers with over 100 tabs open. The only alternative is search, assuming you know what you are searching for. Search will not tell you 'What else you should know when working on this project.

When an employee leaves, their documents are typically inherited by their supervisor. However, supervisors rarely have the time to review those documents, so they end up in a subfolder, never to be looked at again.

It's a nightmare.

Of course, you can create a shared folder structure in Google or Microsoft; you can even use tags. In reality, though, few people do that because nobody told them to do so. The IT department doesn't pay any attention to this issue (although they should) because they are too busy supporting hundreds of cloud apps, from Anaplan to Zuora.

Knowledge workers don't worry about content management because they don't know any better. They were promised consumerization, where using software at work should be as easy as using Facebook at home. Nobody thought they needed training. But it turns out, they do. Without proper training, you end up with a mess.

This chaos has opened the door for a range of new companies. Project Management software like Asana or Wrike is essentially a bunch of shared folders with a dashboard on top. Brand Management software like Brandfolder and Bynder are just simple digital asset management systems in a shared container. Intralinks Deal Rooms are shared folders marketed to investment bankers. Seismic is just a shared folder with better tagging. All of these could be created in a regular content management system like Box. The reason these companies exist is NOT because Box is lacking some features.

Box understands the concept of content management and has built a fantastic product. However, Box, along with other content management vendors (sorry, I use Box and so I like to pick on it), has failed to create a market. Yes, there are companies in regulated industries like Life Sciences and Financial Services that are so heavily regulated that they have no choice but to use a content management system. But most companies don't see the point. Instead of using Box, they end up purchasing Asana, Brandfolder, and Seismic and then wonder why they have high IT costs and low productivity.

That is the problem that Box should be solving. It should establish a market by educating companies and knowledge workers on the importance of managing information. It should stop chasing the latest trends like generative AI because those won't result in a single new customer. The key to creating a market is explaining to people why they need to care about managing content.

Because right now, nobody cares.