End of Taxonomy

End of Taxonomy

In Enterprise Content Management (ECM), using taxonomy to define a folder structure becomes unscalable when the system reaches large numbers of files. Consequentially, some ECM leaders such as Opentext ECM and Microsoft SharePoint have limits on the number of items in folders. While this problem can be somewhat ameliorated by using subfolders, you don’t want your taxonomy to require convoluted folder structures such as “Contract X > A > B > ABC Company Contracts”.

Large ECM providers are making the leap to replace taxonomy with powerful search engines combined with OCR (optical character recognition) and document metadata. Over the past decade, OCR technology has become very reliable, but has reached a relative plateau in further development partially because it has already met most user needs. However, it cannot perform more complicated tasks like differentiate between an upper case ‘i’ and a ‘1’ or parse handwritten letters. All the big names now allow, to some degree, the search of custom metadata and full text. To the end user, the biggest improvement in search is in the implementation of parametric and faceted search, which allows users to search by tags in the search box, rather than doing a simple query on document content. This feature replaces the need for taxonomy, also requires users to create that searchable metadata for their documents.

Different enterprise search technologies have made the move away from taxonomy possible. Solr search is the most popular enterprise search engine on the market and it has kept innovating itself to stay competitive and up to date. Elasticsearch, which was born in 2010, is now the second most common enterprise search tool and can add a lot of power to ECM. Solr is very similar to Elasticsearch, but they are competing as the number 1 and 2 respectively as enterprise search tools. Both are based on Apache Lucene, support faceting and allow API access (via HTTP). Solr search allows for a customized search workflow, but doesn’t support complex document structure like Elasticsearch. Elasticsearch also takes greater advantage of faceted and tag search.

To leverage all the benefits of this new search technology, ECM designs should replace locally-stored enterprise file systems with cloud bucket storage (such as Amazon S3). Some EMC tools can also provide a visual representation of the folder structure that is stored along with the metadata in a database to help users navigate search results. Ultimately, search will eliminate folder capacity planning and end taxonomy in favor of new the powerful tools like Solr and Elasticsearch.

Thank you,

Tim

To view or add a comment, sign in

More articles by Tim Fram

  • Who should own data governance?

    When clients are interested in setting up data governance and implementing a plan ownership becomes a hot topic. Often…

  • What is Product Information Management?

    Over the past two years, project information management (PIM) has become a popular offering to address a growing need…

  • How to make enterprise search like Google

    For the past 10 years, companies have asked their IT departments to make enterprise search more like google. Google…

  • Does cloud computing cause a storm of problems?

    Outsourcing your infrastructure to the cloud has evolved from an attractive alternative for startups to an established…

    1 Comment
  • The Link Between Search and Taxonomy

    In the field of enterprise content management (ECM), taxonomy and search can work together, but often times work as…

    1 Comment
  • DMS Vs DAM

    Enterprise content management offerings are commonly divided into two groups: Document Management System (DMS) and…

  • Server Roles in the Digital Age

    In the world of enterprise software, especially content management, servers are given task specific roles. There is…

  • Selling your computer's excess capacity

    Many of us own a personal computer but we don’t use them 100% of the time. Lucky over the past few years, innovations…

  • Expanding storage space on local drives using the cloud... without your machine knowing the difference

    Now that the cloud is has gained popularity, many server capacity planning issues are irrelevant. Unfortunately, not…

Others also viewed

Explore content categories