End of Taxonomy
In Enterprise Content Management (ECM), using taxonomy to define a folder structure becomes unscalable when the system reaches large numbers of files. Consequentially, some ECM leaders such as Opentext ECM and Microsoft SharePoint have limits on the number of items in folders. While this problem can be somewhat ameliorated by using subfolders, you don’t want your taxonomy to require convoluted folder structures such as “Contract X > A > B > ABC Company Contracts”.
Large ECM providers are making the leap to replace taxonomy with powerful search engines combined with OCR (optical character recognition) and document metadata. Over the past decade, OCR technology has become very reliable, but has reached a relative plateau in further development partially because it has already met most user needs. However, it cannot perform more complicated tasks like differentiate between an upper case ‘i’ and a ‘1’ or parse handwritten letters. All the big names now allow, to some degree, the search of custom metadata and full text. To the end user, the biggest improvement in search is in the implementation of parametric and faceted search, which allows users to search by tags in the search box, rather than doing a simple query on document content. This feature replaces the need for taxonomy, also requires users to create that searchable metadata for their documents.
Different enterprise search technologies have made the move away from taxonomy possible. Solr search is the most popular enterprise search engine on the market and it has kept innovating itself to stay competitive and up to date. Elasticsearch, which was born in 2010, is now the second most common enterprise search tool and can add a lot of power to ECM. Solr is very similar to Elasticsearch, but they are competing as the number 1 and 2 respectively as enterprise search tools. Both are based on Apache Lucene, support faceting and allow API access (via HTTP). Solr search allows for a customized search workflow, but doesn’t support complex document structure like Elasticsearch. Elasticsearch also takes greater advantage of faceted and tag search.
To leverage all the benefits of this new search technology, ECM designs should replace locally-stored enterprise file systems with cloud bucket storage (such as Amazon S3). Some EMC tools can also provide a visual representation of the folder structure that is stored along with the metadata in a database to help users navigate search results. Ultimately, search will eliminate folder capacity planning and end taxonomy in favor of new the powerful tools like Solr and Elasticsearch.
Thank you,
Tim