By Heather Hedden

March 2019

Taxonomies in Support of Search

Taxonomies, or any controlled vocabularies of index terms, support consistent tagging and accurate retrieval. They normalize concepts, bringing together synonyms or other variant forms for the same intended concept, so that desired content is not missed in searching on alternate terms, and inappropriate content is not retrieved due to the occurrence of words with multiple meanings. Additionally, groupings of taxonomy terms of structured hierarchies or into facets can guide users to the desired taxonomy term.

Taxonomies require effort to create and then apply in tagging, though, so the benefits of taxonomies over search alone are often questioned, but search has limitations. While improvements have been made in search technologies in the past, there have been no new improvements recently, and methods of search that work well for web search engines do not necessarily apply to search on the relatively smaller collections of content within an organization. Taxonomies and search are not mutually exclusive methods for users to find content. Rather, taxonomies and search can be used in combination in various ways for improved results and an improved user experience.

When search was initially introduced to enterprise and published digital content systems, it was seen as an inexpensive alternative to developing a taxonomy and manually indexing content. In some cases, where more highly accurate results were desired, users would be presented with the choice to either browse the taxonomy topics or search the full text of content, but the two functions were separate. The browsable multi-level hierarchical taxonomy has meanwhile become less common, as taxonomies have grown too large to be practical to display fully. Now, there are various methods to combine taxonomies and search for a unified experience with the best results: searching on the taxonomy, taxonomies in faceted search/browse, taxonomy terms to refine post-search results, and knowledge graphs.

Search on taxonomy terms
Searching on taxonomy terms is typically enabled within the standard search feature, so the user may not even be aware that the search is on the taxonomy. A single search box is presented to the user. There could be an option to select “search type,” offering the user a choice between full text/keyword search and taxonomy/subject search, but a simpler option would be to have the default search on taxonomy terms, and when no taxonomy term (or its synonyms) match the user-entered search string, the search automatically reverts to full-text search. Thus, it is a seamless experience to the user, who does not need to decide whether or not to search the taxonomy.

When search is on taxonomy terms, taxonomy terms may or may not be displayed to the user. A common option is to present a short list of taxonomy terms, just under the search box, in a type-ahead feature. As the user enters characters into the search box, taxonomy terms that begin with the same letters are presented in the type-ahead list. A more sophisticated version, as “search suggest,” includes taxonomy terms that contain a user-entered word within the taxonomy term and not merely at the beginning of the taxonomy term. Variants/synonyms of completely different words, are typically not included in “search suggest,” but they could be.

Faceted search/browse
Faceted search is an increasingly common implementation of search in combination with taxonomies. Facets are filters that limit search results based on different aspects, such as document type, location, organization, sector, topic, function, etc. A faceted taxonomy is designed to have facets and sets of terms within each facet that are appropriate for the content. The list of terms within each facet is usually short enough to be browsed. Content items are tagged with taxonomy terms from each of multiple facets, and users select terms from each of any number of facets to narrower the results. Faceted search gives control to the user to limit or broaden the search results further. The user can select terms from facets after or before executing a keyword search.

Despite their benefits, faceted taxonomies are not always practical. The content may lack a set of consistent aspects which would serve as facets. For example, products share similar attributes that serve as facets, such product category, user or use, size, material type, color, etc. However, a set of research articles may have only a single attribute, being “topic.” Even when “topic” is one of several facets, it could be problematic if there are too many topics for the user to easily browse through, since facets and their terms are meant to be browsed.

Post-search refinements
Implementing taxonomy terms as post-search refinements is a practical solution when there are too many topical terms to be easily browsed. Sometimes topical taxonomies may have hundreds or thousands of terms. A post-will display (in the left or right margin) only the taxonomy terms that have been used in indexing on the records retrieved in a search. The terms are typically listed in order of usage counts of their indexing occurrence on the retrieved records. Thus, the user executes a traditional search first, then sees what the taxonomy terms are that are linked to the retrieved set of records. and then selects from one of these terms to get a subset of the retrieved results that have been indexed with that term to get a more focused result.

Post-search refinements have a similar user experience as facets. They are different from facets, because they display only the values that are found indexed on the result set, not the entire taxonomy. As such, post-search refinements are suitable for very large topical taxonomies or thesauri.

Knowledge graphs
Knowledge graphs are a more recent development in displaying information to users after they execute a search. In addition to the traditional display of search results with links to records, the knowledge graph displays a quick fact box, serving up information about the searched topic, without the user having to follow links. Knowledge graphs, which are based on graph databases instead of relational databases and can make use of data in RDF triple stores, extract data based both on entities/things and their relationships to each other to create the display of information.

Just as search engines were first introduced for the web and later brought into the enterprise, so are knowledge graphs, popularized by Google on the web, beginning to come into the enterprise, in combination with enterprise search. For example, it could be information about an employee expert, a project, or a facility. Knowledge graphs rely on metadata, and while content on the web is not consistently tagged with metadata, enough content on certain topics on the web has metadata so that a knowledge graph fact box can be created. Inside the enterprise, however, for knowledge graphs to work, content needs to be more consistently tagged with metadata, which comes from some kind of taxonomy.