By Heather Hedden

March 2021

Upcoming events by this speaker:

Mar 29 – Mar 30 2021:
Taxonomy and Metadata Design


Metadata and Taxonomies

Metadata and taxonomies are distinct but related, and both are very important for making information, data, and content easier to manage and find. 

The expanding role of metadata management

We all know that data is both important and growing exponentially. That is why metadata, which is summary data about data, is increasingly valuable. However, if we think of metadata simply as “data about data,” then we might think that it is only important for managing data and leave metadata management to people who manage databases or types of database management systems. Metadata is more than that, and the responsibility for it is thus greater. 

Metadata is really standardized data about anything with shared attributes that needs to be organized and retrieved. These could be documents, spreadsheets, presentation files, images, multimedia files, or specialized files, such as engineering drawings. The documents and digital content tend to be content within an information system or file system. This is not limited, however, to database management systems, but includes content management system, document management system, digital asset management systems, authoring and publishing systems, collaboration systems, intranet systems, and workflow and project management systems, in addition to those that are forms of database management system, such as customer relationship management systems and product information management systems.

All these systems have features for managing metadata, but they often are not fully utilized or not properly utilized, since the system users often do not understand metadata management that well. As a result, content is less easy to find and manage. It is not difficult to learn good metadata management, but it takes a little effort and attention. The basics include making metadata comprehensive, consistent, justified by business use cases, and user-friendly. 

The expanding role of taxonomies

Taxonomies are organized arrangements of controlled terms/concepts that are associated with content to make it easier to find and retrieve the desired content. People use taxonomies to find the concepts they want and not just have to rely on keyword matches, which sometimes are inadequate or inaccurate. 

Taxonomies are often thought of as classification systems, such as those used in libraries, research collections, government statistics, or manufacturing specifications. While taxonomies have their origins in classification schemes, they have been adapted and thus go beyond the limited uses and formalities of such schemes. Taxonomies can be customized to a set of content, the needs of its users, and the requirements of a system and its user interface. We are all familiar with taxonomies for browsing for products in ecommerce websites, but each online store has a different set of products and thus a different taxonomy.

Taxonomies are useful for finding all kinds of digital content, not just ecommerce products. You have probably noticed categories, tags, and filters on web sites, intranets, and various information systems. Sometimes they are well-functioning and sometimes they are not. This depends on whether there is a well-designed taxonomy behind the structure of categories, tags, or filters. Actually, taxonomies can be implemented in most the same systems that utilize metadata: content management system, document management system, digital asset management systems, authoring and publishing systems, collaboration systems, intranet systems, etc. For this reason, taxonomies and metadata often need to be designed in combination. 

The connection between metadata and taxonomies

In any given implementation, metadata is organized into types, also called properties, elements, or fields, which are common across a set of content. For each individual content item, there are distinct values for each metadata property. The values can be text, dates, numbers, or a mere Boolean yes/no. Text can be a description, title, or key words. Key words can be open and user-created, or they can be restricted to terms in a controlled vocabulary. Controlled vocabularies can be short lists of just a few options for a metadata property (such as a list of five format types), or a controlled vocabulary could be a hierarchical taxonomy of several hundred terms. 

Most, but not all, taxonomies (not purely navigational taxonomies) serve to populate terms/values in metadata fields/elements; and some, but not all, metadata fields are populated by terms/values from controlled vocabularies. In sum, a metadata schema has more breadth, whereas a taxonomy has more depth.

The definition of “taxonomy” is not strict. While traditionally it means a hierarchical arrangement of terms in broader/narrower relationships, a set of multiple metadata properties and their values, even if there is no further hierarchy, may also be considered a taxonomy. It is considered a taxonomy, because there is a structured organization of terms into various metadata properties that apply to the same set of content. 

Designing taxonomies and metadata together

The question remains whether to start with creating the overall metadata strategy and schema and then build taxonomies as part of it as needed, or to start with creating a taxonomy and then, in the process, identify the various descriptive metadata that requires controlled vocabularies. Ideally the two are developed for implementation combination, as part of an integrated strategy. However, an expert in taxonomy development (a taxonomist) and an expert in metadata design (a metadata architect) are usually not the same person.

A metadata architect can design taxonomies, and a taxonomist can design metadata, or the two experts can work together on the same project. Actually, it is rare for an organization to have both such experts on staff. Whether an organization has a metadata architect or taxonomist depends on the nature of the organization’s content and content organization needs.

A hierarchical taxonomy can be integrated with metadata, when one of the metadata fields is for “Topic” or “Subject,” and there is a hierarchical taxonomy of subject terms associated with that field. However, it is the faceted type of taxonomy in particular that unites the tasks of taxonomy creation and metadata design.

Faceted taxonomies and metadata

A faceted taxonomy comprises a set of facets, each one as an individual controlled vocabulary whose terms are generally not linked/related to terms in the other controlled vocabulary facets, but the combination of terms from a combination of facets are used to tag the same set of content, and users search/filter on terms in combination from various facets. Examples of facets may be Product/Service, Market Segment, Location, Document Type, Supplier, etc. A faceted taxonomy is a common type for both enterprise taxonomies and ecommerce or product review taxonomies. It is called a “taxonomy” even though it differs from the classical hierarchical “tree” type of taxonomy, because it involves controlled vocabulary and classification. The name for each facet and the terms within the facet constitutes a simply two-level hierarchy.

Each facet is also a metadata field/element. The person designing a faceted taxonomy is thus also designing metadata, at least some of it. There are usually more metadata fields to describe the content beyond those which comprise the taxonomy facets. For a faceted taxonomy to best serve the user who is trying to find content based on what it is and what it is about, the number of facets should be limited, such as 4-8. Metadata, however, can serve additional purposes beyond helping users find content. Metadata may describe content for purposes of full identification, source citation, or information on how the content can be used, including rights data. It needs to be decided which metadata fields will constitute a displayed faceted taxonomy for the end-user to utilize in search, and which metadata fields will not but will rather display on a selected content record.

On the other hand, there may be additional metadata fields beyond the scope and definition of “taxonomy” that are nevertheless made available to the end-user to filter/refine results alongside the other, taxonomy facets. These could be for author/creator, date, title keyword, text keyword, file format, etc. Sometimes the distinction between taxonomy facet and other metadata in this case is not so clear, such as for Document/Content Type, Audience, or Language, when these fields utilize controlled vocabularies. Due to this overlap and blurred distinction between taxonomy facets and displayed metadata for filtering, it is a good idea to design the taxonomy and metadata together as an integrated strategy.