As taxonomies and other knowledge organization systems expand, the need for a shared space to collect and describe these terminologies is becoming more pressing. A solution is the TaxoBank Terminology Registry, sponsored by Access Innovations, Inc. TaxoBank borrows from a terminology registry metadata model developed in the United Kingdom, offering a set of metadata to describe a vocabulary, its major and minor topics, languages, conditions for use, provider and more. TaxoBank’s coverage is broad, ranging from large and formal vocabularies from governmental entities to small and quirky thesauri submissions from library and information science students. The registry is a collaborative effort, welcoming all to contribute and describe a vocabulary, as well as to review, borrow and learn from other taxonomy submissions.

terminology
terminology registries
controlled vocabularies
metadata
collaboration

Bulletin, April/May 2011


Developing an Eclectic Terminology Registry

by Marjorie Hlava

This era of online sharing of resources is also an era that has seen a proliferation of terminology resources, which have become indispensable in knowledge management. Inevitably, there is a need for shared means of sorting out the characteristics of existing terminologies. The information world has started to see the establishment of terminology registries to answer this need. These annotated directories of existing terminology resources are still few and far between, however. 

This article describes the genesis and establishment of one terminology registry. One aspect that sets it apart is its open, collaborative nature. I urge you, our readers, to consider exploring it and then contributing to it in some way. The TaxoBank Terminology Registry (www.taxobank.org/) contains information about controlled vocabularies of all types and complexities. We invite you to both browse and contribute. Enjoy term lists for special purpose use, get ideas for building your own vocabulary and perhaps find one that can give you a quicker start.

Why Did We Decide to Implement a Terminology Registry?
I’m the president of a company that offers taxonomy-related software and services. Sometimes, our customers ask where they can find instructive examples of taxonomies or thesauri. They also ask about taxonomies and thesauri that might be available to serve as a basis for their own organization’s controlled vocabulary, of whatever kind, to help them avoid re-inventing the wheel.

This situation prompted me to think about having our company sponsor and maintain a terminology registry. As it turns out, my motivations were perfectly in line with most of the purposes identified by Gail Hodge and her colleagues for a terminology resource registry [1]:

  • Make traditional resources more visible.
  • Provide key characteristics of resources.
  • Encourage human assessment of these resources by applicability to semantic projects.
  • Promote information exchange and knowledge sharing.

Initially, we called our registry Taxonomy ShareSpace. We eventually settled on TaxoBank since other terminology banks exist. The “bank” aspect was played up in the motto on the registry website: “Access, deposit, save, share and discuss taxonomy resources.”

How to Describe the Terminologies?
A substantial portion of the budding literature on terminology registries focuses on what metadata to include and on what metadata is essential for each terminology resource. We based the TaxoBank metadata model on the comprehensive recommendations of the United Kingdom’s Joint Information Systems Committee’s (JISC) Terminology Registry Scoping Study (TRSS), recommendations developed for planning its own registry [2]. That study, in turn, was based partly on the work and recommendations of such experts as Gail Hodge and Marcia Zeng [3,4], as well as on observations of metadata present in existing terminology registries. Our initial metadata model was based on the core metadata fields that the TRSS recommended; the expanded model included some (but not all) of the optional metadata fields mentioned in the TRSS. (See also the paper on KOS metadata by Zeng and Hodge in this issue [5].)

Decisions on the metadata items to be included involved quite a lot of discussion among my staff. We ended up using dozens, but not all, of the TRSS recommended or suggested items in our terminology record template. (I won’t list all of them here; you can go to the website to see them.) However, only two metadata items are required for entry of a record: the name of a terminology and a descriptive sentence or paragraph. 

One debatable metadata item was “Minor subjects covered.” Sometimes the vocabulary coverage includes well-specified corollary areas. For those vocabularies, knowing what these areas are can be valuable. For most terminologies, though, this metadata item can be downright silly ¬¬¬– "Here are the subjects that aren't so important." Because of the item’s value for some terminologies, we nonetheless ended up including it.

And then there’s the consideration of how to implement metadata entry. We were, for example, going to have a pick list for languages, based on one of the standardized lists. Then we found that some of the richest multilingual terminologies have languages that the lists don’t cover. Moreover, at least one of the terminologies that TaxoBank covers has terms in over 100 languages. That one would demand a lot of picking away at a pick list!

While we are hoping for thesaurus sponsors and editors to provide metadata in the future, we got the TaxoBank off to a running start by researching the websites of the terminologies that we wanted to cover. This research was time-consuming, and the results varied depending on what information was available on those websites. Of the many terminologies covered in the registry, we have been able to provide entries for all the metadata fields for only one: the National Agricultural Library Thesaurus, or NAL Thesaurus. (Thank you, NAL Thesaurus coordinator Lori Finch, for documenting the thesaurus so well!)

Serious Ones and Fun Ones: Which Terminologies to Cover?
We deal with a variety of taxonomy and thesaurus needs, for a variety of knowledge domains, so we wanted to cover a goodly variety of what exists in the world of terminologies. In addition to a variety of subject matter, we wanted to include a variety of languages and a variety of KOS structures (thesauri, ontologies, glossaries and so forth).

We definitely wanted to cover the major terminologies, the ones of national or international significance. Sure, other registries have already included many of these, but usually without the metadata that we wanted to include. Some of the major terminologies are proprietary but can be licensed, while others are freely available; this kind of information is included in the records for the individual terminologies. The larger terminologies were created and are maintained by such organizations as the United Nations, major universities, the Getty Foundation, the Library of Congress and federal agencies of various countries.

We also wanted the registry to be interesting and, yes, fun. So in addition to the must-have thesauri sponsored by prominent national and international organizations, we also cover the unusual, creative ones that show what it is possible to do with even a small terminology. One of the more unusual ones (although totally serious within the cheese industry) is the Affineur’s Concept Map, which explores types of cheese by rind type. 

The people who are perhaps in the best position to create the most interesting thesauri are library and information science students. While their creations may not be completely typical of KOS, they do exemplify what can be done with skill, thought and a bit of creativity. As I write this, TaxoBank’s Vocabulary Spotlight (a regular feature of the website) is on the 85-term Thesaurus of Bellydancing, a student group project authored by “the League of Bellydancing Librarians, Indexers and Archivists (LoBeLIA).” Student thesauri are usually hosted on university websites and, as a result, are in danger of being purged. We therefore invite the authors of such thesauri to have them hosted on TaxoBank. So to a limited extent, TaxoBank serves as a KOS repository, as do some other terminology registries.

Some of the more unusual terminologies also show some of the more unusual possibilities of KOS implementation. These implementations can give us a new perspective on the world. For example, the Antarctic Thesaurus is an audiovisual thesaurus. The purpose field explains: “Antarctica is a continent of extremes, and notoriously indescribable. … Art and science offer different ways for us to connect with Antarctica and to what it is telling us about significant changes that are happening in our environment. … This thesaurus offers images, sounds, animations and words that reveal some different ways people are thinking and feeling about Antarctica today.” 

In the case of the Antarctic Thesaurus, we received an updated description from the author, who wanted to emphasize in the description field that the thesaurus uses a new language of lines and gestures. We have also received recommendations from the “outside world” concerning what terminologies to include.

You’re All Welcome: An Open Terminology Registry
Part of the value of having a collaborative platform is that the authors or owners of a vocabulary can contribute and edit the metadata. They can also contribute a metadata set for a terminology not already in TaxoBank. We welcome TaxoBank website visitors to edit and enter content directly onto the appropriate web pages. Visitors can be participants and authors. They can start by requesting a user name and password, and users can create the following content:

  • blog entries
  • event announcements
  • stories/articles
  • terminology metadata (using the template discussed above)
  • eventually, downloaded terminologies.

We do monitor changes to guard against the unlikely possibility of vandalism.

Where Are We Going With This Registry?
We’re still not sure which of these features we’ll end up implementing. All are possible.

  • Storage of full contributed terminologies
  • Use of SKOS wherever practicable
  • Search capability across terminologies
  • Automated indexing of descriptions
  • A navigation tree based on the NICEM Thesaurus as a “spine” for the collection.

We do know that as KOS continue to be developed, TaxoBank will expand its coverage in response to this growth. And we hope that the community of contributors and collaborators will also grow. You are invited.

Resources Mentioned in the Article
[1] Hodge, G., Salokhe, G., Zolly, L., & Anderson, N. (2007). Terminology Resource Registry: Descriptions for humans and computers [PowerPoint slides]. Presented at Integrating Standards in Practice, 10th Open Forum on Metadata Registries, New York City, NY, USA, July 9-11, 2007

[2] Golub, K., & Tudhope, D. (2008, rev. 2009). Terminology Registry Scoping Study (TRSS): Final report. Bath, Eng.: UKOLN, 2008. Retrieved March 5, 2011, from www.jisc.ac.uk/media/documents/programmes/sharedservices/trss-report-final.pdf.

[3] Zeng, M. L. (2008). Metadata and terminology registries: Synergies and differences [PowerPoint slides]. Presented at the NKOS Workshop, DC-2008: International Conference on Dublin Core and Metadata Applications, Berlin, Sept. 22-26, 2008. Retrieved March 5, 2011, from www.slideshare.net/mzeng/metadata-and-terminology-registries.

[4] Zeng, M.L. (2008). Metadata for terminology/KOS resources [PowerPoint slides]. Presented at New Dimensions in Knowledge Organization Systems, a Joint NKOS/CENDI Workshop, September 11, 2008, The World Bank, Washington, DC. Retrieved March 5, 2011, from www.slideshare.net/mzeng/metadata-for-terminology-kos-resources 

[5] Zeng, M. L., & Hodge, G. (In press). Developing a Dublin Core Application Profile for the knowledge organization systems (KOS) resources. Bulletin of the American Society for Information Science & Technology, 37(4), 30-34.


Marjorie Hlava is president and chairman of Access Innovations/Data Harmony. She can be reached by e-mail at mhlava<at>accessinn.com.