Started in 2005 as the metadata registry for the National Science Digital Library (NSDL), the NSDL Registry has undergone significant changes. It has expanded, shifted from its original support source to JES & Co. and been renamed in 2010 as the Open Metadata Registry (OMR). While serving end users seeking vocabularies available for a project or group, the OMR focuses on supporting the primary providers of vocabularies by facilitating publishing and tracking evolving versions of their vocabularies. Work is ongoing to enable the OMR to support multilingual vocabularies and separate but associated language versions. To promote consistent term labels, the use of uniform resource identifiers (URIs) is promoted as well as concept representation by numeric strings. The OMR is currently being reengineered in Drupal. As it undergoes dramatic changes, the OMR is offered as a valuable infrastructure tool for data sharing.

metadata
metadata registries
URI
collaboration
controlled vocabularies

Bulletin, April/May 2011


The Open Metadata Registry: An Update

by Jon Phipps and Diane I. Hillmann

Metadata registries have been a part of the landscape for several years, as the need for enabling infrastructure for data sharing has become more critical. Most research implementations have approached registries from the point of particular projects and domains and have encountered serious issues attempting to move from research demonstration to production services. The Open Metadata Registry began its development as the NSDL Registry, attempting to address the big question: What should these registries do and how can they operate in an open services environment? 

Registry Scope and Rationale
Initial funding for the NSDL Registry was provided in 2005 by the U.S. National Science Foundation, intended to support the large National Science Digital Library (NSDL), then approaching the end of its first funding and development cycle. The project group decided to start by building services around controlled vocabularies, then the major concern of the NSDL community. As the draft specifications for the Simple Knowledge Organization System (SKOS) had just been introduced, the timing seemed propitious and the NSDL Registry became the first system to effectively adopt SKOS as its core standard.

The NSDL Registry was renamed the Open Metadata Registry (OMR) in July of 2010 in recognition of the broader scope of the registry and the end of its funding from the NSF. In 2010 the OMR became a fully supported project of JES & Co., a non-profit active in the education sector. 

Registry Services
At an initial planning level, two sets of service categories were identified. One set of categories was based on the idea that vocabularies have both users and owners, and each depends on the activities of the other. The second set of categories, further refining the idea of users, recognized that the consumer of the services could be a human or a machine. 

A typical use case for human users begins with the need to search or browse the registry for vocabularies that might suit the needs of a project or community, most often during planning phases for projects or application profiles. For users who had already made a choice of vocabulary, the registry decided to provide services that allowed for the optimal maintenance of chosen vocabularies within instance data. Services to human users and machine users depend on the ability to provide configurable notifications that could be actionable with minimal intervention. Currently the OMR provides a simple RSS feed as the basis for more sophisticated notification to come.

Ultimately, registry success relies much more on the utility of its services to vocabulary providers than of those to vocabulary consumers. If vocabulary providers cannot find a reason to continue to update their vocabularies in the registry, users will need to find other ways outside the registry to maintain their data. Given that reality, it is obviously critical to this category of services to make the registry an integral part of the document/publish strategy for vocabulary owners and managers and not just another task with little or no immediate payback. The current OMR has therefore focused on providing important and relatively unique services to owners to support group-based vocabulary development, versioning and change tracking, and multi-lingual vocabularies and on helping to bridge the technical gaps between the XML and RDF worlds.

Versioning Challenges
Because the early development planning for the registry included a significant amount of study of versioning issues, the registry functionality stands out as a model for other service providers. The Registry strategy for tracking change relies partially on the software model, where recognition of “diffs” or differences between one version and the next (including who made the changes) are the norm. Use of this model allows a complete history of all changes (and who made the change) to be maintained and accessed by administrators, maintainers and users. 

Part of the challenge of a dynamic, public vocabulary maintenance environment involves providing support for systems that are dependent on static, rarely changing vocabularies. In addition to the basic history tracking, the OMR also enables vocabulary consumers to access fixed-point-in-time “timeslices” by use of timestamp-filtered URIs (Uniform Resource Identifiers). Vocabulary owners may deliberately identify and publish a timeslice by tagging it with a named version alias. 

Multilingual Vocabularies
At present, vocabularies with more than one language represented are treated as single vocabularies. This allows each URI to identify a single concept containing labels in multiple languages and decreases the need for synchronization. Other strategies (separate but related vocabularies with defined lexical relationships), as well as additional options for building and retrieving specific language versions, have been specified but not yet enabled. This strategy is currently being used successfully within the RDA Vocabularies (http://metadataregistry.org/rdabrowse.htm).

The Challenge of URIs
The OMR enables vocabulary owners to use previously established URIs to build new URIs using a specific domain or to use the metadataregistry.org domain as a functional permanent URL (PURL) resolution service. Resolution services for non-metadataregistry.org domains may be accomplished by using a provided script and the OMR’s REST (Representational State Transfer) services. Regardless of the domain used, the registry software provides assistance building URIs to minimize the possibility of typos.

As part of the effort to analyze the implications of vocabulary changes on the OMR, it became clear that using language-specific term names or labels as part of a URI (a practice common in some vocabularies in an effort to improve the “human readability” of URIs) could eventually become out of sync with the term labels as vocabularies evolved and that the practice had further implications for multi-lingual vocabularies. For this reason, the controlled vocabulary side of the registry uses numeric strings as a default and encourages vocabularies that have not already committed to using term names to follow suit. Given the rarity of multi-lingual labeling and the more static nature of property names, the default for “Element Set” URIs is the property/element name.

Next Steps
One of the significant disadvantages of being an early adopter of technology, especially Semantic Web technology, is that the technology upon which you base your systems may move forward (or backward) in unpredictable ways. Over the last few years it has become increasingly clear that some of the technology upon which the OMR depends needs to be refreshed. The financial support and stability provided by JES & Co. has enabled a thorough technological review of the OMR and, after more than a year of studying various options and a number of private experiments, the decision was made to completely re-engineer the OMR in Drupal.

All of the current functionality of the OMR will be retained and enhanced by the inclusion of modules provided by the highly active Drupal development community. Vocabulary, Element Set and ontology import (in multiple formats) will be enabled along with indexing provided by SOLR-based services. The existing OMR vocabulary services will be extended to the development and maintenance of OWL (Web Ontology Language) ontologies and Dublin Core Application Profiles. Multi-lingual development will be enhanced and simplified, enabling maintainers to work exclusively in the language of their choice and providing utilities to display incomplete translations. Vocabulary visualization and graphical browsing will finally come to the OMR. The OMR REST-based services will be overhauled and content-negotiation enhanced. Integrated vocabulary mapping and cross-walk services will be provided.

One of the advantages of being an early provider of services is recognizing that there’s no substitute for real experience and real users. Bringing this experience, and our users, to the next iteration of the Open Metadata Registry provides an exciting challenge for the coming year.


Jon Phipps is lead scientist for Internet strategy, JES & Co. He can be reached by email at jonP<at>jesandco.org.

Diane I. Hillmann is director of metadata at Metadata Management Associates. She can be reached by email at metadata.maven<at>gmail.com or dih1<at>cornell.edu.