Please tell us what you think of the new Bulletin interactive pdf!  Feedback

Bulletin, June/July 2007


FaceTag: Integrating Bottom-up and Top-down Classification in a Social Tagging System

by Emanuele Quintarelli, Andrea Resmini and Luca Rosati

Emanuele Quintarelli (info<at>infospaces.it) is an IT consultant, customer experience expert and information architect at Reed Business Information. Andrea Resmini (root<at>resmini.net) holds a masters degree in architecture and is currently a Ph.D. candidate in legal informatics at CIRSFID, Università di Bologna. Luca Rosati (luca<at>lucarosati.it) is a freelance information architect and assistant professor in informatics for humanistic science at Universite per Stranieri di Perugia in Italy.

Disclaimer
Though the paper is the result of a collaborative effort, Rosati wrote par. "Faceted Analysis: FaceTag¹s Faceted Scheme Construction" and "User Evaluation", Resmini "Using FaceTag", Quintarelli the other.

Collaborative tagging systems are powerful tools for organizing, browsing and publicly sharing personal collections of resources on the World Wide Web. They have enjoyed widespread adoption by end-users.

Collaborative tagging produces aggregations of user metadata, often referred to as folksonomies. These user-generated classifications emerge through bottom-up consensus by users assigning free form keywords to online resources for personal or social benefit. Del.icio.us, Flickr, 43things, Furl and Technorati are examples of web-based collaborative systems for building shared databases of items. The users of these systems create a flat metadata vocabulary that can be used to perform metadata driven queries, to monitor change in areas of interest or to discover emergent trends, such as the hottest/most popular topics in the system. In the past, folksonomies have often been seen as orthogonal to taxonomies and controlled vocabularies: the latter being rigid, hierarchical and organically hand-crafted by professionals a priori; the former being flat, inclusive and emerging from bottom-up users' input and consensus [1].

Despite their low cognitive cost, their capability of matching users’ real needs and language, and their great value in serendipitous browsing, folksonomies are haunted by a number of important issues related to intrinsic language variability and imprecision and the lack of good tools to enable users to navigate through the mass of tags. As a result of the inherently inconsistent, evolving and quite variable process of associating words and meanings, tagging systems are implicitly plagued by polysemy, homonymy, plurals, synonymy and basic level variation – linguistic issues which do not appear easy to solve [2]. Any of these problems can dramatically reduce the effectiveness and benefits brought on by the use of tagging systems.

From an information visualization perspective, tag clouds are widely used as visual interfaces for information retrieval in tagging systems. They provide a global contextual view of tags assigned to resources in the system. In such a structure, the most popular tags are usually displayed through an alphabetically ordered list in which the font size increases with the tag's relevance. Users browse the cloud, scanning hyperlinks to recognize information of interest [3].

The problem with this approach is that flat tag clouds are not sufficient to provide a semantic, rich and multidimensional browsing experience over large tagging spaces. There are several reasons for this:

  • Tag clouds don’t help much to address the language variability issue, so the findability quotient and scalability of the system are very low.
  • Choosing tags by frequency of use inevitably causes a high semantic density with very few well-known and stable topics dominating the scene.
  • Providing only an alphabetical criterion to sort tags heavily limits the ability to quickly navigate, scan and extract and hence build a coherent mental model out of tags.
  • A flat tag cloud cannot visually support semantic relationships between tags. We suggest that these relationships are needed to improve the user experience and general usefulness of the system.

The FaceTag collaborative tagging tool addresses these issues and contributes to social tagging systems in three ways:

  • Optional tag hierarchies are possible. Users have the opportunity to organize their resources by means of parent-child relationships.
  • Tag hierarchies are semantically assigned to editorially established facets that can be leveraged later to flexibly navigate the resource domain.
  • Tagging and searching can be mixed to maximize findability, browsability and user-discovery.

Even if FaceTag doesn’t promise to solve all problems, we believe our approach can limit the impact of some of them, while introducing an innovative, multidimensional and more semantic paradigm for organizing, navigating and searching large information spaces through tags.

Overview of Semantic Structures in Tagging Systems
Usability studies show that information seekers in domains with a large number of objects prefer that related items be in meaningful groups to enable them to quickly understand relationships and thus decide how to proceed [4]. In other words, it seems quite clear that without any means to explore and make sense of large quantities of similar items, users feel lost and fail to complete their information-seeking tasks.

Providing ways to generate and navigate such groups from a flat set of objects is a challenge. Both clustering and faceted classification have been proposed in the past as useful techniques to allow searchers to easily browse and navigate information spaces.

Clusters. Document clustering refers to the act of grouping items according to some measure of similarity, typically searching for identifiable repetitive patterns of words and phrases. Some advanced tagging systems like Rawsugar and Flickr are already using clusters to address the issues that plagued the first generation of folksonomy-based applications. Clusters help reduce the semantic density and improve the visual consistency of tag clouds. Moreover, clustering is automatable and can be used to refine vague queries and disambiguate ambiguous search keywords.

Nonetheless, clustering techniques and algorithms are not perfect and often generate messy groups which are generally hard to predict. These groups also tend to conflate many different dimensions, which makes them hard to label in ways that are meaningful for users. Moreover, clustering does not generally allow issuing refinement and follow-up queries, thus heavily limiting the explorative capabilities of the system.

For these reasons, usability results show that users prefer clear hierarchies with categories at uniform levels of granularity over the messy, unpredictable and unlabeled groupings typical of clustering techniques [4].

Hierarchical Facets. At the other end of the classification continuum, traditional hierarchical categories are coherent and complete systems of meaningful labels which systematically organize a domain. The main drawback of this approach is that a single a priori and monolithic hierarchical organization rarely has the capability to match the varied ways of thinking and organizing the world of different users.

Hierarchical faceted metadata has shown to be a promising middle ground, able to satisfy the needs of a wide range of users with different mental models and vocabularies [5]. Facets are orthogonal categories of terms (here tags) within a metadata system. Each facet has a name, and it addresses a different conceptual dimension or feature type relevant to the collection such as activities, components, geographical locations, forms or languages. Facets can be flat or hierarchical. A faceted search interface requires that each object in the collection be classified using one or more tags/terms (or foci, as they are technically called in faceting) from one or more different facets.

In a hierarchical faceted navigation tool, choosing a term that has sub-terms from one of the facets is equivalent to performing a disjunction (Boolean OR) over all the terms subordinate to the selected one. For example, choosing “navigation design” from the Themes facet would provide a search over all the navigation types listed in the facet, such as “breadcrumbs,” unless the user chose to narrow the search. When the user chooses terms (tags) from different facets such as Themes and Forms, however, systems typically automatically conjoin them (Boolean AND), for example “breadcrumbs” AND “case study.” The complete search thus includes a disjunction of all the terms selected from the same facet conjoined with all the tags selected from other facets. In this kind of interface, users can navigate multiple faceted hierarchies at the same time. Usability studies show how this approach is preferred over single hierarchies because users feel in control without getting lost [5] [6]. 

For these reasons, faceted metadata can be used to support navigation along several dimensions simultaneously, allowing seamless integration between browsing and free text searching and an easy alternation between refining (zooming in) and broadening (zooming out) [6]. The major benefits resulting from this approach include a strong reduction of the mental work, which favors recognition over recall; and better support for exploration, discovery and iterative query refinement [4].

Again, usability studies attest that hierarchical faceted interfaces are preferred over simpler keyword based search interfaces, and they document that such interfaces can be easily understood by the average user [5] if iteratively designed and tested to address usability issues [7].

Faceted Analysis: FaceTag's Faceted Scheme Construction
FaceTag is a working prototype of a semantic collaborative tagging tool conceived for bookmarking information architecture resources. It aims to show how the flat keywords space of user-generated tags can be effectively mixed with a richer faceted classification scheme to improve the information architecture of a social tagging system.

The choice of facets in the FaceTag system is based on the CRG (Classification Research Group) theory [8]. Although we have remained faithful to the CRG standard categories [9], we have reviewed the resulting facets to fit into a more semantic perspective. Indeed, an aspect often underestimated on the World Wide Web is that both Ranganathan and the CRG described a generic schema for faceted classification which every actual schema can refer to. 

Thus, in a faceted classification project one does not have to rebuild the schema from scratch every time, but may follow a constant guideline while building one's main categories (facets). CRG postulates 11-13 general categories. In Table 1 we show the matching between CRG standard categories and IA-related categories that were used to define our facets. Table 2 shows actual tags or other terms used in those facets. Terms are yet to be formalized, and facets will be verified in future works. It is our belief that Ranganathan’s list and the CRG postulates should be considered as guidelines and not as a restrictive framework. User needs are paramount and should be used to determine the most valuable facets for every system.


Table 1. FaceTag facets definition by CRG standard categories. (Facets in brackets have been considered of secondary importance and discarded.)



Table 2. Actual FaceTag facets and examples of tags. Hierarchical groups of tags are set off with commas. Tags following a greater than (>) symbol are on the next lower level of the hierarchy.

In the actual implementation, since tags are our foci, facets will be user-generated with the exception of the language facet, which will use a predefined list of languages in the ISO 639-2 notation.

Using FaceTag
FaceTag deals with tags, facets, resources and users in two distinct ways: a browsing/searching mode and a bookmarking/editing mode. The user interface adapts to these two different activities, providing different support tools (navigation, resource and user management) and different behaviors (zooming, tag autocompletion and tag suggestion), respectively. The browsing interface is used to navigate the resources. The bookmarking interface can only be accessed by authenticated users and is used to add new resources to the system. It also provides tools to administer the user’s profile.

Figure 1 illustrates the default browsing mode of FaceTag. This screen is the first a user who is not logged in will access. At the top of the current interface is the header area, which contains the main site-wide navigation tabs – the facets Resource type, Themes, People and Purposes – and the resources area. In default mode, the resource area lists the most recently added bookmarks and can be paginated if necessary.


Figure 1. FaceTag homepage

The left-most container in the facet area presents filters for language and publication date and for the search box. Language and publication date are actual facets, but are primarily used as simple filtering tools because of their special, flat nature, since they cannot currently be part of a tag hierarchy.

The remaining facet containers present, left to right, the most used tags (foci) for Resource type, Themes, People and Purposes. Each facet lives in its color-coded space and query previews – the number of resources associated with each tag automatically gathered from information stored in the database – are provided. First-level tags in a hierarchy are followed by a + (plus) sign, which can be clicked to expand the set and access the complete tree of children tags.

Now suppose a user wants to look for a specific subject. She is on the homepage, and she starts typing Inf into the search entry field. This field uses an autocompletion widget, so as soon as she enters the third character FaceTag starts to suggest possible choices, reading from existing tag data stored in the database: this time FaceTag suggests Informatics, Information architecture, Information design.

She decides that Information architecture is what she wants to look for, so she selects that, confirms and clicks Search.

FaceTag engages the tag, makes it active, and both the facet containers and the resource area adjust (zooming in) (Figure 2). What the user sees is now a filtered view, a subset of all available resources based on this active selection. In our example, the returned result set is small, as only a short list of bookmarks has been tagged Information architecture.


Figure 2. Bookmarks tagged with Information architecture

In a real system, the result sets might be orders of magnitude larger, so she might decide she needs to refine her search. She looks at the facet containers and clicks on article from the Resource type facet: tags from different facets can be seamlessly used alongside search tags in one single engagement. FaceTag again engages the tag, and both the facet containers and the resource area adjust (Figure 3). The Resource type facet container is no longer useful to refine this search, and it provides no further possible selections, while the other facets list a smaller number of tags and smaller query preview figures. The resource area is now showing fewer bookmarks.


Figure 3. Refining the results: looking for articles on information architecture

Again, she is not satisfied and clicks folksonomies from the Themes facet (Figure 4). Once more, FaceTag engages the tag. The facet containers and the resource area adjust themselves. This step is the final one for this search, since the result set consists of a single resource that complies with our engaged tags and search tags. As Figure 4 demonstrates, the bookmark "Folksonomies: power to the people" has been tagged with information architecture, article, folksonomies.


Figure 4. The final result set

Note that the facet Purposes in Figure 4 provides no further possible selections, just like Resource Type before. The facets Themes and People do not list links, but simple grayed-out tags: these are the tags pertaining to the zoomed-in resource.

It is worth mentioning again that searching and browsing in FaceTag are one seamless process, and users can use either way to proceed, mixing them as they deem appropriate. The engage system keeps track of every step taken during a search, and engaged tags are listed just below their pertaining facet in the facet area. A snapshot feature is available to freeze or bookmark a query zoom state and its result set. 

Users can disengage any tag they choose at any time, in no particular order. This feature set bootstraps berrypicking search strategies and allows users to follow the information scent. A user may nonetheless instead decide to deselect all tags at once and start anew. This function is executed by clicking a "Disengage all tags" quick link.

A user who is allowed to add new resources can log in. (Figure 5). The "New bookmark" page requires the user to enter a title for the bookmark, its URL (which will be added automatically if FaceTag is accessed from within a bookmarklet), a rich description using a WYSIWYG embedded editor and any number of tags or hierarchical tags, one facet a time. Hierarchies can be easily built on the fly using a simple tag > tag > tag notation. As seen in Figure 5, an autocompletion widget helps the user by suggesting tags and tag placement within existing hierarchies.


Figure 5. Adding a resource using FaceTag's editor

User Research
In our current stage of development, we have outlined two groups of preliminary testing: facets evaluation and user interface evaluation.

Facet Evaluation. The preliminary facets (as in Table 2) will be revised through an iterative bottom-up procedure to elicit the possible and more popular facets from a wide set of IA bookmarks already online. Toward this goal we are collecting samples of IA-related bookmarks from the IAI Library and from Del.icio.us in order to perform iterative card sorting tests with different user groups. The purpose of such tests is to figure out the several mental models by which users represent the IA knowledge domain. These results, combined with an analysis of IA-related tags in selected social tagging spaces (such as Del.icio.us, Technorati, Magnolia or Flickr), will provide the basis to tune the facet architecture.

User Interface Evaluation. The user interface has been designed through documented heuristics and patterns and verified at each iterative step by small usability tests. More extensive user research will involve the use of think-aloud protocol sessions with more than five testers for each session. Scenarios will include storing bookmarks and retrieving them. Looking at preliminary results, a critical task addressed by the application is the assignment of new bookmarks and the association of tags to relevant facets. The current interface is rather simple but a number of alternatives, leveraging advanced tag suggestion and tag/facet association, are under evaluation.

Resources
Resources: Web-based Collaborative Systems That Incorporate Folksonomies

Del.icio.us http://del.icio.us/
Flickr www.flickr.com/
43things www.43things.com/
Furl www.furl.net/
Technorati www.technorati.com/
Rawsugar www.rawsugar.com

Resources: FaceTag 

The FaceTag Collaborative Tagging Tool www.facetag.org/

Resources Mentioned in the Article

[1] Quintarelli, E., (2005). Folksonomies: Power to the people. Proceedings of 1' ISKO Italy-UniMIB Meeting (Milano, June 24 2005.) www.iskoi.org/doc/folksonomies.htm
[2] Golder, A.S., & Huberman, B.A. (2005). The structure of collaborative tagging system. [E-print]. Available April 21. 2007, from arXiv at http://arxiv.org/pdf/cs.DL/0508082
[3] Hassan-Montero, Y., & Herrero-Solana, V. (2006). Improving tag-clouds as visual information retrieval interfaces. International Conference on Multidisciplinary Information Sciences and Technologies, InSciT2006. Available April 21, 2007 at www.nosolousabilidad.com/hassan/improving_tagclouds.pdf
[4] Hearst, M.A. (2006, April). Clustering versus faceted categories for information exploration. Communication of the ACM, 49(4), 59-61. Available April 21, 2007, at http://flamenco.Berkeley.edu/papers/cacm06.pdf and at http://portal.acm.org.
[5] Yee, K.P., Swearingen, K., Li, K., & Hearst, M., (2003). Faceted metadata for image searching and browsing, In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, CHI 2003, (pp. 401-408). New York: ACM Press. Available April 21, 2007, at http://flamenco.berkeley.edu/papers/flamenco-chi03.pdf or at http://portal.acm.org.
[6] English, J, Hearst, M., Sinha, R., Swearingen, K., and Yee, P. (2002b). Flexible search and browsing using faceted metadata. Unpublished manuscript. Available April 21, 2007, at http://flamenco.berkeley.edu/papers/flamenco02.pdf
[7] English, J., Hearst, M., Sinha, R., Swearingen K., and Yee, P., (2002a). Hierarchical faceted metadata in site search interfaces, In Conference on Human Factors in Computing Systems, CHI '02: Extended abstracts on human factors in computing systems (pp. 628-539). Available April 21, 2007 at http://flamenco.berkeley.edu/papers/chi02_short_paper.pdf or at http://portal.acm.org.
[8] Vickery, B. C. (1960). Faceted classification: A guide to construction and use of special schemes. London: Aslib.
[9] Broughton, V. (2001). Klasifikacija za 21. stoljece: nacela i struktura Blissove bibliografske klasifikacije (A classification for the 21st century: principles and structure of the Bliss bibliographic classification). Vjesnik bibliotekara Hrvatske, 44(1-4) 38-51. Italian translation (Una classificazione per il 21’ secolo: principî e struttura della Classificazione bibliografica Bliss. Aavailable April 21, 2007, at www.aib.it/aib/contr/broughton1.htm