Please tell us what you think of this issue!  Feedback

Bulletin, August/September 2008


Exploratory Search in Different Information Architectures


by Tingting Jiang and Sherry Koshman

Tingting Jiang is a Ph.D. student at the School of Information Sciences, University of Pittsburgh. Her research interests include exploratory search, information architecture and social tagging. She can be reached at tij4<at>pitt.edu. 

Sherry Koshman is an assistant professor at the School of Information Sciences, University of Pittsburgh. Her research interests include investigating user interaction with information visualization tools and online information structures. She teaches graduate courses in information architecture, information visualization and information technology. She can be contacted at skoshman<at>sis.pitt.edu.

The ASIS&T 2008 IA Summit, Experiencing Information, emphasized users who want to know, do or share something. A user experiences information by creating, organizing, browsing and searching for information. These actions contribute to the notion of exploratory search that can be described as an information process in which the importance of a search system’s query-document matching power is diminished in favor of the user assuming a more assertive role in making decisions about the search results and the next steps toward fulfilling their information needs [1]. A straightforward and common way to distinguish an exploratory search system is to examine the presentation of search results. Typically some browsing facilities to supplement or replace the popular list-based result pages are introduced and they feature grouping as a primary mechanism for search result display.

In a less dense space with information organized into perceptible groups, users are able to more easily detect and concentrate on the most relevant information for each search session in order to determine the direction of the next search step. In studying this concept further, we investigate the information architectures of current exploratory search systems and identify four primary organizing strategies to assist users in searching and browsing information. They are hierarchical classification, faceted categorization, clustering and social tagging.

Hierarchical Classification
Hierarchical classification refers to a system of fixed classes organized in a hierarchical enumerative structure, offering a structural representation of a vocabulary that is typically applied to formal and stable resources. This structure is a familiar one found in many formal classification systems. An online example, CitiViz (http://feathers.dlib.vt.edu/CitiViz/index.html), a visual search project for computing literature, is a case in point [2]. Each search result does not appear individually; it appears under the classification(s) that it belongs to and together with other related results. Only the classifications that contain search results will be included for each query. It is effective in reducing the considerable volume of results to a small number of groups that are rational and familiar to us, greatly lessening the burden of browsing. Specifically in CitiViz, users are allowed to manipulate the classifications by deleting unwanted headings.

As we all know, it requires a high investment in time, money and expertise to create non-overlapping classes and arrange them hierarchically, thus not many domains can afford a well-built comprehensive hierarchical classification system. Even if it is affordable, not every field can be properly reduced to a few classes with clear edges. Once established, the system will become the authority and it may not be responsive to changes. As a result, for certain information domains, such as user-contributed content, such classification hardly applies at all since old vocabularies wither and new ones grow at such a fast pace.

Faceted Categorization
Faceted categorization, for the purpose of exploratory search, represents different attributes of an information collection by offering small categorical hierarchies that correspond to concepts contained within the repository [3]. The subject-oriented ACM classification discussed above cannot satisfy users who want to explore the literature by author, year or method. By taking multiple facets into consideration, faceted categorization better manifests the fact that different people understand the world differently.
Faceted categories have become a standard information architecture that is readily implemented across many sites, especially online shopping sites. For instance, in finding a PC laptop on eBay, users are enabled to constrain their search by brand, processor speed, memory, hard drive capacity, screen size and condition. The interface is flexible because information can be retrieved along any number of facets in any order. In addition, faceted hierarchical categories need much less manual work to develop, compared to one large classification system.

Clustering
Clustering has become a popular technique of grouping search results. For each query, the clustering algorithm will generate a unique taxonomy of the search results based on their content. Result items are assigned to clusters in the taxonomy right after they are retrieved and before presented to the searcher [4]. The clustering approach appeals to many information providers in comparison to hierarchical classification and faceted categories. Most importantly, everything is done automatically. Secondly, a dynamic taxonomy with clusters generated in real time eliminates the complexity and cost of building and maintaining fixed.

Depending on the algorithm, the taxonomy may have only one level or multiple levels. And as the algorithms evolve, there could be multiple taxonomies created for each query from different aspects. WebClust in Figure 1a is one of the few search engines adopting the simplest form, that is, one taxonomy and one level. Carrot2 is a little more complex, with three facets (topics/sources/sites) introduced and a one-level taxonomy for each one (Figure 1b). Including the same facets, Clusty in Figure 1c offers hierarchies that can be expanded to an unpredictable number of levels. Unfortunately, the disadvantages of clustering such as the mislabeling, misplacing or overlapping of clusters within the taxonomy are sometimes evident. The robustness of a cluster-based post-retrieval taxonomy is defeated by the previously discussed pre-determined architectures.

Figure 1a (a) WebClust (http://www.webclust.com/)

Figure 1b (b) Carrot2 (http://demo.carrot2.org/demo-stable/main)

Figure 1c (c) Clusty (http://clusty.com/)  

Figure 1. Taxonomies of search results in WebClust, Carrot2 and Clusty with ascending complexity

Social Tagging
Tagging refers to individual users assigning meaning to online objects, including bookmarks, texts, images and videos, in the form of tags or keywords or metadata. Users can be resource providers or consumers or both. The social nature of tagging emphasizes that tags are not only personal labels used for categorizing individual collections, but also serve as public clues for others to reach personal collections. Folksonomies result from this bottom-up social tagging process and are distributed classification systems that can be exactly described as a flat name space without rigid hierarchies or exclusive categories [5]. Today, almost all the social tagging systems are relying on tag clouds to represent a folksonomy. Tag clouds usually display tags in alphabetical order with little attention to term relationships (Figure 2).

Figure 2. A tag cloud from Flickr (www.flickr.com/)
Figure 2
It demonstrates a very loose structure, with only font size implying the use frequency of the alphabetically ordered tags.

The advantages and disadvantages of folksonomies are usually compared with the other taxonomies mentioned above. The self-driven tagging activities produce two benefits: the inexpensiveness in terms of creation cost and the responsiveness to changes [6]. Furthermore, a democratic folksonomy gives everyone the opportunity to express his or her personal viewpoints through tags and welcomes distinct or even contrary ones to co-exist. The other side of the coin, however, is that liberal and distributed tagging by everyone will lead to the “vocabulary problem” as well as the “basic level” problem [7] [8].

Conclusion
Exploratory search systems, compared to current mainstream web search engines, reflect more carefully on the presentation of search results, a critical factor that determines search effectiveness. Realizing the insufficiency of the linear list of ranked results for sophisticated exploratory tasks, they are devoted to satisfying users’ information needs by enabling grouping of results. The four major grouping strategies at present, which constitute this discussion, are hierarchical classification, faceted categorization, clustering and social tagging. They give birth to four different information architectures and each of them bears advantages and disadvantages. In our plan for future research, an in-depth comparison of these architectures is an important step forward that will not only measure the effectiveness of each one in its applicable information domain, but also seek possible solutions to offset their weaknesses.

Resources Mentioned in the Article
[1] White, R. W., Muresan, G., & Marchionini, G. (2006, August). Evaluating exploratory search systems. In R.W. White, G. Muresan, and G. Marchionini (Eds.) Proceedings of the ACM SIGIR 2006 Workshop on Evaluating Exploratory Search Systems (EESS 2006) Retrieved July 1, 2008, from http://resarch.microsoft.com/~ryenw/eess/eess2006_proceedings.pdf. Article also retrieved July 1, 2008, from www.scils.rutgers.edu/~muresan/Publications/wshsigirWhite2006.pdf

[2] Fox, E. A., Neves, F. D., Yu, X., Shen, R., Kim, S., & Fan, W. (2006). Exploring the computing literature with visualization and stepping stones & pathways. Communications of the ACM, 49(4), 53-58.

[3] Hearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), 59-61.

[4] Vivisimo, Inc. (2006). Tagging vs. clustering in enterprise search. Retrieved June 10, 2008, from http://vivisimo.com/html/download-tagging.

[5] Hammond, T., Hannay, T., Lund, B., & Scott, J. (2005). Social bookmarking tools (I). D-Lib Magazine, 11(4).

[6] Chi, E. H., and Mytkowicz, T. (2006). Understanding navigability of social tagging systems. Retrieved June 10, 2008, from www.viktoria.se/altchi/submissions/submission_edchi_0.pdf.

[7] Furnas, G.W., Landauer, T.K., Gomez, L.M., & Dumais, S.T. (1987). The vocabulary problem in human-system communication. Communications of the ACM, 30(11), 964-971.

[8] Tanaka, J., & Taylor, M. (1991). Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology, 23(3), 457-482.