of The American Society for Information Science

Vol. 26, No. 6

August/Septmeber 2000

Go to
 Bulletin Index

bookstore2Go to the ASIS Bookstore

  Copies

Meeting Review

Notes from the Boston 2000 Search Engine Meeting

by Candy Schwartz

Just under 300 people attended the fifth North American Search Engines conference, sponsored by Infonortics, and held in Boston, April 10-11, immediately after the ASIS Information Architecture Summit. With Ev Brenner as a genial master of ceremonies, and Infonortics President Harry Collier making sure everything ran as scheduled, some 25 speakers representing the search engine industry and the research community reviewed the current state of search engines and looked around the corner to the future. Most of the PowerPoint presentations are available at the conference site

www.infonortics.com/searchengines/boston2000pro.html

and I will restrict my comments to a general overview. 

Providers of search services face several interesting design conflicts. The foremost from the commercial point of view is the desire to please users by giving them satisfactory answers (resulting in their clicking away from the search site) weighed against the need to keep users at the site so that their eyeball count (i.e., viewing of advertising) can be maximized. Another design problem stems from wanting users to have confidence that the search service includes "everything on the Web" against the difficulty of performing good ranking in an environment of huge databases and two-word queries. While attempts to solve these problems are many and various, the following themes emerged during the conference:

    • Size matters.
    • Magnetism (the new stickiness) is what everyone is aiming for.
    • Context (interpreted many ways) counts
    • The human individual, on both the producer and user side, is going to be the focus of attention in attempts to improve satisfaction - if not necessarily retrieval.

These themes provide some common threads in the discussion below.

Coverage

"We have the biggest search engine" is a nice marketable, concrete and understandable statement, certainly more so than "we give you a better search experience." No matter what the presentation topic, most search engine representatives managed to slip in some indication of database size. Various figures were used to suggest Web and Web user growth - for instance, Knut Risvik (FAST Search & Transfer, Norway) reported that the Web is doubling in size every eight months, and by 2005, 95% of over one billion wireless devices will be Internet-enabled. Even though less than 25% of Internet content is accessible to search engine discovery tools, trawling through that portion in a timely matter presents a large hurdle and will require adaptive crawling algorithms and scalable search engine architecture.

In his first-session overview, Danny Sullivan (Search Engine Watch) pointed out that self-reporting is always suspect and that size may not be the same as what is actually being searched, as some may report what has been crawled but not what has been indexed or retained. He also observed that although professional searchers familiar with search tools may actively seek out larger databases, bigger is not necessarily better for most casual users suffering from overload (the phrase used was "dumping a haystack on their heads"). Nonetheless, search logs indicate that at least 25% of queries are unique, and coverage is therefore important. Both Inktomi and Northern Light representatives alluded to plans for capturing, de-duplicating, analyzing and organizing the entire Web (however that might be defined).

Search engine logs suggest that queries are getting richer, or at least more "natural," in expression (someone called this the Ask Jeeves effect), but most are still one or two words long and often typographically challenged. Tweaking ranking algorithms can only go so far in meeting the challenge of producing relevant results in answer to poorly formed queries, and so a host of alternative or collateral approaches have emerged.

Popularity

Direct Hit has proven that incorporating popularity, that is, the behavior of past users - including not only that a page was viewed, but also how much time was spent viewing it - can be a profitable strategy, and many other services are either using Direct Hit or developing similar methods. Link popularity, very successfully implemented by Google, is based on the degree to which a page is linked to by others, and here again success is breeding widespread adoption.

Concept-Based Searching

In the past, the notion of using "concepts" has been directed to expanding queries. Excite, for example, incorporated a form of query expansion almost from the beginning but largely behind the scenes, and any number of services present users with suggested associated terms following a search. What seems to be new now are more proactive attempts to use associations (or intellectually constructed term taxonomies) to try to help users disambiguate query terms. While some are still in development, Simpli.com and Oingo are two recent working examples.

Human Power

The phrase human powered was used frequently to refer to intellectual creation or augmentation of search tools. One obvious manifestation is the directory approach, popularized by Yahoo and presented to some degree or other by almost every search engine. Of course, not all classification methods are entirely human-powered - Northern Light has been using a combination of intellectual and algorithmic means for classification for some years, and taxonomy building tools (see, for example, Semio Corporation) are becoming popular and widespread. Another example of human power is the use of editors or "guides," as found with About.com and Ask Jeeves.

Vortals

One way of reducing the sheer size of what is being searched while not omitting relevant content, is to develop a "vortal" - a service restricted to a particular vertical (the "v" in vortal) market - preferably restricted to one that will be attractive to advertisers (for example, specific business sectors, law, health care or regional information). Specialty crawlers perform the same function as their generic search engine siblings, but they are loaded with terminology and search strategies specific to the target market, and human power adds quality control to the final result.

Bonding

Most interesting to me in various presentations was the blurring of the border between the search service and the user's information space. The ability to run desktop search agents that can combine search on a user's hard drive with meta-searches on the Web is not new, but the past year has seen dramatic changes in the functionality of these agents in areas such as results organization, repackaging and dissemination. Intelliseek's Bullseye is a popular example.

In addition, we have a flurry of new tools that can search the Web (or other networked public and proprietary content) for text selected from virtually any form of desktop document - several speakers mentioned Flyswat, Gurunet, Kenjin and WebCheck as examples. Eric Brewer (Inktomi Inc.) suggested that searches could be ordered by "zones," for example, the desktop first, then the intranet, then a specialty engine and finally the Web. Ordering could be individually selected or possibly calculated automatically based on query content. Whether used for zone ordering, filtering or ranking, Brewer sees context ("customizing the haystack") as an essential element of search, and one that will differentiate new services.

Related to the bonding of search to the desktop is the bonding between the user and the service. Aside from good results, what else can encourage a user to return to the same service? Personal portal pages (e.g., MyExcite) have been somewhat successful, but Steve Arnold (AIT) reviewed newer approaches in "identity bonding" - including interactive chat, user-based voting and recommendations, analysis of click-through data to learn more about the specific search and searcher, and generally far more personalization. Searching is situational, and the more a service can understand about the situation, the better the user experience is likely to be.

Advances in IR

Advances in information retrieval (IR) and natural language processing (NLP) were not ignored, despite the emphasis on "off the page" elements of improving search. According to Bill Bliss (Microsoft), MSN is investigating the use of linguistic enhancements for disambiguation, query expansion and clustering. Alison Huettner (Claritech) discussed question-answering applications supported by named entity extraction and semantic type detection, using a combination of lists, thesauri and NLP techniques. Claritech's work on filtering, described by David Evans, uses a combination of NLP, field restriction, example documents and threshold calculations to compile user profiles. Chris Buckley (SABIR) and David Hawking (CSIRO Mathematical and Information Sciences, Australia) reviewed the recent TREC experiments, which generally indicate that vocabulary mismatch remains a primary problem, suggesting continued work on query expansion, document expansion, collection or query-based thesauri, passage retrieval and relevance feedback. New performance measures are being developed, and long-standing ones modified. TREC has added a Web track with a collection of documents and queries that are more representative of the typical search engine environment.

Evaluation and User Studies

There was a refreshing attention to the more academic side of evaluation and user study. Carol Hert (Syracuse University) suggested that search engine designers have not capitalized on the user studies research found in the academic world and may not be making the most of search logs and click-through analysis. She urged them to consider different types of data collection (e.g., interviews, focus group, critical incident techniques and content analysis of feedback) to acquire better understanding of the motivations and actions of various types of users and non-users. Admittedly this type of research is costly and oftentimes obtrusive, and knowledge acquisition is slow, but the payback is a richer and deeper understanding of search in context. Paul Kantor (Rutgers University) was ostensibly part of a panel on filtering, but his presentation covered the importance of modeling users and user behaviors. His research focuses on developing criteria for predicting the point during an ongoing search at which a user quits because the updated estimate of the probability of failure exceeds the chance of success. This could reveal worthwhile early warning indicators for when a user is approaching the threshold of "infuriation," a state which could be avoided by presenting alternative interaction paths based on if-then rules and modeled understanding of the particular user.

Work responsibilities called me away from several presentations, but I was glad to have been able to attend most of the sessions. Although the sense of novelty and excitement that this conference generated several years ago has worn off, Infonortics provides a good forum for catching up with new developments, seeing the principal players and realizing that there are still opportunities for improvement and innovation.

Candy Schwartz, immediate past president of ASIS, is professor of Library & Information Science at Simmons College, 300 The Fenway, Boston, MA 02115-5898. She can be reached by e-mail at candy.schwartz@simmons.edu or on the web at web.simmons.edu/~schwartz/; by phone at 617/521-2849; or by fax at 617/521-3192.


ASIS Home Search ASISSend us a Comment

How to Order

@ 2000, American Society for Information Science