of the American Society for Information Science and Technology       Vol. 27, No. 3       February/March 2001

Search

Go to
Bulletin Index

bookstore2Go to the ASIST Bookstore

 

Copies

Annual Meeting Coverage

ASIST Award of Merit Acceptance Speech

On the Fragmentation of Knowledge, the Connection Explosion, and Assembling Other People's Ideas

by Don R. Swanson

Editor's Note: Donald R. Swanson, professor emeritus at the University of Chicago, is the 2000 recipient of the ASIST Award of Merit, the society's highest honor. For additional information about the award and Dr. Swanson, please see awards coverage in Inside ASIST in this issue of the Bulletin and in the December/January 2001 issue.

Many thanks for the introduction, Gene; if that had been an obit, it would have been worth dying for.

I am pleased and honored to receive the ASIST Award of Merit. Among all the people whose writings have influenced and inspired me, an astonishingly high proportion of them have received an ASIST award, among them Derek de Solla Price, Manfred Kochen, Eugene Garfield, Henry Small, Wilfred Lancaster, Cyril Cleverdon, William Cooper, Abraham Bookstein, David Blair, Marcia Bates, Dagobert Soergel and Stephen Harter. Meriting special mention, though having no ASIST connection, is my colleague and co-author of many recent papers, Neil Smalheiser, a neurobiologist. I am indebted also to many students formerly in the University of Chicago Graduate Library School who have been a source of intellectual stimulation.

More than 40 years ago the fragmentation of scientific knowledge was a problem actively discussed but without much visible progress toward a solution; perhaps people then had the consummate wisdom to know that no problem is so big that you can't run away from it. Three aspects of the context and nature of this fragmentation seem notable:

    1. The disparity between the total quantity of recorded knowledge, however it might be measured, and the limited human capacity to assimilate it, is not only enormous now but grows unremittingly. Exactly how the limitations of the human intellect and life span affect the growth of knowledge is unknown.  Metaphorically, how can the frontiers of science be pushed forward if, someday, it will take a lifetime just to reach them? Wigner has perceptively explored this question (1950).

    2. In response to the information explosion, specialties are somehow spontaneously created, then grow too large and split further into subspecialties without even a declaration of independence. One unintended result is the fragmentation of knowledge owing to inadequate cross-specialty communication. And as knowledge continues to grow, fragmentation will inevitably get worse because it is driven by the human imperative to escape inundation.

    3. Of particular interest to me is the possibility that information in one specialty might be of value in another without anyone becoming aware of the fact. Specialized literatures, or other "units" of knowledge, that do not intercommunicate by citing one another may nonetheless have many implicit textual interconnections based on meaning. Indeed the number of unintended or implicit text-based connections within the literature of science may greatly exceed the number that are explicit, because there are far more possible combinations of units (that potentially could be related) than there are units. The connection explosion may be more portentous than the information explosion.

Undiscovered Public Knowledge

In 1985, I was struck by lightning and have never recovered. I encountered, partly by accident, two pieces of information from two different articles in the medical literature that together suggested an answer to a question for which I could find no single article that provided an answer. It seemed that I might have found out something that no one else knew and that the medical literature might be full of such undiscovered connections. That incident led to the idea of connecting two different sets of problem-oriented specialized articles, or literatures, that are noninteractive (that is, do not cite each other) and complementary in that together they suggest new information not apparent in either of the sets considered separately. Most of my efforts since then have been directed toward developing a computer-assisted literature-based approach to scientific discovery (1991, 1999). The purpose of the computer here is to organize and display information in a way that helps the user see new connections of scientific interest.

By working through three specific examples and publishing results in the biomedical literature (1986, 1988, 1990) that were later corroborated, I was able to show that complementary but disjoint noninteractive structures in the literature of science do exist and can lead to novel scientific hypotheses that are worth testing. Neil Smalheiser and I went on to develop four more examples, published in neurology journals (1994, 1996a, 1996b, 1998), and to describe a Web-based set of software aids called Arrowsmith, publicly available at http://kiwi.uchicago.edu (1999).

Assembling Other People's Ideas

It is clearly appropriate for information scientists to develop new methods, techniques and tools to be used by subject specialists, but a new question is also raised can information scientists, working on their own, presume to contribute to the substantive content of disciplines such as medicine and biology? The ASIST award-winning paper by Marcia Bates is perceptive and cogent for this question (1999). She calls attention to the important distinction between form (or structure) and subject-content, with information science being concerned primarily with the former and subject experts with the latter.

I believe that the dividing line between form and subject-content is itself a challenging research problem. Subject content as represented in text and bibliographic records can contain structures not ordinarily noticed, but nonetheless crucial for our purposes. The central idea of complementarity straddles the dividing line; it is a structural concept that is text-based and is about subject content.

For example, a link between two complementary passages of natural language text can be largely a matter of form: "A causes B" and "B causes C" can be seen as linked by B irrespective of the meaning of A, B or C. Yet the apparent implication that A causes C is not a foregone conclusion. In the complex biological world of multiple causation, the above construction is not transitive, the syllogistic appearance notwithstanding. Any conclusion would in general depend on understanding the biological meaning of the two premises. However, the two premises do suggest that the hypothesis "A causes C" might be worth testing. Complementarity within text, as I have defined it, is based on suggestivity rather than logical deduction (1991).

The above "ABC" model is a useful point of departure for a more systematic study of text-based complementarity; the seven examples of undiscovered public knowledge mentioned earlier and cited in the attached reference list may be of value as case material for such a study. 

If we, as information scientists, try to contribute to a subject specialty within a completely different field such as biology or medicine, it would seem prudent first to define and accept certain rhetorical constraints that are based on the distinction between form and content. I suggest the following two constraints.

First, our stated goal should be to produce new hypotheses or suggestions not discoveries. We want the lab scientists to test our hypotheses, and their view is that real discoveries come out of the lab, not the literature. Second, when writing for publication, subject content should be limited to reporting factually, or simply quoting, selected passages from what the experts in that subject have already put into print in reputable journals. In effect, our job is to assemble other people's ideas. The main point is to call attention to possible implicit links between the various text passages that are selected, wherein we focus especially on form and structure more than on subject content. Whether the links are plausible and persuasive enough to merit testing is then a judgment call by readers with subject expertise.

Complementarity Versus Novelty

The idea of complementarity, as I have presented it, is embedded in the text of scientific articles; novelty, on the other hand, is evidenced by the mutual isolation of the two sets of articles that are thought to be complementary. The detection of novelty thus can be based on citation searching, where the object of the search is to find sets of articles that do not cite each other and are not co-cited. However, I think that the seminal idea of co-citation analysis, pioneered by Henry Small (1973), has thus far been under-exploited in my work, and I foresee a more central role for it in future work.

Novelty can also be detected through subject searching in large databases. If two separate sets of articles that are complementary have a substantial intersection, the complementary relationship is likely to be reflected in that intersection and hence would probably be well-known. Ideally we seek pairs of sets that are disjoint (i.e., have a null intersection) or nearly so. Such a quest requires a high-recall search strategy, as does any search for something rare or that may not even exist. The point is worth mentioning if only because it is sometimes said that, in this era of inundation, we no longer have any use for high-recall searching.

Skilled database searching, in a quest for either novelty or complementarity, is a prerequisite for conducting any investigation of implicit connections between literatures, for it is necessary first to identify the literatures as well as the explicit links that might already exist. The issue of whether professional searching without subject expertise is better than end-user searching echoes the question raised earlier about the proper role of the information scientist in identifying implicit linkages. In both cases I suspect that there are more unanswerable questions than unquestionable answers.

I am grateful to ASIST for the award and hope that it might stimulate other information scientists to embark on similar adventures in literature exploration. They should be warned that it can be addictive. But it also can be fun.

References

Bates, M. J. (1999). The invisible substrate of information science. Journal of the American Society for Information Science, 50, 1043-1050.

Smalheiser, N.R. & Swanson, D.R. (1994). Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease. Neuroscience Research Communications, 15, 1-9.

Smalheiser, N.R. & Swanson, D.R. (1996a). Indomethacin and Alzheimer's disease. Neurology, 46, 583. 

Smalheiser, N.R. & Swanson, D. R. (1996b). Linking estrogen to Alzheimer's disease: An informatics approach. Neurology, 47, 809-810. 

Smalheiser, N.R. & Swanson, D.R. (1998). Calcium-independent phospholipase A2 and schizophrenia. Archives of General Psychiatry, 55, 752-753.

Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24, 265-269.

Swanson, D. R. (1986). Fish Oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30, 7-18.

Swanson, D. R. (1988). Migraine and magnesium: Eleven neglected connections. Perspectives in Biology and Medicine, 31, 526-557.

Swanson, D.R. (1990). Somatomedin C and arginine: Implicit connections between mutually-isolated literatures. Perspectives in Biology and Medicine, 33, 157-186.

Swanson, D.R. (1991). Complementary structures in disjoint science literatures. In A. Bookstein, et al (Eds.), SIGIR91: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval Chicago, Oct 13-16, 1991 (pp. 280-289). New York: Association for Computing Machinery.

Swanson, D. R. & Smalheiser, N. R. (1999). Implicit text linkages between Medline records: Using Arrowsmith as an aid to scientific discovery. Library Trends, 48, 48-59.

Wigner, E. P. (1950). The limits of science. Proceedings of the American Philosophical Society, 94, 422-427.

Don R. Swanson is professor emeritus at the University of Chicago. He can be reached by mail at 1010 E. 59th St., Chicago, IL 60637; by telephone at 773/702-8267; or by e-mail at d-swanson@uchicago.edu

How to Order


ASIST Home Page

American Society for Information Science and Technology
8555 16th Street, Suite 850, Silver Spring, Maryland 20910, USA
Tel. 301-495-0900, Fax: 301-495-0810 | E-mail:
asis@asis.org

Copyright 2001, American Society for Information Science and Technology