Title: Topological aspects of information retrieval
Authors: EGGHE, Leo
Issue Date: 1998
Publisher: Wiley
Citation: Journal of the American Society for Information Science, 49(13). p. 1144-1160
Abstract: Let (DS, DQ, sim) be a retrieval system consisting of a document space DS, a query space QS, and a function sim, expressing the similarity between a document and a query. Following D. M. Everett and S. C. Cater (1992), we introduce topologies on the document space. These topologies are generated by the similarity function sim and the query space QS. Three topologies will be studied: The retrieval topology, the similarity topology, and the (pseudo-)metric one. It is shown that the retrieval topology is the coarsest of the three, while the (pseudo-) metric is the strongest. These three topologies are generally different, reflecting distinct topological aspects of information retrieval. We present necessary and sufficient conditions for these topological aspects to be equal. Several examples of topological retrieval systems are presented. One of these examples is a vector space model that yields a simplification of the Everett-Cater model, yet having a more diversified spectrum of topological properties. Finally, it is shown that information retrieval based on Boolean operators is an intrinsic part of the general topological model. This is a major motivation of the introduction of topologies in theoretical IR models.
