Multiple Collection Searching: An approach with two term layers in the Bayesian network retrieval model [ ]


As hundreds or even thousands of collections are available on the Internet, the IR community must cope with the problem of searching multiple collections. To build a single index for all collections is practically prohibited by its obvious drawback: it is too slow, because searching such a gigantic index takes a long time to complete. Worse, this search may not complete due to network resource limits in case of hundreds of collections. This paper describes how to use Bayesian inference network, a probabilistic approach, to solve the problems in searching multiple collections. An efficient learning method to capture the relationships among terms contained in a given document collection, for improving the retrieval performance, as well as their use for retrieval purposes, is also shown.