Dynamic Signature File Partitioning Based on Term Characteristics
Abstract
Signature files act as a filter on retrieval to discard a large number of non-qualifying data items. Linear hashing with superimposed signatures (LHSS) provides an effective retrieval filter to process queries in dynamic databases. This study is an analysis of the effects of reflecting the term query and occurrence characteristics to signatures in LHSS. This approach relaxes the unrealistic uniform frequency assumption and lets the terms with high discriminatory power set more bits in signatures. The simulation experiments based on the derived formulas show that incorporating the term characteristics in LHSS improves retrieval efficiency. The paper also discusses the further benefits of this approach to alleviate the potential imbalance between the levels of efficiency and relevancy.