Scholarly Commons at Miami University Scholarly Commons @ MU
    • Login
    • Scholarly Commons FAQs
    • SHERPA/RoMEO
    • SPARC Author Addendum Engine
    View Item 
    •   SC Home
    • Faculty Research and Scholarship
    • College of Engineering and Computing
    • Computer Science and Software Engineering
    • Computer Science and Software Engineering Technical Reports
    • View Item
    •   SC Home
    • Faculty Research and Scholarship
    • College of Engineering and Computing
    • Computer Science and Software Engineering
    • Computer Science and Software Engineering Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Analysis of Multiterm Queries in Partitioned Signature File Environments

    Thumbnail
    View/Open
    fulltext.pdf (1.670Mb)
    Date
    1993-04-01
    Author
    Aktug, Deniz
    Metadata
    Show full item record
    Abstract
    The concern of this study is the signature files which are used for information storage and retrieval in both formatted and unformatted databases. The analysis combines the concerns of signature extraction and signature file organization which have usually been treated as separate issues. Both the uniform frequency and single term query assumptions are relaxed and a comprehensive analysis is presented for multiterm query environments where terms can be classified based on their query and database occurrence frequencies. The performance of three superimposed signature generation schemes is explored as they are applied to a dynamic signature file organization based on linear hashing: Linear Hashing with Superimposed Signatures (LHSS). First scheme (SM) allows all terms set the same number of bits regardless of their discriminatory power whereas the second and third methods (MMS and MMM) emphasize the terms with high query and low database ooccurrence frequencies. Of these three schemes, only MMM takes the probability distribution of the number of query terms into account in finding the optimal mapping strategy. The main contribution of the study is the derivation of the performance evaluation formulas which is provided together with the analysis of various experimental settings. Results indicate that MMM outperforms the other methods as the gap between the discriminatory power of the terms gets larger. The absolute value of the savings provided by MMM reaches a maximum for the high query weight case. However, the extra savings decline sharply for high weight and moderately for the low weight queries with the increase in database size. The applicability of the derivations to other partitioned signature organizations is discussed and a detailed analysis of Fixed Prefix Partitioning (FPP) is provided as an example. An approximate formula that is shown to estimate the performance of both FPP and LHSS within an acceptable margin of error is also modified to account for the multiterm case.
    URI

    http://hdl.handle.net/2374.MIA/199
    Collections
    • Computer Science and Software Engineering Technical Reports

    Browse

    All of Scholarly CommonsCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    Statistics

    View Usage Statistics

    - Miami University Libraries
    - Center for Digital Scholarship
    - Contact Us
    DSpace software
    Mirage 2 Theme
    htmlmap