Author Topic: Indexing Relational Databases for Efficient Keyword Search  (Read 2296 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Indexing Relational Databases for Efficient Keyword Search
« on: November 23, 2011, 02:04:27 am »
Author : Phyo Thu Thu Khine, Htwe Pa Pa Win, Khin Nwe Ni Tun
International Journal of Scientific & Engineering Research Volume 2, Issue 10, October-2011
ISSN 2229-5518
Download Full Paper : PDF

Abstractó Keyword search is a widely accepted mechanism for querying in Information Retrieval (IR) systems and Internet search engines on the Web. They offer convenient keyword-based search interfaces. But searching in relational database systems the user needs to learn SQL and to know the schema of the underlying data even to pose simple searches. A system that can eliminate these requirements is needed. Therefore we proposed an efficient keyword-based search system for relational databases. The proposed system provides online search and offline indexing method to do efficient keyword based search. Firstly, a relational database is indexed in advance using the proposed indexing algorithm. At searching time, the index supports keyword-based searches with interactive response. The index size is manageable and database updates do not significantly hinder query performance. As long as the database table records can be extended, this system can be easily extendable for further searching records from tables. Experimental results show that the proposed algorithm provides less storage space and short offline indexing time and also reduces query processing time significantly compared to previous approaches.

Index Termsósearch, relational database, information retrieval, indexing, searching, ranking.

1   INTRODUCTION                                                                      
Keyword search provides a simple yet effective way for the users to query and explore the underlying documents. The last decade has seen ever expanding adoption of the key-word search technology, and it has become a de facto standard for user interaction on the World Wide Web (WWW). In the recent years, there has been a great deal of research and development activities on extending keyword search capabilities to handle relational data, the dominant form in which business data are stored. One important advantage of keyword search is that it enables users to search for information without having to know complex query languages such as SQL and XQuery or prior know-ledge about the structure of the underlying data. Keyword search provides an alternative means of querying relational databases, which is simple to most internet users. It lowers the access barrier for average users.
The alternative approaches of keyword search on relational databases are classified into candidate network based methods [8, 9, 10, 11], graph based algorithms [6, 12], and tuple unit based approaches [1, 2, 4, 5]. Li et al. [2] (Su et al. [1]) proposed the concept of tuple units (text objects) to efficiently answer keyword queries, which are composed of the most relevant tuples. The tuple units can be generated and indexed to improve search efficiency. Most previous approaches to keyword-based search in structured databas-es (reviewed in Section 2) perform a significant amount of database computation at search time. Our offline indexing approach does all significant work in advance so it can provide interactive answers.
This paper is organized as follows: Section 2 summarizes the literature reviews. Section 3 shows the overview of proposed
system. Section 4 shows experimental results and section 6 is the conclusion and future work.

Verity [15] crawls the content of relational databases and builds an external text index for keyword searches, as well as external auxiliary indexes to enable parametric searches. DataSpot [16] extracts database content and builds an external, graph-based representation called a hyperbase to support keyword search. Graph nodes represent data objects such as relations, tuples, and attribute values. Query answers are connected subgraphs of the hyperbase whose nodes contain all of the query keywords.   
DbSurfer [17] indexes the textual content of each relational tuple as a virtual web page. For querying and navigation, the database foreign-key constraints between tuples are treated as hyperlinks between virtual web pages. Given a keyword query, the system computes a ranked set of virtual web pages that match at least one keyword. Then a best-first expansion algorithm finds a ranked set of naviga-tion paths originating from the starting web pages.
Three systems, DBXplorer [8], BANKS [6], and DISCOVER [9], share a similar approach: At query time, given a set of keywords, first find tuples in each relation that contain at least one of the keywords, usually using auxiliary indexes. Then use graph-based approaches to find tuples among those from the previous step that can be joined together, such that the joined tuple contains all keywords in the query. All three systems use foreign-key relationships as edges in the graph, and point out that their approach could be extended to more general join conditions.

Su et al [1] proposed a technique of indexing relational databases to improve the search efficiency. They indexes interconnected textual content in relational databases, and do keyword search over this content. A relational database is crawled in advance, text-indexing virtual documents that correspond to interconnected database content. At query time, the text index supports keyword-based searches with interactive response, identifying database objects corresponding to the virtual documents matching the query.    

Li et al. [2] proposes the concept of tuple units (text objects) to efficiently answer keyword queries, which are composed of the most relevant tuples. The tuple units can be generated and indexed to improve search efficiency. However, existing methods identify a single tuple unit to answer keyword queries. They neglect the fact that in many cases a single tuple unit cannot answer a keyword query. It is promising to integrate several related tuple units to effectively answer keyword queries.
Saint (Structure-Aware INdexing for finding and ranking Tuple units) [5] proposes a structure-aware index based me-thod to integrate multiple related tuple units to effectively answer keyword queries. The structural relationships between different tuple units are discovered and stored them into structure-aware indices, and progressively found the top-k answers using such indices.
ITREKS (Indexing Tuple Relationship for Efficient Key-word Search) [4] supports efficient keyword-based search over relational database by indexing tuple relationship: A basic database tuple relationship, FDJT, is established in advance and then a FDJTTuple-Index table is created, which records relationships between each tuple and FDJT. At query time, for each of keywords, system first finds tuples in every relation that contain it, using full text indexes offered by database management system. Then FDJT-Tuple-Index table is used to find the joinable tuples contain all keywords in the query.

Read More: Click here...