Author Topic: A Stochastic Simulation of Optimized Access Strategies for a Distributed Databas  (Read 2487 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Author : Rajinder Singh, Gurvinder Singh, Varinder Pannu virk
International Journal of Scientific & Engineering Research Volume 2, Issue 11, November-2011
ISSN 2229-5518
Download Full Paper : PDF

Abstract—This paper highlights a design of a probabilistic solution to the operation allocation problem of Distributed Databases. Most of the present day commercial vendors of Distributed DBMS use deterministic procedures along with certain heuristics on exhaustive enumeration procedures like Dynamic Programming, Greedy Techniques, Randomized strategies etc. These procedures have a lot of scope for improvements when problem domain is increased from the point of view of ‘number of sites’ or ‘number of joins’ involved in a distributed query.  Recently great interest has been shown by researchers to apply Genetic Algorithms to achieve this. This paper highlights design and implementation of one such model, Genetic Algorithm for Subquery Allocation (GA_SA), which is a modest effort to stochastically simulate optimization of retrieval transactions for a distributed query.

Index Terms— Stochastic Simulation, Genetic Algorithms, Distributed Database, Access Strategies, Query Optimization,  Fragments, sub-query operation, Computer Network, Random Number Generation, Query Tree.

1   INTRODUCTION                                                                     
ONE of the important key components of a distributed database query processing & optimization process is the allocation of the sites of a network to various operations or sub-queries involved in a particular Query. A distributed query is first broken into various sub operations like Selec-tions/Projection or Joins/Semi-joins, then these operations may be performed at many different sites of the network in many different sequences. These two components of the dis-tributed query optimization process are popularly referred as Operation Order Sequence Problem (OSP) & Operation allocation Problem (OAP). OSP involves finding the optimal ordering of operations e.g. Join order sequence.OAP involves finding the optimal placement of these operations to different permutations of network sites. Both of them are proven to be NP Complete and NP Hard problems while trying to find an optimal solution for a combination of large number of sites and operations [1]. For this reason one prefers a stochastic solution over deterministic ones. Because enumerative and deterministic procedures go intractable quite quickly as soon as the number of sites or number of complex operations like joins are increased. So this has been an active area of research for database community since last few decades and  the hunt for better techniques or heuristics is still actively pursued [2].
   In this paper focus is on sub-query operation alloca-tion problem. An innovative Genetic Algorithm (GA_SA) is proposed as a solution and a stochastic simulator is designed based on this algorithm. Paper is organized as follows. Section 2 discusses various prevalent techniques and earlier research in the area. Section 3-5 describe step by step, all the details of the genetic algorithm and simulator’s design like Query representation by Query Tree, mathematical formulation of the objective function, Data File Design, Flowchart of GA_SA, Genetic Algorithm and Graphical Analysis.

Finally in section 6 we analyze and validate the solution by taking a set of Queries on a Winconsin Benchmark Database and comparing the GA_SA costs and execution times with other widely followed methods like Exhaustive Enumeration, Dynamic Programming, Branch & Bound and Simulated Annealing etc. Due to the NP-Hard nature of the problem, there are no standard benchmarks for comparing efficiency of distributed database design and query optimization algorithm variants. So, to validate the results of the proposed genetic solution GA_SA, some of above stated approaches have been compared by using benchmark analysis & results from the classic works of Martin & Lam [4], other good comparisons are available in [5], [6], and [7].

2.1   Introduction:
Research activities in this field can be broadly divided into following three categories, which have been extensively elaborated and compared in earlier works of [8],[9],[10],and [11].

a)   Deterministic techniques: Exhaustive Enumeration with heuristics (Dynamic   Programming), Branch & Bound, Greedy, Iterative Dynamic Programming.

b)   Randomized Techniques: Iterative Improvement, Simu-lated Annealing, 2PO (Two phase Optimization).
c)   Evolutionary Techniques: Genetic Algorithms, Multi objective Genetic Programming.

2.2 Exhaustive Enumeration
It is the most primitive technique, when used along some heuristics to prune suboptimal plans is then called Dynamic Programming Technique. This was the first approach in System R project of IBM and was later used for query optimization by most of DBMS vendors. Working in a bottom-up fashion it builds more complex sub-plans from earlier simple sub-plans until the completion of Plan. Access plans are build for every relation in the first phase and then algorithm enumerates all two way join plans using access plans as building blocks in the second phase. Third phase (finalizePlans) finalizes the plans by attaching the operations to make a complete plan. The function prunePlans helps in discarding suboptimal plans as early as possible. A brief outline of the general procedure is given below in figure 2.1.Distributed version does table scans at different sites and plan pruning is postponed till covering all sites.

Read More: Click here...