Extensible/Rule Based Query Rewrite Optimization in Starburst Hamid Pirahesh Joseph M. Hellerstein \Lambda Waqar Hasan y IBM Almaden Research Center San Jose, CA 95120, USA pirahesh@almaden.ibm.com, joey@cs.berkeley.edu, hasan@hpl.hp.com Abstract This paper describes the Query Rewrite facility of the Starburst extensible database system, a novel phase of query optimization. We present a suite of rewrite rules used in Starburst to transform queries into equivalent queries for faster execution, and also describe the production rule engine which is used by Starburst to choose and execute these rules. Examples are provided demonstrating that these Query Rewrite transformations lead to query execution time improvements of orders of magnitude, suggesting that Query Rewrite in general --- and these rewrite rules in particular --- are an essential step in query optimization for modern database systems. 1 Introduction In traditional database systems, query optimization typically con­ sists of a single phase of processing in which access methods, join orders and join methods are chosen to provide an efficient plan for executing a user's declarative query. We refer to this phase as plan optimization. In this paper we present a distinct phase of query optimization, Query Rewrite, which has been implemented in the Starburst DBMS [HCL + 90] to precede plan optimization in the pro­ cessing of a query. The goal of Query Rewrite is twofold: 1. Make queries as declarative as possible: In database languages such as SQL, it is often possible for a poorly expressed query, though ostensibly declarative, to force typical plan optimizers into choosing sub­optimal execution plans. A major goal of Query Rewrite is the transformation of such ``procedural'' queries into equivalent but more declarative queries. 2. Perform natural heuristics: Certain heuristics can be performed in Query Rewrite and are generally accepted in the literature as being valuable. A typical example is that of ``predicate push­ down'', in which predicates are applied as early as possible in the query (i.e. they are ``pushed'' from their original positions into table accesses, subqueries, views, etc.) Such rules can signifi­ cantly improve query execution time, and while a few of these \Lambda Current address: CS Division, Department of EECS, University of California, Berkeley, CA 94720 y Current address: Hewlett­Packard Laboratories, Palo Alto, CA 94305 heuristics are done in typical plan optimizers, they often can be applied in a more general way during Query Rewrite. Although it is accepted doctrine that query languages should be declarative, we shall see in our examples that alternative but equiv­ alent expressions of a query can have widely varying performance, often differing by orders of magnitude. It is therefore our conviction that Query Rewrite is an essential step in query optimization, since it further ensures that the expression of the query will be insignificant with respect to its performance. 1.1 Queries with Path Expressions The goals of Query Rewrite explained above are even more signifi­ cant in Object Oriented applications, which typically generate com­ plicated queries with ``path expressions'' connecting various collec­ tions of objects [BTA90, LLOW91, LLPS91]. In such applications, both the complexity of the logic and the volume of the data are far greater than in traditional DBMS applications [HSS88]. As a result, query optimization becomes increasingly important. The following is an example of a query involving path expres­ sions, using the Object SQL syntax defined in [BTA90]. 1 This query is a small variation of an example presented in [BTA90]. The ex­ ample database contains records of patients. Medical records are set attributes of patients. All accesses to data are via methods. Given a patient's record, the medical records are returned by the function get medical records. The example retrieves male patients who have been diagnosed with malaria or smallpox prior to '10/10/89'. The FROM clause enumerates the patients and the WHERE clause re­ stricts the patients to males and checks for the existence of a malaria or smallpox diagnosis prior to a given date. SELECT DISTINCT P FROM Patient p IN Patient Set WHERE p.sex == 'male' && EXISTS ( SELECT r FROM Medical record r IN p.get medical record() WHERE r.get date() ! '10/10/89' && ( r.get diagnosis() == 'Malaria' jj r.get diagnosis() == 'Smallpox' ); Queries such as this essentially involve path expressions in which, given a record, the related information is obtained through a path (e.g., getting the medical records of a patient). Queries involving path expressions are very common in complex applications such as CAD/CAM. In general, many path expressions may be involved in a query, and each may have a length of more than one. Finding an efficient execution plan for such path expressions is a problem very similar to that of optimizing (nested) SQL subqueries. The above query is commonly executed by enumerating the male patients, 1 This is a proposal for a standard OO query language, and is implemented by some commercial OODBMSs and checking the medical records for each patient, testing for the predicates on date and diagnosis. This amounts to a nested­loop join, with the join order dictated by the user --- the subquery is necessarily the inner of the join. However, if there are a large number of male patients, this could be very inefficient, since most of the records accessed are not qualified. Assuming both smallpox and malaria are rare diseases, the performance of such a query can be improved by orders of magnitude by first searching the medical records through an index on the diagnosis attribute of the medical records, and then finding the associated patients. This basically requires converting such queries to joins if possible, which can then be executed by a greater variety of plans. Changing the join order in the above example improves the per­ formance considerably, but has an undesirable side­effect. A given patient may have been diagnosed with both smallpox and malaria, and for each diagnosis, the associated patient record is sent to the output, resulting in duplicate patient records in the output. This is not correct; for example, this would cause the wrong result to be given by a query that counted the output records of the above query. Duplicates may occur even without query transformations. Du­ plicate records frequently appear in base or intermediate results in applications, and are of great significance to queries involving ag­ gregate functions, such as a query requesting an average of values of some column. Duplicates are part of SQL [ISO91] and OO models such as [LLOW91] (as ``bags of objects''). The need and importance of having duplicates in a realistic implementation is widely recog­ nized, particularly in relational DBMSs (RDBMSs). Hence, careful treatment of duplicates is essential, and is one focus of the work presented in this paper. 1.2 System Design Naturally we do not propose to find an optimal expression of a query, just as traditional plan ``optimizers'' do not find an optimal execu­ tion plan for a query. Rather, we have built a set of Query Rewrite heuristics, expressed as production rules, which work together to address both of the rewrite goals stated above. These production rules are controlled by a rule engine written for the purpose and inte­ grated into Starburst. Our rule system design has been instrumental in facilitating our experimentation with query transformations for two reasons. First, the rule system paradigm has made it easy for us to exploit the complicated triggering interactions between rewrite rules, saving us from the task of explicitly laying out the flow of control between rules. This is often cited as a dangerous complexity of rule systems ([ZH90], [Ras90], [HH91], etc.) but in our experi­ ence has been not only manageable but inherently useful. Second, a rule system is an excellent platform for extensibility, one of the key goals of Starburst's design. This extensibility has allowed us to write and test dozens of query transformations over the past two years, including those presented in this paper, magic sets transfor­ mations [MFPR90a, MPR90, MFPR90b], and numerous others. As we shall see, the rules presented in this paper demonstrate that Query Rewrite can often speed up query execution by orders of magnitude, suggesting that query transformation schemes form a ripe area of research. Our extensible Query Rewrite system is designed with this in mind, and we expect to continue adding transformations to it. 1.3 Related Work Designers of early RDBMSs such as System R [ABC + 76] and IN­ GRES [SWK76] recognized the importance of merging views, and achieved this under limited circumstances. In spite of the acknowl­ edged importance of such transformations, few systems have ex­ panded upon these early transformation designs. Kim [Kim82] originally studied the question of when quantified subqueries could be replaced by joins (or anti­joins). Ganski and Wong [GW87] and Dayal [Day87] did additional work on eliminat­ ing nested subqueries. These papers recognize the importance of merging of subqueries. [Kim82, GW87] also deal with subqueries containing aggregation. We have reported our set of rules that deal with such subqueries in [MFPR90a, MPR90, MFPR90b]. Ganski's paper illustrates the complexity of query rewrite, since it has to emend some previous transformations which were incorrect. This complexity supports our approach of decomposing transforma­ tions into an extensible set of distinct rules, such that each rule can be shown to be correct separately. We have paid particular attention to language orthogonality. Hence, operations such as UNION, INTERSECT, and EXCEPT (SQL's equivalent of set difference) may appear in subqueries, as is required by the SQL2 standard [ISO91]. The work mentioned above does not deal with these more complicated subqueries. Furthermore, our rules guarantee the merge of existential subquery conjuncts con­ sisting of restriction, projection and join. This is partially due to our careful treatment of duplicates. Anfindsen [Anf89] also has done a study of subquery transfor­ mations using IBM's DB2 RDBMS for performance measurements. Like us, Anfindsen reports orders of magnitude improvement in per­ formance. However, [Anf89] restricts itself to those transformations which result in a SQL query that can be handled by DB2. In contrast, our approach is an internal and integrated part of an RDBMS, taking advantage of a richer internal language, and hence allowing for con­ siderably more optimization. Anfindsen defines a concept similar to our one­tuple­condition, explained below, and gives sufficient conditions for which it is satisfied. Many extensions have been added to Starburst [LLPS91], includ­ ing XNF, a system supporting complex object queries which often generates extremely complicated SQL queries. Our Query Rewrite system has withstood the test of being used by these extensions, and in fact is key to making some of them work efficiently. The work presented here should not be confused with the query rewrite facility of POSTGRES [SJGP90]. POSTGRES's query rewrite is part of an implementation for an active database. In POST­ GRES, one can define a rule stating that certain incoming queries should perform additional or entirely different actions from what the user has specified. In some situations, POSTGRES implements this notion by rewriting the user's query. Note well that POSTGRES's query transformations are intended to change a query's semantics, not its performance. For example, POSTGRES's rewrite may be used to implement user­defined semantics for update of views. In contrast, our emphasis is on transformation for the purpose of opti­ mizing query execution. An earlier design of the Starburst Query Rewrite rule system is reported in [HP88]. 1.4 Structure of the Paper Section 2 presents the abstract representation of queries used by the rewrite rules. The rules themselves are presented in Section 3. Section 4 describes the rule engine designed for Query Rewrite. Summary and conclusions appear in Section 5. 2 Starburst's Query Graph Model Queries are internally represented in a Query Graph Model (QGM). The goal of QGM is to provide a more powerful and conceptually more manageable representation of queries in order to reduce the complexity of query compilation and optimization. QGM supports arbitrary table operations, where the inputs are tables and outputs are tables. Examples of operations are SELECT, GROUP BY, UNION, INTERSECT and EXCEPT. The operation SELECT is that part of Starburst SQL which handles restriction, projection and join. We present the QGM graph through an example. Suppose we have the following SQL query: inventory quotations partno, desc QUANTIFIER COLUMNS partno, price q1.partno = q2.partno q1.descr ='engine' suppno partno q1(F) q2(F) =q2.suppno HEAD BODY QUANTIFIER SELECT distinct= ENFORCE distinct =TRUE price q2.partno = q3.partno SELECT q2.price<=q4.price =q3.price (3) (1) (2) (4) q4(A) QUERY SUBQUERY q3(F) distinct= PERMIT =q1.partno =q1. descr descr distinct =FALSE Figure 1: Example QGM graph SELECT DISTINCT q1.partno, q1.descr, q2.suppno FROM inventory q1, quotations q2 WHERE q1.partno = q2.partno AND q1.descr='engine' AND q2.price Ÿ ALL ( SELECT q3.price FROM quotations q3 WHERE q2.partno=q3.partno); This query gives information about suppliers and parts for which the supplier price is less than that of all other suppliers. Figure 1 shows the QGM for this query. The graph contains four boxes. Boxes 1 and 2 are associated with base tables inventory and quotations. Box 3 is a SELECT box associated with the main part of the query, and Box 4 is a SELECT box associated with the subquery. Each box has a head and a body. The head describes the output table produced by the box, and the body specifies the operation required to compute the output table. Base tables can be considered to have empty or non­existent bodies. Let's study Box 3. The head specifies output columns partno, descr and suppno, as specified in the select list of the query. The specification of these columns includes column names, types, and output ordering information. The head has a Boolean attribute called distinct which indicates whether the associated table contains only distinct tuples (head.distinct = TRUE), or whether it contains dupli­ cates (head.distinct = FALSE). The body of a box contains a graph. The vertices of this graph (dark circles in our diagrams) represent quantified tuple variables, called quantifiers. In Box 3, we have quantifiers q1, q2, and q4. Quantifiers q1 and q2 range over the base tables inventory and quo­ tations respectively, and correspond to the table references in the FROM clause of the SQL query. Note that nodes q1 and q2 are connected via an inter­box edge to the head of the inventory and quotations boxes. The edge between q1 and q2 specifies the join predicate. The (loop) edge attached to q1 is the local predicate on q1. In fact, each inter­quantifier edge represents a conjunct of the WHERE clause in the query block --- the conjuncts being repre­ sented in the diagram by the labelled rectangle along the edge. Such edges are also referred to as Boolean factors [SAC + 79]. Quantifier 3 is a universal quantifier, associated with the ALL subquery in the WHERE clause. This represents that for all tuples associated with q4, the predicate represented by the edge between q2 and q4 is true. In Box 3, q1 and q2 participate in joins, and their columns are used in the output tuples. These quantifiers have type F, since they come from the query's FROM clause. Quantifier 4 has type A, representing a universal (ALL) quantifier. SQL's predicates EXISTS, IN, ANY and SOME are true if at least one tuple of the subquery satisfies the predicate. Hence, all of these predicates are existential, and the quantifiers associated with such subqueries have type E. Each quantifier is labeled with the columns that it needs from the table it ranges over. Box 4 represents the subquery. It contains an F quantifier q3 over the quotations table, and has a predicate that refers to q2 and q3. The body of every box has an attribute called distinct which has a value of ENFORCE, PRESERVE or PERMIT. ENFORCE means that the operation must eliminate duplicates in order to enforce head.distinct = TRUE. PRESERVE means that the operation can preserve the number of duplicates it generates. This could be because head.distinct = FALSE, or because head.distinct = TRUE and no duplicates could exist in the output of the operation even without duplicate elimination. PERMIT means that the operation is permitted to eliminate (or generate) duplicates arbitrarily. For example, the distinct attribute of Box 4 can have the value PERMIT because its output is used in a universal quantifier (q4 in Box 3), and universal quantifiers are insensitive to duplicate tuples. This will be covered in more detail in Section 3. Like each box body, each quantifier also has an attribute called distinct which has a value of ENFORCE, PRESERVE or PERMIT. ENFORCE means that the quantifier requires the table over which it ranges to enforce duplicate elimination. PRESERVE means that the quantifier requires that the exact number of duplicates in the lower table be preserved. PERMIT means that the table below may have an arbitrary number of duplicates. Existential and universal quantifiers can always have distinct = PERMIT, since they are insensitive to duplicates. In the body, each output column may have an associated expres­ sion corresponding to expressions allowed in the select list of the query. In Figure 1, all of these expressions are simple identity func­ tions over the referenced quantifier columns. SQL2 has table expressions, which are similar to view definitions, and can be defined anywhere a table can be used. In Starburst, table expressions and views, just like queries and subqueries, have a QGM, with one or many boxes, and become part of the QGM graph of queries referring to them. The output of a box can be used multiple times (e.g., a view may be used multiple times in the same query), creating common subexpressions. Recursive queries create cycles in QGM. As the size of the graph grows, the cost of optimization also grows. The number of QGM boxes in a query typically ranges from 2 to 10. For much more complex queries, such as those produced by XNF, this number often ranges from 10 to 100. 2.1 Environment for Performance Measurements It is not uncommon for queries to take hours or even days to com­ plete. Query Rewrite can improve performance by several orders of magnitude --- in many cases converting an over­night query to an interactive one. We will be demonstrating this fact during the course of the discussion by measuring the performance effect of our rewrite rules on various queries. In this section we present the environment used for these measurements. A comprehensive performance evaluation requires a definition of a benchmark database and a set of queries for a particular work­ load. We focus on a complex query workload (involving subqueries, views, etc), rather than a transaction workload, where queries are relatively simple. There is no accepted standard complex query workload, although several have been proposed ([TOB89, O'N89]). To measure the performance effect of the rewrite rules, we employ a Table Tuple Size #Tuples #4K Pgs #Indices itm 34 170 000 1 850 1 itl 78 2 550 000 57 980 2 itp 43 339 440 4 250 3 pur 398 128 000 11 640 1 wor 119 120 000 4 000 1 Table 1: Benchmark Database version of the IBM DB2 benchmark database described in [Loo86], scaled up by a factor of 10. The DB2 benchmark database is based on an inventory tracking and stock control application. Workcenters have locations (locatn). Items (itm) are worked on at locations within workcenters, and the table itl captures this relationship. The record of the items worked on by a particular employee is captured in table wor. Each item may have orders (itp). Some physical characteristics of the database are shown in Table 1. Since Starburst's Query Rewrite system can produce SQL repre­ sentations of its output, we can easily measure its effects on a widely­ used commercial DBMS. This allows us to demonstrate the general applicability of Query Rewrite to typical DBMSs, not just Starburst. Our performance measurements were done on the DB2 relational DBMS. We measured the elapsed time (total time taken by system to evaluate the query), and CPU time (the time for which CPU is busy) of each query both before and after applying the rewrite rules. Both representations of the query went through the usual DB2 query com­ pilation process, including plan optimization. Performance figures for several of the queries we measured are given in Section 3. 3 A Suite of Rewrite Rules for Guaranteeing SELECT Merge In designing our rewrite engine and rules, we attempted to abide by the following: Rewrite Philosophy Whenever possible, a query should be converted to a single SELECT operator. A single SELECT operator ranging over base tables represents the prototypical relational query, involving straightforward restric­ tion, projection, and join. There are a variety of high­performance algorithms for executing such queries, and methods for choosing among these are well understood. Also, as noted above, more com­ plex queries often force the plan optimizer into choosing a particular plan --- for example subqueries force particular join methods and orders, and views (as we shall see) unnecessarily restrict the possible join orders for the query. Finally, plan optimizers typically can only make decisions based on the environment of a single query block (i.e. a single QGM box). As a result, multi­operator queries usually do not result in optimal plans, and should be converted to single SELECT operators whenever possible. There are, of course, many optimizations that can be applied to SELECT operators themselves. But we consider conversion to a single SELECT, when possible, to be among the most important goals of query transformation. As a result, we focus in this paper on those rewrite rules in Star­ burst which are used to guarantee that all views (table expressions) and existential subqueries are merged whenever possible. The only queries in Starburst which do not get rewritten as single SELECT queries are those which contain non­existential or non­Boolean fac­ tor subqueries, set operators, 2 aggregates, or user­defined extension operators (such as OUTER JOIN). The system allows rewrite rules 2 Even some of these get converted to a single SELECT, as we shall see! DISTPDTO SELMERGE EtoF ADDKEYS INT2EXIST BOXCOPY DISTPU EXC2NEXIST DISTPDFR EorAPDFR Figure 2: Triggering Interactions Between Rules if (in a SELECT box (upper box) a quantifier has type F AND ranges over a SELECT box (lower box) AND no other quantifier ranges over lower box AND ( upper.head.distinct = TRUE OR upper.body.distinct = PERMIT OR lower.body.distinct != ENFORCE)) fmerge the lower box into the upper box; if ( lower.body.distinct = ENFORCE AND upper.body.distinct != PERMIT) fupper.body.distinct = ENFORCE;gg Table 2: Rule 1 --- SELMERGE to be defined for such queries as well, so even though these are not converted to a single SELECT, they are often subject to some useful transformation. For example, Rosenthal [RGL90] defines a set of such transformations for outer join. In this section we will describe the set of rules in Starburst which lead up to the merger of SELECT boxes. In Figure 2 we show the dependencies among the rules, particularly when the execution of one rule (at the tail of an arrow) can cause another (at the arrow's head) to have its condition satisfied. We make no claim to be exhaustive in presenting these dependencies. The goal is rather to present the most important dependencies as an illustration of the utility of each rule in causing the merger of SELECT boxes. Since the goal of this suite of rules is the merger of SELECT boxes, the ``SELECT merge'' (SELMERGE) rule presented first will be transitively dependent on each of the other rules here, and will form the focus of our measurements. Before proceeding it should be noted that the number of rules we require is kept tractably low by enforcing locality of reference: each rule is written with reference to a specific context (e.g. a box, or a quantifier), and as a result rules which involve more than one box can be written in a box­by­box manner, rather than a paired­box manner. This keeps the number of rules on the order of the number of box types, rather than the square of that number or worse. We shall see many examples of rule locality in the suite of rules presented here, and in the next section we will see how our rule engine supports it. 3.1 Rules to Guarantee View Merge Rule 1. SELECT Merge The SELMERGE rule (Table 2) takes two SELECT boxes con­ nected by an F quantifier (e.g. a query over a view) and merges them into one box. The benefit of this transformation is that it makes more join orders possible; in the resulting single SELECT box the plan optimizer can choose as a join order any permutation of the tables under F quantifiers, whereas in the original query the tables refer­ enced in the table expression (lower box) could not be interleaved with those of the box above. Note that if we can apply this rule to all boxes in a query, we end up with a single SELECT box, so this rule leads directly to the realization of our Rewrite Philosophy. In order to exploit the utility of this rule, the rest of the rules in this section will attempt to make the condition of this rule satisfied in as many situations as possible. 3 The issue of duplicates forces us to introduce some complexity to ensure the rule's correctness. Ignoring duplicates for a moment, it should be clear why this rule works: it follows directly from the commutativity of joins and predicate applications. Since joins and predicate applications are commutative, we can interleave those of the lower and upper boxes, which is equivalent to saying that we can merge the two. This argument applies directly to the case in which neither the upper nor the lower box removes duplicates. Some analysis is required to see that this rule handles duplicates correctly. We break down the cases for duplicates and argue the correctness of each case: ffl upper.head.distinct = TRUE: This can happen in one of two ways: -- If upper.body.distinct = ENFORCE, then any duplicates pro­ duced by the lower box in the original query will be removed by the upper box, and thus no duplicates are lost or intro­ duced by the merge. -- If upper.body.distinct = PRESERVE, then all the F quan­ tifiers in the upper box produce sets without duplicates. If lower.body.distinct = PRESERVE then we can simply merge the lower box into the upper box, without any effect on duplicates. If lower.body.distinct = ENFORCE then we must set upper.body.distinct to ENFORCE to ensure that no duplicates will be produced after merge occurs. Note that we cannot have lower.body.distinct = PERMIT --- if we did, then the lower box could produce as many dupli­ cates as it found convenient, meaning that we could not have upper.head.distinct = TRUE and upper.body.distinct = PRESERVE, a contradiction. ffl upper.body.distinct = PERMIT: In this case we may ignore the issue of duplicates by definition. ffl lower.body.distinct != ENFORCE: In this remaining case, the previous two conditions must be false, i.e. we know that upper.head.distinct = FALSE and upper.body.distinct = PRESERVE. As a result we cannot merge the boxes if lower.body.distinct = ENFORCE, since we would be unable in a single box to remove the duplicates from the quantifiers of the lower box, and preserve those of the remaining quantifiers of the upper box. However, if lower.body.distinct != ENFORCE we need not worry about this issue, and thus can merge. Note that the only cases in which we cannot apply SELMERGE to two SELECT boxes connected by F quantifiers are when the lower box has multiple quantifiers ranging over it, or when up­ per.head.distinct = FALSE, upper.body.distinct = PRESERVE and lower.body.distinct = ENFORCE. We shall see that these cases are handled by the BOXCOPY and ADDKEYS rules respectively, guar­ anteeing that SELMERGE will eventually get to be executed. We have chosen a relatively simple query to measure the effect of the above rule in the performance environment explained in Sec­ tion 2. In practice, queries are typically more complicated, and the 3 As noted in [HP88], the importance of triggering this rule is emphasized when we remember that early relational systems such as System R supported only mergable views. Query CPU Time Elapsed time Before Rewrite 20 min 34.51 sec 24 min 19.80 sec After Rewrite 0 min 1.10 sec 0 min 7.20 sec Table 3: Example 1, Before and After Rewrite if ( in a SELECT box either quantifier­nodup­condition or one­tuple­condition holds for all F quantifiers) f head.distinct = TRUE; body.distinct = PRESERVE;g Table 4: Rule 2 --- DISTPU merge rule only becomes applicable after many of the rules enu­ merated below are applied. Although this example is simple, many commercial DBMSs miss this optimization. Consider a view which gives the item number and vendors for an item which vendors have supplied since the year 85. This view is used in a query which gives information about certain items and their vendors. Example 1. CREATE VIEW itpv AS ( SELECT DISTINCT itp.itemn, pur.vendn FROM itp, pur WHERE itp.ponum = pur.ponum AND pur.odate ? '85'); SELECT itm.itmn, itpv.vendn FROM itm, itpv WHERE itm.itemn = itpv.itemn AND itm.itemn – '01' AND itm.itemn ! '20'; The rewrite logic first recognizes that the result of the query is DISTINCT by applying the DISTPU rule explained below. Then the merge rule is applied. The resulting query is: SELECT DISTINCT itm.itmn, pur.vendn FROM itm, itp, pur WHERE itp.ponum = pur.ponum AND itm.itemn = itp.itemn AND pur.odate ? '85' AND itm.itemn – '01' AND itm.itemn ! '20'; As a result of merging the view with the query, the plan optimizer can use an index to access the tables within the view, and therefore it chooses a plan which exploits this fact while doing a join on behalf of the query. The results of executing this query with and without rewrite are shown in Table 3. After applying the rewrite rule, we get an 1100\Theta improvement in CPU time (and hence in pathlength) and a 200\Theta improvement in the elapsed time. Rule 2. Distinct Pullup In the DISTPU rule (Table 4) a SELECT box upper infers that no duplicate elimination is needed to guarantee that its output tuples are distinct. It does this by isolating the following properties: ffl one­tuple­condition: given a quantifier and a set of predicates, this condition is TRUE iff at most one tuple of the quantifier satisfies the set of predicates. ffl quantifier­nodup­condition: given an F quantifier in a SE­ LECT box, this condition is TRUE iff at least the primary key or a candidate key of the F quantifier appears in the output. Upper must find that either quantifier­nodup­condition or one­ tuple­condition holds for each of its F quantifiers and their asso­ ciated predicates. If this is not true of some F quantifier, then the projection of the Cartesian product of the boxes below will have /* DISTPDFR */ if ( in a box with type SELECT, UNION, INTERSECT or EXCEPT, body.distinct = PERMIT or ENFORCE) f for (each F quantifier in the body) quantifier.distinct = PERMIT;g /* DISTPDTO */ if ( in a box with type SELECT, UNION, INTERSECT or EXCEPT, all quantifiers ranging over the box have quantifier.distinct = PERMIT) fbody.distinct = PERMIT;g Table 5: Rule 3 --- DISTPDFR/DISTPDTO duplicates, and upper must remove them; if one of the two condi­ tions is true for each F quantifier in upper, then the projection of the Cartesian product will have no duplicates. Note the ``locality'' of this rule -- in writing the rule we need not worry about the type of the boxes below us, rather we focus on the F quantifiers over those boxes. Rule 3. Distinct Pushdown From/To In this pair of rules (Table 5), a box informs the boxes under it that it does not require them to eliminate duplicates. It does so by ``pushing'' the distinct attribute from itself to the boxes below it. This may save the lower boxes below from needing a sort or hash for duplicate elimination, and may also allow the lower boxes to be subject to rules which can introduce duplicates (such as EtoF below.) For the DISTINCT set operators (UNION, INTERSECT, and EX­ CEPT) the DISTPDFR rule is correct because of the semantics of duplicate elimination in those operators. The DISTINCT set opera­ tors are defined as removing duplicates from all their inputs before any further processing [ISO91]. Thus these boxes will disregard any duplicates produced by boxes below them, and can safely signal this by pushing DISTINCT down along their quantifiers. In the case of a SELECT box with body.distinct = PERMIT, we do not worry about the issue of duplicates. To see the correctness of the DISTPDFR rule for a SELECT box with body.distinct = ENFORCE, it suffices to notice that any tuple resulting from such a box is a projection of the concatenation of tuples t 1 ; : : : ; tn from the n inputs (under F quantifiers) to the box. Regardless of how many copies of each t i there are in the corresponding input table i, no more than one tuple projected from t 1 \Delta : : : \Delta tn will be in the output of the SELECT DISTINCT box. Thus each input can safely remove or introduce duplicates without affecting the output of the SELECT box above. The DISTPDTO rule is quite simple --- if all boxes ranging over a given box indicate their indifference to the number of duplicates produced by their inputs, then that box may introduce or remove du­ plicates at will, and hence can set its body's distinct flag to PERMIT, and its head's distinct flag to FALSE. Note the use of rule locality here: the task involves two boxes, and thus is broken into two separate rules, with the information passed via the quantifier between the boxes. Each operator need only concern itself with its own behavior in the activity of pushing down the DISTINCT attribute, and need not know anything about the other operators involved in the activity. Note further that if processing halts after DISTPDFR but before DISTPDTO (as can happen in our rule engine, see below), the QGM is still consistent and valid. Rule 4. E or A Distinct Pushdown From This rule (Table 6) is a special case of distinct pushdown ``from'', which exploits the fact that existential and universal quantifications are blind to duplicates. That is, the number of tuples in a subquery if ( in a SELECT box a quantifier has type = E or A) fquantifier.distinct = PERMIT;g Table 6: Rule 4 --- EorAPDFR if ( in a SELECT box more than one quantifier ranges over the box) f Make a copy of the box; Take one of the quantifiers ranging over the original box and change it to range over the new copy;g Table 7: Rule 5 --- BOXCOPY which satisfy the existential predicate is insignificant; existential predicates merely require that one of the tuples of the subquery match. Similarly the number of duplicates in a subquery has no bearing on a universal predicate; either all tuples in the subquery match the universal predicate, or not all do. Example 2. CREATE VIEW richemps AS ( SELECT DISTINCT empno, salary, workdept FROM employee WHERE salary ? 50000); SELECT mgrno FROM department dept WHERE NOT (EXISTS ( SELECT * FROM richemps rich, project proj WHERE proj.deptno = rich.workdept AND rich.workdept = dept.deptno)); This example returns those managers who have no rich employees in their department. By applying EorAPDFR and DISTPDTO, we make the subquery have body.distinct = PERMIT, which results in the view richemps being merged into the subquery. After rewrite, the query is: SELECT mgrno FROM department dept WHERE NOT (EXISTS ( SELECT * FROM employee emp, project proj WHERE proj.deptno = emp.workdept AND emp.workdept = dept.deptno AND emp.salary ? 50000)); Rule 5. Common Subexpression Replication This rule (Table 7) breaks common subexpressions in a QGM by replicating them. Doing so can allow one or both of the resulting boxes to merge. 4 Rule 6. Add Keys Given two SELECT boxes upper and lower, such that lower is ranged over only by an F quantifier in upper, ADDKEYS (Table 8) guarantees that upper and lower will be merged. It does so by modifying any SELECT box which preserves duplicates to be able to safely eliminate duplicates. We achieve this by adding ``key'' columns (or unique tuple ID's) to the inputs, which are passed up into the SELECT box. Once this is done, we can eliminate duplicates from the SELECT box without any effect, since each tuple in the 4 If queries are correlated, the copy logic is more complicated. This issue is beyond the scope of this paper, but is treated correctly in Starburst's version of this rule. if ( in a SELECT box head.distinct = FALSE) f for(each F quantifier) if ( the key of the F quantifier does not appear in the output) f Add the key to the head; head.distinct = TRUE;gg Table 8: Rule 6 --- ADDKEYS box has a unique key formed by the concatenation of the keys of the inputs. Again, note the ``locality'' of this rule -- boxes below are referenced by the F quantifiers which range over them, and thus the types of the boxes below become insignificant. In the following example, a view is declared, giving distinct nego­ tiated prices of ordered items. The query uses the view to calculate the negotiated price for each item type. Example 3. CREATE VIEW itemprice AS ( SELECT DISTINCT itp.itemno, itp.NegotiatedPrice FROM itp WHERE NegotiatedPrice ? 1000); SELECT itemprice.NegotiatedPrice, itm.type FROM itemprice, itm WHERE itemprice.itemno = itm.itemno; The ADDKEYS rule is applied to the (upper) SELECT box of the query, allowing SELMERGE to merge the view itemprice into the query. Note that SELMERGE changes the query's body.distinct attribute to be ENFORCE, thus removing the duplicates originally removed in the view. The resulting query is: SELECT DISTINCT itp.NegotiatedPrice, itm.type, itm.itemno FROM itp, itm WHERE itp.NegotiatedPrice ? 1000 AND itp.itemno = itm.itemno; This SQL representation of the rewritten query does not exactly capture the semantics of the transformed QGM. In the actual rewrit­ ten query, the output column itm.itemno is used during duplicate elimination, but its values are not delivered to the output of the query. 3.2 Guaranteeing Existential Subquery Merge The rules in the previous section guarantee that SELECT boxes get merged whenever the only quantifiers over the lower box are F quantifiers. The following rule attempts to facilitate merging by creating this situation as often as possible. In particular, we shall see that the next rule guarantees the merger of existential subquery conjuncts and the SELECT boxes above them. Rule 7. E to F Quantifier Conversion In this rule (Table 9) we convert Boolean factor existential sub­ queries to table expressions, by changing the type of quantifier over the subquery from E to F. Note that the ADDKEYS rule guarantees that the condition of this rule will eventually be satisfied for all such subqueries. As noted above, converting a subquery to a table ex­ pression (and hence a member of a join) increases possible orders of join execution. It may also allow for additional merging, if the subquery is another SELECT box. This rule is the QGM equivalent of a rule proven correct in [Day87]. 5 We do not prove its correctness here, but an intuition of 5 The exact rule is Semijoin(R; S; J) = Delta­Project(Join if ( in a SELECT box there is a quantifier of type E forming a Boolean factor AND ( head.distinct = TRUE OR body.distinct = PERMIT OR one­tuple­condition)) f set quantifier type to F; if ( one­tuple­condition is FALSE AND head.distinct = TRUE) fbody.distinct = ENFORCE;gg Table 9: Rule 7 --- EtoF Query CPU Time Elapsed time Before Rewrite 88 min 01.25 sec 91 min 49.20 sec After Rewrite 2 min 42.97 sec 6 min 24.60 sec Table 10: Example 4, Before and After Rewrite the rule's correctness can be seen by considering the two­quantifier case. As an example, consider the following query, which gives the order information for items built at certain locations and worked on at certain workcenters. Note that the itp table has a key, and hence contains no duplicates. Example 4. SELECT * FROM itp WHERE itp.itemn IN ( SELECT itl.itemn FROM itl WHERE itl.wkcen = 'WK468' AND itl.locan = 'LOCA000IN'); In order to execute this query we must output one copy of a tuple from itp iff there is at least one tuple in itl.itemn which satisfies the appropriate predicates. If we apply DISTPU to convert this subquery to a table expression, and then apply SELMERGE, we get a single SELECT query, i.e. SELECT DISTINCT itp.* FROM itp, itl WHERE itp.itemn = itl.itemn AND itl.wkcen = 'WK468' AND itl.locan = 'LOCA000IN'; In the transformed query, we output one copy of a tuple from itp \Theta itl such that the appropriate predicates are satisfied (including ``itp.itemn = itl.itemn'', which was implied previously by the ``IN'' construct). Since all columns from itl are projected away and du­ plicates are removed, this produces the same results as the original query. The above example is from the performance environment ex­ plained in Section 2. The results of the performance measurements are shown in Table 10. After rewrite we get a 32\Theta improvement in CPU time and a 14\Theta improvement in elapsed time. 6 During transformation, the DISTPU rule recognizes that the result of the query is distinct. Then the EtoF rule converts the subquery to a table expression, without adding any extra keys to the output, since it has already recognized that the output is distinct. Then the SELMERGE rule merges the table expression, greatly enhancing the performance of the query. We can now observe why Boolean factor existential subqueries over SELECT boxes are guaranteed to merge. Consider any SE­ (R; S; J); R:\Lambda): We actually generalize slightly here by isolating the case where one­tuple­condition is satisfied. 6 Again, many RDBMS, including the commercial ones, are un­ able to perform this optimization. if ( in an INTERSECT box body.distinct != PRESERVE) f set the box to be of type SELECT; choose an arbitrary quantifier Q1; /* Q1 will keep type F */ for ( each quantifier Q != Q1 in the box) f Q.type = E; add the predicate EXISTS(SELECT * FROM Q WHERE Q1.c1 = Q.c1 AND Q1.c2 = Q.c2 AND ... AND Q1.cn = Q.cn);gg Table 11: Rule 8 --- INT2EXIST LECT box upper with a Boolean factor SELECT subquery (i.e. a SE­ LECT box lower over which it ranges with an E quantifier). Because of the EorAPDFR and DISTPDFR/TO rules, we can assume that the subquery has body.distinct = PERMIT. Now, we want to be able to fire the EtoF rule, but we cannot do so if upper.head.distinct =FALSE, upper.body.distinct = PRESERVE and the one­tuple­condition does not hold for the quantifier between the two boxes. In this case, we can apply the ADDKEYS rule to force upper.head.distinct = TRUE, and at that point we can apply EtoF. After EtoF is applied, we have up­ per ranging over lower with an F quantifier, and lower.body.distinct != ENFORCE, so the conditions for SELMERGE are satisfied, and lower can be merged into upper. Rule 8. INTERSECT to Exists This rule (Table 11) converts a set INTERSECT operator (which may be n­ary) into an existential subquery, which can subsequently be converted (via EtoF and SELMERGE) into a single SELECT box. Typically, RDBMSs execute the INTERSECT operation by sorting the operands and then merging them. This method of execution is a variant of the sort merge join. We rewrite the INTERSECT operation as a join, and therefore benefit from other join methods besides sort­merge. This can improve the performance by many orders of magnitude, and hence is an essential query transformation. 7 Recall that the semantics of SQL's INTERSECT operator are to first remove duplicates from the inputs and then send to the output one copy of every tuple that appears in all of the inputs. This is equivalent to choosing any one input and sending to the output one copy of each of its tuples which appears in all other inputs. This rule simply captures that equivalence --- it produces a SELECT DISTINCT box with one F quantifier (the arbitrarily chosen input) and E quantifiers over all other inputs, which filter out tuples of the F­quantified input that do not have a match in all of the other inputs. Note that matching tuples require the large conjunction in the predicate of the subquery --- tuples must match on all columns to be equal. 8 As an example, consider the following query which finds the intersection of the items that employee 1279 works on and the items that are scheduled to be worked on in workcenter WK195 on date 9773. 7 A similar rule (EXC2NEXIST) exists in Starburst for converting an EXCEPT operator into a negated existential subquery, which can subsequently be involved in SELECT merge. 8 This can be simplified by including in the conjunction only the key columns of the tables. Query CPU Time Elapsed time Before Rewrite 9.65 sec 13.92 sec After Rewrite .42 sec 1.77 sec Table 12: Example 5, Before and After Rewrite Example 5. SELECT items FROM wor WHERE empno = 'EMPN1279' INTERSECT SELECT itemn FROM itl WHERE entry time = '9773' AND wkctr = 'WK195'; The intersect rewrite rule converts the query to an existential subquery, which in turn is converted to join by the EtoF rule, and merged by the SELMERGE rule. The query after rewrite is: SELECT DISTINCT itemn FROM itl, wor WHERE empno = 'EMPN1279' AND entry time = '9773' AND wkctr = 'WK195' AND itl.itemn = wor.itemn; The results of executing this query with and without rewrite are shown in Table 12. 9 After conversion to join, the plan optimizer considers both merge join and nested loop join methods. Due to the presence of an index on the join column, itemn, nested loop is chosen, resulting in much better performance. 10 4 The Rule Engine In keeping with the extensibility goals of the Starburst project, it was decided that a rule system was the appropriate platform for allowing query transformations to be easily added to the system, and subsequently reordered or modified. Existing rule engines did not appear to be appropriate for our needs, and thus we designed our own. As will become apparent in the following discussion, we required numerous capabilities not available in typical rule systems (such as OPS5 [BFKM85]). Starburst's Query Rewrite rule engine incorporates the following features: 1. Rules of Arbitrary Complexity: Rules in our engine are pairs of functions in a procedural language such as C: a condition function, which does an arbitrary check and sets a flag TRUE or FALSE, and an action function, which, if the condition function sets the flag TRUE, is invoked to take arbitrary action. The fact that our rules are C functions is essential --- we require our rules to be able to manipulate QGM, which is represented in Starburst as a network of C structures. Although rules are written in a language such as C, they are compiled into native machine language for efficient execution. 2. Context Facility: The data structure passed into any invocation of the rule system includes a pointer for user information, which is in turn passed into the rule functions themselves, where it can be read and/or modified. For our purposes in Query Rewrite, we use this pointer to store a current ``context'' in the QGM (a box, quantifier, or predicate), and our rewrite rules are written with reference to this context. This allows for the rule local­ ity discussed earlier. Special rules are added to the rule set to 9 For this experiment we used the original benchmark database, not the one scaled up by 10. 10 DB2 does not support INTERSECT. In the experiment, we chose UNION instead, which can be executed by a nearly identical strategy. Obviously, the number of output tuples for UNION is different, however, since this number is small in our experiment, the error in the cost difference is negligible. In fact, DB2 chose a better execution strategy for UNION than the one sketched above, and therefore the performance numbers for the original query are conservative. ``advance'' the context once we exhaust possibilities of modifi­ cation to the current context. Thus these rules ``traverse'' a QGM graph. 11 3. Rule Classes and Extensible Conflict Resolution: We partition our set of rules into rule classes, each of which can be thought of as a separate rule set. This gives us a number of advantages. First, it allows us to group rules into sensible units, allowing for better comprehension of rule activity. Second, since rules are arbitrary C procedures, a rule can invoke another rule class as a subroutine. This results in better modularity and increased comprehensibility. Finally, different rule sets can have different conflict resolution schemes [MF78] for choosing the next rule to fire. Starburst currently supports two schemes, a sequential scheme which cycles through a set of ordered rules, and a priority scheme, which always fires the highest order rule that has its condition satisfied. New conflict resolution schemes can be easily added to the system. The rule class mechanism provides some optional structure for the rules, giving us a measure of control without resorting to a fully procedural system. A glance at Figure 2 should illus­ trate that our rule interactions make it very difficult to explicitly express the order in which rules should be applied. The con­ cept of production rules frees us of this burden. However, we do wish to exercise some control over the order in which rules are fired. For instance, we want to apply INT2EXIST before SELMERGE, in order to ensure that an INTERSECT box above a SELECT box will be converted and merged into one SELECT box. By carefully organizing our rule classes and their conflict resolution schemes, we ensure this ordering. Note that the full spectrum of control is supported by our engine, from totally data­driven to totally procedural: a totally data­ driven scheme could be modelled with a single rule class in which the control scheme was randomized, and a totally procedural scheme would be a rule set made up of one rule which was triggered once and performed an arbitrary procedure or program. In Starburst's Query Rewrite, we use the rule engine to strike a balance between these extremes --- we have multiple rule classes, some of which use the sequential control scheme, others of which use the priority scheme. The resulting system retains much of the organization of a procedural program, while taking advantage of the easy extensibility and flexible interactions inherent in a rule system. 4. Guaranteed Termination: After a specified number of rules have been considered, our rule engine will terminate execution, whether or not there remain rules eligible for execution. This number is under the control of the rule programmer. Note that since execution can end after any rule is checked or executed, we are forced to make each rule be an atomic change mapping a valid QGM to an equivalent valid QGM. There can be no invalid or ``transitory'' states of the QGM between rules. This restric­ tion turns out to be quite positive, as it enforces a conceptual cleanliness on each rule. 5. Rule Engine Controls: Starburst users can locally enable or disable rules ``on the fly'' without affecting other users. This allows rule developers to enjoy the convenience of the rule sys­ tem paradigm without affecting concurrent users of the database. The controls also allow us to trace and explain rule activity, a use­ ful method for debugging the potentially complex interactions between rules. Our experience with this rule system has been quite positive. Once all the rule engine controls were added to the system, it became fairly easy to add new rules to the existing set without causing unusual rule 11 The choice of traversal is itself extensible. We currently support both depth­first and breadth­first traversal of QGM. interactions. The time required to add a rule to the system was determined largely by the complexity of the rule's transformation to QGM; typically we did not need to spend much time debugging our rule set as a whole. The success we have had with our rule system is unusual, and can be attributed both to the unique features of the system design listed above, and also to our application of the rule system. Since all of our rewrite rules produce valid QGMs, and since each individual rule does not degrade query performance, the worst behavior our rule system can display is to leave a query untouched. In practice this happens only to simple queries which require no optimization. 5 Conclusions: Engine and Rules We have built an extensible Query Rewrite system for Starburst, and shown that it can provide query execution improvements of orders of magnitude. Others have proposed query transformations before ( [Kim82, GW87, Day87, Anf89]) but our work subsumes many of these transformations, and is the first system implementation (to our knowledge) to organically incorporate query transformation schemes into a full RDBMS. We have put considerable effort into design­ ing the system to address the problems faced by comprehensive research prototypes and industrial grade RDBMSs, particularly in handling of complex queries, and queries associated with complex objects [LLPS91]. The capabilities of our Query Rewrite system sur­ pass by a significant margin those of current RDBMSs, commercial systems included. Among the query transformations presented in this paper, we generalize previous work to handle duplicates correctly, and thus are able to guarantee the merge of existential subquery conjuncts. Furthermore, we convert set operators (INTERSECT, EXCEPT) to subqueries, allowing a rich set of join methods to be used for exe­ cuting set operators. Although this is quite simple, it has not been dealt with in the past. Our decision to design a rule engine for Query Rewrite seems to have been sound. We have encountered few of the oft­cited problems of rule systems (nondeterminism of outcome, difficulty of comprehending system execution, slow performance, etc.), and made significant use of the inherent advantages of a rule system (ease of extensibility, abstraction of rule programmer from control flow, etc.) The extensibility of a Query Rewrite system is key --- it allows new functionality to be added easily both to the query language and the underlying technology (e.g. faster and cheaper memory), and should also allow plan optimizers to be ``taught'' to avoid their shortcomings, which may only be discovered when the system is used in production [Pir89, HCL + 90]. Our extensible Query Rewrite system addresses both of these issues. 6 Acknowledgments As usual, George Lapis was a great help in numerous implementation details for many of the rules. Don Haderle, Yun Wang of the DB2 organization, and Josephine Cheng of DBTI were of great help in our understanding of query transformations in the context of industrial grade systems. This lead to significant improvement in the complete­ ness of our rewrite rules. Many discussions with Sheldon Finkelstein lead to better conceptualization of QGM and the rules. We benefited considerably from the plan optimizer, query refinement, and runtime environment of Starburst. Thanks are due to Bruce Lindsay, Guy Lohman, John McPherson, Mavis Lee, Hanh Nguyen, and Kiyoshi Ono. We also thank John McPherson and Adam Glass for comments on an earlier version of this paper. References [ABC + 76] M. Astrahan, M. Blasgen, D. Chamberlin, K. Eswaran, J. Gray, P. Griffiths, W. King, R. Lorie, P. McJones, J. Mehl, G. Putzolu, I. Traiger, B. Wade, and V. Wat­ son. System R: Relational Approach to Database Man­ agement. ACM Transactions on Database Systems, 1(2):97--137, June 1976. [Anf89] Ole Jirgen Anfindsen. A Study of Access Path Selection in DB2. Technical report, Norwegian Telecommunica­ tions Administration and University of Oslo, Norway, October 1989. [BFKM85] L. Brownston, R. Farrell, E. Kant, and N. Martin. Pro­ gramming Expert Systems in OPS5. Addison­Wesley Publishing Co., 1985. [BTA90] Jose Blakeley, Craig Thompson, and Abdallah Alashqur. Strawman reference model for object query language. In First OODB Standardization Workshop, X3/SPARC/DBSSG/OODBTG, Atlantic City, New Jer­ sey, May 1990. [Day87] Umeshwar Dayal. Of Nests and Trees: A Unified Approach to Processing Queries that Contain Nested Subqueries, Aggregates, and Quantifiers. In Proc. 13th International Conference on Very Large Data Bases, pages 197--208, Brighton, September 1987. [GW87] Richard A. Ganski and Harry K. T. Wong. Optimiza­ tion of Nested SQL Queries Revisited. In Proc. ACM­ SIGMOD International Conference on Management of Data, pages 23--33, San Francisco, May 1987. [HCL + 90] L.M. Haas, W. Chang, G.M. Lohman, J. McPher­ son, P.F. Wilms, G. Lapis, B. Lindsay, H. Pirahesh, M. Carey, and E. Shekita. Starburst Mid­Flight: As the Dust Clears. IEEE Transactions on Knowledge and Data Engineering, pages 143--160, March 1990. [HH91] Joseph M. Hellerstein and Meichun Hsu. Determinism in Partially Ordered Production Systems. Research Re­ port RJ 8089, IBM Almaden Research Center, March 1991. [HP88] Waqar Hasan and Hamid Pirahesh. Query Rewrite Optimization in Starburst. Research Report RJ 6367 , IBM Almaden Research Center, August 1988. [HSS88] T. Haerder, H. Schoning, and A. Sikeler. Parallelism in Processing Queries on Complex Object. In Proc. International Symposium on Databases in Parallel and Distributed Systems, Austin, December 1988. [ISO91] ISO ANSI. Database Language SQL ISO/IEC 9075:1992, 1991. [Kim82] W. Kim. On Optimizing an SQL­like Nested Query. ACM Transactions on Database Systems, 7(3), September 1982. [LLOW91] Chares Lamb, Gordon Landis, Jack Orenstein, and Dan Weinreb. The Objectstore Database System. Commu­ nications of the ACM, October 1991. [LLPS91] Guy Lohman, Bruce Lindsay, Hamid Pirahesh, and Bernhard Schiefer. Extensions to Starburst: Objects, Types, Functions, and Rules. Communications of the ACM, October 1991. [Loo86] Chris Loosley. Measuring IBM Database 2 Release 2 ­ The Benchmark Game. InfoDB, 1(2), 1986. [MF78] J. McDermott and C. Forgy. Production System Con­ flict Resolution Strategies. In D.A. Waterman and Fredrick Hayes­Roth, editors, Pattern Directed Infer­ ence Systems, pages 177--199. Academic Press, 1978. [MFPR90a] Inderpal Singh Mumick, Sheldon J. Finkelstein, Hamid Pirahesh, and Raghu Ramakrishnan. Magic is Rele­ vant. In Proc. SIGMOD 90 [Pro90], pages 247--258. [MFPR90b] Inderpal Singh Mumick, Sheldon J. Finkelstein, Hamid Pirahesh, and Raghu Ramakrishnan. Magic Condi­ tions. In Proc. 9th ACM SIGACT­SIGMOD­SIGART Symposium on Principles of Database Systems, pages 314--330, Nashville, March 1990. [MPR90] Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. The Magic of Duplicates and Aggre­ gates. In Proc. 16th International Conference on Very Large Data Bases, Brisbane, August 1990. [O'N89] P. O'Neil. Revisiting DBMS Benchmarks. Datama­ tion, pages 47--54, September 15, 1989. [Pir89] Hamid Pirahesh. Early Experience with Rule­Based Query Rewrite Optimization. In G. Graefe, edi­ tor, Workshop on Database Query Optimization, CSE Technical Report 89­005. Oregon Graduate Center, May 1989. [Pro90] Proc. ACM­SIGMOD International Conference on Management of Data, Atlantic City, May 1990. [Ras90] Louiqa Raschid. Maintaining Consistency in a Strat­ ified Production System Program. In Proc. AAAI Na­ tional Conference on Artificial Intelligence, 1990. [RGL90] Arnon Rosenthal and Cesar Galindo­Legaria. Query graphs, implementing trees, and freely­reorderable out­ erjoins. In Proc. SIGMOD 90 [Pro90]. [SAC + 79] Patricia G. Selinger, M. Astrahan, D. Chamberlin, Ray­ mond Lorie, and T. Price. Access Path Selection in a Relational Database Management System. In Proc. ACM­SIGMOD International Conference on Manage­ ment of Data, Boston, June 1979. [SJGP90] M. Stonebraker, A. Jhingran, Jeffrey Goh, and Spyros Potamianos. On rules, procedures, caching and views in data base systems. In Proc. SIGMOD 90 [Pro90]. [SWK76] M.R. Stonebraker, E. Wong, and P. Kreps. The design and implementation of ingres. ACM Transactions on Database Systems, 1(3):189--222, September 1976. [TOB89] C. Turbyfill, C. Orji, and Dina Bitton. AS3AP ­ A Comparative Relational Database Benchmark. In Proc. IEEE Compcon Spring '89, February 1989. [ZH90] Yuli Zhou and Meichun Hsu. A Theory for Rule Trig­ gering Systems. In Francois Bancilhon, Costantino Thanos, and Dennis Tsichritzis, editors, Proc. Interna­ tional Conference on Extending Data Base Technology, Advances in Database Technology ­ EDBT '90. Lec­ ture Notes in Computer Science, Volume 416, Venice, March 1990. Springer­Verlag.