Main Page | Forum
I should point out that the opinions expressed here are my own and that I accept no liability for mistakes and/or omissions: caveat lector. A Standard disclaimer applies.
Back

Honours | Calendar | References (Normal) (Formal) (Quick) (Links only) (BibTeX) (Light Toggle)

Total of 98 records found.
Current page is: file:///C:/My Documents/Education/University/public_html/comp442.html?HTML=true&
File is: comp442_lite.html

Sort by Category, type, title, short_title, author, pub_date, read_date, rank. View rank.


term[LN] Title: Lecture Notes
Author(s): Pavle Mogin
Comments:

  1. Issues in Database and Information Systems
  2. Introduction to the Data Warehouse,
    Readings: Chapter 23 §23.1 + [ODWOT]
  3. Main Characteristics of a Data Warehouse,
    Readings: Chapter 23 § 23.1 23.2 + [ODWOT]
  4. On-Line Analytical Processing (OLAP)
    Readings: Chapter 23 § 23.3 + [ODWOT]
  5. OLAP Queries
    Readings: Chapter 23 § 23.3 23.6 + [ODWOT]
  6. Indexing the Data Warehouse,
    Readings: Chapter 23 § 23.4 + [ODWOT] [AIT]
  7. Materialized Views,
    Readings: Chapter 23 § 23.4 + [ODWOT]
  8. Query Generator,
    Readings: [ODWOT]
    IntLinks: [QG] [MG]
  9. Populating a Data Warehouse,
    Readings: Chapter 23 § 23.2.1 + [ODWOT]
  10. OLAP and Data Warehouse Architectures,
    Readings: [ODWOT] [MABDW]
    IntLinks: [VDW] [DMBOX] [SPDM] [EDWA]
  11. Data Mining,
    Readings: Chapter 24
    IntLinks: [DMINE]
  12. An Introduction to Object-Relational Databases,
    Readings: Chapter 25
    IntLinks: [OBDB]
  13. Abstract Data Types,
    Readings: Chapter 25 § 25.2
    IntLinks: [ADT]
  14. Structured Data Types,
    Readings: Chapter 25 § 25.3
    IntLinks: [SDTYPE]
  15. Oids and Reference Types,
    Readings: Chapter 25 § 25.4
    IntLinks: [OIDS]
  16. Inheritance,
    Readings: Chapter 25 § 25.4
    IntLinks: [INHER]
  17. An Introduction to Internet Databases,
    Readings: Chapter 22 § 22.1 22.2
    IntLinks: [IDB]
  18. XML and Related Technologies,
    Readings: Chapter 25 § 22.3
    IntLinks: [XML]
  19. XML Storage,
    Readings: Ronald Bouret
    IntLinks: [XSTORE]
  20. An Introduction to XML Query Languages
    IntLinks: [XQUERY]

See Also [CPS216]


term[NF] Title: Normal Forms including BCNF
Remote: http://www.mcs.vuw.ac.nz/courses/COMP302/LectureNotes/NormalizationSlides.pdf
Comments: Normalisation is a process for designing a set of relation schemas to give optimal update performance for a operational database.

  1. First normal form - a relation schema is in first normal form if the domain of its every attribute has atomic values. i.e. no one relation schema attribute is allowed to be composite or multi valued.
  2. Second Normal Form - A relation schema is in second normal form if its in first and no non-prime attribute is partially functionally dependent on any relation schema key.
    A attribute which is not part of any primary key is not partially defined on any key.
  3. Third normal form - A relation schema is in third normal form if it is in the 2nd and no non-prime attribute is transitively functionally dependent on any relation schema key.
  4. Boyce - Codd normal form (BCNF) - The relation schema (R, F) is in BCNF if the left hand side of every non-trival functional dependency in F contains a relation schema key.

Note for the 2nd, 3rd, and BCNF that every relation schema key functionally determines every relation schema attribute.


term[ODWOT] Title: An Overview of Data Warehousing and OLAP Technology
Author(s): Surajit Chaudhuri, Umeshwar Dayal
Publication Date: (March 1997) - Publisher: ACM.
Local: misc/decision-datawarehousingoverview-sigmodrecord97.pdf
Remote: http://www.dvs1.informatik.tu-darmstadt.de/DVS1/staff/wu/Overview_Papers/
Found/Read: 20/07/2002
Comments: Google version.


term[OLTP] Title: On Line Transactional Processing (OLTP)


term[OLAP] Title: On Line Analytical Processing (OLAP)
Comments: "OLAP applications are dominated by ad hoc, complex queries. In SQL terms, these are queries that involve group-by and aggregaion operators."

See Also [TDBMS](pg 682 § 23.3)


link[DWDSO] Title: Data Warehousing, Decision Support & OLAP
Remote: http://redbook.cs.berkeley.edu/lec28.html


book[TDBMS] Title: Database Management Systems, (2nd Edition)
Author(s): Raghu Ramakrishnan, Johannes Gehrke
Publication Date: (2000) - Publisher: McGraw-Hill International Editions.
ISBN: 0-07-232206-3 - Library: QA76.9.D3R237
Comments: Course Textbook

See Also [QTDBMS]


book[FDBS] Title: Fundamentals of Database Systems, (3rd Edition)
Author(s): Ramez Elmasri, Shamkant B. Navathe
Publication Date: (2000) - Publisher: Addison-Wesley.
ISBN: 0-201-54263-3
Found/Read: 2001
Comments: COMP302 Text book


paper[PIR92] Title: Extensible/Rule Based Query Rewrite Optimization in Starburst
Author(s): Hamid Pirahesh, Joseph M. Hellerstein, Waqar Hasan
Publication Date: (1992)
Local: misc/pirahesh92extensiblerule.pdf
Remote: http://citeseer.nj.nec.com/pirahesh92extensiblerule.html
Found/Read: 23/07/2002
Comments: Essay to Review

Publications by Hamid Pirahesh Joseph Hellerstein Wagar Hasan

See Also [QRWDB2] [SGB] [PGSQLRS] [RBQOR]
Reference 519 in [TDBMS]


link[DWO] Title: Data Warehousing - An Overview
Remote: http://www.peterindia.com/DataWarehousingView.html
Found/Read: 23/07/2002


pdf[DWP] Title: Data Warehouses
Author(s): Srikanth Pullela
Remote: http://lambda.uta.edu/cse6331/talks/d2.pdf
Found/Read: 23/07/2002
Comments: General presentation of Data Warehousing


link[DWFAQ] Title: Oracle Data Warehousing FAQ
Author(s): Frank Naudé
Publication Date: (04 March 2002)
Remote: http://www.orafaq.com/faqwh.htm
Found/Read: 24/08/2002
Comments: How can Oracle Materialized Views be used to speed up data warehouse queries?
With "Query Rewrite" (QUERY_REWRITE_ENABLED=TRUE in INIT.ORA) Oracle can direct queries to use pre-aggregated tables instead of scanning large tables to answer complex queries."


redbookcoverpaper[QRWDB2] Title: Query rewrite optimization rules in IBM DB2 universal database
Author(s): T. Y. Cliff Leung, Hamid Pirahesh, Joseph M. Hellerstein, Praveen Seshadri
Publication Date: (1998) - Publisher: Morgan Kaufmann Publishers Inc.
ISBN: 1-55860-523-1 - Library: QA76.9 D3 R287 I
Remote: http://portal.acm.org/citation.cfm?id=302119&coll=GUIDE&dl=GUIDE&CFID=4008255&CF...
Found/Read: 24/08/2002
Comments: Readings in Database Systems, Third Edition (1998)
edited by Michael Stonebraker and Joseph M. Hellerstein
Morgan Kaufmann Publishers, Inc.
ISBN: 1-55860-523-1

See Also [PIR92] Eric Anderson Notes Extensibility & Object-Relational Systems
I was unable to obtain a copy of the third edition of the red book but Joe Hellerstein was kind enough to send me a copy of this paper. I'm not sure if I can link to it from the web as the authors have not yet done so.


link[PGSQLRS] Title: The PostgreSQL Rule System
Local: http://developer.postgresql.org/docs/postgres/overview.html
Remote: http://developer.postgresql.org/docs/postgres/rule-system.html
Found/Read: 24/08/2002
Comments: "Thw work presented here should not be confused with the query rewrite facility of POSTGRESQL. POSTGRES's query rewrite is part of an implementation of an active database, In POSTGRES, one can define a rule stating that certain incoming queries should perfrom additional or entirely different actions from what the user has specified. In some situations, POSTGRES implements this notion by rewriting the user's query. Note well that POSTGRES's query transformations are intended to change a query's semantics, not its performance." quote from [PIR92]

See Also [PGSQL]


link[QRWIQRT] Title: Use Query Rewrite to Improve Query Response Time
Remote: http://otn.oracle.com/products/oracle9i/daily/jul17.html
Found/Read: 24/08/2002
Comments: "When query rewrite is enabled, all SQL queries will be checked to see if a materialized view can be used. If this is possible, then your SQL query will be transparently rewritten by Oracle to use the materialized view."


link[ODWGQRW] Title: Oracle9i Data Warehousing Guide - Query Rewrite
Publication Date: (2001) - Publisher: Oracle Corporation.
Remote: http://download-west.oracle.com/otndoc/oracle9i/901_doc/server.901/a90237/qr.htm
Found/Read: 24/08/2002


quote[QTDBMS] Title: Quote from [TDBMS] on rule based systems.
Found/Read: 26/08/2002
Comments: "However, once the number of joins becomes greater than about 15, the cost of optimization using this exhaustive approach becomes prohibitively high, even if we consider only left-deep plans.

Such complex queries are becoming important in decision-support environments, and other approaches to query optimization have been proposed. These include rule-based optimizers, which use a set of rules to guide the generation of candidate plans, and randomized plan gerenation, which uses probabilistic algorithms such as simulated annealing to explore a large space of plans quickly, with a reasonable likelihood of finding a good plan."

See Also [TDBMS]


paper[RBVQO] Title: A rule-based view of query optimization
Author(s): J. Ferytag
Publication Date: (1987) - Publisher: In Proc. ACM SIGMOD Conf. on the Management of Data.
Local: misc/p173-freytag.pdf
Remote: http://portal.acm.org/citation.cfm?id=38735&dl=ACM&coll=portal
Found/Read: 28/08/2002
Comments:

See Also [PIR92] [EXG] citeseer
Reference 246 in [TDBMS]


paper[EXG] Title: The Exodus optimizer generator
Author(s): G. Graefe, D. DeWitt
Publication Date: (1987) - Publisher: In Proc. ACM SIGMOD Conf. on the Management of Data.
Local: mics/sigmod87.pdf
Remote: http://www.cs.wisc.edu/~dewitt/queryopt/sigmod87.pdf
Found/Read: 28/08/2002
Comments:

See Also [PIR92] [RBVQO] [GLFRRQOA] citeseer
Reference 277 in [TDBMS]


paper[GLFRRQOA] Title: Grammar-like functional rules for representing query optimization alternatives
Author(s): G. Lohman
Publication Date: (1988) - Publisher: In Proc. ACM SIGMOD Conf. on the Management of Data.
Local: misc/p18-lohman.pdf
Remote: http://portal.acm.org/citation.cfm?id=50204&dl=ACM&coll=portal
Found/Read: 28/08/2002
Comments:

See Also [PIR92] [RBVQO] [EXG] citeseer
Reference 425 in [TDBMS]


paper[CEQO] Title: Control of an extensible query optimizer: A planning-based approach
Author(s): G. Mitchell, U.Dayal, S.Zdonik
Publication Date: (1993) - Publisher: In Proc. Intl. Conf. on Very Large Databases.
Local: misc/mitchell93control.pdf
Remote: http://citeseer.nj.nec.com/mitchell93control.html
Found/Read: 28/08/2002
Comments:

See Also [PIR92] [RBVQO] [EXG] [GLFRRQOA]
Reference 468 in [TDBMS]


link[PGSQL] Title: PostgreSQL
Local: http://www.mcs.vuw.ac.nz/technical/software/PostgreSQL/
Comments: The database used in the course.

See Also [PGSQLRS]


paper[QETLD] Title: Query Evaluation Techniques for Large Databases
Author(s): Graefe
Publication Date: (1993) - Publisher: ACM Computing Surveys, 25(2), Sections 1-8.
Local: misc/graefe-acmcs.pdf
Remote: http://www.cs.duke.edu/education/courses/fall01/cps216/papers/graefe-acmcs.pdf


link[CPS216] Title: CPS 216: Advanced Databases Systems
Local: misc/16-qo.pdf
Remote: http://www.cs.duke.edu/education/courses/fall01/cps216/
Comments: A course at Dukes University that covers many of the smae topics.

See Also [LN] [CS346] [DB2UDO]


paper[QOPMA] Title: Query Optimization by Predicate Move-Around
Author(s): Alon Y. Levy, Inderpal Singh Mumick, Yehoshua Sagiv
Publication Date: (1994) - Publisher: Proceedings of the Twentieth International Conference on Very Large Databases.
Local: misc/levy94query.ps
Remote: http://citeseer.nj.nec.com/levy94query.html
Comments: The pdf is available but the pages are in reverse order.


link[COSI] Title: COSI 227b: Advanced Topics in Database Systems
Remote: http://www.cs.brandeis.edu/~cs227b/papers.html#extensible
Found/Read: 30/08/2002
Comments: Has a good selection of papers.


pdf[SGB] Title: Starburst Grows Bright
Author(s): Stephen Brobst, Bob Vecchione
Local: misc/starbust.pdf
Remote: http://www-3.ibm.com/software/data/pubs/starbust.pdf
Found/Read: 30/08/2002
Comments: Refer section on Query Rewrite Optimization

See Also [PIR92]


paper[IFSOQ] Title: Inferring Function Semantics to Optimize Queries
Author(s): Mitch Cherniack, Stan Zdonik
Publication Date: (1998)
Local: misc/cherniack98inferring.pdf
Remote: http://citeseer.nj.nec.com/cherniack98inferring.html
Found/Read: 30/08/2002
Comments: Refers to [PIR92] - Internal reference 20. Start of section 1.2 Semantic Transformations

See Also [RLIARBO]


paper[RLIARBO] Title: Rule languages and internal algebras for rule-based optimizers
Author(s): Mitch Cherniack, Stanley B. Zdonik
Publication Date: (1996)
Local: misc/cherniack96rule.pdf
Remote: http://citeseer.nj.nec.com/cherniack96rule.html
Found/Read: 30/08/2002
Comments: Refers to an earlier version of [PIR92] - Internal reference 20
"The Starburst [20] optimizer and EXODUS [8] optimizer generator are example rulebased systems that permit rules to be supplemented with code. Code appears in two places:
* Head Routines (called “conditions” in [8] and “condition functions” in [20]) are invoked in the heads (lefthand sides) of rules and analyze query representations to decide if they should be transformed by rules.
* Body Routines (called “support functions” in [8] and “action routines” in [20]) are invoked in the bodies (righthand sides) of rules and are used to transform query representations into alternative forms. Code fragments are restrictive, making the quality and correctness of generated optimizers depend on the quality and correctness of included code. Code fragments also make rules difficult to formulate, prove correct and reason about. This helps to explain why for example, transformations of nested queries do not typically get implemented as instances of rules. Nested query optimization is particularly important and particularly difficult when nested queries are expressed over data with complex structure, as in nested relational [33], complex object [1] and objectoriented [27] databases. Such data models exacerbate both the classification and manipulation of nested queries by allowing tuples and objects to refer to sets and to each other. This potentially introduces data dependencies into queries, complicating their transformation as we later show. ..."

See Also [IFSOQ]


paper[RBQOR] Title: Rule-Based Query Optimization, Revisited
Author(s): Lane Warshaw, Daniel P. Miranker
Publication Date: (1999)
Local: misc/warshaw99rulebased.pdf
Remote: http://citeseer.nj.nec.com/warshaw99rulebased.html
Comments: Refers to [PIR92] - Internal reference PHH92
"The first generation of this research comprised a number of rule-based query optimizers [GRD87, GRM93, DID95, PHH92]. Rule-based representation is attractive since it closes the semantic gap between the specification of an optimizer and its implementation. By virtue of a declarative specification, it follows that rule-based optimizers are much easier to extend than their predecessors. However, each of these first generation optimizers considered using the general-purpose rule-based languages that were available at the time and determined that they were unsuitable as the basis of a query optimizer (we agree with those findings)."


link[CS346] Title: CS346 - Database System Implementation
Author(s): Prof. Widom
Publication Date: (Autumn 2001)
Remote: http://www-db.stanford.edu/~widom/cs346/
Comments: Of particular note is Guy Lohman from IBM's slides & Notes on query rewrite qg2.txt

See Also [CPS216] [DB2UDO]


pdf[DB2UDO] Title: The DB2 Universal Database Optimizer
Author(s): Guy M. Lohman
Publication Date: (1997) - Publisher: IBM Research Division.
Local: misc/ibmDB2QRW.pdf
Remote: http://www-db.stanford.edu/~widom/cs346/ibm.pdf
Comments: Good notes on Query Rewrite with several examples given (some broken off towards the end)

See Also [CS346] [CPS216]


link[OVSQLC] Title: DB2 Administration Guide - Overview of the SQL Compiler
Publisher: IBM.
Remote: http://webdocs.caspur.it/ibm/web/udb-6.1/db2d0/db2d0165.htm
Found/Read: 31/08/2002
Comments: Details of how DB2 goes from SQL to an access plan in several steps. Includes Query Rewrite examples.


paper[QPCTQIDO] Title: Query Plans for Conventional and Temporal Queries Involving Duplicates and Ordering
Author(s): Giedrius Slivinskas, Christian S. Jensen, Richard T. Snodgrass
Publication Date: (1999) - Publisher: ICDE.
Local: misc/slivinskas99query.pdf
Remote: http://citeseer.nj.nec.com/slivinskas99query.html
Found/Read: 31/08/2002
Comments: "Recent work on query optimization by Leung et al. [QRWDB2] emphasizes the importance of considering duplicates in DB2’s query rewrite rules. However, duplicates are addressed as special cases when defining rewrite rules, and no formal foundation for reasoning about these is offered."


paper[FCTQO] Title: A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering
Author(s): Giedrius Slivinskas, Christian S. Jensen, Richard T. Snodgrass
Publication Date: (2001) - Publisher: Knowledge and Data Engineering.
Local: misc/slivinskas00foundation.pdf
Remote: http://citeseer.nj.nec.com/slivinskas00foundation.html
Comments: "Leung et al. [QRWDB2](Leu98) present query rewrite rules for decorrelating complex queries, as implemented in IBM’s DB2. Queries are represented in a query graph model, which is a graph of nodes, each representing a table operation whose inputs and outputs are tables. Duplicates are addressed in a query graph model and in query rewrite rules; in this graph model, each operation can eliminate, preserve, or permit duplicates. Duplicates should be preserved when, for example, the DISTINCT clause is not specified, and duplicates are permitted when the operation produces an argument for a universal quantifier, e.g., ALL. Consequently, duplicates are addressed as special cases in query rewrite rules. Our algebra and transformation rules incorporate the handling of duplicates and order. We consider operations that eliminate or preserve duplicates. The "


paper[MABDW] Title: Mistakes to Avoid in Building Data Warehouses
Author(s): Pieter R. Mimno
Publication Date: (June 1999) - Publisher: Cutter IT Journal. Vol 12 No 6, pp 36 - 50..
Found/Read: 01/08/2002


term[QG] Title: Query Generators
Comments: Nasty notation: Aπ(X) - the set of non aggregated attributes in the SELECT clause. Agg(X) is a set of pairs of attribute and aggregate functions.

Using Pavle's stricter definition the requirements for a query generator are:

  1. Aπ(Q) ⊆ Aπ(V)
    Attributes required by the query need to be present in the view.
  2. (∀(Ai, Fi)∈Agg(Q))(Agg(V) ]= (Ai, Fi) ∨ Ai ∈ Aπ(V))
    Ai is an attribute from the set of attributes and Fi is a aggregate function like SUM/AVERAGE/COUNT. So, for an aggregate required by the query one of the following needs to be true:
  3. j(Q) ⊆ j(V)
    j(X) are the set of join conditions in the WHERE clause, e.g. f.shopID = s.shopID. The set of query dimensions is a subset of view dimensions, and join conditions pair wise match.
  4. σ(V) |= σ(Q)
    σ(X) are the selection parts of the WHERE clause, e.g. AGE > 45, NAME LIKE '%s%', joined together in conjunctive normal form (giving several disjunctive elements).
    CONDITION (2) Each disjunctive component in the query must be covered by the disjunctive elements in the view, e.g. (query) Name LIKE 'J%s%n' => (view) Name LIKE '%s%' - this is true as the rhs is satisfied whenever the lhs is satisfied (meaning the view contains at least as many tuples as required by the query). It is also important that the attributes that appear in this mapping are in Aπ(V).
    With disjunctive componets in a Query A ∨ B and a View C ∧ D, must be able to show that A => C and B => C OR A => D and => D to show that A ∨ B is covered by C ∧ D.
    If the query contains a disjunctive element with an attibute that is not present in any of the disjunctive elements of the view, then the attribute must appear in the non aggregated attributes of the view.
    CONDITION (3) The last condition is that the selection in the view cannot be stricter than the selection in the query, e.g. can't have Name LIKE '%s%' AND Age > 35 in the view and only Name = 'Jason' in the query as the view will be missing any Jason with an Age ≤ 35.

a ]= b - b may be inferred from a
a |= b - a covers b, that is, a tuple which satisfies a also satisfies b.

From assignment 2 model answers, must be able to find a mapping from all the elementary disjunctions in the query to the view to satisfy condition 2. A mapping from the view to the query is required for condition 3.


term[MG] Title: minimal generators (views)
Found/Read: 20/08/2002
Comments: Let M be a set of views and Q be a query. The set of minimal generators for Q is a subset of M such that each element in the subset is a generator for Q and doesn't generate any other view in the subset. Minimal generators are desirable as they contain the closest data to the query. Less superflous data, giving faster answers to queries.

A generator for Q contains sufficent data that all the required data for Q is present or may be infered (i.e. the generator contains a lower level of aggregation).


term[pred] Title: predicate
Remote: http://www.dictionary.com/search?q=predicate
Found/Read: 28/08/2002
Comments: From [LN](Data Mining, Slide 19) "P1(X1) ∧ P2(X2) ∧ P3(X3) ∧ … ∧ Pk(Xk)" ⇒ Y = c
where Xi(i=1, 2,…, k) are so called predictor attributes, Pi(Xi) are predicates, and Y is called the dependent attribute.

"A Predicate is a Unary Function whose result represents the truth or falsehood of some condition. A Predicate might, for example, be a function that takes an argument of type int and returns true if the argument is positive." from here.
Dictionary "2. Logic. That part of a proposition that is affirmed or denied about the subject. For example, in the proposition We are mortal, mortal is the predicate."


term[DSS] Title: Decesion Support Systems
Comments: "a programming system that is aimed to help managers during the process of business decision making" [LN]


term[DM] Title: Dart Mart
Comments: "a small Data Warehouse that satisfies the needs of part of the company" [LN]. Focused on a single subject.


term[DW] Title: Data Warehouse
Comments: Contains information that supports decision support systems [DSS].
Main Characteristics (Bill Inmon 1993):

Online Enterprise Reporting [OLER] alters this definition.

Design may be either top-down or bottom up. The top-down approach has a long term focus and will consider: projected needs for multilpe business units, a central data warehouse used to store transaction level data, integration of multiple data marts and a ECTL [ECTL] tool to avoid dirty data problems, and a datamart exchange architecture to avoid stovepipe data marts [SPDM]. In contrast, bottom-up development first developes a single data mart as a pilot project. It will find the business drivers, functional requirements, data sources, an off the shelf ECLT tool, a data mart in a box capable of meta data exchange, logical and physical strucutures. Acts as a proff of concept over a time frame of 90 to 120 days. Bottom-up development with supporting top-down design is generally considered better as it can be difficult to predicte all the enterprise requirements.


term[STARS] Title: Star Schema
Comments: Data for a single fact is contained in a central fact table (usually BCNF [NF]) that is extended using dimension tables (not BCNF). Attribute hierarchies are used to make aggregates.
The fact table is normalised as it makes up the bulk of the database. Less costly joins with the fact table can be achieved by not breaking the dimension tables.

See Also [SNOWF]


term[SNOWF] Title: Snowflake Schema
Comments: A variation on the star schema [STARS] in which dimension tables are in 3rd or boyce codd normal form [NF]. This provides a refinement to provide support for attribute hierarchies. Note that the attribute hierarchy is fragmented over more relational tables than the denormalised structure of a star schema. Accordingly, browsing a snowflake schema requires more joins.
The components of an attribute hierarchy are fragmented over relational tables.


term[CONSTS] Title: Constellation Schema
Comments: A collection of many star schema. It will have fact tables at different levels of aggregation and shared dimension tables.
A collection of fact tables stored as either star or snowflake schema that are stored in the same data warehouse and may share common dimensions.

Components are:



term[RUP] Title: Roll-up
Comments: Perform (or use) higher levels of aggregation to look for more general/courser data. Can lower number of dimensions (a level of aggregation can encompasses an entire dimension).
From 2001 exam: "An OLAP query that takes the current level of Data Warehouse fact table and performs a further aggregation using an attribute hierarchy of one (or more) dimension."

See also [DDOWN]


term[DDOWN] Title: Drill-down
Comments: Analyse data of finer granularity (more specific). Often occurs on a star-schema that contains multiple aggregates (lecture 5, slide 18). This technique is often limited by the smallest value in the data warehouse.
From 2001 Exam: "An OLAP query that looks for more detailed data using an attributes hierarchy of one (or more) dimension."

See also [RUP]


term[SLICE] Title: Slice
Comments: Equality select condition. Does not involve aggregation. Just cut out one dimension with a particular value from a hypercube.
From 2001 Exam: "An OLAP query that is performed on a fact table using a project operation on a proper subset of dimensions and a select operation with equality condition."

See also [RUP] [DDOWN]


term[DICE] Title: Dice
Comments: Range select condition.
From 2001 Exam: "An OLAP query that is performed on a fact table using a project operation on a proper subset of dimensions and a select operation with range condition."

See also [RUP] [DDOWN] [SLICE] [PIVOT]


term[PIVOT] Title: Pivot (cross tabulation)
Comments: Selects multiple dimensions to aggregate over.
From 2001 Exam: "An OLAP query that is performed on a fact table by selecting two dimensions, aggregating the measure, and representing the aggregated measure in a grid having two selected dimensions as coordinates."

See also [RUP] [DDOWN] [SLICE] [DICE] [CUBE]


term[CUBE] Title: Cube
Comments: A proposed extension to the SQL language. It will be equivalent to a collection of GROUP BY statements, with one GROUP BY statement for each subset of the k dimensions (with a total of 2k possible aggregation levels). All the possible levels of aggregation may be considered as a latice of GROUP BY queries.

Example,

CUBE ProdID, LocID, TimeID BY SUM Amnt

Is equivalent to eight queiries of the form:

SELECT SUM (Q.Amnt) FROM Quantity Q GROUP BY grouping-list

Where each query will have one of the eight permutations of ProdID, LocID, and TimeID for grouping-list.

See also [RUP] [DDOWN] [SLICE] [DICE] [PIVOT]


term[TOPN] Title: Ranking / Top-N
Comments: OPTIMISE FOR N ROWS rather than ordering all the tuples. Usually requires a histogram to be maintained on the table column of interest. Effectiveness is dependent on ability to choose a cutoff value.

From 2001 Exam: "An OLAP query that returns the top N items."

See also [RUP] [DDOWN] [SLICE] [DICE] [PIVOT] [CUBE]


term[MOLAP] Title: MOLAP (Multidimensional)
Comments: Advantage: Excellent speed and support for small and medium voulme databases. Disadvantage: Low Flexibility with the addition of new dimensions requiring the rebuilding of the hypercube. Volume limited to small and medium databases.

See Also [ROLAP]


term[ROLAP] Title: ROLAP (Relational)
Comments: "The multidimensional data model consists of measures and dimensions. The relation that relates the dimensions to the measures is called the fact table." [TDBMS]

Advantage: Large flexibility with an unlimited number of dimensions. Also capable of storing a medium to large database. Disadvantage: Slower speed than MOLAP for small and medium sized databases.

See Also [MOLAP]


term[DOLAP] Title: Distributed OLAP architectures
Comments: Distributed Enterprize Data Warehouse with centralized control [EDWA] -Distributed (fragmented) for load balancing, scalability, and higher avaiabilty. The centrally controlled metadata repository [MDR] is replicated with each fragment. This centralized architecture guaranties better data consistecy, but is hard to implement.

Federation of Data Marts - loosely coupled; each has it own metadata repository [MDR]. This approach is cheaper, more easy to administer, and to implement.


term[SCALE] Title: Scalability
Comments:



term[SAOP] Title: Server architectures for OLAP processing
Comments:



term[DMIG] Title: Data migration
Comments: Simple data transformation rules to be specified - eg. replace man by male, pound by kg.

See Also - Data cleaning tools [DAUDIT] [DSCRUB] as part of [ECTL]


term[DSCURB] Title: Data scrubbing
Comments: Using domain specific knowledge (rules of behaviour in the real system) to do cleaning of data from various sources - e.g. Identifying a single concept with multiple names. Like "Win95", "Windows 95", and "Microsoft Windows 95", which all represent the same entity/concept.

See Also - Data cleaning tools [DAUDIT] [DMIG] as part of [ECTL]


term[DAUDIT] Title: Data auditing
Comments: Tools used to scan a database for strange patterns - e.g. products that have never been sold.

See Also - Data cleaning tools [DSCRUB] [DMIG] as part of [ECTL]


term[ILOAD] Title: incremental loading
Comments: An alternative to full loading - where the change from old data to new is treated as a single atomic transaction. Provides incremental data warehouse refresh. Only operational databases and external source tuples that are updated during two successive loads are used, since only they alter the contents of the data warehouse.

The source operational database must support incremental loading for propagating updates from the primary database to its replicas (the data warehouse included). If a legacy database is in use the whole source databaase may need to be tranfered.

See Also [ECTL]


term[DATAS] Title: Data shipping
Comments: A technique that uses a snapshot log file to record the updates of source data and to refresh the data warehouse data at a later time.
More appropriate when the operational database and data warehouse are from different vendors, as the transaction log files will not be standardised. Differing APIS for access from different DBMS vendors.

See Also [TRANSS]


term[TRANSS] Title: Transaction Shipping
Comments: A technique that uses regular transaction log files to record the changes to source data. These changes are transfered to the data warehouse at a later time. More appropriate for homogenous systems, because transaction files will be standarised. Also preferable as transaction shipping requires less operational database server resources.

See Also [DATAS]


term[BI] Title: Bitmap Index
Comments: A bitmap index for a single table column contains as many bit vectors as there are different values in the column.
Each bit vector represents the presence(1) or absence(0) of one possible value in the column.
There is one bit in a vector for every row in the table.
If the column contains the value that the bit vector represents then the corresponding row bit is set to 1, otherwise it is 0.

Bitmap indexes are usually only used when there are a sparse number of possible values for the column.

Advantages: Efficent bit operations to answer queries. Logocal operators such as AND, OR, and NOT. Set oriented operators such as intersection, union, and difference. Join and aggregate operations. A 1 in a bitmap index can be directly translated to a record using its relative position in the vector and the block address on disk contating the required tuples.

An Encoded bitmap index can be used to map strings to unique binary representations. Many searches will then first query the bitmap index.

See also [TDBMS](pg 691) [BS]


term[BS] Title: Bit-slice
Comments: A bit-slice index involves creating a binary code representation of attribute values. For numerical values this involves conversion to binary. For non-numerical values, like strings, a hash function can be used.
The bit slice will have one column for each binary bit (i.e. 20, 21, 22, ...). Some queires become easier using bit-slices, such as find values greater than X or Sum the values.


term[JI] Title: Join Index
Comments: A join index is a ordered tree structure whose elements are n-tuples of the form (r1, r2, ..., rn).
[AIT] "A join index is a list of the disk addresses of the rows for matching columns."
A pre-computed auxiliary structure containing rids of tuples from different tables that join together.
In the context of a star schema, a join index can relate the values of an attribute of a dimension table to matching rows and that dimension table.

See also [BI]


term[MV] Title: Materialised View
Comments: A query is evaluated and stored in a virtual table and will often contain summary and other aggregated data. Need to be kept consistent with the underlying tables when they are updated.
These are important, as data warehouses are often Tbytes in size and are mainly queried. Materilised views can considerably enhance times of frequently asked queries. The drawback is when MV's need to be updated, substantially slowing the database system down when used with operational databases.

A materilised view is refresh to make it consistent with its underlying tables. An incremental refreshing algorithm has a update cost that is proportional to the amount of change in the base tables. "The view maintance policy determines when a view should be refreshed. In immediate view maintance the view is updated within the same transaction that modifies the underlying tables; otherwise the policy is to be deferred view maintenance." Lazy maintainance - update at query time. Periodic maintainance/snapshots - update periodically using an interval. Forced maintainance - after a certain number of changes in the base tables the view is refreshed.


term[MDR] Title: Metadata Repository
Comments: A database containing data about the data warehouse data, programs, users,... A dedicated bookkeeping database.

Tasks include:

When creating a central Meta Data repositiory for a architectured Data Mart it will also contain:



term[VDW] Title: Virtual Data warehouse
Comments: No seperate data warehouse or meta data. Directly uses operational databases using simple OLAP front-end tools. Minimal setup costs, but no historical data, aggregation, cleaning and transforming, or central meta data for enterprise wide definitions of the business data semantics. Competition between OLTP and OLAP for same resources.


term[DMBOX] Title: Data Marts in a box
Comments: Builds a data warehouse from various data sources. Uses local meta-data repository. Proliferation leads to multiple, non-integrated, independent, local data marts from different vendors. No common data mart. May lack cleaning tools [ECTL]. May use differing architectures.


term[SPDM] Title: "Stovepipe" Data Marts
Comments: Uses a single (central) Extraction, Cleansing, Transformation and Loading software package (ECTL tool) [ECTL]. This tool has its own centralized meta data repository. Provides admin facilities, can perfrom aggregations.
ECLT is good for dealing with dirty data and providing a coordinated access to source data. The meta-data repositories of each data mart are not connected to that of the ECTL tool. This lack of integration creates many mutually independent datamarts that support individual business units, but cant support coporate level needs. This is addressed by having a central metadata repository linked to all the local meta data repositories [MDR].

"Stovepipe data marts are small data warehouses that support the needs of individual business units but cannot be integrated at the corporate level because tyhey do not conform with enterprise-wide data definitions." [MABDW]


term[EDWA] Title: Enterprise Data Warehouse Architecture
Comments: This model contains a large central database that contains almost all data from the operational source databases (detailed at the level of transactions). Consolidated data that can be used for detailed drill-down queries.
The distributed (fragmented) nature is good for: load balancing, scalability, higher availability. The centrally controlled metadata repository is replicated with each fragment.

From 2001 Exam: "A number of packaged products that allow building and using many local Data Warehouse databases together with their local metadata repositories integrated to a central one."


term[HOLAP] Title: Hybrid OLAP Data Mart Architecture
Comments: Combines elemnets from ROLAP (stores basic data in relational tables) and MOLAP (stores aggregated data - materilised views). Utilises speed benefits according to data volumes.

From 2001 Exam:"A Data Warehouse that keeps basic data in relational tables and stores aggregated data in MOLAP strucutres."


term[OLER] Title: On-line Enterprise Reporting (OLER)
Comments: OLER - A new acronym formed for the comtempory view of datawarehousing as a technology used to build summary databases that can manage alll the data teh organisation needs for all kinds of reporting.
The OLER system (aka Enterprise Information Portal (EIP)) provides a single point of entry to integrated corporate data to employees, clients, and trusted business partners. Requirements include:

From 2001 Exam: "A Data Warehouse that is near real-time, contains:



term[ECTL] Title: Extraction, Cleaning, Transforming and Loading (ECTL)
Remote: http://www.brio.com/pdfs/achieving_data_warehouse_success.pdf
Comments: Extraction - Gateways to the operational databases and external sources of information. Gateways such as Open Database Connectivity (ODBC), Object Loading and Embedding for Databases (OLE-DB), and Java Database Connectivity (JDBC) allow client programmes to generate SQL statements to be executed of the server.

Cleaning - tools for detecting and correcting frequent source data errors. Tools include data migration [DMIG], data scrubbing [DSCRUB], and data auditing [DAUDIT].

Transforming - Reconciling semantic mistaches, such as different currency units, different names for the same attribute, and differences in how tables are normailsed or strucutred. Transforming data is typically accomplished by defining a relational view over teh tables in the data sources (the operational databases and other external sources).

Loading - May need to build a time dimension to satisfy need for unique and immutable Timeid. Other preprocessing can include sorting, summarisation, aggregation, and building indexes and materilised views. May require incremental loading [ILOAD], pipelining and parallel exectuion to deal with the volumes. Full loads avoid using partially updated data but may take longer than available.


term[DMINE] Title: Data Mining
Comments: Generally used to find interesting trends in large datasets using techniques from statistics (explanatory data analysis) and AI (knowledge discovery and machine learning).



term[FI] Title: Frequent Itemsets
Comments: Itemsets are items that appear together and have a support equal to the fraction of transactions where all the items of the set appear together (=#occuraces/#transactions). A frequent itemset has achieved over a threshold minimal support level.

Priori property: Every subset of a frequent itemset must also be a frequent itemset.

As a result the algorithm for constructing frequent itemsets involves first finding all the singleton sets with the minimal support. Then constructing combintations using first pairs, then triples and so on. Each new tuple is kept if it meets the minimal support. It should be noted that if a candidate tuple contains any sub tuple that does not satisy the minimal support, then the candidate tuple does not satisfy minimal support.

See also [DMINE]


term[ICEQ] Title: Iceberg queries
Comments: Refers to queries with many possible candidate results but the presence of something like a HAVING SUM(X) >=5 shows only the tip of all results. For very large data sets this can be a problem as the database will first sort and sum up all the data and then eliminate the groups that do not satisfy the condition in the HAVING clause. Againg using the priori rule we consider only those tuples whose individual components satisfy the HAVAING condition. Those tuples that do satisfy the requirements form candidate tuples that are tested for the ful having clause.

See also [DMINE]


term[MRULE] Title: Mining Rules
Comments:

See also [DMINE]


term[ARULE] Title: Association rules
Comments: LHS => RHS, both sides are sets of items and intrepreted as: "If every item in LHS is purchased in a transaction, then it is likely that the items in RHS are purchased as well."
The support is the percentage of transactions that contain all items from S. The support for LHS => RHS is equal to the support of LHS ∪ RHS (= support{LHS, RHS}). The confidence for a rule is the support(LHS ∪ RHS) / support(LHS) and provides a measure of strength of that rule.
Finding association rules requires two threshold values: minimum support and minimum confidence. The algorithm first finds all those itemsets that satisfy the minimal support, and then checks againest minimal confidence.

See also [DMINE]


term[SEQPAT] Title: Sequential patterns
Comments: ||X-Y||=ki=1(xi-yi)2 is used to determine the distance between two sequences of numbers, referred to as data sequences or a time series. Note that both sequences should be the same length.
For a simple similarity search all sequences in the database are retrieved that are fall within threshold value of some user given sequence.

See also [DMINE]


term[CRRULE] Title: Classification and regression rules
Comments: Classification and regression rules are expressed as P1(X1)∧P2(X2)∧...Pk(Xk) => Y = c.
Pi(Xi) is a predicate [pred], Xi(i=1..k) are predictor attributes, and Y is the dependent attribute. The predictor attributes may be numerical and in a range or catagorical and in a set. A numberical dependent attribute creates a regression rule, and a categorical creates a classification rule. Support and confidence use the same format as association rules.
Once rules are found they may be represented as structured trees.

See also [DMINE]


term[CLUST] Title: Clustering
Comments: Clustering identifies groups of records with similar properties. Groups are similar according to the same proterties while seperate groups are dissimilar according to the same attributes. Uses threshold attribute radius. Each cluster has a centre defined by the average of its member attributes and a radus defined by average distance of the members from the centre. New instances attempt to connect to the cloest current cluster. If the new radius exceeds the threshold a new cluster will be formed rather than joining the other cluster.

See also [DMINE]


term[OBDB] Title: Object-Relational Databases
Comments: Objects consist of a pair that represents the entities state and behaviour. In contrast, the relational paradigm seperates and entity into tuples of simple attribute values and does not consider behaviour. When complex objects are stored in a relational data model they are scattered through many tables and have to be recovered every time they are needed, resulting in costly joins. Newer applications require complex objects, long transactions, new data types, versioning, fast execution of applications, ...
Object relational databases extend the traditional relational database model with the features required for object oriented storage. These include user defined methods, ADT, (Row, Collection, Object id, and refenrece types), Birany Large objects (BLOB) Character Large objects (CLOB), inheritance, overloading of methods, and triggers.

Relational versus object-oriented database systems:
Lecture 17 (Inheritance) slides 14-16. An important point that a OODBMS allows users to work with large complex objects for a long period of time, while ORDBMS only expect relatively short periods (only commited when stored) with moderately complex objects.


term[ADT] Title: Abstract Data Types
Comments: As part of an object relational databae the user is able to define new data types. The combination of an atomic data type and is associated methods forms an abstract data type. ADT's enforce encapsulation and are hence sometimes referred to as opaque types. Encapsulation allows the DBMS to invoke methods and accept the ecpected datatype returned, but is blind to the actual implementation.
When defined an ADT will be either atomic/base or composite/structured. Users declare the functions available on the ADT as well as their parameter and return types.

CREATE FUNCTION <funct_name>([<ftype> [, ...]]) RETURNS <return_type> AS <funct_definition> LANGUAGE <language_name>
CREATE ABSTRACT DATA TYPE <type_name> (INPUT = <input_function>, OUTPUT = <output_function>, [INTERNALLENGTH = <{internallength | VARIABLE}>]);

Typically required are import, export, ans size functions for any ADT.


term[SDTYPE] Title: Structured Data Types
Comments:

Structured types mean tables are no longer in first normal form [NF]. Set operators are supported ∪ ∩ ⊃ ⊇ = & ε/IN. List operators: head, tail, and append. Array types support indexing using postfix square brakets.


term[OIDS] Title: Object identifiers
Comments: In PostgreSQL each tuple in a table in assigned a unique id.

SELECT oid,* FROM Class_Flat WHERE CourseID = 'COMP442' AND Student = 'Craig';

When using a reference type (ref(base)) it needs to be associated with a structured type, and the scope of the reference must be a table known at compliation time (example on slide 6).
Dereferencing (deref(label).Attribute or label→.Attribute) allows a the object id's contained by reference types to be followed.
References vs. structured embedding:
References provide an advantage for updating (automatic), but may cause dangling pointers. Also benefits for storage once. However, structured types are usually clustered on disk rather than the scattered nature of pointers. This favours higher retrival speed. So referencing favours small storage space and fast updates, while strutured embedding favours better speed performance due to clustering.
The use of WITH SCOPE restricts all references from one column to just one relation. This ensures referenced and referencing tuples belong to precisely known relations.


term[INHER] Title: Inheritance
Comments: Used for reusing and refining types and creating hierarchies of similar but not identical objects.



term[IDB] Title: Internet Databases
Comments: Commerce and other online sites require larger numbers of concurrent users (scalability), unstructured and strucutured documents, ranked keyword searches.
Internet applications require: Data Management, Application Logic, and Presentation. Organisation can be:



term[XML] Title: Extensible Markup Language (XML)
Remote: http://www.w3.org/XML/
Comments: HTML style tag based markup to help give discription of content (adding semantics). Consists of rules and conventions.
Elements are enclosed in start and ending tags and may be (properly) nested within other elements. Elements help make documents self describing to humans.
Descriptive attributes can be added inside start tags. e.g. <ELM att_name="value">.
Entity references allow for special characters, i.e. &lt; is <. Comments are the same as HTML except only one dash(-) is allowed for opening and clossing (e.g. <!- comment text ->).
An XML document is valid if a DTD is associated with it and it is stuructured according to the rules in the DTD.

See Also [DTD] [XNAME] [XSTORE]


term[XSTORE] Title: XML Storage
Comments: By its nature most web media is semistructured - free text and hyperlinks with some strucutree imposed by HTML tags. Semistrucuted data can be represented by a labeled graph with nodes for compond objects and atomic values. Edges indicate the relationship between nodes. When dealing with an XML document it may be classed as either data or document centric.
Data centric - XML is used as a transport mechanism for machine consumption. It possesses a fairly regular strucutre with little or no mixed element character content and the order of elements is insignificant.
Document centric - Usually built for human consumption with a less regular strucutre with subordinated elements and character data interspersed in the content of an element. Order of elements and character data becomes significant for human comprehension. It must be possible to round-trip store and retrieve the same document.

Storage in a relational database is achieved by mapping the DTD or XML Schema to the database schema. Storage may occur in:

A fine grained mapping is outlined on slides 36-40


term[DTD] Title: Data Type Definition (DTD)
Remote: http://www.w3schools.com/dtd/default.asp
Comments: A DTD is a grammar defined by a set of rules for elements, attributes, and entities. It indicates what tags are allowed, the permittable orderings, and valid nestings. A XML document without a DTD will be indicate as standalone in the XML declaration [XML Guide].
An XML document is well formed is it does _not_ have an associated DTD, starts with an CML declaration, all elements are embedded in a single root element, and all elements are properly nested.
Example given in lecture 19 slide 23 or [W3SCHOOLS, examples]

See Also [XML]


term[XSCHEMA] Title: XML Schema
Remote: http://www.w3.org/XML/Schema
Comments: Aims to provide a superset of DTD [DTD] functionality. Supports: Data Type definition, uniqueness constraint and key constraints. Uses XML Schema definition language (XSDL). The XSDL provides mechanisms for deriving new data types by either extending or restricting existing ones, defining uniqueness constraints, defining key constraints,...


term[DOM] Title: Document Object Model
Remote: http://www.w3.org/DOM/
Comments: An interoperable set of classes to manipulate XML documents from programming lnaguages. DOM objects:



term[XSL] Title: Extensible Stylesheet Language (XSL)
Remote: http://www.w3.org/Style/XSL/
Comments: "XSL is a language for expressing stylesheets. It consists of three parts: XSL Transformations (XSLT): a language for transforming XML documents, the XML Path Language (XPath), an expression language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification). The third part is XSL Formatting Objects: an XML vocabulary for specifying formatting semantics. An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary"


term[XLINK] Title: XML Linking
Comments: XLink is a mechanism for creating hyperlinks in XML documents that link between resources.


term[XNAME] Title: XML Namespaces
Remote: http://www.jclark.com/xml/xmlns.htm
Comments: A collection of names identified by a Uniform Resource Locator (URL) to define context of elements. This allows two elements with the same name but different namespaces to have differing semantics.


term[XQUERY] Title: XML Query
Remote: http://www.w3.org/XML/Query
Comments: XQuery is the W3C standard query language for XML. For a well formed document:

FOR $t IN doc(path/class.xml)//STUDENTS/FIRSTNAME RETURN <RESULT>$t</RESULT>

// specifies nesting _anywhere_ within the preceeding element while / is immediately after.
For Let Where Result (FLWR) expressions. FOR binds a varialbe to each element specified by the path expression, while LET binds a varialbe to a the whole collection of elements. The Where clause qualifies the value. The RETURN clause constructs the XML result fragment.


^Top^

Honours | Calendar | References (Normal) (Formal) (Quick) (Links only) (BibTeX) (Light Toggle)