Comp442 - Issues in Databases and Information Systems References Light Version

Main Page | Forum
I should point out that the opinions expressed here are my own and that I accept no liability for mistakes and/or omissions: caveat lector. A Standard disclaimer applies.
Back

Honours | Calendar | References (Normal) (Formal) (Quick) (Links only) (BibTeX) (Light Toggle)

Total of 98 records found.
Current page is: file:///C:/My Documents/Education/University/public_html/comp442.html?HTML=true&
File is: comp442_lite.html

Sort by Category, type, title, short_title, author, pub_date, read_date, rank. View rank.

[LN] Title: Lecture Notes
Author(s): Pavle Mogin
Comments:

Issues in Database and Information Systems
Introduction to the Data Warehouse,
Readings: Chapter 23 §23.1 + [ODWOT]
Main Characteristics of a Data Warehouse,
Readings: Chapter 23 § 23.1 23.2 + [ODWOT]
On-Line Analytical Processing (OLAP)
Readings: Chapter 23 § 23.3 + [ODWOT]
OLAP Queries
Readings: Chapter 23 § 23.3 23.6 + [ODWOT]
Indexing the Data Warehouse,
Readings: Chapter 23 § 23.4 + [ODWOT] [AIT]
Materialized Views,
Readings: Chapter 23 § 23.4 + [ODWOT]
Query Generator,
Readings: [ODWOT]
IntLinks: [QG] [MG]
Populating a Data Warehouse,
Readings: Chapter 23 § 23.2.1 + [ODWOT]
OLAP and Data Warehouse Architectures,
Readings: [ODWOT] [MABDW]
IntLinks: [VDW] [DMBOX] [SPDM] [EDWA]
Data Mining,
Readings: Chapter 24
IntLinks: [DMINE]
An Introduction to Object-Relational Databases,
Readings: Chapter 25
IntLinks: [OBDB]
Abstract Data Types,
Readings: Chapter 25 § 25.2
IntLinks: [ADT]
Structured Data Types,
Readings: Chapter 25 § 25.3
IntLinks: [SDTYPE]
Oids and Reference Types,
Readings: Chapter 25 § 25.4
IntLinks: [OIDS]
Inheritance,
Readings: Chapter 25 § 25.4
IntLinks: [INHER]
An Introduction to Internet Databases,
Readings: Chapter 22 § 22.1 22.2
IntLinks: [IDB]
XML and Related Technologies,
Readings: Chapter 25 § 22.3
IntLinks: [XML]
XML Storage,
Readings: Ronald Bouret
IntLinks: [XSTORE]
An Introduction to XML Query Languages
IntLinks: [XQUERY]

See also [DMINE]

[ICEQ] Title: Iceberg queries
Comments: Refers to queries with many possible candidate results but the presence of something like a HAVING SUM(X) >=5 shows only the tip of all results. For very large data sets this can be a problem as the database will first sort and sum up all the data and then eliminate the groups that do not satisfy the condition in the HAVING clause. Againg using the priori rule we consider only those tuples whose individual components satisfy the HAVAING condition. Those tuples that do satisfy the requirements form candidate tuples that are tested for the ful having clause.

See also [DMINE]

[MRULE] Title: Mining Rules
Comments:

Association rules - [ARULE]
Sequential patterns - [SEQPAT]
Classification and regression rules - [CRRULE].

See also [DMINE]

[ARULE] Title: Association rules
Comments: LHS => RHS, both sides are sets of items and intrepreted as: "If every item in LHS is purchased in a transaction, then it is likely that the items in RHS are purchased as well."
The support is the percentage of transactions that contain all items from S. The support for LHS => RHS is equal to the support of LHS ∪ RHS (= support{LHS, RHS}). The confidence for a rule is the support(LHS ∪ RHS) / support(LHS) and provides a measure of strength of that rule.
Finding association rules requires two threshold values: minimum support and minimum confidence. The algorithm first finds all those itemsets that satisfy the minimal support, and then checks againest minimal confidence.

See also [DMINE]

[SEQPAT] Title: Sequential patterns
Comments: ||X-Y||=^k∑_i=1(x_i-y_i)² is used to determine the distance between two sequences of numbers, referred to as data sequences or a time series. Note that both sequences should be the same length.
For a simple similarity search all sequences in the database are retrieved that are fall within threshold value of some user given sequence.

See also [DMINE]

[CRRULE] Title: Classification and regression rules
Comments: Classification and regression rules are expressed as P₁(X₁)∧P₂(X₂)∧...P_k(X_k) => Y = c.
P_i(X_i) is a predicate [pred], X_i(i=1..k) are predictor attributes, and Y is the dependent attribute. The predictor attributes may be numerical and in a range or catagorical and in a set. A numberical dependent attribute creates a regression rule, and a categorical creates a classification rule. Support and confidence use the same format as association rules.
Once rules are found they may be represented as structured trees.

See also [DMINE]

[CLUST] Title: Clustering
Comments: Clustering identifies groups of records with similar properties. Groups are similar according to the same proterties while seperate groups are dissimilar according to the same attributes. Uses threshold attribute radius. Each cluster has a centre defined by the average of its member attributes and a radus defined by average distance of the members from the centre. New instances attempt to connect to the cloest current cluster. If the new radius exceeds the threshold a new cluster will be formed rather than joining the other cluster.

See also [DMINE]

[OBDB] Title: Object-Relational Databases
Comments: Objects consist of a pair that represents the entities state and behaviour. In contrast, the relational paradigm seperates and entity into tuples of simple attribute values and does not consider behaviour. When complex objects are stored in a relational data model they are scattered through many tables and have to be recovered every time they are needed, resulting in costly joins. Newer applications require complex objects, long transactions, new data types, versioning, fast execution of applications, ...
Object relational databases extend the traditional relational database model with the features required for object oriented storage. These include user defined methods, ADT, (Row, Collection, Object id, and refenrece types), Birany Large objects (BLOB) Character Large objects (CLOB), inheritance, overloading of methods, and triggers.

Relational versus object-oriented database systems:
Lecture 17 (Inheritance) slides 14-16. An important point that a OODBMS allows users to work with large complex objects for a long period of time, while ORDBMS only expect relatively short periods (only commited when stored) with moderately complex objects.

[ADT] Title: Abstract Data Types
Comments: As part of an object relational databae the user is able to define new data types. The combination of an atomic data type and is associated methods forms an abstract data type. ADT's enforce encapsulation and are hence sometimes referred to as opaque types. Encapsulation allows the DBMS to invoke methods and accept the ecpected datatype returned, but is blind to the actual implementation.
When defined an ADT will be either atomic/base or composite/structured. Users declare the functions available on the ADT as well as their parameter and return types.

CREATE FUNCTION <funct_name>([<ftype> [, ...]]) RETURNS <return_type> AS <funct_definition> LANGUAGE <language_name>

CREATE ABSTRACT DATA TYPE <type_name> (INPUT = <input_function>, OUTPUT = <output_function>, [INTERNALLENGTH = <{internallength | VARIABLE}>]);

Typically required are import, export, ans size functions for any ADT.

[SDTYPE] Title: Structured Data Types
Comments:

Row - Structured Type. A dot notation allows attributes to be addressed through path expressions. e.g. Student.Exams.Grade will return a multiset of grades.
List - Collection Type. A sequence
Set - Collection Type. An array
Bag - Collection Type. setof(base) setb with no duplicates and bagof(base) multiset with duplicates allowed.
Array - Collection Type

Structured types mean tables are no longer in first normal form [NF]. Set operators are supported ∪ ∩ ⊃ ⊇ = & ε/IN. List operators: head, tail, and append. Array types support indexing using postfix square brakets.

[OIDS] Title: Object identifiers
Comments: In PostgreSQL each tuple in a table in assigned a unique id.

SELECT oid,* FROM Class_Flat WHERE CourseID = 'COMP442' AND Student = 'Craig';

When using a reference type (ref(base)) it needs to be associated with a structured type, and the scope of the reference must be a table known at compliation time (example on slide 6).
Dereferencing (deref(label).Attribute or label→.Attribute) allows a the object id's contained by reference types to be followed.
References vs. structured embedding:
References provide an advantage for updating (automatic), but may cause dangling pointers. Also benefits for storage once. However, structured types are usually clustered on disk rather than the scattered nature of pointers. This favours higher retrival speed. So referencing favours small storage space and fast updates, while strutured embedding favours better speed performance due to clustering.
The use of WITH SCOPE restricts all references from one column to just one relation. This ensures referenced and referencing tuples belong to precisely known relations.

[INHER] Title: Inheritance
Comments: Used for reusing and refining types and creating hierarchies of similar but not identical objects.

Type inheritance -
```
CREATE TYPE <subtype_name> UNDER <supertype_name>(<sub_att1> <type> [, <sub_att2> <type>, ...]);
```
The new subtype will inherit all attributes and methods from the supertype. The new subtyp must have at least one specific attribute. It may also have unique methods, either new or overloaded.
Ojects of the subtype may be substitued in expressions for objects of their supertype.
Collection Hierarchy - The use of the UNDER clause can generate an arbitary tree of tables. Queries over any table in this tree may also be run over all its descendents. If this is not the desired affect the query can be limited to a single table using the ONLY keyword immediatly after FROM.
Updating or deleting tuples can apply to child tables (not parent) unless the ONLY keyword is used.

[IDB] Title: Internet Databases
Comments: Commerce and other online sites require larger numbers of concurrent users (scalability), unstructured and strucutured documents, ranked keyword searches.
Internet applications require: Data Management, Application Logic, and Presentation. Organisation can be:

Single Tier - All major work is done at server, giving easy centralised control and dumb terminals (data input and output with GUI done at server).
Client-Server (Two Tier Thin) - Same as single tier accept GUI is now at the client.
Client-Server (Two Tier Thick) - Same as thin two tier client accept both application code and GUI are now at the client. As only SQL statements and data migrate the network traffic is generally lower. Requires high degree of trust in clients and removes central control ability.
Three Tier - Application logic sits independently on the network between the database server and the GUI clinet.

[XML] Title: Extensible Markup Language (XML)
Remote: http://www.w3.org/XML/
Comments: HTML style tag based markup to help give discription of content (adding semantics). Consists of rules and conventions.
Elements are enclosed in start and ending tags and may be (properly) nested within other elements. Elements help make documents self describing to humans.
Descriptive attributes can be added inside start tags. e.g. <ELM att_name="value">.
Entity references allow for special characters, i.e. < is <. Comments are the same as HTML except only one dash(-) is allowed for opening and clossing (e.g. <!- comment text ->).
An XML document is valid if a DTD is associated with it and it is stuructured according to the rules in the DTD.