A Recursive Algebra and Query Optimization
L&ha S. Colby
for Nested Relations
Department of
Cornpzller Science
Indiana Universily,
Bloomington,
IN, 47405
Abstract
The
nested relational model
provides
a better way to
represent complexobjects than the (flat) relationalmod-
el, by
allowing
relations to have relation-valued attri-
butes. A recursive algebra for nested relations that al-
lows tuples at all levels of nesting in a nested relation
to be accessed and modified without any
special nav-
igational operators and without having to flatten the
nested relation has been developed.
In this
algebra,
the operators of the nested relational algebra are ex-
tended
with
recursive definitions so that
they can be
applied not only to relations but also to subrelations
of a relation. Tn this paper, we show that
queries are
more efficient and succinct when expressed in the recur-
sive algebra than in languages that require restructur-
ing in order to access subrelations of relations. We also
show that
most of the query
optimization techniques
that have been developed for the relational algebra can
be
easily
extended for the recursive algebra and that
queries
are
more easily optimizable
when expressed in
the recursive
algebra than
when they
are
expressed in
languages like
the non-recursive algebra.
1, Introduction
The traditional relational model introduced by Codd [4]
in 1970 requires that relations be in first normal form
(lNF), i.e., all values in a relation must be atomic or
non-decomposable.
Although this model is sufficient
for representing objects that have simple domains, the
first normal form assumption makes it difficult to model
complex objects. Database applications in the areas of
office automation, textual data and engineering designs,
involve complex objects and the relational model is un-
suitable for such applications. Normalization in the re-
Permission to copy without fee all or part of this material is granted provided that
the copies are not made or distributed for direct commercial advantage, the ACM
copyright notice and the title of the publication and its date appear, and notice is
given that copying is by permission of tbe Association for Computing Machinery.
To copy otherwise, or to republish, requires a fee and/or specific permission.
0 1989 ACM O-89791-317-5/89/0005/0273 $1.50
lational model causes a lot of fragmentation in the rep-
resentation of objects. Information about objects and
their relationships is distributed over several different
flat tables. This in turn causes queries to be slow and
complicated since excessive joins have to be performed
among the various relations in the database.
The nested relational model, also sometimes called
NF’ (non-first normal form), is a generalization of the
traditional relational model without the first normal
form assumption. Attributes of a nested relation can
have non-atomic values. Their values can be relations
which can hence be viewed as subrelations of the nested
relation. The nested relational model allows users to
view the database in a way that is closer to their con-
cept of the real world since complex objects can be
represented as a whole in a single nested relation in-
stead of being distributed over several different flat re-
lations. This model was first proposed by Makinouchi
[13] who suggested that the 1NF assumption be re-
laxed since it was too restrictive. Jaeschke and Schek
[ll] proposed a generalization of the relational model
by allowing relations to have non-atomic or set-valued
attributes. Thomas and Fischer [25] generalized the
model of Jaeschke and Schek by allowing relations to
have relation-valued attributes, and since then a num-
ber ofresearchers [1,10,14,16,17,18,19,22,26,27] have ex-
tended the relational database theory to nested rela-
tions. Several groups [2,6,7,9,15,24] are attempting to
implement the nested relational model, either directly
or on top of an existing DBMS.
The tables in Figure 1 are an example of a database
for a stock-brokerage firm represented in the (flat) re-
lational model. Figure 2 shows the same information
in a nested relational schema. In Figure 2, the relation
CLIENTS has the relation-valued attribute INVEST-
MENTS, which in turn has SHARES aa a relation-valued
attribute. The relation CLIENTS has only two levels of
nesting. In general, relations can be nested to any arbi-
trary but finite depth, i.e.,
relations can have relation-
valued attributes, which can have relation-valued at-
tributes and so on.
273
CLIENTS
1 NAME ( COMPANY 1 ;;;;HAsE ( DATE 1 ~0. 1
1 John Smith 1 XEROX ) 64.50 1 02/10/83 1 100 I
1 John Smith 1 XEROX 1 92.50
1 08110/87 1 500 I
Jill Brady
--[~-&iiN 1 64.50 1 01/30/82 1 100 1
Jill Brady
I
EXXON 1 59.50 1 02/10/83 1 200 1
Jill Brady 1 FOR0 1 35.50 1 02/10/83 1 200 1
I JillBrody I SEARS I 35.75 I 12125iE7 1 100 )
STOCK DATA
I I I
I
COMPANY
t-
XEROX
IBM
97.50 1.25
EXXON
90.00
0.82
CLIENT INFO.
EXCHANGE DATA
COMPANY
EXCHANGE5
1
XEROX
) NEWYORK 1
I
EXXON
NEW YORK
Figure 1. An example of a flat relational database.
CLIENTS
NAME
ADDRESS
John Smith 311 East 2nd. St.
Bloomington. IN
47401
!
Jill Brady
41 North Main St.
Db&l, OH
INVESTMENTS
STOCK DATA
Figure 2. An example of a nested relational database.
274
Algebra and calculus based query languages have been
proposed for the nested relational model by Roth, Korth
and Silberschatz 1201, Thomas and Fischer [25], Schek
and Scholl [22],
and several others [8,10,12,14,16,18].
Some of them [10,14,20,25] are merely simple extensions
of the query languages that were developed for the rela-
tional model. The nested relational algebra of Thomas
and Fischer [25] is one such example where operators
like selection and projection have similar definitions for
both the relational and the nested relational model, i.e.,
they can only operate at the outermost level of a rela-
tion even if it is a nested relation with relations embed-
ded at several different levels. If a query on a nested
relation involves attributes that are nested deep inside
the structure of the relation, the relation has to be jIa&
tened
until the attributes of interest are at the outermost
level. After the necessary operations are performed on
the flattened relation, the result may have to be trans-
formed back into the structure of the original nested
relation. Restructuring a nested relation to a relation
that has one more or one less level of nesting is done
in the nested relational algebra using
nest
and
unnest
operators. So, although the nested relational model pro-
vides a better way of modeling complex objects, query
languages like the nested relational algebra of Thomas
and Fischer cause the structures representing these ob-
jects to be restructured while being queried upon; thus
defeating the main objective of letting the user operate
in an environment that models the real world as closely
as possible.
Deshpande and Larson [8] have proposed an alge-
bra for nested relations in which queries can be formu-
lated in a way such that relations do not have to be flat-
tened in order to access and manipulate data at interior
levels of relations. This is done by means of an extended
selection and a subrelation constructor. The condition
for the selection can contain any algebraic expression
over the relation-valued attributes of the relation. The
subrelation constructor allows relations to be accessed
and modified at all levels by creating new subrelations
while traversing the different levels of a relation. Al-
though this construct provides navigational ability, it is
not very convenient for users to have to introduce new
relations and names for these new relations within the
query.
Schek and Scholl [22] introduced an algebra for
nested relations that enables subrelations at all levels of
a relation to be accessed without nesting and unnesting.
However, their algebra has several limitations. Firstly,
only one operator, viz.
?r,
is used
as a navigator.
For in-
stance, if the user wants to unnest a subrelation nested
deep inside a relation, he would have to write the query
in terms of a series of projections and an unnest. Sec-
ondly, renaming is necessary in a number of situations,
which is an inconvenience for the user. Finally, the al-
gebra is rather complicated.
Roth, Korth and Batory [18] have proposed an ex-
tension to SQL for nested relations which allows nested
application of queries so that relations do not have to be
restructured. The VERSO algebra, proposed by Abite-
boul and Bidoit [l] is probably closest to the algebra
presented in this paper although not all of the oper-
ators in their algebra can access subrelations without
restructuring. Also, their algebra is defined only on a
subclass of nested relations called Verso relations, which
are nested relations in Partitioned Normal Form [19].
In this paper, we present an algebra for nested rela-
tions in which the operators of the classical nested rela-
tional algebra [25]
are extended so that they can be ap-
plied not only at the outermost level of a nested relation,
but also to subrelations at all levels in a nested relation.
This algebra, which is equivalent in expressive power
to the nested relational algebra, enables queries to be
expressed more naturally and succinctly with fewer re-
structuring operations. We then consider query opti-
mization principles for this recursive algebra and show
that most of them are a natural extension of the op-
timization principles that have been developed for the
relational algebra. Although query optimization for the
relational model is an area in which extensive research
has been done, the same is not true for the nested re-
lational model. Some results have been shown in the
context of efficiently processing relational select-project-
join queries by equivalent queries in the NF’ model by
Scholl [23] and in the Verso model by Bidoit [3]. In the
relational algebra, query optimization is based primar-
ily on the commutativity and associativity properties of
operators. Although most of these properties also hold
in the case of the operators of the nested relational al-
gebra, queries are often difficult to optimize since most
expressions in this algebra involve the nest and unnest
operators which do not commute with each other and
with several other operators [25]. In the recursive alge-
bra, nest and unnest are used only when restructuring
is really necessary,
i.e.,
they are not needed for access
ing the interior levels in a nested relation and so most
queries can be formulated by expressions that do not
involve these two operators, allowing queries to be op-
timized more efficiently than in other query languages.
2. The Nested Relational Model
We first give a few definitions for the nested relational
model and then introduce the recursive algebra. In this
paper, we give definitions for only selection, projection
and join. A full set of definitions for all the operators
can be found in [5]. We will use the (non-recursive) al-
gebra of Thomas and Fischer [25] as a basis for compar-
ing the advantages of the recursive algebra over query
languages whose operators can be applied only at the
275
outermost level of a nested relation. The definitions for
selection, projection and join in the non-recursive alge-
bra are very similar to the corresponding definitions in
the relational algebra (except for minor extensions like
allowing set comparisons in the condition for selection
etc.) and are hence omitted here.
Let A be the universal set of attribute names and
relation scheme names. A relation scheme of a nested
relation is of the form R(S) where R E A is the relation
scheme name and S is a list of the form (Al,
AZ, . . . . A,,,)
where each
Ai
is either an atomic attribute or a relation
scheme of a subrelation. If
Ai
is a relation scheme of
the form a($), then
Rip
the name of the scheme, is
called a relation-valued attribute of
R.
For each atomic attribute
Ai E
A let Di be the
corresponding domain of values. An instance r of a
relation scheme
R(S),
where S =
(Al, AZ, . . . . Am),
is a
set of ordered m-tuples of the form (al, as, . . . . a,) such
that (i) if
Ai
is an atomic attribute, then oi E
Di.
(ii)
if
Ai
is a relation scheme, then si is an instance of
Ai.
An instance of a relation scheme is also referred to as
a (nested) relation. If
R(S)
is a relation scheme, then
Attr(R)
is the set of all (atomic and relation-valued)
attribute names in
S. RAttr(R)
is the set of all relation-
valued attributes in S and
FAttr(R)
is the set of all
atomic attributes in S. Henceforth, when we refer to a
relation scheme, we will refer to it by its name alone,
e.g.,
R
instead of
R(S).
Let r be an instance of
R
and let t E r (a tuple in
relation
r).
If
A E Attr(R)
then
t[A]
is the value oft in
the column corresponding to
A.
If
B C Attr(R)
then
t[B] = t[Al]t[Aa]...t[A,]
where
Ai E B
(1 5 i 5 n).
In the example nested relation shown in Figure
3,
R(A, B(C, D(E)), F(G, H))
is the scheme of the re-
lation,
Attr(R) = {A, B,F), FAttr(R)
=
{A},
and
RAttr(R) = {B, F}.
2.1 The Recursive Algebra
The main motivation for the recursive algebra is to pro-
vide a query language which can extract and manipulate
data at all levels of a nested relation without having to
restructure the relation in order to access data that is
nested at various levels in the relation. The operators
in this algebra are all recursively defined so that each
operator can be applied to subrelations at all levels thus
eliminating the need for a special operator to serve as a
“navigator”.
Selection u
The selection operator introduced here allows selection
conditions to be specified not only on the entire rela-
tion, but also on subrelations at all levels of nesting in
a relation.
8
F
A
C
D
E
G H
Figure 3. The relation 21.
A selection condition is a boolean expression in-
volving comparisons between attributes and between
attributes and constants. Comparison operators also
include set comparison and set membership operators,
C, C, =, _>,I, E, and Q in addition to the usual compari-
son operators, <, 5, =, f, 2, and >. Let
R
be
a
relation
scheme. Then,
L
is a ‘select list’ of R if (i)
L
is empty
(ii)
L
is of the form
(Rlel LI, Rac,La, . . . . Rnc,,L,,),
(1 <
n 51 RAttr(R) I)
h
w ere each R; is a relation-valued at-
tribute of
R,
ci is a condition on &,
Li
is a select list
ofRi,and&#Rj ifi#jVRjcjLjEL.
Let r be a relation with relation scheme
R.
Let
L
be
a select list of
R
and let c be a condition on
R.
Then
u(rcL)
is defined as follows.
(i)
a(r,) = {t E r 1 c(t) = true}
(ii) ~(*c(RlelLl, Rs&, .-., L,L))
= (t 1 (3, E r)A
(t[Attr(R) - {RI, R2, . . . . Rn)]
= t,[Attr(R) - (RI,
R2,
. . . . %)I)
A (c(t,) = true)
A WI = u ((Whl),lL1) # 4)
Figure 4(a) shows an example of a recursive selection,
u(q(Bczc,)),
on the relation 21, shown in Figure 3.
276
u(a(Bc=c,))
46% W))xl)
A
al
i
a2
a3
B
D
E
er
R
!
c
6
el
q
e3
I(
et
Figure 4. Examples of recursive selection and
projection on 21.
Proiection A
The recursive definition of projection given here allows
projections to be performed on attributes at all levels
without restructuring.
Let
R
be a relation scheme. Then,
L
is a ‘project list’ of
R
if
(i) L
is empty (ii)
L
is of the form
(RlLl, . . . . &L,)
where each fi is an attribute of
R, Li
is a project list
of &
(Li
is empty if & is an atomic-attribute), and
&#Rj
ifi#jVRjLjEL.
Let
r
be a relation with relation scheme
R
and let
L
be a project list of
R.
Then
+Lr)
is defined as follows.
(i)
r(r) = r
(ii) r((RlLl, . . . . %L,)r)
= {t 1 (3, E r) A (t[Rl] = f(G, &&))A
A (d&l = f(tr , %Ln)))
where
f(tr, &Li) = tc[RJ
if
& E FAttr(R)
= n(Li(t,[&]))
if
R.+ E RAttr(R)
Figure 4(b) shows an example of a recursive projection,
+A, B(D)), on 21.
Joins, like all the other operators are defined recursively
so that a relation can be joined to another relation or
any subrelation of the relation.
(i) L
is empty. (ii)
L
is of the form (&
Li)
where fi is
a relation-valued attribute of
R
and
Li
is
a
join-path of
Ri-
Let
r
and
q
be two relations with relation schemes
R
and Q respectively and let
L
be
a
join-path of
R.
Then
w
(rL, q)
is defined as follows.
(i) MT, 4 =
~tI(~W3tp~~)
A (Wl, . . . . &I = tr[Rl, . . . . &I = t,[Rl, . . . . %I)
A
(t[Attr(R) - {RI, . . . . R,,}] =
t,[Attr(R) - (RI, . . . . &}I)
A
(t[Attr(Q)
- {RI, . . . . Rn)] =
t,Wr(Q) - {RI, .-, %Il)~
where {RI, . . . . R,,} is the set of common attributes
of
R
and Q
(ii) w(r(&Li), 4) =
{t I (3h E r)
A
(t[Attr(R) - {&}I = t,[Attr(R) - {Ri}])
A (t[Ri] = w (tr[RiILi, Q) # 4))
Figure 5 shows an example of a join between a subrela-
tion of a relation and a relation.
x-2
x3
5
W
V
A
“_
Wl ;-n
tl at
Vl
/ _jl(1/
/ v1 /
W(Q(S), 23)
/ a3 / ml
r I
5
I1
1
L
I I
1
I
Let
R
be a relation scheme. L is a join-path of
R
if
Figure 5. An example of a recursive join.
277
The definitions for Cartesian product, nest, unnest,
union, difference, and intersection, which are also recur-
sive, are described in [5], but omitted here for brevity.
Even though the recursive definitions for the operators
allow each operator to apply itself recursively to tuples
at all levels without restructuring, the restructuring op-
erators nest and unnest are, however, necessary since
users may wish to view the data in a different format.
2.2 A Comparison of the Recursive and
Non-Recursive Algebras
As mentioned earlier all the operators of the non-recursive
algebra, i.e., the algebra of Thomas and Fischer, except
nest and unnest, have definitions that are very similar
to the definitions in the relational algebra. Nest and
unnest are restructuring operators which add another
level of nesting to a relation and flatten a relation by
one level respectively. Since all the non-recursive opera-
tors can access and modify only tuples at the outermost
level of a relation, the unnest operator is used to flatten
the relation so that these operators can be applied to
the flattened relation. The nest operator is then used to
bring the relation back into its original form. We give
the definitions for nest and unnest, as defined in the
non-recursive algebra, before considering some example
queries to see how they are expressed in the recursive
and the non-recursive algebras.
The nest operator, also sometimes called ‘pack’ [14],
groups together tuples which agree on all the attributes
that are not in a given set of attributes, say B. It forms
a single tuple, for every such group, which has a new
attribute, say
A,
in place of
B,
whose value is the set
of all the
B
values of the tuples being grouped together.
VB+A(f)
=
{t
I(% E r)A
(t[Attr(R) - B] = u[Attr(R) -
B])A
WI = WI I (8 E $A
(s[Attr(R) - B] = u[Attr(R) - B])})}
where
B s Attr(R)
and
A
is a new attribute name
p Unnest
The unnest or ‘unpack’ operator does the inverse of the
nest operator by ‘ungrouping’ or flattening out the
B
value of the tuples.
PB(r) = {t
l(3u E r)A
(u[Attr(R) - {B}] = t[Attr(R) - {B}])A
(t[Attr(B)]
E
u[B])}
whereB
E RAttr(R)
Example 1
Consider the example database shown in Figure 2. Let
us suppose that we want the set of all the shares pur-
chased on 02/10/83 and we want them listed by their
owner’s name and the company of the share. We would
formulate this query in the recursive algebra as follows:
?r((NAME,INVESTMENTS)
(u(CLIENTS(INVEST~~ENTS
(SHARES~ATE=~02,10,s3')))))
Figure 6 shows the result of this query. In the non-
recursive algebra, the same query would have to be ex-
pressed using nest and unnest as shown below.
WOMPANY,SHARES+INVESTMENTS
(WJRCHASE-PRICE,DATE,NO.-SHARES
(~NAME,coMPANY,P~RcHASE-P~cE,DATE,N~.
(~DATE='02/10/W(/&HARES
(~~NvEsTMENTs(CLIENTS ))))))
The two unnests transform the relation CLIENTS into
a flat relation. Then, after the selection and projection
have been performed on the flat relation, the nest op-
erations, restructure the flat relation back into a nested
one.
NAME INVESTMENTS
I
SHARES
COMPANY
PURCHASE
PRICE
DATE NO.
John Smith
Jill Brody
Figure 6. The result of the query in Example 1.
Examole 2
Let us suppose that we want the investments that clients
have in stock that is traded in London, listed by the
client’s name and address. This query, whose result is
shown in Figure 7, can be expressed in the non-recursive
algebra, using the recursive join which allows joins to be
performed between a subrelation and a relation as fol-
lows.
w(CLIENTS(INVESTMENTS) ,?r((COMPANY)
(fl(sTocK-DATALONDONEEXCHANGES-TRADED))))
278
NAME
ADDRESS
INVESTMENTS
SHARES
COMPANY
PURCHASE
PRICE
DATE
NO.
John Smoth 311 East 2nd. St.
Bloomington. IN
4740’ *,
Jill body
~3:oMH”i”st~ mj
Figure 7. The result of the query in Example 2.
The same query would be expressed in the non-recursive
algebra as follows.
Although the expressions in the non-recursive algebra,
for the above examples, work in this particular case, in
general relations have to be indexed first before unnest-
ing and nesting. These expressions may not give us the
correct result if the relations in Figure 2 were not in
Partitioned Normal Form (a relation is in Partitioned
Normal Form (PNF) if (i) all or a subset of the atomic
attributes form a key for the relation and (ii) every sub-
relation of the relation is in PNF). In [26] Van Gucht
has shown that if r is a relation with relation scheme
R
and A is a relation-valued attribute of R, then
VAttr(A)+A (PA (r>> = 9.
if and only if
r
satisfies the functional dependency
Attr(R) - {A} + A
In [28], Van Gucht and Fischer define an Met oper-
ator, which is basically a way of tagging the tuples of
a relation to preserve the information about how tu-
ples were grouped together before unnesting. A slightly
modified definition of the
Index
operator is given below.
Definition
Index(r) = {t 1 (3, E r) A (t[Attr(R)] = t,[Attr(R)]) A
WI = {We
This operator can be expressed in terms of nest, selec-
tion and Cartesian product. In&z(r) = VA’-rlCrA’=A(rx
r), where
A
is the set of attributes of
R
and
A’
is the set
of new attributes in r x
r.
Tagging the tuples of a re-
lation is required to save information about how tuples
were grouped together before an unnest was performed.
The correct expression for the query in Example 1 is
then
*NAME,INVESTMENT&‘COMPANY,SHARES4NVESTMENTS
(~NAME,C~MPANY,SHARES,I~
(VPURCHASE-PRIcE,DATE,NO.-SHARES
(1TN
AME,COMPANY,PUFlCHASE-PFUCE,DATE,NO.,I&
(uDATE=103/10/83’(~SHARES(~ndex
@INVESTMENTS (IndeX( CLIENTS ))))))))))
where, 11 and Is are the index columns added by the
two
Indez
operations.
In some cases, even indexing a relation before unnest-
ing will not give back the original relation after nesting.
Unnesting on a relation-valued attribute that contains
empty sets as values causes those tuples to be lost. In
such cases expressions for queries that involve unnesting
become even more complicated since special considera-
tion must be given to such tuples. Null values can be
used to overcome this problem and the interested reader
is refered to [21] for a discussion of null values in the
context of the NF’ model.
Thus, queries in the non-recursive algebra are long
and complicated and cause data to be restructured while
performing operations like selection and projection. This
restructuring is necessary because the algebraic opera-
tors operate only at the outermost level of a relation.
Hence, although the first normal form assumption has
been relaxed to include relation-valued attributes, the
algebraic operators essentially treat all the components
of a tuple as non-decomposable or atomic units. The re-
cursive algebra, on the other hand, has operators that
can apply themselves recursively to the subrelations of
nested relations. Queries are processed much more ef-
ficiently when expressed in the recursive algebra since
restructuring is not necessary for accessing the subrela-
tions of a relation. They are also much more succinct
than the corresponding queries in the non-recursive al-
gebra and allow users to remain within the framework
of the model for complex objects while querying the
database, by obviating the need for restructuring these
objects during the querying process.
Theorem
The recursive and the non-recursive algebras are each
expressible in terms of each other and are hence equiv-
alent to each other in terms of their expressive powers.
From the definitions of the recursive operators, it is
obvious that the non-recursive operators can be easily
expressed in terms of the recursive operators. Each re-
cursive operator can be expressed in the non-recursive
algebra by applying a sequence of unnests until the at-
tributes of interest are at the outermost level, applying
279
the operator, and then applying a sequence of nests.
As mentioned earlier, if a relation is not in PNF, it
must be indexed before every unnest. Also, unnesting
causes tuples that contain empty sets as values (for the
relation-valued attribute on which the unnest is per-
formed) to be lost.
These tuples must be accounted
for while expressing a recursive operator in the non-
recursive algebra (except in the case of selection). In
the case of difference tuples can be lost not only as a re-
sult of performing an unnest on an attribute containing
empty sets as values, but also when taking the difference
after unnesting. The proof for the equivalence theorem
can be found in [5].
3. Query Optimization
Optimizing an expression denoting a query involves de-
termining an equivalent expression for that query which
can be processed in lesser time than the original ex-
pression. Finding common subexpressions, performing
selections and projections before joins, combining sev-
eral selections into one selection, etc., are some of the
strategies for finding equivalent but more efficient ex-
pressions for queries in the relational algebra. Laws
about the commutativity and the associativity of oper-
ators are used to rearrange the operators in order to find
equivalent expressions. Most of these laws which apply
to the operators of the relational algebra also apply to
the recursive operators for nested relations. The only
exceptions are the commutativity laws for joins and se-
lections when these operations involve subrelations. We
list some of these properties for the recursive algebra.
We first define
merge, common-merge,
and
join-lists
Definitions:
(i) Let
r
be a relation with relation scheme R and let
~5 = (Rate., L, , . . . . R,,,., L,) and L2 =
(Ram1 Lb,, ---, Rbmet.,
Lb,,,) be select lists of
R.
result of
merge( L1, Lz)
would be very similar.
(ii) If
L1 = (Rbll L,, , . . . .
&,J%,) and La = (RbtLbl, . . . .
&Lb,,,) are project lists of
R,
then
common - merge( L1, La)
=
L1
if
La
is empty
=
La
if
L1
is empty
=
L
otherwise, where
L
is such that
R,,; common
- merge(&, Lb,) E L if
Rbj = R,i for some
Rai Lai E L1, RbiLbj E Ls
nothing else is in
L
(iii) If
L1
is a join path of
R
and
La
is a project-list or
join-path of
R,
then
join
-
lists(Ll , La)
=
LI
if
La
is empty
=
L3
if
L1
is empty
= (& join -
lists( Li, La))
if
L1
is of the form
(Ri Li)
Alaebraic Equivalences:
Let cl and cz be two conditions on
R
and let
L1
and
La
be two select lists on
R.
Then,
1. 4(4-ct)M = 4+,~~~))
2. u((u(rc,
Ll)),,L2) =
where cp = ck A I$!, ch contains only attributes that
are not in
L1
and
L2
can be split up into two lists
L’,
and
L’,’
such that
L’,
contains only attributes
that are not in Lr and not in c$.
merge(Ll, L2)
=
LI
if
La
is empty
=
La
if
L1
is empty
=
L
otherwise, where
L
is such that
3. If
L1
and
La
are project lists on
R,
then
x(Ll(xL2(r))) = ?r(common - merge(Ll,
L2)f)
Raie,iL,i E L if \JRbjcbjLbj E L2, Rbj # R,i
Rbiebi Lbi E L if VR,jc,jL.j E
L1, R,j # Rbi
Rai(caihcbj)
merge(L,, , Lbj) E L
if
Rbj = R,,
for SOme Raitai Lai E LI, Rbjebj Lbj E L2
nothing else is in
L
4. If
L1
is a project list on
R, La
a select list on
?r(Llr)
and c a condition on
R,
then
u((n(Llr))J2)
=
n(Ll(u(rJ2)))
provided that
?r(Llr)
does not change the struc-
ture of the attributes in c and
La.
If
L1
and
L2
were project lists instead of select-lists, the
5. If
L1
is a project list on
R, La a
select list on
R
280
6.
7.
8.
9.
and c a condition on
R,
then
7(Ll(u(rJ2)))
=
u((r(Llr))&2)
if c and
L2
do not contain any attributes that are
not in
L1
and provided that
s(Llr)
does not change
the structure of the attributes in c and
L2.
If
L2
and c are the same as above but
L1
is a join
path of
R,
then
u((W (rL1, q))J2) = w (u(rcL2)L1, q)
if c and
L2
contain no attributes in common with
L1
and
q-
Let
r, q,
and p be relations with schemes
R, Q,
and
P
respectively.
(4 w (r, d = w (q, 4
(b) w ((w (T qhd = w 0.9 w (q, p>>
(d) If
L1
is a join path of
R,
then
w (w (rL1,
dL2,
p) =
where
L
is a non-empty
w
(rL1,
w
(qL’,,p))
if
L2
is a join-path to an
attribute of
q
in w
(rL1, q)
where
Lk
is the
join-path in
q
to that attribute and that at-
tribute is not involved in the join between
r
and
q.
w (w
(rL2, p)Ll, q)
if
Lp
is a join-path to an
attribute in r and that attribute is not involved
in the join between
r
and
q.
If
La
is a join path of
R
and
L1
is a project list of
w (r&t, q), then
r(-h(w
(rL2,
q))) = r(Ll(w
(+W2,4Gd)))
if
L1
can be split up into
Li
and
LL
such that they
contain attributes of
r
and
q
in
L1
respectively,
and they each also contain all common attributes
involved in the join.
If
L1
is a project list of
R
and
L2
is a join path of
~(Llrl),
then
w (7@179La, q) =
Ir(merge( L1, join -
lists(L2, Attr(q)))(w
(~~52,
q)))
if
L1
contains all the common attributes between
r
and
q
and if
?r(Llr)
does not change the structure
of these common attributes.
As shown above, most of the commutative and as-
sociative laws for the recursive operators are a natural
extension of the laws for the operators of the relational
algebra. Even though these laws also apply to the oper-
ators of the non-recursive algebra for nested relations,
since the operators nest and unnest do not commute
with most operators, queries that contain these opera-
tors are not easily optimized. Consider the two expres-
sions in the non-recursive algebra given below, where
rl,
rp,
and
r3
are the relations shown in Figure 8.
(i) VA~A’((~A’(VD~C(~A’,B,D,F(~C(fl Cd ‘-a))))) w r3)
6) vD,C(~A,,B,~,F(~C((vA-Al(~Al(rl) w ‘3)) w “a)))
rl 12
tu
d2 el
t
m
d2 e2
m
d3 el
C
F
Q E
fl
m
d2 e2 f2
.m
d2 e2 f3
m
d3 el f4
Figure 8.
The relation
rq
in Figure 8 shows the result of these
equivalent expressions. If
r3
is likely to have far lesser
matches with rl as compared to
rg,
then the second ex-
pression will be more efficient than the first. But since
join does not commute with nest, it is not possible to
derive the second expression from the first by apply-
ing commutative and associative laws on the operators.
281
On the other hand, in the recursive algebra, this prob-
lem does not often arise since nest and unnest are not
used very often in queries, as they are not needed for
accessing subrelations but are used only when the data
is restructured. Now, let us consider the equivalent ex-
pressions in the recursive algebra for the ones in the
above example.
(i) w (x((A’, B, C(D), F)(w (rl, W(O 4
(ii) n((A’, B, C(D), F)(w (w (rl(A’), r3), 4))
The second expression can be easily derived from the
first by applying laws, 9 and 7d.
So, since queries in the non-recursive algebra are
typically long and complicated with a number of nests
and unnests, they are not easily optimized. The same
queries can be optimized more efficiently in the recur-
sive algebra, since nest and unnest are used only when
restructuring is really necessary.
4.
Conclusions
The nested relational model is more suitable for repre-
senting complex objects than the traditional relational
model. However, most query languages that have been
proposed for this model require either restructuring op-
erators or special navigational operators for accessing
tuples that are nested at different levels in a nested re-
lation. In this paper, we have described a recursive al-
gebra for nested relations which allows queries to be ex-
pressed succinctly without any restructuring or special
navigational operators. The nested relational model,
is a recursive extension of the relational model; it is
hence only natural that query languages for this model
be extended similarly. We have also developed query
optimization principles for this algebra and shown that
most of them can be viewed as extensions of the opti-
mization principles for the relational algebra. Finally,
we have pointed out the problems in optimizing queries
written in query languages which require restructuring
operations in order to access subrelations. In conclu-
sion, we claim that the recursive algebra is better suited
for the nested relational model than other query lan-
guages that have been proposed for this model.
Acknowledgements
I would like to thank Marc Gyssens, Dirk Van Gucht
and the refrees for their helpful suggestions and com-
ments.
References
1. Abiteboul, S., and Bidoit, N., “Non First Normal
Form Relations to Represent Hierarchically Orga-
nized Data,,,
Proc. 3rd PODS,
1984, pp. 191-200.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Bancilhon, F., Richard, P., and Scholl, M., “On
Line Processing of Compacted Relations,”
Proc.
8th VLDB,
Mexico City, 1982, pp. 21-37.
Bidoit, N., “The Verso Algebra or How to Answer
Queries with Fewer Joins,”
Journal
of
Computer
and System
Sciences, 1987, pp. 321-364
Codd, E.F., “A Relational Model for Large Shared
Data Banks,
Communications ACM
Vol. 6, No.
13, June 1970, pp. 377-387.
Colby, L.S., “A Recursive Algebra for Nested Re-
lations,,, Technical
Report,
No. 259, August 1988,
University of Indiana
Dadam, P., Kuespert, F., Andersen, F., Blanken,
H., Erbe, R., Guenauer, J., Lum, V., Pistor, P.,
and Walch,G., “A DBMS Prototype to Support Ex-
tended NF2 Relations: An Integrated View on Flat
Tables and Hierarchies,”
Proc. Annual SIGMOD
Conf., Austin, 1986, pp. 356-366.
Deppisch, U., Paul, H.B., and Schek, H.J., “A Stor-
age System for Complex Objects,”
Proc. Int. Worh-
shop on Object- Oriented Database Systems,
Pacific
Grove, 1986, pp. 183-195.
Deshpande, V., and Larson, P.A., “An Algebra for
Nested Relations,”
Tech. Report, CS-87-6.5,
1987,
University of Waterloo.
Deshpande, A., and Van Gucht, D., “A Storage
Structure for Unnormalized Relations,”
Proc. GI
Conf. on Database Systems
for
Ofice Automation,
Engineering and Scientific Applications,
Darmstadt,
April 1987, pp. 481-486.
Gyssens, M. and Van Gucht, D., “The Powerset
Algebra as a Result of Adding Programming Con-
structs to the Nested Relational Algebra,”
Proc.
Annual SIGMOD Conf.,
Chicago, IL, June 1988
Jaeshke, G., and Schek, H.J., “Remarks on the Al-
gebra on Non-First Normal Form Relations,”
Proc.
1st PODS,
Los Angeles, 1982, pp. 124-138.
Linnemann, V., “Non First Normal Form Relations
and Recursive Queries: An SQL-Based Approach,”
Proc. 3rd IEEE Int. Conf. on Data
Engineering,
Los Angeles, 1987.
Makinouchi, A., “A Consideration of Normal Form
of Not- Necessarily-Normalized Relations in the Re-
lational Data Model,”
Proc. 3rd. VLDB,
Tokyo
1977, pp. 447-453.
Ozsoyoglu, G., Ozsoyoglu, Z.M., and Matos, V.,
“Extending Relational Algebra and Relational Cal-
culus with Set-Valued Attributes and Aggregate
Functions,”
ACM Transactions on Database Sys-
tems,
Vol. 12, No. 4, December 1987, pp. 566-592.
Paul, H.B., Schek, H.J., Scholl, M.H., Weikum,
G., and Deppisch, U.,
“Architecture and Imple-
mentation of the Darmstadt Database Kernel Sys-
282
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
tern,”
Proc., Annual SIGMOD Conf.,
San Fran-
cisco , 1987, pp. 196-207.
Pistor, P., and Andersen, F., “Designing a gener-
alized NF2 model with an SQL-Type Language In-
terface,”
Proc. 12th VLDB,
Kyoto, Japan, 1986,
pp. 278-288.
Pistor, P., and Traunmueller, R., “A Database Lan-
guage for Sets, Lists and Tables,,,
Information Sys-
tems,
Vol 11, No. 4, 1986, pp. 323-336.
Roth, M-A., Korth, H.F., and Batory, D.S., “SQL/-
NF: A Query Language for 71NF Relational Data-
bases,”
Information Systems,
Vol 12, No. 1, 1987,
pp. 99-114.
Roth, M.A., Korth, H.F., and Silberschatz, A., “The-
ory of Non-First-Normal-Form Relational Databa-
ses,” Tech. Report TR-84-36 (Revised
January
1986),
University of Texas at Austin, 1984.
Roth, M.A., Korth, H.F., and Silberschatz, A., “Ex-
tended Algebra and Calculus for +NF Relational
Databases,”
Tech. Report TR-85-19,
University of
Texas at Austin, 1985.
Roth, M.A., Korth, H-F., and Silberschatz, A., “Null
Values in -1NF Relational Databases,,,
Tech. Re-
port
TR-85-32,
University of Texas at Austin, De-
cember, 1985.
Schek, H.J., and Scholl, M.H., “The Relational Mod-
el with Relation-Valued Attributes,”
Information
Systems,
Vol. 11, No. 2, 1986, pp. 137-147.
Scholl, M.H., “Theoretical Foundation of Algebraic
Optimization Utilizing Unnormalized Relations,”
Proceedings
of
the International Conference on Data-
base Theory,
September, 1986, Rome, Italy, pp.
380-396
Scholl, M-H., Paul, H.B., and Schek, H.J., “Sup-
porting Flat Relations by a Nested Relational Ker-
nel,,,
Proc. 13th VLDB,
London, 1987.
Thomas, S-J., and Fischer, P.C., “Nested Rela-
tional Structures,,,
Advances
in
Computing Research
III, The Theory
of
Databases,
P.C. Kanellakis, ed.,
JAIpress, 1986, pp. 269-307.
Van Gucht, D.,
“Theory of Unnormalized Rela-
tional Structures,”
Ph.D. Dissertation,
Vanderbilt
University, 1985.
Van Gucht, D., “On the Expressive Power of the
Extended Relational Algebra for the Unnormalized
Relational Model,”
Proc. 6th PODS,
San Diego,
CA, March 1987, pp. 302-312.
Van Gucht, D., and Fischer, P.C., “Multilevel Nested
Relational Structures,”
Journal
of
Computer and
System Sciences,
Vol. 36, No. 1, February 1988,
pp. 77-105
283