return to index |
xsdb project page with download links xFeedMe xsdb resources |
The purpose of this section is to describe the syntax and semantics of xsdb assertions. Assertions are used in xsdb both to represent queries and databases (also called contexts). Assertions are expressed syntactically using XML notation. Every assertion corresponds to a set of models for the assertion which is identified with the meaning of the assertion. We notate the models for an expression E as Models(E), which is a subset of partial mappings from NAMES to VALUES (the xsdb MODELS).
<anything/>The models for this assertion Models("<anything/>") is the set of all xsdb MODELS.
<nothing/>The models for this assertion Models("<nothing/>") is the empty set.
Object definitions define correspondences between identity strings and content that the identity strings represent. Strictly speaking these constructs are external to the logical semantics, but allow the database to archive possibly large data items and refer to them by possibly much shorter names.
An identity may be associated with a string, for example as follows.
<object id="US_President16"> Abraham Lincoln was born Sunday, February 12, 1809, in a log cabin near Hodgenville, Kentucky... </object>In this case the occurrence of the identity US_President16 in other assertions will be understood to refer to the content enclosed within the object tag. However, two identity strings are judged to be equal if and only if their names are equal. If two differently named identity strings refer to identical objects they are still considered unequal. NOTE: This tag should allow for multiple encodings and data types (eg unicode/byte). An object content may also be specified by an external reference.
<object id="US_President16" href="http://home.att.net/~rjnorton/Lincoln77.html"/>
The basic atoms of information used to construct xsdb expressions are attribute value restrictions. These atomic assertions specify the type and value for a named attribute.
An object reference specifies that a given named attribute names an identity and also specifies the identity string named by that attribute. For example
<id at="PresidentId">US_President16</id>The intuition for this assertion is roughly "this context discusses the object identified as US_President16".
The set Models("<id at='X'>Y</id>") contains exactly those xsdb models M where for every mapping m in M, m["X"]= "Y" (a name for an identity).
A string attribute restriction specifies that a given named attribute maps to a given string value. For example
<s at="FirstName">Abraham</s>The intuition for this assertion is roughly "in this context the first name is always 'Abraham'".
The set Models("<s at='X'>Y</s>") contains exactly those xsdb models M where for every mapping m in M, m["X"] = "Y".
An integer attribute restriction specifies that a given named attribute maps to a given integer value. For example
<i at="Birthyear">1861</i>The intuition for this assertion is roughly "in this context the birth year is always 1861."
The set Models("<i at='X'>Y</s>") contains exactly those xsdb models M where for every mapping m in M, m["X"] = integer("Y"), where the integer function converts the string Y to an integer value using standard C programming language conventions.
A float attribute restriction specifies that a given named attribute maps to a given floating point numerical value. For example
<f at="HeightInMeters">1.93</f>The intuition for this assertion is roughly "in this context the height in meters is always 1.93."
The set Models("<f at='X'>Y</f>") contains exactly those xsdb models M where for every mapping m in M, m["X"] = float("Y"), where the float function converts the string Y to a floating point value using standard C programming language conventions.
In addition to atomic restrictions, attribute values may also be restricted to ranges. For example
<i at="Birthyear"><gt>1860</gt><lt>1869</lt></i>The intuition of this assertion is roughly "in this context the Birthyear is an integer greater than 1860 and less than 1869".
The range predicates include both inclusive and exclusive variants
ge -- greater or equal. le -- less or equal. gt -- strictly greater than. lt -- strictly less than.The set Models("<i at='X'><gt>Y</gt></i>") contains exactly those xsdb models M where for every mapping m in M, m["X"] is an integer greater than integer("Y"), where the integer function translates the string Y to an integer value using standard C programming language conventions. String values are interpreted analogously using the Unicode culture neutral string ordering [XXXX what about supporting different cultures? what about bytes?]. Floating point ranges are interpreted analogously.
The "greater" and "less" predicates may be combined in a single assertion where, for example
<i at="Birthyear"><gt>1860</gt><lt>1869</lt></i>is interpreted the same as
<and> <i at="Birthyear"><gt>1860</gt></i> <i at="Birthyear"><lt>1869</lt></i> </and>as described below.
Another non-atomic single attribute restriction is the string prefix restriction. This assertion is useful for certain types of queries and for building indices. For example
<s at="FirstName"><prefix>A</prefix></s>The intuition of this assertion is roughly "in this context the first name always begins with 'A'."
The set Models("<s at='X'><prefix>Y</prefix></s>") contains exactly those xsdb models M where for every m in M, M["X"] is a string that starts with the string "Y".
Any assertion may be captured as an atomic attribute restriction value, for example:
<a at="drinkers"> <or> <s at="drinker">adam</s> <s at="drinker">lola</s> <s at="drinker">norm</s> <s at="drinker">woody</s> </or> </a>Here the value of the drinkers attribute is restricted to the assertion
<or> <s at="drinker">adam</s> <s at="drinker">lola</s> <s at="drinker">norm</s> <s at="drinker">woody</s> </or>The assertion tag a has a special relationship with the group, ungroup, and subquery predicates as well as with the computation of aggregates using the calc function.
The na
tag specifies that an attribute has no meaningful value.
For example the song "happy birthday to you" has no (literal) thickness, and
therefore the value for the thickness attribute is "not applicable", written
<na at="thickness/>
<and> <s at="songname">happy birthday to you</s> <s at="songcategory">traditional</s> <na at="thickness/> </and>
The set Models("<na at='X'/>") contains exactly those xsdb models M where for every mapping m in M, m["X"] is not defined.
"Not applicable" is one of 3 different "null" values permitted by the xsdb framework at this writing -- the other two are "unknown" and "empty". The value for an attribute is "unknown" if the attribute is missing. For example the "lyrics" attribute is unknown in the information on "happy birthday to you" above. Furthermore an aggregate group may be empty. For example to indicate that the collection of drinkers is empty write
<a at="drinkers"> <nothing/> </a>The "empty aggregate" can be considered another form of null value.
Basic attribute restrictions may be combined using conjunction and disjunction combinators.
Assertions may be conjoined to allow additional restrictions using the and combinator. For example
<and> <s at="FirstName">Abraham</s> <s at="LastName">Lincoln</s> <i at="Birthyear">1861</i> <f at="HeightInMeters">1.93</f> <id at="PresidentId">US_President16</id> </and>The intuition of this assertion is roughly "in this context we are describing the object with PresidentId US_President16 named Abraham Lincoln born in 1861 with height 1.93 meters.
The set Models("<and>U V ... Y Z</and>") contains exactly those xsdb models which are models for each of the assertions U, V, ..., Y, Z. Equivalently
Models("<and>U V ... Y Z</and>") = Models(U) intersect Models(V) intersect ... Models(Y) intersect Models(Z)The empty conjunction <and/> is equivalent to <anything/>
Assertions may be disjoined to allow multiple alternatives using the or combinator. For example
<or> <and> <s at="FirstName">Abraham</s> <s at="LastName">Lincoln</s> <i at="Birthyear">1861</i> <f at="HeightInMeters">1.93</f> <id at="PresidentId">US_President16</id> </and> <and> <s at="FirstName">George</s> <s at="LastName">Washington</s> <i at="Birthyear">1732</i> <id at="PresidentId">US_President1</id> </and> </or>The intuition of this assertion is roughly "in this context we are describing two possibilities. In the first possibility the PresidentId is US_President16 and the name is Abraham Lincoln born in 1861 with height 1.93 meters. In the second possibility The PresidentId is US_President1 and the name is George Washington born in 1732.
The set Models("<or>U V ... Y Z</or>") contains exactly those xsdb models which model any of the assertions U, V, ..., Y, Z. Equivalently
Models("<or>U V ... Y Z</or>") = Models(U) union Models(V) union ... Models(Y) union Models(Z)The empty disjunction <or/> is equivalent to <nothing/>
Possibilities may be explicitly excluded using the exclude combinator which is a form of logical negation. For example
<exclude> <or> <id at="PresidentId">US_President16</id> <id at="PresidentId">US_President1</id> </or> </exclude>Specifies an expression that does not include US_President16 or US_President1. Used in a larger context this expression may explicitly specify the scope of a database context. For example
<or> <exclude> <or> <id at="PresidentId">US_President16</id> <id at="PresidentId">US_President1</id> <id at="PresidentId">US_President100</id> </or> </exclude> <and> <s at="FirstName">Abraham</s> <s at="LastName">Lincoln</s> <i at="Birthyear">1861</i> <f at="HeightInMeters">1.93</f> <id at="PresidentId">US_President16</id> </and> <and> <s at="FirstName">George</s> <s at="LastName">Washington</s> <i at="Birthyear">1732</i> <id at="PresidentId">US_President1</id> </and> </or>May be read "in this context if the PresidentId is not US_President1 or US_President16 or US_President100 there is no information, but if the PresidentId is US_President16 then the first name is Abraham ... or if the PresidentId is US_President1 then the first name is George... and there is no US_President100.
The set Models("<exclude> X </exclude>") contains exactly those xsdb models which are not models of X.
For convenience define <exclude> U V ... Y Z </exclude> to be equivalent to <exclude> <or> U V ... Y Z </or> </exclude> (multiple exclusions are interpreted as disjunctions.
Conditionals represent a sequence of mutually exclusive possibilities. This construct is derived from the other constructs. The purpose of conditionals is to allow explicit control of query evaluation.
<if> <id at="PresidentId">US_President16</id> <then> <consult href="http://www.whitehouse.gov/lincoln.xml"/> </then> <id at="PresidentId">US_President1</id> <then> <consult href="http://www.whitehouse.gov/washington.xml"/> </then> <else> <consult href="http://www.whitehouse.gov/others.xml"/> </else> </if>The above conditional is intended to read: all information about US_President16 is in the file lincoln.html and all information about US_President1 is in the file washington.xml and all other information for this context is in others.xml.
Conditionals are defined as equivalent to other assertions using the following transforms. The boundary case assertion
<if> <else> X </else> </if>is equivalent to
X
. The assertion
<if> C <then> Y </then> I </if>where
I
is a conditional is defined to be equivalent to
<or> <and> C Y </and> <and> <exclude> C </exclude> I </and> </or>
Consultation assertions allow the present database context to refer to all or part of another database. For example
<consult href="http://www.whitehouse.gov/presidents.xml"/>refers to the assertion provided by the content of the URI "http://www.whitehouse.gov/presidents.xml".
The conditional consultation
<consult href="http://www.elvisfans.org/discography.xml"> <s at="Artist">Elvis Presley</s> </consult>Refers to the conjunction of the assertins provided by the content of the URI "http://www.elvisfans.org/discography.xml" where the artist is Elvis Presley.
If TEXT is the assertion text associated with a URI then the models for a conditional consultation to URI are defined as follows
Models("<consult href='URI'/>") = Models(TEXT) Models("<consult href='URI'> X </consult>") = Models("<and> X TEXT </and>") Models("<consult href='URI'> U V W ... Y Z </consult>") = Models("<and> U V W ... Y Z TEXT </and>")The final form is added as a convenience (multiple conditions are assumed conjoined).
The directed graph formed from consultations between a set of contexts must always be acyclic [until some later version of this specification which may lift this restriction].
Naming transforms allow simple conversions between databases that use different naming conventions. Unfortunately the model theoretic interpretation for transforms are a bit tricky.
Renaming allows one database context to refer to the data of another database context by different attribute names. For example the following specifies that for data from the the "http://www.whitehouse.gov/presidents.xml" context the attribute "fn" should be renamed "FirstName" and the attribute "ln" should be renamed "LastName" in the present database context.
<rename to="FirstName LastName" from="fn ln"> <consult href="http://www.whitehouse.gov/presidents.xml"/> </rename>To define Models("<rename to='A' from='B'> X </rename>"), say model M1 is Rename(A,B) compatible with model M2 if and only if there is a relation R with domain M1 and range M2 such that if m1 R m2 then
for every n which is neither A nor B either m1(n) and m2(n) are both undefined or m1(n) = m2(n); and either m1(A) and m2(B) are both undefined or m1(A) = m2(B)In this case we define the
Models("<rename to='A' from='I'> X </rename>")to be the set of all models M1 which are Rename(A,B) compatible to some M1 in Models(X).
Define
<rename to="A B C" from = "I J K"> X </rename>as a shorthand for
<rename to="A" from = "I"> <rename to="B" from = "J"> <rename to="C" from = "K"> X </rename></rename></rename>As a boundary case <rename to="" from = ""> X </rename> is equivalent to X.
When rename is applied to multiple expressions the expressions are to be interpreted as disjunctions. That is
<rename ...> U V ... Z </rename>is defined to be the same as
<rename ...> <or> U V ... Z </or> </rename>
Selects allow a database context to extract only certain named data elements from another database context. In the example below the current database context is only interested in the attributes named fn, ln, by, and nm from the "http://www.whitehouse.gov/presidents.xml" context.
<select names="fn ln by nm"> <consult href="http://www.whitehouse.gov/presidents.xml"/> </select>To define Models("<select names='N'> X </select>") where N is a set of attribute names say that model M1 is Select(N) compatible with model M2 if and only if there is a relation R with domain M1 and range M2 such that if m1 R m2 then m1[n] = m2[n] for every n in N. In this case define Models("<select names='N'> X </select>") to be the set of models M1 which are Project(N) compatible with some model M1 in Models(X).
When select is applied to multiple expressions the expressions are to be interpreted as disjunctions. That is
<select ...> U V ... Z </select>is defined to be the same as
<select ...> <or> U V ... Z </or> </rename>
Predicates and functions allow external extensions to the inference process to support access to non native data sources or to generalized computations.
The fn and pred tags have the syntax
<fn at="attributeName" name="functionName" args="argument description"/> <pred name="predicateName" args="argument description"> Assertion <pred>The precise model theoretic characterization of these constructs are deferred for the moment because it is not clear what definition is best XXXXX FIX.
Intuitively, functions generate attribute assignments from mappings. For example the following function (if associated with an appropriate implementation) assigns the value of x+y to z.
<fn at="z" name="calc" args="x+y"/>In this case the conjunction
<and> <i at="x">3</i> <i at="y">2</i> <fn at="z" name="calc" args="x+y"/> </and>should be equivalent to
<and> <i at="x">3</i> <i at="y">2</i> <i at="z">5</i> </and>Intuitively, predicates derive additional assertions from other assertions. For example the simple predicate
<pred name="eq" args="x y z"/>Might be intended to mean that the attributes x, y, and z should all have the same value. In this case the conjunction
<and> <i at="x">1</i> <pred name="eq" args="x y z"/> </and>should be equivalent to
<and> <i at="x">1</i> <i at="y">1</i> <i at="z">1</i> </and>Predicates may also operate on a subordinate assertion. For example the predicate
<pred name="group" args="DepartmentNumber DepartmentName; dept_groups"> <consult href="http://www.bigco.com/org.xml"> </pred>Might form groups from the information in org.xml grouped by DepartmentNumber and DepartmentName with the groups assigned to the attribute dept_groups.
There are a number of special predicates and functions, such as the group and ungroup predicate and the calc function. These are given their own shorthand notations which translate to the longer notation described above. See the Guide for additional information on these predicates and functions.
A number of special tags support testing inference engines and advice to inference engine optimizers.
The test tag is used to explicitly enclose test cases for an inference engine. If the test case is an instance of the same tag then all enclosed elements are expected to represent equivalent model sets. If the test case is an instance of the different tag then all enclosed elements are expected to represent inequivalent model sets. Any other enclosed cases should not be equivalent to <nothing/>.
<test> ... </test>The test tag evaluates to <nothing/> only if the tests succeed and is permitted to evaluate to any value not equivalent to <nothing/> if the tests fail (where the value may provide an explanation explaining the problem encountered).
The same tag asserts several different ways to derive information and asserts that they are all equivalent. In the context of verifying a test case an inference engine must evaluate each of the expressions and check that they are equivalent. In the context of other inferences an inference engine may assume that the enclosed expressions are equivalent and choose the most convenient one to evaluate.
<same> ... </same>Outside of the testing interpretation the same tag is intended to assist different queries to make use of alternatively indexed data sets in efficient ways. For example a database context about books may provide several different strategies for looking up a book as follows
<same> <ifknown atts="isbn"> <consult href="http://db.isbn.org/db.xml"/> </ifknown> <ifknown atts="author"> <consult href="http://db.amazon.com/authors.xml"/> </ifknown> <otherwise> <consult href="http://db.loc.gov/all.xml"/> </otherwise> </same>The pragmatic reading of the above assertion is that there are three databases that contain the same information, but if you are looking for a known isbn it is best to consult "http://db.isbn.org/db.xml". Failing that, if you know the author you should consult "http://db.amazon.com/authors.xml". Otherwise look to the "http://db.loc.gov/all.xml" context.
The ifknown tag is semantically an alias for the and tag which may only occur directly within a same tag. Beyond the logical semantics the ifknown tag represents advice to an inference engine: an inference engine attempting to derive a result where the attributes listed by the ifknown tag have known values should assume that the enclosed conjunction will be inexpensive to evaluate.
<ifknown atts="isbn"> ... </ifknown>
The otherwise tag is semantically an alias for the and tag which may only occur directly within the same tag, and it must be the last child of the same tag. This tag is provided to allow the specification of a fallback strategy for evaluating an expression.
<otherwise> ... </otherwise> (and)
The different tag is provided for testing purposes only. It is the analogue of the same tag which asserts that the first contained elements should evaluate to a semantically different interpretations from the following elements.
<different> ... </different>The different tag only makes sense as a child of the test tag.
There are a number of attributes that are shared by all Assertions which are external to the semantics of the assertions. These attributes are used to place additional information on the attributes useful for processing.
XXX not completely implemented at this time.
Any expression may have a bookmark attribute
<and bookmark="expression5">...</and>This provides a name the expression. A bookmark may only be used once in a given context.
Any expression may have a weak attribute which is assumed to be set "false" if it is omitted. If it is present and marked anything other than "true", the mark means that this expression was derived using an inference process that was not able to give a complete answer, but chose to provide a weaker response instead.
<or weak="unable to resolve http://a.b.c/x.xsdb">...</or>When the weak attribute is non-false it is possible that the expression is provided in place of another stronger expression which has fewer models.
A bookmarked expression may be used in another location in a context using the reference tag.
<reference bookmark="expression5"/>The models of the reference tag are the same as the model of the expression the bookmark indicates. It is an error to refer to a bookmark which refers to no bookmarked expression.
The directed graph formed from references between a set of expressions must always be acyclic [until some later version of this specification which may lift this restriction].
The name tag may only occur within a top level container and is not interpreted in itself but serves only to name an expression which may be referenced elsewhere.
<name bookmark="expression6">...</name>
<context> <title>Presidents of the United States</title> .... </context>