]>
Larisa Soldatova
Ontology of General Purpose Datatypes
Pance Panov
This ontology contains entities such as: datatype, datatype generator, datatype qualiy and others giving the possibility to represent arbitrary complex datatypes. This is an important fact for a general data mining ontology that wants to represent and query over modelling algorithms for mining structured data.
The ontology was first developed under the OntoDM (Ontology of Data Mining is available at http://kt.ijs.si/panovp/OntoDM) ontology, but for generality and reuse purpose it was decided to export it as a separate ontology. Additionaly, the OntoDT ontology is based on and ISO/IEC 11404 (http://www.iso.org/iso/catalogue_detail.htm?csnumber=39479) standard and can be reused used independently by any domain ontology that requires representation and reasoning about general purpose datatypes.
Saso Dzeroski
13.01.2011
Pance Panov
SVN revision 58
has_information_quality
is-about
denotes
bearer_of
has_part
part_of
data item specification
OntoDM
information content entity quality
table datatype
whose values are collections of values in
the product space of one or more field datatypes, such that each value in the product space represents
an association among the values of its fields. Although the field datatypes may be infinite, any given value
of a table datatype contains a finite number of associations.
aggregate imposed ordering
An aggregate datatype has the ordering property, if and only if there is a canonical first element of each nonempty
value in its value-space. This ordering is (externally) imposed by the aggregate value, as distinct from
the value-space of the element datatype itself being (internally) ordered (see 6.3.2). It is also distinct from the
value-space of the aggregate datatype being ordered.
primitive datatype
A datatype whose value space is defined either axiomatically or by enumeration is said to be a primitive
datatype.
identifiable datatype that cannot be decomposed into other identifiable datatypes without loss of all semantics
associated with the datatype
boolean datatype
boolean is the mathematical datatype associated with two-valued logic.
graph aggregate component
field component
datatype which is a parametric datatype to a datatype generator
parametric datatype
datatype on which a datatype generator operates to produce a generated datatype
ordered structured datatype
value space specification
a value space is the collection of values for a given datatype
ISO/IEC 11404:2007(E)
tree datatype generator
array datatype
whose values are associations between
the product space of one or more finite datatypes, designated the index datatypes, and the value space of the element datatype, such that every value in the product space of the index datatypes associates to
exactly one value of the element datatype.
scaled datatype
Scaled is a family of datatypes whose value spaces are subsets of the rational value space, each
individual datatype having a fixed denominator, but the scaled datatypes possess the concept of
approximate value.
excluding generator
excluding creates a subtype of any exact datatype by enumerating the values which are to be
excluded in constructing the subtype value-space.
real datatype
real is a family of datatypes which are computational approximations to the mathematical
datatype comprising the “real numbers”. Specifically, each real datatype designates a collection of
mathematical real values which are expressed to some finite precision and must be distinguishable to at
least that precision.
aggregate-imposed identifier uniqueness
An aggregate-value has the identifier uniqueness property if and only if no identifier (e.g., label, index) of the
element datatype occurs more than once in the aggregate-value. The aggregate datatype has the identifier
uniqueness property, if and only if all values in its value space do.
non-aggregate generator
exactness
The computational model of a datatype may limit the degree to which values of the datatype can be
distinguished. If every value in the value space of the conceptual datatype is distinguishable in the
computational model from every other value in the value space, then the datatype is said to be exact.
Certain mathematical datatypes having values which do not have finite representations are said to be
approximate
pointer generator
pointer generates a datatype, called a pointer datatype, each of whose values constitutes a
means of reference to values of another datatype, designated the element datatype. The values of a
pointer datatype are atomic.
generated datatype
A generated datatype is a datatype resulting from an application of a datatype generator.
numeric quality
A datatype is said to be numeric if its values are conceptually quantities (in some mathematical number
system). A datatype whose values do not have this property is said to be non-numeric.
homogenity
An aggregate datatype is homogeneous, if and only if all components must belong to a single datatype. If
different components may belong to different datatypes, the aggregate datatype is said to be heterogeneous.
The component datatype of a homogeneous aggregate is also called the element datatype.
vector datatype
directed labeled graph datatype generator
set generator
synonim: set datatype constructor
set generates a datatype, called a set datatype, whose value-space is the set of all subsets of
the value space of the element datatype, with operations appropriate to the mathematical set.
pointer datatype
whose values constitutes a
means of reference to values of another datatype, designated the element datatype. The values of a
pointer datatype are atomic.
tree datatype
agregate size
The size of an aggregate-value is the number of component values it contains. The size of the aggregate
datatype is fixed, if and only if all values in its value space contain the same number of component values.
The size is variable, if different values of the aggregate datatype may have different numbers of component
values. Variability is the more general case; fixed-size is a constraint.
table generator
table generates a datatype, called a table datatype, whose values are collections of values in
the product space of one or more field datatypes, such that each value in the product space represents
an association among the values of its fields. Although the field datatypes may be infinite, any given value
of a table datatype contains a finite number of associations.
uniqueness
An aggregate-value has the uniqueness property if and only if no value of the element datatype occurs more
than once in the aggregate-value. The aggregate datatype has the uniqueness property, if and only if all
values in its value space do.
exteding generator
Extended creates a datatype whose value-space contains the value-space of the base datatype
as a proper subset.
selection generator
selecting creates a subtype of any exact datatype by enumerating the values in the subtype
value-space.
aggregate datatype generator quality
ordered aggregate datatype generator
cardinality
A value space has the mathematical concept of cardinality: it may be finite, denumerably infinite (countable),
or non-denumerably infinite (uncountable). A datatype is said to have the cardinality of its value space. In the
computational model, there are three significant cases:
⎯ datatypes whose value spaces are finite,
⎯ datatypes whose value spaces are exact and denumerably infinite,
⎯ datatypes whose value spaces are approximate and therefore have a finite or denumerably
infinite computational model, although the conceptual value space may be non-denumerably infinite.
Every conceptually finite datatype is necessarily exact. No computational datatype is non-denumerably
infinite.
character datatype
character is a family of datatypes whose value spaces are character-sets.
record generator
synonim: tuple datatype constructor
record generates a datatype, called a record datatype, whose values are heterogeneous
aggregations of values of component datatypes, each aggregation having one value for each component
datatype, keyed by a fixed field-identifier.
non-directed labeled graph generator
non-ordered agregate datatype generator
ordinal datatype
ordinal is the datatype of the ordinal numbers, as distinct from the quantifying numbers
(datatype integer). ordinal is the infinite enumerated datatype.
ordering
A datatype is said to be ordered if an order relation is defined on its value space.
set discrete
aggregate datatype
synonim: structured datatype
An aggregate datatype is a generated datatype, each of whose values is, in principle, made up of values of
the parametric datatypes. The parametric datatypes of an aggregate datatype or its generator are also called
component datatypes.
enumerated datatype
enumerated is a family of datatypes, each of which comprises a finite number of distinguished values having an intrinsic order.
synonim: discrete datatype
integer datatype
integer is the mathematical datatype comprising the exact integral values.
choice datatype
whose values is a single value
from any of a set of alternative datatypes. The alternative datatypes of a choice datatype are logically
distinguished by their correspondence to values of another datatype, called the tag datatype.
datatype quality
indirect access
An aggregate datatype is said to have only indirect access methods if there is no aggregate-imposed index
mapping. Indirect access may be by position (if the aggregate datatype has ordering), by value of the element
(if the aggregate datatype has uniqueness), or by some implementation-dependent selection mechanism,
modeled as random selection.
boundness
A datatype is said to be bounded above if it is ordered and there is a value U in the value space such that, for
all values s in the value space, s ≤ U . The value U is then said to be an upper bound of the value space.
Similarly, a datatype is said to be bounded below if it is ordered and there is a value L in the space such that,
for all values s in the value space, L ≤ s . The value L is then said to be a lower bound of the value space. A
datatype is said to be bounded if its value space has both an upper bound and a lower bound.
datatype generator
A datatype generator is a conceptual operation on one or more datatypes which yields a datatype. A datatype
generator operates on datatypes to generate a datatype, rather than on values to generate a value.
Specifically, a datatype generator is the combination of:
⎯ a collection of criteria for the number and characteristics of the datatypes to be operated upon,
⎯ a construction procedure which, given a collection of datatypes meeting those criteria, creates a new
value space from the value spaces of those datatypes, and
⎯ a collection of characterizing operations which attach to the resulting value space to complete the
definition of a new datatype.
The application of a datatype generator to a specific collection of datatypes meeting the criteria for the
datatype generator forms a generated datatype. The generated datatype is sometimes called the resulting
datatype, and the collection of datatypes to which the datatype generator was applied are called its parametric
datatypes.
synonim: datatype constructor
ISO/IEC 11404:2007(E)
bag generator
bag generates a datatype, called a bag datatype, whose values are collections of instances of
values from the element datatype. Multiple instances of the same value may occur in a given collection;
and the ordering of the value instances is not significant.
date and time datatype
time is a family of datatypes whose values are points in time to various common resolutions:
year, month, day, hour, minute, second, and fractions thereof.
component mandatoriness
The components of an aggregate datatype may not all be required to have a valid value of the datatype, i.e.,
the actual value space of the datatype may include values for which some of the component values are
unspecified.
When a component of the datatype is required to have a valid value in order for the aggregate value to be a
valid value of the datatype, the component is said to be a mandatory component.
When a component of the datatype is not required to have a valid value in order for the aggregate value to be
a valid value of the datatype, the component is said to be an optional component.
ordinal datatype
datatype
Since this collection is unbounded, there are four formal methods used in the definition of the datatypes:
⎯ explicit specification of primitive datatypes, which have universal well-defined abstract notions, each
independent of any other datatype.
⎯ implicit specification of generated datatypes, which are syntactically and in some ways semantically
dependent on other datatypes used in their specification. Generated datatypes are specified implicitly by
means of explicit specification of datatype generators, which themselves embody independent abstract
notions.
⎯ specification of the means of datatype declaration, which permits the association of additional identifiers
and refinements to primitive and generated datatypes and to datatype generators.
⎯ specification of the means of defining subtypes of the datatypes defined by any of the foregoing methods.
ISO/IEC 11404:2007(E)
set of distinct values, characterized by properties of those values, and by operations on those values
tuple of primitives
non-directed labeled graph datatype
complex datatype
complex is a family of datatypes, each of which is a computational approximation to the
mathematical datatype comprising the “complex numbers”. Specifically, each complex datatype
designates a collection of mathematical complex values which are known to certain applications to some
finite precision and must be distinguishable to at least that precision in those applications.
labeled graph datatype
class datatype
state datatype
state is a family of datatypes, each of which comprises a finite number of distinguished but unordered values.
range generator
range creates a subtype of any ordered datatype by placing new upper and/or lower bounds on
the value space.
structurness
Aggregate datatypes are:
⎯ conceptually structured, having both the component datatypes and the access method specified, or
⎯ conceptually semi-structured, having either the component datatypes or the access method specified, but
not both, or
⎯ conceptually unstructured, having neither the component datatype nor the access method specified.
DAG datatype
sequence datatype
whose values are ordered
sequences of values from the element datatype. The ordering is imposed on the values and not intrinsic
in the underlying datatype; the same value may occur more than once in a given sequence.
size generator
size creates a subtype of any sequence, set, bag, or table datatype by specifying bounds on
the number of elements any value of the base datatype may contain.
bag datatype
procedure generator
procedure generates a datatype, called a procedure datatype, each of whose values is an
operation on values of other datatypes, designated the parameter datatypes. That is, a procedure
datatype comprises the set of all operations on values of a particular collection of datatypes. All values of
a procedure datatype are conceptually atomic.
void datatype
void is the datatype representing an object whose presence is syntactically or semantically
required, but carries no information in a given instance.
choice generator
Choice generates a datatype called a choice datatype, each of whose values is a single value
from any of a set of alternative datatypes. The alternative datatypes of a choice datatype are logically
distinguished by their correspondence to values of another datatype, called the tag datatype.
explicit subtype generator
Explicit subtyping identifies a datatype as a subtype of the base datatype and defines the
construction procedure for the subset value space in terms of general-purpose datatypes or datatype
generators.
procedure datatype
whose values is an
operation on values of other datatypes, designated the parameter datatypes. That is, a procedure
datatype comprises the set of all operations on values of a particular collection of datatypes. All values of
a procedure datatype are conceptually atomic.
vector generator
aggregate generator
An aggregate datatype generator generates a datatype by
⎯ applying an algorithmic procedure to the value spaces of its component datatypes to yield the value space
of the aggregate datatype, and
⎯ providing a set of characterizing operations specific to the generator.
synonim: aggregate datatype constructor
recursiveness
A datatype is said to be recursive if a value of the datatype can contain (or refer to) another value of the
datatype.
node component
set datatype
whose value-space is the set of all subsets of
the value space of the element datatype, with operations appropriate to the mathematical set.
direct access
An aggregate datatype has a direct access method, if and only if there is an aggregate-imposed mapping
between values of one or more “index” (or “key”) datatypes and the component values of each aggregate
value. Such a mapping is required to be single-valued, i.e. there is at most one element of each aggregate
value which corresponds to each (composite) value of the index datatype(s). The dimension of an aggregate
datatype is the number of index or key datatypes the aggregate has.
An aggregate datatype is said to be indexed, if and only if it has a direct access method, every index datatype
is ordered, and an element of the aggregate value is actually present and defined for every (composite) value
in the value space of the index datatype(s). Every indexed aggregate datatype has a fixed size, because of
the 1-to-1 mapping from the index value space. In addition, an indexed datatype has a “partial ordering” in
each dimension imposed by the order relationship on the index datatype for that dimension; in particular, an
aggregate datatype with a single ordered index datatype implicitly has the ordering imposed by sequential
indexing.
An aggregate datatype is said to be keyed, if and only if it has a direct access method, but either the index
datatypes or the mapping do not meet the requirements for indexed. That is, the index (or key) datatypes
need not be ordered, and a value of the aggregate datatype need not have elements corresponding to all of
the key values.
sequence generator
Sequence generates a datatype, called a sequence datatype, whose values are ordered
sequences of values from the element datatype. The ordering is imposed on the values and not intrinsic
in the underlying datatype; the same value may occur more than once in a given sequence.
subtype generator
A subtype is a datatype derived from an existing datatype, designated the base datatype, by restricting the
value space to a subset of that of the base datatype whilst maintaining all characterizing operations. Subtypes
are created by a kind of datatype generator which is unusual in that its only function is to define the
relationship between the value spaces of the base datatype and the subtype.
non-aggregate datatype
array generator
array generates a datatype, called an array datatype, whose values are associations between
the product space of one or more finite datatypes, designated the index datatypes, and the value space of the element datatype, such that every value in the product space of the index datatypes associates to
exactly one value of the element datatype.
rational datatype
Rational is the mathematical datatype comprising the “rational numbers”.
access
The access method for an aggregate datatype is the property which determines how component values can
be extracted from a given aggregate-value.
edge component
equality
ISO/IEC 11404:2007
In every value space there is a notion of equality, for which the following rules hold:
⎯ for any two instances (a, b) of values from the value space, either a is equal to b, denoted a = b , or a is
not equal to b, denoted a ≠ b ;
⎯ there is no pair of instances (a, b) of values from the value space such that both a = b and a ≠ b ;
⎯ for every value a from the value space, a = a ;
⎯ for any two instances (a, b) of values from the value space, a = b if and only if b = a ;
⎯ for any three instances (a, b, c) of values from the value space, if a = b and b = c , then a = c .
On every datatype, the operation Equal is defined in terms of the equality property of the value space, by:
⎯ for any values a, b drawn from the value space, Equal(a,b) is true if a = b , and false otherwise.
record (tuple) datatype
whose values are heterogeneous
aggregations of values of component datatypes, each aggregation having one value for each component
datatype, keyed by a fixed field-identifier.
synonim: tuple datatype
DAG datatype generator
sequence of real
class generator
class generates a datatype, called a class datatype, whose values are heterogeneous
aggregations of values of component datatypes, each aggregation having one value for each component
datatype, keyed by a fixed field-identifier. Components of a class may include procedure definitions.
The override type qualifier specifies that the labeled class attribute definition that follows replaces the
prior class attribute definition with the same label.
non-ordered structured datatype
primitive field component list
aggregate field component
field identifier
primitive field component
field component list
label
IAO
bounded below
non equal
unbounded
variable size
unbounded below
non-unique values
ordered
semi-structured
non-ordered
equal
finite
approximate
unstructured
countable
component mandatory
key access
component non-mandatory
uncountable
homogeneous
structured
index access
access by value
non-numeric
recursive
unbounded above
bounded above
non-recursive
exact
unordered aggregate
unique values
identifier not unique
numeric
heterogeneous
position acess
ordered aggregate
bounded
fixed size
real
homogenuous set generator
2 element record generator
identifier unique
inplemetation dependent access
boolean
ident2:real
enumerated{class1, class2, class3}
ident4
tuple(ident1:real,ident2:real,ident3:real,ident4:real,ident4:enumerated)
ident5:enumerated
ident3:real
ident4:real
ident2
ident5
ident1:real
5 element record generator
ident1
list(ident1:real,ident2:real,ident3:real,ident4:real,ident5:enumerated)
ident3