Inductive Queries for Mining Patterns and Models
Induktivno povpraševanje za rudarjenje vzorcev in modelov

No. of contract:

Type of project:
from 01.09.2005 to 31.08.2008


Given the present distinct lack of a generally accepted framework for data mining, the quest for such a framework is a major research priority. The most promising approach to this task is taken by inductive databases (IDBs), which contain not only data, but also patterns. Patterns can be either local patterns, such as frequent itemsets, which are of descriptive nature, or global models, such as decision trees, which are of predictive nature. In an IDB, inductive queries can be used to generate (mine), manipulate, and apply patterns. The IDB framework is appealing as a theory for data mining, because it employs declarative queries instead of ad hoc procedural constructs. Declarative queries are often formulated using constraints and inductive querying is closely related to constraint-based data mining. The IDB framework is also appealing for data mining applications, as it supports the process of knowledge discovery in databases (KDD): the results of one (inductive) query can be used as input for another and nontrivial multistep KDD scenarios can be supported, rather than just single data mining operations. The state-of-the-art in IDBs is that there exist various effective approaches to constraint-based mining (inductive querying) of local patterns, such as frequent item sets and sequences, most of which work in isolation. The proposed project aims to significantly advance the state-of-the-art by developing the theory of and practical approaches to inductive querying (constraint-based mining) of global models, as well as approaches to answering complex inductive queries that involve both local patterns and global models. Based on these, showcase applications/IDBs in the area of bioinformatics will be developed, where users will be able to query data about drug activity, gene expression, gene function and protein sequences, as well as frequent patterns (e.g. subsequences in proteins) and predictive models (e.g. for drug activity or gene function).