Methods for Developing Hierarchical Models

Research project, 1997-99, financed by the Ministry of Science and Technology of the Republic of Slovenia under grant 3411-97-22-9076.

Project Proposal

Background

To solve a complex problem, one of the most general approaches is to decompose it to smaller and less complex subproblems. Such decomposition presents a foundation for hierarchical models, which are intended for classification or evaluation of objects described in the attribute language. Subproblems or intermediate concepts are there represented by distinct variables, organized into a hierarchical structure. In general, such a structure is an acyclic graph or, in most cases, a tree. For every internal node in the structure there is a rule (function, relation, procedure) that defines how the value of this node is determined from the values of its descendants.

Hierarchical models are used in various domains of science and technology. They are extensively used within decision support systems for problems in which an option is to be selected from a given set so as to best satisfy the goals of the decision maker. The procedure for option evaluation is defined by a multi-attribute hierarchical model. In addition to evaluation, such models are used also for various analyses and simulations. In artificial intelligence, the hierarchical models are used in (1) expert systems for inference and (2) machine learning to decrease the complexity and increase the transparency of the learned classification rule. Hierarchical models and corresponding methods were also developed and are used within pattern recognition and computer aided design of switching circuits.

The most difficult problem in the development of hierarchical models is the design of model structure. In most cases, the structure is defined manually in a tiresome and lengthy process based on knowledge about the problem domain, which requires a lot of skill and experience. Only within few specific domains this process has been automated, for example in switching cirucuit design. However, these methods are typically limited to two-valued variables and a predefined set of logical operators.

Another important problem is due to the fact that the research of hierarchical models took place separately within different areas of research; resulting in etremely specific solutions. For example, in multi-attribute decision making, the models incorporate only numerical variables, which are then interrelated by functions based on utility theory, Bayesian probability or continuous logic. On the other hand, artificial intelligence generally uses the models with symbolic variables, and operators, defined by rules, decision tables, or decision trees. Many real-world problems require both, and this can be achieved only by an appropriate combination of the approaches.

Within this project we propose research, development, and verification of the methods that would, compared to existing methods, provide a better and more active support to the development of hierarchical classification and decision models. The proposed solutions are based on four approaches:

automatic development of hierarchical models from examples;
supervised development of models;
methods for developing operators/functions within the model from examples and known structure of variables;
manual methods of development and transformation of hybrid (qualitative and quantitative) hierarchical models.

Related Work

This project integrates methods and techniques from two principal areas: artificial intelligence and decision support systems. In the context of artificial intelligence, the project involves machine learning, and more specifically, constructive induction and structural decomposition of functions. From the viewpoint of decision support, the project is focused to multi-attribute decision making.

Machine learning methods are used to derive a concept that is described with learning examples. The methods are based on the identification of problem subspaces, for which the concept can be described with given description language. In this project we treat the problem of developing hierarchical models as a special case of machine learning from examples that automatically derives both intermediate concepts (model variables) and operators that define the relationships between the concepts. Such a method belongs to constructive induction (Michalski 83), which, in order to induce a concept description, generates new intermediate concepts. Some well-known constructive induction systems are DUCE (Muggleton 90), LFC (Ragavan, Rendell 93), CIGOL (Muggleton, Buntine 92). The main limitation of these systems is that the generation of new concepts does not discover the operators of the model, but rather selects them from a predefined set. In general, such methods generate new concepts in a way that makes it difficult to order them into a hierarchy.

The development of hierarchical models was addressed by a pioneer of artificial intelligence, Samuel (59). Given the model's structure, his algorithm used learning examples to develop models for playing checkers. This approach was later improved by Biermann et al. (82). Although their method could automatically derive the structure of variables, the approach was inefficient and could only handle cases with a complete set of learning examples. Recently, the method was substantially improved by research groups of Ross (94) and Perkowski (95). Their approach is based on Ashenhurst-Curtis function decomposition (Ashenhurst 52). Besides the fact that these methods are designed for specific application of switching circuit design and use two-valued variables, they are much more efficient and can be used also with incomplete sets of learning examples.

An important contribution to the use of hierarchical models in machine learning is structured induction, developed by Shapiro (87). His approach is based on a manual decomposition of the problem. For every intermediate concept either a special set of learning examples is used or an expert is consulted to build a corresponding decision tree. In comparison with standard decision tree induction techniques, Shapiro's approach exhibits about the same classification accuracy, but increased transparency and lower complexity of the developed models. Michie (95) emphasizes the important role the structural induction will have in the future developments and presents several real problems that where solved by structured induction.

Decision support is an area where hierarchical models have been used extensively. Most common are the models that use numerical variables and analytical (utility) functions as operators (Mallach 94); a typical example is AHP (Saaty 93). In all cases, the structure of the model is defined manually. However, the acquisition of utility functions is well supported by various methods based on interactive dialogs, graphs, and tables. Within this project we will make use of these methods primarily as a reference to generalize the learning algorithms for (1) continuous variables, (2) different types of operators, and (3) the derivation of hybrid models that include both numerical and symbolical variables.

The members of the Project group have actively participated in the research and development of both machine learning and multi-attribute decision making methods. They have developed several original machine learning methods, such as Assistant (Cestnik et al. 87) and Retis (Karalič 92), and proposed several improvements of existing methods, like m-estimate of probability (Cestnik 90) and an optimal tree-pruning algorithm (Bohanec, Bratko 94). Special emphasis was also payed on real application of these methods in medicine, machine engineering, metallurgy, process control and others (Urbančič et al. 91).

Within the domain of decision support systems we have developed an expert system shell for multi-attribute decision making DEX (Bohanec, Rajkovič 90). The system was successfully applied in Slovenia and abroad in more than 50 realistic decision making problems. Typical application areas were: selection and evaluation of technologies, evaluation of enterprises and business partners, personnel management and project evaluation. For the proposed project, specifically relevant are the approaches for combining qualitative and quantitative decision models (Bohanec et al. 92, 95, Rajkovič et al. 95). In this project, these methods will be upgraded, generalized and supported with appropriate methods.

References

Ashenhurst, R.L.: The decomposition of switching functions. Technical report, Bell Laboratories BL-1(11), 541-602, 1952.

Biermann, A.W., Fairfield, J., Beres, T.: Signature table systems and learning, IEEE Trans. System, Man, and Cybernetics 12(5), 635-684, 1982.

Bohanec, M., Rajkovič, V.: DEX: An expert system shell for decision support, Sistemica 1(1), 145-157, 1990.

Bohanec, M., Urh, B., Rajkovič, V.: Evaluating options by combined qualitative and quantitative methods, Acta Psychologica 80, 67-89, North-Holland, 1992.

Bohanec, M., Bratko, I.: Trading accuracy for simplicity in decision trees, Machine Learning 15, Kluwer Academic Publishers, 223-250, 1994.

Bohanec, M., Rajkovič, V., Semolič, B., Pogačnik, A.: Knowledge-based portfolio analysis for project evaluation, Information & Management 28, 293-302, Elsevier, 1995.

Cestnik, B., Kononenko, I., Bratko, I.: ASSISTANT 86: A knowledge-elicitation tool for sophisticated users, Progress in Machine Learning (eds. I. Bratko, N. Lavrač), Sigma Press, 1987.

Cestnik, B.: Estimating probabilities: A crucial task in machine learning, Proc. ECAI 90, 1990.

Karalič, A.: Linear regression in regression tree leaves, Proc. ISSEK 92, Bled, 1992.

Mallach, E.G.: Understanding decision support systems and expert systems, Irwin, 1994.

Michalski, R.: Understanding the nature of learning: issues and research directions, Machine Learning: An Artificial Intelligence Approach (eds. R. Michalski, J. Carbonell, T. Mitchell), Kaufmann, 3-25, 1986.

Michie, D.: Problem decomposition and the learning of skills, Lecture Notes in Artificial Intelligence 912 (eds. N. Lavrač, S. Wrobel), Springer, 17-31, 1995.

Muggleton, S.: Inductive Acquisition of Expert Knowledge, Addison-Wesley, 1990.

Muggleton, S., Buntine, W.: Machine invention of first-order predicates by inverting resolution, Machine Learning: An Artificial Intelligence Approach (ed. S. Muggleton), Academic Press, 1992.

Perkowski, M.A., et al.: Unified approach to functional decomposition of switching functions, Technical report, Warsaw University of Technology and Eindhoven University of Technology, 1995.

Ragavan, H., Rendell, L.: Lookahead feature construction for learning hard concepts, Proc. 10th International Machine Learning Conference, Morgan Kaufmann, 252-259, 1993.

Rajkovič, V., Bohanec, M., Leskošek, B., Kapus, V.: Zasnova ekspertnega sistema za usmerjanje šolske mladine v športne panoge, Računalniška analiza medicinskih podatkov CADAM-95 (ed. Lavrač, N.), IJS Scientific Publishing IJS-SP-95-1, 234-244, 1995.

Ross, T.D., Noviskey, M.J., Gadd, D.A., Goldman, J.A.: Pattern theoretic feature extraction and constructive induction, Proc. ML-COLT-94 Workshop on Constructive Induction and Change of Representation, New Brunswick, New Jersey, 1994.

Samuel, A.: Some studies in machine learning using the game of checkers, IBM Journal 3(3), 535-554, 1959.

Saaty, T.L.: Multicriteria decision making: The analytic hierarchy process, RWS Publications, 1993.

Shapiro, A.D.: Structured induction in expert systems, Turing Institute Press with Addison-Wesley, 1987.

Urbančič, T., Kononenko, I., Križman, V.: Review of applications by Ljubljana Artificial Intelligence Laboratories, Technical Report DP-6218, Institut Jožef Stefan, Ljubljana, 1991.

Work Programme

The primary goal of this project is to develop methods for an active support of designers, experts and decision makers in developing hierarchical models. Special attention will be paid to methods for (1) automatic and supervised learning of model structure from examples, and (2) methods for development and modification of hybrid qualitative and quantitative models. We wish to develop these methods to the level that would allow their application in realistic problems, i.e., problems with about 20 attributes, involving incomplete and noisy data, and requiring a combination of categorical and numerical attributes. With respect to the automation of model development process, we will consider four categories of methods in the range from manual to completely automated. We also intend to (1) integrate methods from these four categories, (2) develop prototype supporting computer programs, and (3) verify and evaluate the methods in artificial and realistic problem domains.

1. Methods for automatic learning of models from examples

In this category of methods, we formulate the problem in terms of machine learning: given a set of learning examples, represented as attribute-value vectors and classified into one of the predefined classes, find a hierarchical model that represents the examples. This process requires the construction of intermediate concepts (model variables), development of the hierarchy of variables, and learning of operators (functions, tables, rules) that interrelate the variables. We take the approach of function decomposition, where the goal is to decompose a function y=F(x1,...,xn) into y=G(A,H(B)). Here, x1,...,xn represent input attributes, y the target concept, and F, G and H functions represented as tables. A and B are subsets of input attributes such that their union incorporates all the input attributes. The joint complexity of the newly developed functions G and H should be lower than the complexity of F. The functions G and H are developed by the algorithm, and are not predefined in any way. The decomposition can be applied recursively on H and G, so that the result of this process is a hierarchical model.

The most important problems with existing methods of this type are: (1) high time complexity, (2) inability to cope with multi-valued attributes, and (3) high sensitivity to incomplete and/or noisy learning examples. In this project we intend to:

generalize the approach to multi-valued attributes,
investigate heuristic methods to improve the efficiency of algorithms for
- finding sets A and B: in order to guide and bound the search, we will investigate various measures that are used in existing machine learning algorithms and which can be computed efficiently: informativity, Gini index, Relief, and MDL.
- development of functions G and H: the basic idea is to try sub-optimal, but efficient, algorithms for graph coloring,
propose robust measures of complexity of machine-developed models,
develop methods for the identification of redundant attributes and/or their values,
introduce the capability of dealing with continuous attributes based on interval logic and discretization of variables,
develop methods for learning from noisy examples based on:
- pre-processing of learning examples,
- gradual elimination of noise during the development of functions G and H.

2. Methods for supervised learning of models from examples

Supervised methods extend their automatic counterparts in the way that they can be interrupted in particular steps and interactively controlled by an expert. By this, the expert can possibly guide the development process towards a more suitable or more comprehensible structure, which may also improve the classification accuracy of the final model. We plan to provide the following controlling capabilities:

giving names to newly discovered model variables and their values,
guiding the selection of attributes in the sets A and B,
determining the number of discrete values that can be assigned to a newly discovered variable,
partial definition of the function H, taking into account the degrees of freedom left by the selection of A and B, and available learning examples.

3. Methods for the development of model operators and functions based on known model structure

The learning problem in this case is: given a structure of variables (sets A and B), find the functions G and H. This is a subproblem of the automatic model learning. Since the methods of this type are known and well-developed (ref. Samuel, Biermann, Shapiro), we will not develop them further in this project. However, we intend to use them within automatic and supervised learning algorithms, and also provide them as a part of an integral solution.

4. Methods for manual development and reorganization of hybrid qualitative and quantitative models

With these methods, we intend to provide an extended set of tools for manipulating hierarchical models in the stages of their initial development and later reorganization of their components. The main goal is to extend the expressive power of models based on:

integration of numerical and symbolic variables within a model,
using probabilistic and other types of distributions to represent uncertain attributes' values,
extension of operators available to the designer: in addition to common operators used in multi-attribute decision making (i.e., various types of utility functions), the operators will include decision trees and rules.

For this purpose, we propose to:

define a formal description language for such extended hierarchical models,
develop methods for supporting the designer in the tasks of:
- model structure development,
- acquisition of operators, and
- dynamic reorganization of models,
combine the above methods with the ones developed for automatic and supervised learning of models, and
implement a prototype object-oriented supporting system.

All the methods developed in the project will be evaluated on artificial and real-world problems. The latter will specifically include:

problems of structured induction (Shapiro), where we will try to reconstruct by automatic learning the original manually-developed models,
multi-attribute decision problems, already solved by DEX: again, the aim is to compare the reconstructed models with the original ones,
medical domains (diagnosis and prognosis, neurobiology): the developed models will be evaluated with respect to classification accuracy, comprehensibility, and relevance of the newly discovered concepts,
dynamic system control: in addition to the above criteria, the developed models will be evaluated with respect to the successfulness of their real-time control of a dynamic system.

Timescale

1997

generalization of function decomposition methods to multi-valued attributes,
definition of measures for estimating the complexity of hierarchical models,
investigation of heuristic methods in function decomposition in the stages of finding the sets A and B, and functions G and H,
design of control options available to the designer in supervised learning,
definition of static and dynamic methods for manual model development,
development of a formal language for the description of extended hierarchical models,
evaluation of the developed methods using some artificially-generated problems and realistic decision models developed by DEX.

1998

learning from noisy examples based on the pre-processing of examples,
identification of redundant attributes and/or their values in the learning process,
dealing with continuous input attributes,
dynamic reorganization of hierarchical models.
evaluation of these methods on realistic problems of dynamic systems control.

1999

development of methods for eliminating noise during the construction of functions G and H,
integration of the approaches to automatic, supervised and manual model development,
implementation of a prototype supporting computer program,
evaluation of the integral approach in the fields of medicine and neurobiology.

Evaluation Criteria

time complexity of the developed heuristic algorithms, which should be sufficiently low to facilitate learning in problem domains with about 20 attributes, and should cope with incomplete and/or noisy data.
quality of various model components, developed by fast heuristic algorithms, which should not differ substantiallty from the quality achieved by optimal but inefficient algorithms,
classification accuracy and comprehensibility of models developed by automatic and supervised methods,
relevance of newly discovered concepts,
difference between original manually-developed DEX models and the models reconstructed from examples by automatic and supervised methods,
the level of expressiveness and flexibility of hybrid models.

Project Team

Assist. Prof. Dr. Marko Bohanec, Principal Investigator
Prof. Dr. Ivan Bratko
Dr. Bojan Cestnik
Assoc. Prof. Dr. Matjaž Gams
Prof. Dr. Vladislav Rajkovič
Dr. Tanja Urbančič
Dr. Blaž Zupan