Ana Kostovska successfully defended her doctorate thesis

18/03/2025

Ana Kostovska successfully defended her doctorate thesis titled Representing and Exploiting Benchmarking Data for Optimisation and Learning.

Congratulations!

Abstract:

The rapid advancements in Machine Learning (ML) and Black-Box Optimization (BBO) have led to an increased reliance on benchmarking data for evaluating and comparing algorithms across diverse domain tasks. However, the effective exploitation of this data is hindered by challenges such as syntactic variability, semantic ambiguity, and lack of standardization. In this dissertation, we address these challenges by advocating for formal semantic representation of benchmarking data through the use of ontologies. By providing standardized vocabularies and ontologies, we improve knowledge sharing and promote data interoperability across studies in ML and BBO.

In the ML domain, focusing on multi-label classification (MLC), we design an ontology-based framework for semantic annotation of benchmarking data, facilitating the creation of MLCBench– a semantic catalog that enhances data accessibility and reusability. In the BBO domain, we introduce the OPTION (OPTImization algorithm benchmarking ONtology) ontology to formally represent benchmarking data, including performance data, algorithm metadata, and problem landscapes. This ontology enables the automatic integration and interoperability of knowledge and data from diverse benchmarking studies.

Building upon the semantically annotated benchmarking data, we conduct various empirical studies, including tasks such as algorithm performance prediction and automated algorithm selection (AAS). In the MLC domain, a data-driven AAS pipeline is proposed to exploit this MLC benchmarking data. We evaluate the predictive power of dataset metafeatures for AAS and explore various ML approaches– including regression, classification, and pairwise methods– to identify the most effective one.

In the BBO domain, we exploit benchmarking data about modular BBO algorithms to conduct a comprehensive analysis of how individual algorithm modules influence overall performance. We develop algorithm representations derived from performance and feature importance values, effectively linking algorithm behavior to problem landscape features. Using these representations, we also relate module configurations and performance, providing deeper insights into the impact of different modules on algorithm performance.

Furthermore, the semantically annotated benchmarking data on modular BBO optimization algorithms is used as a backbone for creating various knowledge graphs (KGs). The KGs are then examined for their predictive power in algorithm performance prediction. By applying scoring-based KG embedding methods and graph neural networks, we predict algorithm performance in transductive and inductive setups, respectively.

Overall, the contributions of this dissertation include the development of ontology-based frameworks for managing benchmarking data in the ML and BBO domains, the creation of semantic data catalogs, and novel methodologies for algorithm selection and performance prediction. By addressing challenges in representation and exploitation, this work advances both ML and BBO. It provides tools for improved data management and algorithm selection, as well as insights into algorithm behavior.