Jure Brence successfully defended his master’s degree


Jure Brence successfully defended his master’s degree titled Probabilistic grammar-based equation discovery.




In this thesis, we introduce novel methods for equation discovery (ED), based on the use of probabilistic grammars. ED and symbolic regression address the task of finding a symbolic mathematical model that best describes observed data. Models can be as simple as an algebraic equation or as complex as a system of differential equations. Traditionally, domain experts derive equations based on theory and use regression methods to estimate their parameters. ED methods seek to automate the identification of equation structure as well as its parameters. The advantage of discovering closed-form equations over black-box models, popular in machine learning, lies in their inherent interpretability and correspondence with domain theory.

Our methods focus on the use of probabilistic context-free grammars (PCFGs) as a tool for generating mathematical expressions, constraining the space of expressions and encoding background knowledge. We demonstrate that PCFGs parametrize the parsimony principle inherently and intuitively. Furthermore, in addition to the hard constraints imposed by CFG, PCFGs allow us to impose soft constraints on the search space of mathematical expressions. To aid analysis, we introduce a novel method for visualizing the search space of expressions, useful for any ED approach. We introduce a Monte-Carlo algorithm that enables the use of PCFGs in ED and perform extensive computational experiments using an established benchmark database. The results demonstrate that our approach can be used to discover equations, but performs worse than existing methods.

To improve the performance of ED, we introduce dimensional attribute grammars, an extension of PCFGs, that generate only dimensionally consistent mathematical expressions. Our computational experiments demonstrate the impact of dimensional consistency in ED, resulting in performance, comparable to state-of-the-art approaches in the field.

We further extend the ideas of attribute grammars into a general-purpose framework for encoding background knowledge. The framework relies on probabilistic attribute grammars to overcome the limitations of PCFGs in expressing complex types of background knowledge. We demonstrate the utility of the framework by designing and analyzing grammars that encode three different types of background knowledge: dimensional consistency, systems of differential equations for chemical kinetics, and systems of differential equations describing electronic circuits.