Formant combinatorics in Slovenian
Kombinatorika besedotvornih obrazil v slovenščini

Slovenian, like other Slavic languages, is characterized by an extremely rich morphemic structure of words, which is a result of multistage word formation; for example, in the first stage, the adjective mlad ‘young’ yields the noun mladost ‘youth’, which in turn yields the adjective mladosten ‘youthful’ in the second stage, which in turn yields the noun mladostnik ‘adolescent’ in the third stage, which in turn yields the possessive adjective mladostnikov ‘adolescent’s’ in the fourth stage. This example shows the compatibility of four suffixal formants: -ost + -en + -ik + -ov. The compatibility of formants is considered to be the ability of different word-formation morphemes to coexist within multistage formation, taking into account the semantic-extension aspect.

The proposed project focuses on the analysis and description of the compatibility of wordformation morphemes (formants) within multistage formation (mlad → mladost → mladosten → mladostnik → mladostnikov), which will generate a new research field in Slovenian linguistics: morphotactics. This is an innovation because this field does not yet exist in Slovenian linguistics. Such an analysis – by determining (a) the systemic predictability of formation in terms of the compatibility of suffixal formants and (b) its limitations – will make it possible to present the characteristics of word-formation and semantic-extension mechanisms of Slovenian, on contemporary language material, including all contemporary Slovenian dictionaries and corpora, by integrating the most state-of-the-art research methods in linguistics and language technologies (including deep learning). In so doing, we will perform the first comprehensive analysis of Slovenian word formation in 70 years.

The language technology objective of the project is a pioneering compilation of the first training set and the first language technology application allowing automatic
morpheme segmentation of Slovenian words. This is also of key importance for the development of semantic language resources and language technologies for Slovenian, and it is of course undoubtedly important for linguistics as well.