Machine learning (knowledge discovery) explores the algorithms of learning in general,
while data mining uses a variety of mostly automatic processes for analysing large amounts of data. In these areas, we focus on inductive, relational and constraint-based methods (databases,
inductive logic programs), meta-learning (
combining classifiers), subgroup discovery and
equation discovery. We have developed a
series of systems for learning
logic programs and various kinds of equations
(polynomial algebraic, difference and partial differential equations), learning both the
structure and the parameter values of equations.
Contact:
Nada Lavrač,
Sašo Džeroski
Projects - Hide details:
- H2020 RESILOC - Resilient Europe and Societies by Innovating Local Communities (Odporna evropska družba z inovativnimi lokalnimi skupnostmi)
Duration: 2019 - 2022 Contact: Aljaž Osojnik, Martin Žnidaršič Areas: Data mining and machine learning
- COST European Soil-Biology Data Warehouse for Soil Protection (Evropska zbirka talnih bioloških podatkov za varstvo tal)
Duration: 2019 - 2023 Contact: Marko Debeljak Areas: Decision support, Data mining and machine learning
- H2020 FNS-Cloud - Food Nutrition Security Cloud (Računalniški oblak in storitve za obdelavo podatkov iz področja ved o hrani, prehrani in varnosti)
Duration: 2019 - 2023 Contact: Nada Lavrač Areas: Data mining and machine learning
- Basic Research Project: SESAME - Automating the Synthesis and Analysis of Scientific Models (Avtomatizirana sinteza in analiza znanstvenih modelov)
Duration: 2019 - 2022 Contact: Sašo Džeroski Areas: Data mining and machine learning
- H2020 TAILOR - Foundations of Trustworthy AI Integrating Learning, Optimisation and Reasoning (Temelji umetne inteligence vredne zaupanja, vključno z učenjem, optimizacijo in sklepanjem)
Duration: 2020 - 2023 Contact: Sašo Džeroski Areas: Data mining and machine learning
- Basic Research Project: FORMICA 2 - Quantitative and qualitative analysis of the unregulated corporate financial reporting (Kvantitativna in kvalitativna analiza nereguliranih delov finančnega poročanja podjetij)
Duration: 2020 - 2023 Contact: Senja Pollak, Martin Žnidaršič Areas: Human language technologies, Data mining and machine learning
- Basic Research Project: Predictive clustering on data streams (Napovedno razvrščanje na podatkovnih tokovih)
Duration: 2020 - 2023 Contact: Sašo Džeroski Areas: Data mining and machine learning
- CRP Klinični potek in izid Covid-19
Duration: 2020 - 2022 Contact: Sašo Džeroski, Nada Lavrač Areas: Data mining and machine learning
- Basic Research Project: CRISPR/CAS9-mediated targeted mutagenesis for resistance of grapevine and potato against phytoplasmas (Ciljana mutageneza s CRISPR/CAS9 za odpornost vinske trte in krompirja proti fitoplazmam)
Duration: 2020 - 2023 Contact: Nada Lavrač Areas: Data mining and machine learning
- Basic Research Project: Determining the origin of liver metastases from liquid biopsy (Določanje izvora jetrnih zasevkov iz tekočinskih biopsij)
Duration: 2021 - 2024 Contact: Sašo Džeroski Areas: Data mining and machine learning
- Applied Research Project: Application of single cell sequencing and machine learning in mammary gland biology (Aplikacija sekvenciranja posameznih celic in strojnega učenja v biologiji mlečnih celic)
Duration: 2021 - 2024 Contact: Sašo Džeroski Areas: Data mining and machine learning
- Basic Research Project: Innovative isotopic techniques for identification of sources and biogeochemical cycling of mercury in contaminated sites - IsoCont (Inovativne izotopske tehnike za identifikacijo virov in biogeokemijskega kroženja živega srebra na kontaminiranih območjih - IsoCont)
Duration: 2021 - 2024 Contact: Sašo Džeroski Areas: Data mining and machine learning
- Basic Research Project: Intelligent inference system for biological discoveries and its application to cancer research (Inteligentni sistem sklepanja za biološka odkritja in njegova uporaba pri raziskavah raka)
Duration: 2022 - 2024 Contact: Sašo Džeroski Areas: Data mining and machine learning
- Research Programme: KT - Knowledge Technologies (Tehnologije znanja)
Duration: 2022 - 2027 Contact: Sašo Džeroski Areas: Data mining and machine learning, Decision support, Human language technologies
- HE PARC - Partnership for the Assessment of Risks from Chemicals (Partnerstvo za oceno tveganj zaradi kemikalij )
Duration: 2022 - 2029 Contact: Sašo Džeroski, Panče Panov Areas: Data mining and machine learning
|
Software - Hide details:
- CIPER - Constrained Inductive Polynomial Equation for Regression
Regression methods aim at inducing model of numeric data. While most state-of-the-art machine learning methods for regression focus on inducing piecewise regression models (regression and model trees), we investigate the predictive performance of regression models based on polynomial equations. We present Ciper, an efficient method for inducing polynomial equations and empirically evaluate its predictive performance on standard regression tasks.
- Lagrange/Lagramge
Lagrange and Lagramge are programs for inducing algebraic
and ordinary differential equations from observational
data. While Lagrange is completely data-driven approach
to inducing equations, Lagramge allows for knowledge-driven
induction, where user can tailor the space of candidate
equation structures according to the background knowledge
from the domain of interest.
- MLC4.5 and MLJ4.8
Learn to combine classifiers with meta decision trees.
- LINUS
ILP learning of constrained logic programs.
-
RSD
Relational Subgroup Discovery
through 1.st order feature construction.
The source code of the system, in Yap
Prolog, is available for download, with samples and a user manual.
- SEGS
SEGS (Search for Enriched Gene Sets) is a web tool for descriptive analysis of microarray data. The analysis is peformed by looking for descriptions of gene sets that are statistically significantly over- or under-expressed between different scenarios within the context of a genome-scale experiments (DNA microarray).
- CLUS
Clus is a decision tree and rule induction system that implements the predictive clustering framework. This framework unifies unsupervised clustering and predictive modeling and allows for a natural extension to more complex prediction settings such as multi-task learning and multi-label classification. While most decision tree learners induce classification or regression trees, Clus generalizes this approach by learning trees that are interpreted as cluster hierarchies. We call such trees predictive clustering trees or PCTs. Depending on the learning task at hand, different goal criteria are to be optimized while creating the clusters, and different heuristics will be suitable to achieve this.
|
Text mining, which aims at extracting useful information from document collections, is a well-developed field of computer science, driven by the growth of document collections available in corporate and governmental environments and especially on the Web. In many real-life scenarios, documents are also available in information networks. Examples of such networks include multimedia repositories (containing multimedia descriptions, subtitles, slide titles, etc.), social networks of professionals (containing CVs), citation networks (containing publications), and even software code (heterogeneously interlinked software artifacts containing code comments). The abundance of such document-enriched networks motivates the development of new methodologies that join the two worlds, text mining and mining heterogeneous information networks, and handle the two types of data in a common data mining framework. Handling vast document streams is a relatively new challenge emerging mainly from the self-publishing activities of Web users (e.g., blogging, twitting, and participating in discussion forums). Furthermore, news streams (e.g., Dow Jones, BusinessWire, Bloomberg, Reuters) are growing in number and rate, which makes it impossible for the users to systematically follow the topics of their interest. One of the challenges is thus to investigate techniques for online data mining, machine learning, and sentiment analysis, supporting decision making in near-real time over vast amounts of constantly evolving data.
Contact:
Miha Grčar,
Igor Mozetič
Projects - Hide details:
|
Software - Hide details:
- MLC4.5 and MLJ4.8
Learn to combine classifiers with meta decision trees.
-
RSD
Relational Subgroup Discovery
through 1.st order feature construction.
The source code of the system, in Yap
Prolog, is available for download, with samples and a user manual.
- SEGS
SEGS (Search for Enriched Gene Sets) is a web tool for descriptive analysis of microarray data. The analysis is peformed by looking for descriptions of gene sets that are statistically significantly over- or under-expressed between different scenarios within the context of a genome-scale experiments (DNA microarray).
|
Most of the information humans deal with consists of text, and Human
Language Technologies enable computers to help us exploit and manage this
information. Texts, in whatever language, need to be processed in
various ways, from ensuring uniform encoding, to complex linguistic
analyses such as assigning syntactic and semantic structure. Such methods find application in text mining, machine
translation, search engines, exploratory instruments for linguists and
lexicographers, digital publishing, etc. In this research area the
department is developing general methods for text processing and
mark-up, although with a special focus on the Slovene language. We are
especially concerned with the production of standardised and available
language resources, such as annotated mono- and multilingual corpora,
lexica, and complex digital editions, eg. of Slovenian literature (
ZRC eLibrary). While such resources can be
directly used for language study, they are, for the most part,
targeted towards the use of machine learning programs that automatically
induce various language models from the resources.
Contact:
Tomaž Erjavec
Projects - Hide details:
- Basic Research Project: The linguistic landscape of hate speech on social media (Jezikovna krajina sovražnega govora na družbenih omrežjih)
Duration: 2019 - 2023 Contact: Tomaž Erjavec Areas: Human language technologies
- RSDO - Razvoj slovenščine v digitalnem okolju
Duration: 2020 - 2022 Contact: Tomaž Erjavec Areas: Human language technologies
- Basic Research Project: FORMICA 2 - Quantitative and qualitative analysis of the unregulated corporate financial reporting (Kvantitativna in kvalitativna analiza nereguliranih delov finančnega poročanja podjetij)
Duration: 2020 - 2023 Contact: Senja Pollak, Martin Žnidaršič Areas: Human language technologies, Data mining and machine learning
- Basic Research Project: CANDAS - Computer-assisted multilingual news discourse analysis with contextual embeddings (Računalniško podprta večjezična analiza novičarskega diskurza s kontekstualnimi besednimi vložitvami)
Duration: 2020 - 2023 Contact: Senja Pollak Areas: Human language technologies
- Basic Research Project: Tradition and Innovation: Traditional Paremiological Units in Dialogue with Contemporary Use (Tradicionalne paremiološke enote v dialogu s sodobno rabo)
Duration: 2020 - 2023 Contact: Tomaž Erjavec Areas: Human language technologies
- CEF MaCoCu - Massive collection and curation of monolingual and bilingual data: focus on under resourced languages (Obsežno zbiranje in kuriranje eno- in dvojezičnih podatkov s poudarkom na manj podprtih jezikih)
Duration: 2021 - 2023 Contact: Nikola Ljubešić Areas: Human language technologies
- Basic Research Project: SOVRAG - Hate speech in contemporary conceptualizations of nationalism, racism, gender and migration (Sovražni govor v sodobnih konceptualizacijah nacionalizma, rasizma, spola in migracij)
Duration: 2021 - 2024 Contact: Senja Pollak Areas: Human language technologies
- Basic Research Project: Formant combinatorics in Slovenian (Kombinatorika besedotvornih obrazil v slovenščini)
Duration: 2021 - 2024 Contact: Senja Pollak, Tomaž Erjavec Areas: Human language technologies
- ParlaMint II - Towards Comparable Parliamentary Corpora (K primerljivim parlamentarnim korpusom)
Duration: 2021 - 2023 Contact: Tomaž Erjavec, Nikola Ljubešić Areas: Human language technologies
- Research Programme: KT - Knowledge Technologies (Tehnologije znanja)
Duration: 2022 - 2027 Contact: Sašo Džeroski Areas: Data mining and machine learning, Decision support, Human language technologies
- H2020 RobaCOFI - Robust and adaptable comment filtering (Robustno in prilagodljivo filtriranje komentarjev)
Duration: 2022 - 2023 Contact: Senja Pollak, Matthew Richard John Purver Areas: Human language technologies
|
Software - Hide details:
|
Decision Support (DS) aims to provide computational support to (groups of) people faced with difficult decisions. DS provides a rich collection of decision analysis, simulation, optimization and modeling techniques, including hierarchical multi-attribute models, decision trees, influence diagrams and belief networks. DS also involves software tools such as decision support systems, group decision support and mediation systems. We have developed a series of decision models and support systems, focusing on qualitative, multi-attribute decision making and models of uncertainty, necessary for capturing realistic aspects of complex decision problems. We continue to develop and expand our main software tool,
DEXi.
Contact:
Marko Bohanec,
Martin Žnidaršič
Projects - Hide details:
|
Software - Hide details:
- DEXi (DEX for Instruction)
An educational computer program for qualitative decision modelling (developed within Slovenian Ro (Computer Literacy) Programme; 1999-2000)
- proDEX
proDEX is a tool for qualitative multi-attribute modelling in basic and extended DEX methodology.
- GMOtrack
GMOtrack is a program that supports traceability of genetically modified organisms.
Given a table of GMOs (along with the probabilities of their presence and the
genetic elements present in their genome) GMOtrack computes the optimal set of
screening assays for a two-phase testing strategy.
- ECOGEN Soil Quality Index
ESQI is a qualitative multi-attribute model, developed within the ECOGEN project, that calculates an index of soil quality relative to a selected standard soil condition ("medium" value of attributes). The model is implemented in a server-side script, and accessed through an interactive Web page.
|