Our department members will participate on the LREC-COLING 2024 conference


We are excited to share that our department will be very well represented at the LREC-COLING 2024 conference – the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, taking place next week in Italy. We will be presenting 10 papers at the main conference and 4 papers at the workshops. Looking forward to it! 

Papers at the main conference:

  • Špela Arhar Holdt, Tomaž Erjavec, Iztok Kosem and Elena Volodina. Towards an Ideal Tool for Learner Error Annotation.
  • Špela Arhar Holdt, Jaka Čibej, Kaja Dobrovoljc, Tomaž Erjavec, Polona Gantar, Simon Krek, Tina Munda, Nejc Robida, Luka Terčon and Slavko Žitnik. SUK 1.0: A New Training Corpus for Linguistic Annotation of Modern Standard Slovene.
  • Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver and Senja Pollak. A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media.
  • Filip Dobranić, Bojan Evkoski and Nikola Ljubešić. A Lightweight Approach to a Giga-Corpus of Historical Periodicals: The Story of a Slovenian Historical Newspaper Collection.
  • Nikola Ljubešić and Taja Kuzman. CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation.
  • Michal Mochtak, Peter Rupnik and Nikola Ljubešić. The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings.
  • Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez and Antonio Toral. Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages.
  • Andraž Pelicon, Mladen Karan, Ravi Shekhar, Matthew Purver and Senja Pollak. Denoising Labeled Data for Comment Moderation Using Active Learning.
  • Marko Pranjić, Marko Robnik-Šikonja and Senja Pollak. LLMSegm: Surface-level Morphological Segmentation Using Large Language Model.
  • Darinka Verdonik, Kaja Dobrovoljc, Tomaž Erjavec and Nikola Ljubešić. Gos 2: A New Reference Corpus of Spoken Slovenian.


Papers at the LREC-COLING workshops:

  • Çağrı Çöltekin, Matyáš Kopp, Meden Katja, Vaidas Morkevicius, Nikola Ljubešić and Tomaž Erjavec: Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines. At ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (LREC-COLING).
  • Nikola Ivačič, Matthew Purver, Fabienne Lind, Senja Pollak, Hajo Boomgaarden and Veronika Bajt.  Comparing News Framing of Migration Crises using Zero-Shot Classification. At Reference, Framing, and Perspective 2024 Workshop (LREC-COLING)
  • Asher de Jong, Taja Kuzman, Maik Larooij and Maarten Marx. ParlaMint Ngram viewer: Multilingual Comparative Diachronic Search Across 26 Parliaments. At ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (LREC-COLING).
  • Nikola Ljubešić, Vít Suchomel, Peter Rupnik, Taja Kuzman and Rik van Noord. Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining. At the 3rd Annual Meeting of the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL 2024) (LREC-COLING).