We are excited to share that our department will be very well represented at the LREC-COLING 2024 conference – the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, taking place next week in Italy. We will be presenting 10 papers at the main conference and 4 papers at the workshops. Looking forward to it!
Papers at the main conference:
- Špela Arhar Holdt, Tomaž Erjavec, Iztok Kosem and Elena Volodina. Towards an Ideal Tool for Learner Error Annotation.
- Špela Arhar Holdt, Jaka Čibej, Kaja Dobrovoljc, Tomaž Erjavec, Polona Gantar, Simon Krek, Tina Munda, Nejc Robida, Luka Terčon and Slavko Žitnik. SUK 1.0: A New Training Corpus for Linguistic Annotation of Modern Standard Slovene.
- Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver and Senja Pollak. A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media.
- Filip Dobranić, Bojan Evkoski and Nikola Ljubešić. A Lightweight Approach to a Giga-Corpus of Historical Periodicals: The Story of a Slovenian Historical Newspaper Collection.
- Nikola Ljubešić and Taja Kuzman. CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation.
- Michal Mochtak, Peter Rupnik and Nikola Ljubešić. The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings.
- Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez and Antonio Toral. Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages.
- Andraž Pelicon, Mladen Karan, Ravi Shekhar, Matthew Purver and Senja Pollak. Denoising Labeled Data for Comment Moderation Using Active Learning.
- Marko Pranjić, Marko Robnik-Šikonja and Senja Pollak. LLMSegm: Surface-level Morphological Segmentation Using Large Language Model.
- Darinka Verdonik, Kaja Dobrovoljc, Tomaž Erjavec and Nikola Ljubešić. Gos 2: A New Reference Corpus of Spoken Slovenian.
Papers at the LREC-COLING workshops:
- Çağrı Çöltekin, Matyáš Kopp, Meden Katja, Vaidas Morkevicius, Nikola Ljubešić and Tomaž Erjavec: Multilingual Power and Ideology identification in the Parliament: a reference dataset and simple baselines. At ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (LREC-COLING).
- Nikola Ivačič, Matthew Purver, Fabienne Lind, Senja Pollak, Hajo Boomgaarden and Veronika Bajt. Comparing News Framing of Migration Crises using Zero-Shot Classification. At Reference, Framing, and Perspective 2024 Workshop (LREC-COLING)
- Asher de Jong, Taja Kuzman, Maik Larooij and Maarten Marx. ParlaMint Ngram viewer: Multilingual Comparative Diachronic Search Across 26 Parliaments. At ParlaCLARIN IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (LREC-COLING).
- Nikola Ljubešić, Vít Suchomel, Peter Rupnik, Taja Kuzman and Rik van Noord. Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining. At the 3rd Annual Meeting of the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL 2024) (LREC-COLING).