Senja Pollak

Senja Pollak, PhD.

J317, Jamova cesta 39


I am an assistant professor of Language Technologies and work as a researcher at the Department of Knowledge Technologies, Jožef Stefan Institute (JSI). My research interests include natural language processing, text mining, corpus linguistics, and computational creativity. I also teach language technologies and computational creativity at the Jožef Stefan International Postgraduate School. From 2018 to 2019, I was a research fellow at the Usher institute of the University of Edinburgh.

I am the leader of national project CANDAS, co-leader of project RobaCOFI funded by AI4Media, and institutional lead of national projects SOVRAG, KOBOS, and FORMICA2. I was the coordinator of the H2020 project EMBEDDIA (12 partners, budget 3 mio eur, 2019-2022), led industrial projects with companies Kliping and Iolar, and participated in many European and national projects, including SAAM, MUSE, WHIM, ConCreTe, TermFrame, Janes.

I have served in conference organizing committees (co-chair of SLSP 2019, Hackashop on news media content analysis and automated report generation at EACL 2021, BSNLP at EACL 2021, Digital humanities and natural language processing at PROPOR 2020 …), program committees (ICCC, JTDH, SYNASC2021 …) or served there as a reviewer (LREC, ACL-IJCNLP 2021 …).

My work was published in several conferences and journals including Computational Linguistics, Computer Speech, and Language, Natural Language Engineering, Terminology, Language Resources and Evaluation, International Journal of Lexicography. For more details, check GoogleScholar.

I am (co-)supervising 7 PhD students (1 completed, 6 ongoing) and 3 MSc students (1 ongoing, 2 completed).


2014― now: researcher at Jožef Stefan Institute

Domains of interest

natural language processing, language technologies, text mining, corpus linguistics, computational creativity, digital humanities

Open positions

Position for PhD in natural language processing (Slovenia-France). More info


Selected publications


Selected Journal Papers

  • MARTINC, Matej, POLLAK, Senja, ROBNIK ŠIKONJA, Marko (2021). Supervised and unsupervised neural approaches to text readability. Computational linguistics. Vol. 47, no. 1, str. 141-179. ISSN 0891-2017. DOI: 10.1162/coli_a_00398
  • KOLOSKI, Boshko, STEPIŠNIK PERDIH, Timen, ROBNIK ŠIKONJA, Marko, POLLAK, Senja, ŠKRLJ, Blaž (2022). Knowledge graph informed fake news classification via heterogeneous representation ensembles. Neurocomputing. ISSN 0925-2312. DOI: 10.1016/j.neucom.2022.01.096.
  • MARTINC, Matej, ŠKRLJ, Blaž, POLLAK, Senja (2021). TNT-KID : transformer-based neural tagger for keyword identification. Natural language engineering. ISSN 1469-8110. DOI: 10.1017/S1351324921000127.
  • PELICON, Andraž, SHEKHAR, Ravi, ŠKRLJ, Blaž, PURVER, Matthew, POLLAK, Senja (2021). Investigating cross-lingual training for offensive language detection. PeerJ computer science. DOI: 10.7717/peerj-cs.559.
  • MARTINC, Matej, HAIDER, Fasih, POLLAK, Senja, LUZ, Saturnino (2021). Temporal integration of text transcripts and acoustic features for Alzheimer’s diagnosis based on spontaneous speech. Frontiers in aging neuroscience. DOI: 10.3389/fnagi.2021.642647.
  • PELICON, Andraž, PRANJIĆ, Marko, MILJKOVIĆ, Dragana, ŠKRLJ, Blaž, POLLAK, Senja (2020). Zero-shot learning for cross-lingual news sentiment classification. Applied sciences 10(17).
  • HAIDER, Fasih, POLLAK, Senja, ALBERT, Pierre, LUZ, Saturnino (2020). Emotion recognition in low-resource settings : an evaluation of automatic feature selection methods. Computer speech & language.
  • ŠKRLJ, Blaž, MARTINC, Matej, KRALJ, Jan, LAVRAČ, Nada, POLLAK, Senja (2020). tax2vec : constructing interpretable features from taxonomies for short text classification. Computer speech & language.
  • POLLAK, Senja, GANTAR, Polona, ARHAR HOLDT, Špela (2019). What’s new on the internetz? Extraction and lexical categorisation of collocations in computer-mediated Slovene. International journal of lexicography.
  • POLLAK, Senja, COESEMANS, Roel, DAELEMANS, Walter, LAVRAČ, Nada (2011). Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining. Pragmatics, vol. 21, no. 4, p. 647-683 (paper received the Award of the Slovenian Research Agency for Contributions to Humanities).


Selected Conference Papers



Current projects:


Past EU projects:

  • EMBEDDIA: Cross-Lingual Embeddings for Less-Represented Languages in European News Media (EU H2020 RIA), project coordinator
  • SAAM: Supporting Active Ageing through Multimodal coaching (EU H2020 RIA), task leader
  • MUSE: Machine understanding for Interactive StorytElling, WP leader and co-PI at JSI;
  • PROSECCO: Promoting the Scientific Exploration of Computational Creativity, co-PI at JSI;
  • WHIM: The What-if Machine;
  • ConCreTe: Concept Creation Technology


Past national projects (funded by the Slovenian Research Agency):

  • SDM-Open-SLO (Semantic Data Mining for linked open data), researcher
  • TERMFRAME: Terminology and Knowledge Frames across Languages (National research project), PI at JSI
  • FORMICA: Influence of formal and informal corporate communications on capital markets, co-PI at JSI;
  • JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene, WP leader;
  • HinLife: Analysis of heterogeneous information networks for knowledge discovery in lifesciences, researcher


Past industrial projects:

  • TermIOLAR 1 and TermoIOLAR 2 (Development of a prototype software solution to support semi-automatic terminology management and extraction in mono- and bilingual corpora; industrial projects for Slovene Language service provider Iolar), principal investigator



  • Program committee member of: ICCC 2016-2023, JTDH2020-2022, REPROLANG 2020,  ECIR 2019, SLATE 2016, 4REAL 2016-2018, ICCBR-ExpCrea 2015
  • Organizing committee member of: DHandNLP 2020 (co-chair), SLSP 2019 (co-chair), ConCreTe workshop on Metaphor 2016, ICCC 2014, LTSP 2014, ESSLLI 2011
  • Reviewer: LREC 2022, ACL-IJCNLP 2021, IJCAI-PRICAI 2020,



2014: PhD in Translation Studies (spec. Language Technologies), Department of Translation, Faculty of Arts, University of Ljubljana, Slovenia
Title: Semi-automatic domain modeling from multilingual corpora
advisor: Prof. Špela Vintar, co-advisor: Prof. Paola Velardi
2009: MSc in Computational Lingusitics, University of Antwerp, Belgium
Title: Text classification of press articles on Kenyan elections
advisor: Prof. Walter Daelemans
2007: BSc in French Language and Literature and BSc in Sociology of Culture, Faculty of Arts, University of Ljubljana, Slovenia
2004: BSc in Modern Languages – French Linguistics (maîtrise), University Paris 3 – Sorbonne Nouvelle, Paris, France




CC resources for download (Paper: SMAILOVIĆ, Jasmina, POLLAK, Senja (2011). Semi-automated construction of a topic ontology from research papers in the domain of language technologies. LTC’11, 5th Language & Technology Conference, Poznan, Poland, November 2011.)