Senja Pollak, PhD.
I am an assistant professor of Language Technologies and work as a researcher at the Department of Knowledge Technologies, Jožef Stefan Institute (JSI). My research interests include natural language processing, text mining, corpus linguistics, and computational creativity. I also teach language technologies and computational creativity at the Jožef Stefan International Postgraduate School. From 2018 to 2019, I was a research fellow at the Usher institute of the University of Edinburgh.
I am the leader of national project CANDAS, co-leader of project RobaCOFI funded by AI4Media, and institutional lead of national projects SOVRAG, KOBOS, and FORMICA2. I was the coordinator of the H2020 project EMBEDDIA (12 partners, budget 3 mio eur, 2019-2022), led industrial projects with companies Kliping and Iolar, and participated in many European and national projects, including SAAM, MUSE, WHIM, ConCreTe, TermFrame, Janes.
I have served in conference organizing committees (co-chair of SLSP 2019, Hackashop on news media content analysis and automated report generation at EACL 2021, BSNLP at EACL 2021, Digital humanities and natural language processing at PROPOR 2020 …), program committees (ICCC, JTDH, SYNASC2021 …) or served there as a reviewer (LREC, ACL-IJCNLP 2021 …).
My work was published in several conferences and journals including Computational Linguistics, Computer Speech, and Language, Natural Language Engineering, Terminology, Language Resources and Evaluation, International Journal of Lexicography. For more details, check GoogleScholar.
I am (co-)supervising 7 PhD students (1 completed, 6 ongoing) and 3 MSc students (1 ongoing, 2 completed).
2014― now: researcher at Jožef Stefan Institute
Domains of interest
natural language processing, language technologies, text mining, corpus linguistics, computational creativity, digital humanities
Selected publications
Selected Journal Papers
- MARTINC, Matej, POLLAK, Senja, ROBNIK ŠIKONJA, Marko (2021). Supervised and unsupervised neural approaches to text readability. Computational linguistics. Vol. 47, no. 1, str. 141-179. ISSN 0891-2017. DOI: 10.1162/coli_a_00398
- KOLOSKI, Boshko, STEPIŠNIK PERDIH, Timen, ROBNIK ŠIKONJA, Marko, POLLAK, Senja, ŠKRLJ, Blaž (2022). Knowledge graph informed fake news classification via heterogeneous representation ensembles. Neurocomputing. ISSN 0925-2312. DOI: 10.1016/j.neucom.2022.01.096.
- MARTINC, Matej, ŠKRLJ, Blaž, POLLAK, Senja (2021). TNT-KID : transformer-based neural tagger for keyword identification. Natural language engineering. ISSN 1469-8110. DOI: 10.1017/S1351324921000127.
- PELICON, Andraž, SHEKHAR, Ravi, ŠKRLJ, Blaž, PURVER, Matthew, POLLAK, Senja (2021). Investigating cross-lingual training for offensive language detection. PeerJ computer science. DOI: 10.7717/peerj-cs.559.
- MARTINC, Matej, HAIDER, Fasih, POLLAK, Senja, LUZ, Saturnino (2021). Temporal integration of text transcripts and acoustic features for Alzheimer’s diagnosis based on spontaneous speech. Frontiers in aging neuroscience. DOI: 10.3389/fnagi.2021.642647.
- PELICON, Andraž, PRANJIĆ, Marko, MILJKOVIĆ, Dragana, ŠKRLJ, Blaž, POLLAK, Senja (2020). Zero-shot learning for cross-lingual news sentiment classification. Applied sciences 10(17). https://dx.doi.org/10.3390/app10175993
- HAIDER, Fasih, POLLAK, Senja, ALBERT, Pierre, LUZ, Saturnino (2020). Emotion recognition in low-resource settings : an evaluation of automatic feature selection methods. Computer speech & language. https://dx.doi.org/10.3390/app10175993
- ŠKRLJ, Blaž, MARTINC, Matej, KRALJ, Jan, LAVRAČ, Nada, POLLAK, Senja (2020). tax2vec : constructing interpretable features from taxonomies for short text classification. Computer speech & language. https://dx.doi.org/10.1016/j.csl.2020.101104
- POLLAK, Senja, GANTAR, Polona, ARHAR HOLDT, Špela (2019). What’s new on the internetz? Extraction and lexical categorisation of collocations in computer-mediated Slovene. International journal of lexicography. https://dx.doi.org/10.1093/ijl/ecy026
- POLLAK, Senja, COESEMANS, Roel, DAELEMANS, Walter, LAVRAČ, Nada (2011). Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining. Pragmatics, vol. 21, no. 4, p. 647-683 (paper received the Award of the Slovenian Research Agency for Contributions to Humanities). https://www.jbe-platform.com/content/journals/10.1075/prag.21.4.07pol
Selected Conference Papers
- EVKOSKI, Bojan, POLLAK, Senja (2023). XAlin Computational Linguistics: Understanding Political Leanings in the Slovenian Parliament. In: Proceedings of 10th Language & Technology Conference:Human Language Technologies as a Challenge for Computer Science and Linguistics, April 21-23, 2023, Poznań, Poland, str. 56-61. (Best student paper award)
- KOLOSKI, Boshko, POLLAK, Senja, ŠKRLJ, Blaž, MARTINC, Matej (2022). Out of thin air : is zero-shot cross-lingual keyword detection better than unsupervised?. Proc. of LREC 2022 Language Resources and Evaluation Conference. http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.42.pdf.
- MARTINC, Matej, KRALJ NOVAK, Petra, POLLAK, Senja (2020). Leveraging contextual embeddings for detecting diachronic semantic shift. Proceedings of LREC 2020 Language Resources and Evaluation Conference. http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.592.pdf
- ARMENDARIZ, Carlos S., PURVER, Matthew, ULČAR, Matej, POLLAK, Senja, LJUBEŠIĆ, Nikola, ROBNIK ŠIKONJA, Marko, GRANROTH-WILDING, Mark, VAIK, Kristiina (2020). CoSimLex : a resource for evaluating graded word similarity in context. Proceedings of LREC 2020 Language Resources and Evaluation Conference. http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.720.pdf
- POLLAK, Senja, REPAR, Andraž, MARTINC, Matej, PODPEČAN, Vid (2019). Karst exploration : extracting terms and definitions from Karst domain corpus. Electronic lexicography in the 21st century : proceedings of eLex 2019. https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_53.pdf
- MARTINC, Matej, ŠKRJANEC, Iza, ZUPAN, Katja, POLLAK, Senja (2017). PAN 2017: author profiling – gender and language variety prediction : notebook for PAN at CLEF 2017. Conference and Labs of the Evaluation Forum. http://ceur-ws.org/Vol-1866/paper_78.pdf
- MARTINS, Pedro, URBANČIČ, Tanja, POLLAK, Senja, LAVRAČ, Nada, CARDOSO, Amilcar (2015). The good, the bad, and the AHA! Blends. Proceedings of the Sixth International Conference on Computational Creativity, ICCC 2015. http://axon.cs.byu.edu/ICCC2015proceedings/7.3Martins.pdf
- VERHOEVEN, Ben, ŠKRJANEC, Iza, POLLAK, Senja (2017). Gender profiling for Slovene Twitter communication : the influence of gender marking, content and style. Proceedings of The 6th Workshop on Balto-Slavic Natural Language Processing at ACL 2017. http://bsnlp-2017.cs.helsinki.fi/bsnlp2017-book.pdf
- FIŠER, Darja, POLLAK, Senja, VINTAR, Špela (2010). Learning to mine definitions from Slovene structured and unstructured knowledge-rich resources. Proceedings of LREC 2010. http://www.lrec-conf.org/proceedings/lrec2010/pdf/141_Paper.pdf
Current projects:
- FORMICA 2: Quantitative and qualitative analysis of the unregulated corporate financial reporting (Kvantitativna in kvalitativna analiza nereguliranih delov finančnega poročanja podjetij), PI at JSI
- Development of natural language processing prototype for sentiment analysis, keyword detection and clustering of news articles, project leader
- Formant combinatorics in Slovenian (Kombinatorika besedotvornih obrazil v slovenščini), PI at JSI
- RSDO: Development of Slovene in digital environment – language resources and technologies, task leader
- RobaCOFI: Robust and adaptable comment filtering, project co-leader
Past EU projects:
- EMBEDDIA: Cross-Lingual Embeddings for Less-Represented Languages in European News Media (EU H2020 RIA), project coordinator
- SAAM: Supporting Active Ageing through Multimodal coaching (EU H2020 RIA), task leader
- MUSE: Machine understanding for Interactive StorytElling, WP leader and co-PI at JSI;
- PROSECCO: Promoting the Scientific Exploration of Computational Creativity, co-PI at JSI;
- WHIM: The What-if Machine;
- ConCreTe: Concept Creation Technology
Past national projects (funded by the Slovenian Research Agency):
- SDM-Open-SLO (Semantic Data Mining for linked open data), researcher
- TERMFRAME: Terminology and Knowledge Frames across Languages (National research project), PI at JSI
- FORMICA: Influence of formal and informal corporate communications on capital markets, co-PI at JSI;
- JANES: Resources, Tools and Methods for the Research of Nonstandard Internet Slovene, WP leader;
- HinLife: Analysis of heterogeneous information networks for knowledge discovery in lifesciences, researcher
Past industrial projects:
- TermIOLAR 1 and TermoIOLAR 2 (Development of a prototype software solution to support semi-automatic terminology management and extraction in mono- and bilingual corpora; industrial projects for Slovene Language service provider Iolar), principal investigator
- Program committee member of: ICCC 2016-2023, JTDH2020-2022, REPROLANG 2020, ECIR 2019, SLATE 2016, 4REAL 2016-2018, ICCBR-ExpCrea 2015
- Organizing committee member of: DHandNLP 2020 (co-chair), SLSP 2019 (co-chair), ConCreTe workshop on Metaphor 2016, ICCC 2014, LTSP 2014, ESSLLI 2011
- Reviewer: LREC 2022, ACL-IJCNLP 2021, IJCAI-PRICAI 2020,
2014: PhD in Translation Studies (spec. Language Technologies), Department of Translation, Faculty of Arts, University of Ljubljana, Slovenia
Title: Semi-automatic domain modeling from multilingual corpora
advisor: Prof. Špela Vintar, co-advisor: Prof. Paola Velardi
2009: MSc in Computational Lingusitics, University of Antwerp, Belgium
Title: Text classification of press articles on Kenyan elections
advisor: Prof. Walter Daelemans
2007: BSc in French Language and Literature and BSc in Sociology of Culture, Faculty of Arts, University of Ljubljana, Slovenia
2004: BSc in Modern Languages – French Linguistics (maîtrise), University Paris 3 – Sorbonne Nouvelle, Paris, France
CC resources for download (Paper: SMAILOVIĆ, Jasmina, POLLAK, Senja (2011). Semi-automated construction of a topic ontology from research papers in the domain of language technologies. LTC’11, 5th Language & Technology Conference, Poznan, Poland, November 2011.)