J2-9180

JOS - Linguistic annotation of Slovene language: methods and resources

Jezikoslovno označevanje slovenskega jezika: metode in viri

https://www.sicris.si/public/jqm/prj.aspx?lang=eng&opdescr=search&opt=2&subopt=400&code1=cmn&code2=auto&psize=1&hits=1&page=1&count=&search_term=odsek%20za%20umetno%20inteligenco&id=5344&slng=&order_by=

No. of contract:

J2-9180

Type of project:

Basic ARIS Projects | National Projects

Duration:

from 01.01.2007 to 31.12.2009

Contact:

Tomaž Erjavec

Areas:

Language Tehnologies and Digital Humanities

The project will develop automatic inductive methods and tools for morphosyntactic, syntactic and semantic annotation, which will be used for building manually corrected and publicly accessible Slovene language resources, namely annotated corpora and lexicons. These results will provide the urgently needed infrastructure for further development of language technologies for Slovene. As these resources will be accessible not only to the project members, but to any research team in Slovenia and abroad, they are expected to act as a catalyst for R&D in the field of language technologies for the Slovene language, an area that is of vital importance for effective use of Slovene in the Information Society. The project comprises four work packages. The first horizontal work package addresses technical and legal aspects of resource accessibility, i.e. making resources available to developers for use as learning and testing datasets, and to linguists for research on Slovene. The remaining three work packages are concerned with three levels of linguistic analysis. The first is morphosyntactic tagging and the related lemmatization, which is the basic level of annotation indispensable to virtually every language-oriented computer program; the project will improve on existing methods and produce an annotated corpus, manually checked for errors. The second level comprising automatic syntactic analysis is of key importance for in-depth text analyses, since it reveals the interdependence of syntactic units. The project will produce a syntactically annotated corpus and a valency lexicon, both hand corrected, and a syntactic parser for Slovene. The last level deals with lexical semantics of Slovene, needed e.g. in machine translation and information search. The project will upgrade the existing semantic lexicon (ontology) for Slovene, annotate a corpus using concepts from this lexicon and develop methods for automatic ontology building and disambiguation of polysemous lexemes. The project will draw on ample experience of the project partners in the development of Slovene language resources and machine learning. The point of departure will be the morphosyntactically annotated reference corpus Fida PLUS, the syntactically annotated prototype corpus SDT and the prototype semantic lexicon sloWNet. Work in the project will be closely tied to simultaneous Slovene and EU projects concerned with the development of machine learning methods for machine translation and ontology building.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.