MaCoCu

Massive collection and curation of monolingual and bilingual data: focus on under resourced languages

Obsežno zbiranje in kuriranje eno- in dvojezičnih podatkov s poudarkom na manj podprtih jezikih

https://macocu.eu/

No. of contract:

2278341

Type of project:

CEF | EU Projects

Duration:

from 01.06.2021 to 30.09.2023

Contact:

Nikola Ljubešič

Areas:

Language Tehnologies and Digital Humanities

This Action aims to improve machine translation output quality by extending and enhancing the quality of the data sets, especially for specific under-resourced languages. The Action builds upon previous CEF-funded Actions ParaCrawl and EuroPat, H2020 project ‘GoURMET’ and the FP7 MSCA project ‘Abu-MaTran’.

Within the Action, new monolingual and parallel data will be acquired and enriched for the following under-resourced languages: Maltese, Slovenian, Croatian, Bulgarian, Turkish, Serbian, Montenegrin, Macedonian, Albanian and Icelandic. Text classification will be used to identify the appropriateness of parallel and monolingual data for the ten DSI categories for which the ELRC repository contains data: e-Health, e-Justice, Online Dispute Resolution, Europeana, Open Data Portal, Business Registers Interconnection System, e-Procurement, Safer Internet, Cybersecurity, and EESSI.

As a result, the Action will extend the data in ELRC-Share and focus on DSI-specific data to align with the automated production and configuration of text translation engines tailored to the needs of online public services in specific domains. Finally, by enriching the data, the Action will contribute to the collection of language resources through ELRC-SHARE to improve the quality of the machine translation services offered by CEF AT.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.