Analysis of Large Text Datasets, Project Page
This project aims at the development of new and improvement of the existing computer methods for the analysis of large text datasets. Special emphasis is put on the analysis of Slovenian text. The developed methods will enable automatic document categorization of Slovenian (and potentially also some other natural languages) text, adaptation of the existing methods for text-learning to Slovenian texts, analysis of text datasets based on the new, extended document representation and better Web browsing by using a personal browsing assistant based on the new text analysis methods. The development of different applications will be enabled, including automatic updating of some existing document categorizations that are currently updated manually.
This project is strongly related to the
Yahoo Planet project,
Personal WebWatcher project
and the PhD thesis project:
Machine Learning on non-homogeneous, distributed text data