CC resources
ICCC corpus from years 2010 to 2015
Proceedings - one folder per year , one document per article,
references are delimited by sign </references_biblio>.
Proceedings in named lined document format: .txt file - one line per document,
starting by ID, followed by proceedings year, articles
without reference sections.
Terminology
CC terminology- top 1500 single and multiword terms
CC terminology- top 1500 multiword terms
Ontologies
Ontologies were constructed using OntoGen tool by Fortuna et al.
Initial onology (.rdf, .png): completely automated clustering of ICCC documents
Named clusters (.rdf, .png): initial onology by manual naming of clusters
Improved (.rdf,.png): semi automated ontology (combined automated constrction and
manual improvement)
Years (.rdf,.png): characteristic and
distinctive keywords by years (category names: first three dictinctive
(SVM) keywords)