Data and Text Mining
Course
Scheduled lectures and course materials in school year 2019/2020
21 OCT 2019 | 17:00 – 19:00 | Lavrač |
23 OCT 2019 | 17:00 – 19:00 | Data mining Kralj Novak |
4 NOV 2019 | 17:00 – 19:00 | Lavrač |
6 NOV 2019 | 17:00 – 19:00 | Kralj Novak |
11 NOV 2019 | 17:00 – 19:00 | Lavrač |
13 NOV 2019 | 17:00 – 19:00 | Cestnik |
20 NOV 2019 | 17:00 – 19:00 | Cestnik |
25 NOV 2019 | 17:00 – 19:00 | Kralj Novak |
27 NOV 2019 | 17:00 – 19:00 | Cestnik |
2 DEC 2019 | 17:00 – 19:00 | Kralj Novak |
16 DEC 2019 | 17:00 – 19:00 | Kralj Novak |
13 JAN 2020 | 17:00 – 19:00 | Kralj Novak |
11 FEB 2020 | 15:00 – 17:00 | Kralj Novak - Data mining (partial written exam) |
24 FEB 2020 | 17:00 – 19:00 | Lavrač - Data mining seminar presentations (partial exam) |
16 MAR 2020 | 15:00 – 19:00 | Mladenić |
23 MAR 2020 | 15:00 – 19:00 | Mladenić |
6 APR 2020 | 15:00 – 19:00 | Mladenić |
18 MAY 2020 | 15:00 – 19:00 | Mladenić |
The course is divided into three parts:
- data preprocessing (lectured by prof. dr. Bojan Cestnik),
- data mining (lectured by prof. dr. Nada Lavrač and doc. dr. Petra Kralj Novak),
- text mining (lectured by prof. dr. Dunja Mladenić).
The rest of this page focuses on the data mining part of the course.
Materials:
- Nada Lavrac:
- Petra Kralj Novak:
--- Nov. 6, 2019 ---
- Lecture notes from Nov. 6, 2019
- Decision trees (.pdf)
- Entropy and information gain (.pdf)
- Hands on: Orange workflow on language bias of decision trees (.ows) and data (.csv)
- Home assignments: On slides 13 and 20 of the lecture notes
--- Nov. 25, 2019 ---
- Lecture notes from Nov. 25, 2019
- Homework assignments
--- Dec. 2, 2019 ---
- Lecture notes from Dec. 2, 2019, updated on Dec. 19, 2019
- Naive Bayes
- Homework assignments
--- Dec. 16, 2019 ---
- Lecture notes from Dec. 16, 2019, updated on Dec. 19, 2019 - Clustering
- Homework assignments
--- Jan. 13, 2020 ---
- Lecture notes from Jan. 13, 2020 - Accosiation rules
- Association rules
- Lecture notes from Jan. 13, 2020 - Artificial neural networks
Practice materials: dr. Petra Kralj Novak
Literature:
- Bramer, Max: Principles of Data Mining (2007)
- Liaw, Andy, and Wiener, Matthew: "Classification and regression by randomForest" R news 2.3 (2002): 18-22.
- Loh, Wei‐Yin: Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1.1 (2011): 14-23.
- Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874.
Course requirements, data mining part:
- Attending lectures, practical exercises and theory exercises
- Written exam
- Seminar: half a page proposal, 4 pages written report, oral presentation, all in English
- Written exam
- Data mining seminar
- Half a page seminar proposal on written exam day
- Data analysis of your own data (Orange or scikit-learn recommended)
- Deliver a 4 pages written report (printed and electronic copy)
in Information Society paper format
on seminar presentations day (use paper template and guidelines)
- Oral presentation of seminar results
(10 minutes for presentation + 5 minutes discussion,
use slides template)
Examples of data mining seminars:
- Janez Bucik (.pdf)
Microsoft stock quotes dependency analysis
- Valentin Koblar (in Slovene .pdf)
Napoved menjalnega tečaja ameriškega dolarja na podlagi menjalnih
tečajev tujih valut
Ideas for seminars
- Analyze some data where you are the domain expert, use at least two algorithms
- Find some interesting data to analyze, possible sources:
Templates
- presentation template (.pot)
- paper template (.doc)
- paper guidelines (.doc)
Useful links
Link to last year's web page - Data Mining and Knowledge Discovery 2018/2019
Last update: January 27, 2020