data-mining-course

An undergraduate course on data mining.

This project is maintained by chatox

List of theory topics

:construction: These materials should not be considered final until the end of the course. Materials from previous editions can be found in other branches of the repository for the course.

There are 11 theory sessions of 2 hours each. They will all take place face-to-face. Please bring your laptop.

Before each class, there are short videos you should watch. They are up to 20 minutes in total, and watching them requires some preparation/scheduling on your part. Please set aside time in your schedule to watch these videos before coming to class, ideally on the day before.

During class, I will present the contents using slides and we will do together some exercises using Nearpod or Google Spreadsheets. Please avoid distractions: place your phone in airplane mode, close all other windows in your computer, and try to stay focused. We will pause frequently during the session to help you regain focus. In one of the sessions, a midterm exam will be taken, and at the end of the course, a final exam will be taken. The exam questions are based exclusively on the materials shown or discussed in the lectures during class.

After each session, there is some reading for you to do. These readings will be much easier after you have attended each lecture, will bring depth to what you learn in class, and will help you remember these contents better. Think of these readings as a form of continuous studying that will save you time and effort when preparing for the exams.

Session 1: Introduction

Before class

During class

After class

Optional/additional material

Session 2: Data cleaning

Before class

During class

After class

Optional/additional material

Session 3: Near duplicates

Before class

During class

After class

Optional/additional material

Session 4: Itemsets

Before class

During class

After class

Session 5: Association rules mining

Before class

During class

After class

Optional/additional material

Session 6: Mid-term exam (Wednesday October 25th, 2023)

Before class

Study on your own TT01-TT08, TT11-TT13, try to solve exams from past years. Ask your questions in the forum.

The exam will not include TT09, TT10 or TT14.

During class

We will have a mid-term exam, with no class after the mid-term.

Session 7: Recommender systems

Before class

During class

After class

Session 8: Recommender systems (cont.) + Outlier analysis

Before class

During class

After class

Optional/additional material

Session 9: Outlier analysis (cont.) + Data streams

Before class

During class

Note for 2022 – need to finish outlier detection on this class - isolation forest - for practice sessions to have the necessary background.

After class

Optional/additional material

Session 10: Streams (cont.) + Time series mining

Before class

During class

Note for 2022 – need to finish probabilistic counting for practice session on probabilistic counting to have the necessary background

After class

Optional/additional material

Session 11: Time series mining (cont.)

Before class

During class

Note for 2023 – need to finish forecasting for practice session on temperature prediction to have the necessary background

After class

Final exam (Wednesday, December 13th, 2023)

The final exam will include recommender systems, outlier analysis, data streams, and forecasting: topics TT16-TT25, TT27-TT29; it will not include topic TT26.

Notes

Note that session numbers are approximate and subject to change.

Slides available under a Creative Commons license unless specified otherwise.

Main bibliography

:blue_book: Data Mining, The Textbook (2015) by Charu Agrawal. ISBN 978-3-319-14142-8. Free Download

:ledger: Mining of Massive Datasets SECOND EDITION (2014) by Leskovec et al. ISBN 978-1107077232. Online materials: http://www.mmds.org/. Free Download

Additional bibliography

:orange_book: Introduction to Data Mining SECOND EDITION (2019) by Tan et al. ISBN 978-0-13-312890-1. Online materials: https://www-users.cs.umn.edu/~kumar001/dmbook/index.php

:blue_book: Data Mining and Machine Learning SECOND EDITION (2020) by Zaki and Meira. ISBN 978-1108473989.

:notebook: Data Mining Concepts and Techniques THIRD EDITION (2011) by Han et al. ISBN 978-0123814791.