data-mining-course

An undergraduate course on data mining.

This project is maintained by chatox

List of theory topics

:file_folder: This Google Drive Folder contains the slides for the 2024 lectures.

There are 11 theory sessions of 2 hours each. They will all take place face-to-face. Please bring your laptop.

Before each class, there are short videos you should watch. They are up to 20 minutes in total, and watching them requires some preparation/scheduling on your part. Please set aside time in your schedule to watch these videos before coming to class, ideally on the day before.

During class, I will present the contents using slides and we will do together some exercises using Nearpod or Google Spreadsheets. Please avoid distractions: place your phone in airplane mode, close all other windows in your computer, and try to stay focused. We will pause frequently during the session to help you regain focus. In one of the sessions, a midterm exam will be taken, and at the end of the course, a final exam will be taken. The exam questions are based exclusively on the materials shown or discussed in the lectures during class.

After each session, there is some reading for you to do. These readings will be much easier after you have attended each lecture, will bring depth to what you learn in class, and will help you remember these contents better. Think of these readings as a form of continuous studying that will save you time and effort when preparing for the exams.

:file_folder: This Google Drive Folder contains the slides for the 2024 lectures.

Session 1: Introduction

Before class

During class

After class

Optional/additional material

Session 2: Data cleaning

Before class

During class

After class

Optional/additional material

Session 3: Near duplicates

Before class

During class

After class

Optional/additional material

Session 4: Itemsets

Before class

During class

After class

Session 5: Association rules mining

Before class

During class

After class

Optional/additional material

Session 6: Mid-term exam (Tue October 22nd, 2024 08:30-10:30)

Before class

Study on your own TT01-TT09, TT11-TT14, try to solve exams from past years. Ask your questions in the forum.

The exam will not include TT10.

During class

We will have a mid-term exam, with no class after the mid-term.

Session 7: Recommender systems

Before class

During class

After class

Session 8: Recommender systems (cont.) + Outlier analysis

Before class

During class

After class

Optional/additional material

Session 9: Outlier analysis (cont.) + Data streams

Before class

During class

After class

Optional/additional material

Session 10: Streams (cont.) + Time series mining

Before class

During class

After class

Optional/additional material

Session 11: Time series mining (cont.)

Before class

During class

After class

Final exam (December 11th, 09:30-11:30)

The date of the final exam is fixed by the School of Engineering. Please check their webpage for potential changes.

The final exam will include recommender systems, outlier analysis, data streams, and forecasting: topics TT16-TT25, TT27-TT29; it will not include topic TT26.

Notes

:construction: Session numbers are approximate and subject to change. Materials should not be considered final until the end of the course.

Slides are available under a Creative Commons license unless specified otherwise.

Main bibliography

:blue_book: Data Mining, The Textbook (2015) by Charu Agrawal. ISBN 978-3-319-14142-8. Free Download

:ledger: Mining of Massive Datasets SECOND EDITION (2014) by Leskovec et al. ISBN 978-1107077232. Online materials: http://www.mmds.org/. Free Download

Additional bibliography

:orange_book: Introduction to Data Mining SECOND EDITION (2019) by Tan et al. ISBN 978-0-13-312890-1. Online materials: https://www-users.cs.umn.edu/~kumar001/dmbook/index.php

:blue_book: Data Mining and Machine Learning SECOND EDITION (2020) by Zaki and Meira. ISBN 978-1108473989.

:notebook: Data Mining Concepts and Techniques THIRD EDITION (2011) by Han et al. ISBN 978-0123814791.