data-mining-course

An undergraduate course on data mining.

This project is maintained by chatox

Rules for course at UPF (2023)

1. Grades

Continuous evaluation will be based in the following elements:

To pass the course under continuous evaluation, all of the following must be true:

  1. Practice grade, A ≥ 5.0
  2. Theory grade, 0.4 B + 0.6 C ≥ 5.0
  3. Final grade, 0.5 A + 0.2 B + 0.3 C ≥ 5.0

If you fail to pass, you will have to take the resit exam. The resit exam replaces the theory grade (B and C in the list above).

1.1. Getting a grade in the practical sessions (PSxx, individual)

To obtain a grade in the practical session, you must:

Extra points might be added to your grade, allowing you to have up to 12 points (instead of 10) in some practice sessions, however your total practice grade is capped at 10 points.

Automatically the lowest grade in the practice sessions will be discarded, which allows you to skip one session and still get the maximum grade.

Not coming to a practice session or not delivering your work means a zero grade in that session, unless you can justify your absence to the teaching assistant (profesor de prácticas).

:unamused: Do not work alone and isolated during the practice session. You can prevent simple mistakes by talking to someone else. You can work in pairs, but each one should submit his/her own work individually and the works must be different.

:rage: Do not work in groups of three or more during the practice session. Feel free to exchange ideas with other students, but do not copy from others.

:warning: Copying from external sources without acknowledging them in your work, and copying the work from other person/group in your class, are considered by the university as serious misconduct (“falta grave”). The instructor will make a case for the university to sanction this serious misconduct, as per the university regulations, with a suspension from the university for a minimum of six months and a maximum of four years.

1.2. Getting a grade in the in-class tests or exam

In-class tests and exams are individual work.

Not coming to an in-class test means a zero grade in that in-class test, unless you can justify your absence to the professor (profesor de teoría).

:warning: Copying during a test/exam, knowingly facilitating the copy of others, elaborating, lending, or facilitating instruments for copying during an exam, are considered by the university as serious misconduct (“falta grave”). The instructor will make a case for the university to sanction this serious misconduct, as per the university regulations, with a suspension from the university for a minimum of six months and a maximum of four years.

2. Guidelines for submitting your work

2.1. Exams

The exam will be with pen and paper.

2.2. Practices and assignments

Each practice session and assignment specifies what you should deliver.

2.2.1. Delivering your report

Identify the authorship and date of each report with a paragraph on the first page, including:

All of your reports should end with the following statement:

I hereby declare that, except for the code provided by the course instructors, all of my code, report, and figures were produced by myself.

or, in the case of work done in pairs/groups (if any):

We hereby declare that, except for the code provided by the course instructors, all of our code, report, and figures were produced by ourselves.

These are some of the most common mistakes in reports, these deduct points from your grade:

  1. failing to include your name in the first page
  2. exceeding the number of pages
  3. having the required number of pages but with text that has no substance and is just filling up space
  4. including plots without a scale or without a label
  5. including screenshots instead of exporting the images
  6. copy-pasting tables formatted in ASCII
  7. delivering in the wrong format such as .docx
  8. delivering a report that is not understandable or does not look professional at all

2.2.2. Delivering your code

Your code is delivered as a self-contained Python notebook. This notebook should be readable and understandable on its own by a person familiar with the course’s topic. Think of the notebook as a report in which you tell a story, and tell that story well and professionally.

Remember to identify the authorship and date of your code. Include as many markdown cells between code cells as needed to explain what you are doing and what we are looking at.

Follow good programming practices:

These are some of the most common mistakes in code, these deduct points from your grade:

  1. Delivering code that does not execute from beginning to end; to prevent this, make sure that Kernel > Restart and run all works in your notebook, because that is how practice instructors review your code
  2. Not including comments explaining how your code works; these are important for practice instructors to properly review your code
  3. Including unnecessary code
  4. Leaving cells that you should have removed when delivering
  5. Including code that does the same twice or more (use functions)
  6. Giving cryptic names to your variables or functions
  7. Using an inconsistent coding style
  8. Using code cells to write text intended to be read, instead, use markdown cells for that; the only text you should have in code cells are brief commentaries to understand a piece of code

We expect nothing less than top-quality work

Delivering consistently top-quality work takes time and effort, but it can be very rewarding both personally and professionally :sunglasses:

Check your answers, your code, and your reports as many times as needed to ensure they are correct.

Be precise: use the section numbers (1.1, 1.2, …) of the notebook to present your results in the report, and refer to sections by their number. If you need to refer to your figures or tables, number them and refer to them by numbers. Do not include ambiguous statements or plots without a scale or a legend. Do not use colors if you do not explain what each color means. State clearly your assumptions and limitations.

Be careful with the presentation of your work. For instance, do not use low-quality screenshots, or poorly cropped screenshots showing toolbars and window borders. Instead, export and save high-quality images from each application. Do not copy-paste or screenshot tables into your report without making them to actual tables.

As a data scientist your reports and code should be (among other things) correct, understandable, pristine, clear, and pleasant to look at. Ensure you set aside enough time to review, improve, and polish your work. Get used to produce top-quality work and it will become a habit.

3. Asking questions

If you need help installing software or packages in your computer, please ask your classmates, for instance through the Aula Global’s forum. Teaching staff does not have the necessary bandwidth to debug your installation.

To ask for an appointment, send an e-mail to the course’s professor. No appointments will be given in the 72 hours before partial or final exams.