Data Mining

Data Mining is a multideciplinary field touching important topics across machine learning, natural language processing, information retrieval, and optimization. This lecture is delivered in the second semester at the Department of Computer Science, The University of Liverpool as master level module.

Days and Locations

Lecture schedule and slides

# Date Title slides videos
1. Jan 30 Introduction to Data Mining (Problem Set 0)
2. Feb 1 Data representation
5. Feb 2 Perceptron
3. Feb 13 Missing value handling, labeleing and noisy data
4. Feb 15 k-NN classifier
6. Feb 16 Problem Set 1
7. Feb 16 Classifier Evaluation
8. Feb 20 Naive Bayes classifier
9. Feb 22 Decision Tree Learner
10. Feb 23 Logistic regression.
12. Feb 23 Problem Set 2
13. Feb 27 Text mining. Part 1
14. March 1 Text mining. Part 2 [above]
15. March 2 Text mining. Part 3 [above]
16. March 6 Support Vector Machines. Part 1
17. March 8 Support Vector Machines. Part 2[above]
18. March 9 Support Vector Machines. Part 3 [above]
19. March 13 k-means clustering
20. March 15 Cluster evaluation measures [above]
21. April 10 Dimensionality reduction (SVD)
22. April 12 Dimensionality reduction (PCA) [above]
23. April 13 Problem Set 3
24. April 13 Information Retrieval
25. April 17 Graph mining.
26. April 19 Neural networks and Deep Learning. Part 1
27. April 20 Neural networks and Deep Learning. Part 2
28. April 24 Sequential data
23. April 26 Data visualization
28. April 27 Privacy and Ethical issues
28. May 1 Word Representations
30. May 3 Revision
31. May 4 Revision

Assignment 1 (12% of course marks)

Assignment 2 (13% of course marks)

Final Exam (75% of course marks)

Past Exams with Answers

Lab Sessions / Tutorials

The concepts that we will be learning in the lectures will be further developed using a series of programming tutorials. We will both implement some of the algorithms we learn in the course using Python as well as use some of the machine learning and data mining tools freely available. The two lab sessions are identical and you only need to attend one of the sessions per week. If your student number is even attend the Thursday session, else attend the Friday session. Attendance is not marked for the lab sessions, which are optional.

Location: Thursdays 13:00-14:00 GHOLT-H105 (Lab 3)

Location: Fridays 09:00-10:00 GHOLT-H105 (Lab 3)

Lab Tasks

Problem Sets

The following problem sets are for evaluating your understanding on the various topics that we have covered in the lectures. Try these by yourselves first. We will dicusss the solutions during the lectures and lab sessions later. You are not required to submit your solutions and they will not be marked or counting towards your final mark of the module. The problem sets are for self-assessment only.

MSc projects

The following summer MSc projects are available to CS students at UoL. If you are interested please contact me.

References

There is no specific official text book for this course. The following is a recommended list of text books, papers, web sites for the various topics covered in this course.

  1. Pattern Recognition and Machine Learning, by Chris Bishop. For machine learning related topics

  2. A Course in Machine Learning, by Hal Daume III. Excellent introductory material on various topics on machine learning.

  3. Data Mining: Practical Machine Learning Tools and Techniques by Ian Witten. For decision tree learners, associative rule mining, data pre-processing related topics.

  4. Foundations of Statistical Natural Language Processing by Christopher Manning. For text processing/mining related topics

  5. Introduction to Linear Algebra by Gilbert Strang is a good reference to brush up linear algebra related topics. MIT video lectures based on the book are also available

  6. An excellent reference of maths required for data mining and machine learning by Hal Daume III.

  7. numpy (Python numeric processing)

  8. scipy (Python MATLAB like functions)

  9. LIBSVM (SVM library available written in C and with bindings for numerous languages including Python)

  10. scikit-learn (Machine Learning in Python)