Data Mining

Data Mining is a multideciplinary field touching important topics across machine learning, natural language processing, information retrieval, and optimization. This lecture is delivered in the second semester at the Department of Computer Science, The University of Liverpool as master level module.

Days and Locations

Lecture schedule and slides

# Date Title slides
1. Jan 30 Introduction to Data Mining
2. Feb 1 Data representation
3. Feb 3 Missing value handling, labeleing and noisy data
4. Feb 6 k-NN classifier
5. Feb 8 Perceptron
6. Feb 10 Classifier Evaluation
7. Feb 13 Decision Tree Learner
8. Feb 15 Naive Bayes classifier
9. Feb 17 Logistic regression. Part 1
10. Feb 20 Logistic regression. Part 2 [above]
11. Feb 22 Support Vector Machines. Part 1
12. Feb 24 Support Vector Machines. Part 2[above]
13. Feb 27 Support Vector Machines. Part 3 [above]
14. March 1 k-means clustering
15. March 3 Cluster evaluation measures [above]
16. March 6 Text mining. Part 1
17. March 8 Text mining. Part 2 [above]
18. March 10 Information Retrieval
19. March 13 Graph mining. Part 1
20. March 15 Graph mining. Part 2 [code]
21. March 17 Dimensionality reduction (SVD)
22. March 20 Dimensionality reduction (PCA) [above]
23. March 22 Data visualization
24. March 24 Data visualization [above]
25. March 27 Neural networks and Deep Learning. Part 1
26. March 29 Neural networks and Deep Learning. Part 2
27. April 24 Sequential data
28. April 26 Word Representations
30. April 28 Revision
31. May 3 Revision

Assignment 1 (12% of course marks)

Assignment 2 (13% of course marks)

Final Exam (75% of course marks)

Resit Assignment 1 (12% of course marks)

Resit Assignment 2 (13% of course marks)

Final Exam (75% of course marks)

Lab Sessions / Tutorials

The concepts that we will be learning in the lectures will be further developed using a series of programming tutorials. We will both implement some of the algorithms we learn in the course using Python as well as use some of the machine learning and data mining tools freely available. Pavithra Rajendran and Xia Cui will be your TAs.

Location: Mondays 11:00-12:00 GHOLT-H105 (Lab 3)

Location: Wednesdays 10:00-11:00 GHOLT-H116 (Lab 2)

Lab Tasks

Problem Sets

The following problem sets are for evaluating your understanding on the various topics that we have covered in the lectures. Try these by yourselves first. We will dicusss the solutions during the lectures and lab sessions later. You are not required to submit your solutions and they will not be marked or counting towards your final mark of the module. The problem sets are for self-assessment only.

MSc projects

The following summer MSc projects are available to CS students at UoL. If you are interested please contact me.

References

There is no specific official text book for this course. The following is a recommended list of text books, papers, web sites for the various topics covered in this course.

  1. Pattern Recognition and Machine Learning, by Chris Bishop. For machine learning related topics

  2. A Course in Machine Learning, by Hal Daume III. Excellent introductory material on various topics on machine learning.

  3. Data Mining: Practical Machine Learning Tools and Techniques by Ian Witten. For decision tree learners, associative rule mining, data pre-processing related topics.

  4. Foundations of Statistical Natural Language Processing by Christopher Manning. For text processing/mining related topics

  5. numpy (Python numeric processing)

  6. scipy (Python MATLAB like functions)

  7. LIBSVM (SVM library available written in C and with bindings for numerous languages including Python)

  8. scikit-learn (Machine Learning in Python)