UE 141: Discovery Seminar on Data Mining

Spring 2013

Basic Information
Overview

Data Mining is the process of discovering new and insightful knowledge from large bodies of data. The amount of data in our world has been exploding, and nearly every industry is desperate to infer actionable knowledge from data. As tons of data are generated and collected every day, our daily lives are significantly influenced by data mining applications. Based on customer purchase records, retailers are able to tell what items should be promoted together to increase profit. From your purchase history and web click records, Amazon can recommend books, movies or products that you are likely to buy in the future. By analyzing the profiles of existing customers, many companies can predict the preference of potential customers, and thus make focused and efficient use of its sales force. The magic that leads to the success of these stories is achieved by data mining.

In this seminar, we will review classical and state-of-the-art data mining techniques for association analysis, clustering, classification, feature selection and other tasks that transform data into useful knowledge. Students will also gain hands-on experience in utilizing open source data mining software for effective data analysis. After this seminar, you will learn what data mining is, how it works, and why it's important.

This seminar is part of the UB Discovery Seminar Program, which provides first and second-year undergraduate students at UB with the opportunity to explore some new ideas in a small-class environment. More details about this program can be found at: http://discoveryseminars.buffalo.edu/

Prerequisties

The course assumes high school math and basic computer skills (software installation and usage).

Course Structure

This is a one-credit, letter-grade course. The instructor will present basics in data mining and various data mining approaches using real-world application examples. Students are expcted to participate in class discussions, present ideas about formulating real-world tasks as data mining problems and apply data mining tools on real data sets.

Grading Policy

Grades will be computed based on the following factors (subject to changes):
  • Class Participation -- 10%

  • In-class Discussions -- 30%

  • Projects -- 60%
Course Schedule

The lecture slides were developed based on materials from several sources. Please see copyright notice.

Date Topic Readings
January 16 Introduction to Data Mining
January 23 Recommendation Systems (1) [KBV09]
January 30 Recommendation Systems (2) [KBV09]
February 6 Data Preprocessing [RaDo00]
February 13 Data Preprocessing [Weka]
February 20 Association Analysis Textbook Chapter 6
February 27 Association Analysis Textbook Chapter 6
March 6 Classification Textbook Chapter 4
March 20 Project 1 Presentation
March 27 Classification Textbook Chapter 4
April 3 Ensemble Learning, Clustering [Polikar06], Textbook Chapter 8
April 10 Project 2 Presentation
April 17 Clustering Textbook Chapter 8

Supplementary Materials

[KBV09] Yehuda Koren, Robert Bell and Chris Volinsky. Matrix Factorization Techniques for Recommender Systems. Journal Computer 42(8): 30-37, 2009. [Paper]
[RaDo00] Erhard Rahm and Hong Hai Do. Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin Volumn 23, 2000. [Paper]
[Weka] Weka 3: Data Mining Software in Java. [Link]
[Polikar06] Robi Polikar. Ensemble Based Systems in Decision Making. IEEE Circuits and Systems Magazine, 6(3): 21-45, 2006. [Paper]

Projects

Project 1: Association Analysis: Due March 20. Supplementary material
Project 2: Classification: Due April 10. Supplementary material
Project 3: Clustering: Due May 1. Supplementary material