UE 141: Discovery Seminar on Data Mining
Spring 2013
Basic Information
- Instructor: Jing Gao (jing@buffalo.edu)
- Time: Wednesday 4-4:50pm
- Location: Capen 108
- Office Hour: Tuesday 1:30-3:30pm
- Office: 350 Davis Hall
Overview
Data Mining is the process of discovering new and insightful knowledge from large bodies of data. The amount of data in our world has been exploding, and nearly every industry is desperate to infer actionable knowledge from data.
As tons of data are generated and collected every day, our daily lives are significantly influenced by data mining applications. Based on customer purchase records, retailers are able to tell what items should be promoted together to increase profit. From your purchase history and web click records, Amazon can recommend books, movies or products that you are likely to buy in the future. By analyzing the profiles of existing customers, many companies can predict the preference of potential customers, and thus make focused and efficient use of its sales force. The magic that leads to the success of these stories is achieved by data mining.
In this seminar, we will review classical and state-of-the-art data mining techniques for association analysis, clustering, classification, feature selection and other tasks that transform data into useful knowledge. Students will also gain hands-on experience in utilizing open source data mining software for effective data analysis. After this seminar, you will learn what data mining is, how it works, and why it's important.
This seminar is part of the UB Discovery Seminar Program, which provides first and second-year undergraduate students at UB with the opportunity to explore some new ideas in a small-class environment. More details about this program can be found at: http://discoveryseminars.buffalo.edu/
Prerequisties
The course assumes high school math and basic computer skills (software installation and usage).
Course Structure
This is a one-credit, letter-grade course. The instructor will present basics in data mining and various data mining approaches using real-world application examples. Students are expcted to participate in class discussions, present ideas about formulating real-world tasks as data mining problems and apply data mining tools on real data sets.
Grading Policy
Grades will be computed based on the following factors (subject to changes):
- Class Participation -- 10%
- In-class Discussions -- 30%
- Projects -- 60%
Course Schedule
The lecture slides were developed based on materials from several sources. Please see copyright notice.
Supplementary Materials
[KBV09] Yehuda Koren, Robert Bell and Chris Volinsky. Matrix Factorization Techniques for Recommender Systems. Journal Computer 42(8): 30-37, 2009. [Paper]
[RaDo00] Erhard Rahm and Hong Hai Do. Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bulletin Volumn 23, 2000. [Paper]
[Weka] Weka 3: Data Mining Software in Java. [Link]
[Polikar06] Robi Polikar. Ensemble Based Systems in Decision Making. IEEE Circuits and Systems Magazine, 6(3): 21-45, 2006. [Paper]
Projects