Data-Intensive Computing Research and Education
Project partially funded by National Science Foundation Grant NSF-DUE-CCLI-0920335



Data-intensive computing has been receiving much attention as a collective solution to address the data deluge that has been brought about by tremendous advances in distributed systems and Internet-based computing. An innovative programming models such as MapReduce and a peta-scale distributed file system to support it have revolutionized and fundamentally changed approaches to large scale data storage and processing.  These data-intensive computing approaches are expected to have profound impact on any application domain that deals with large scale data, from healthcare delivery to military intelligence. Given the omnipresent nature of large scale data and the tremendous impact they have on a wide variety of application domains, it is imperative to ready our workforce to face the challenges in this area. This project aims to improve the big-data preparedness of diverse STEM audience by defining a comprehensive framework for education and research in data-intensive computing.
Dr. Bina Ramamurthy is the director and principal investigator of this project.


CSE487/587 Data-Intensive Computing is a new course that has been designed to address big-data preparedness of our workforce.


CSE487/587 Course Description



Date
Topics
Lecture material
Demos/reading material
8/30
Introduction to Data-intensive computing
DataInt
Amazon AWS 
9/1
Fourth Paradigm: Ecological Sciences System
FREcoSys

9/6
Project 1 discussion
Prj1
WS demo
9/8
Project 1 Sample Demo


9/13
Introduction to MapReduce Prgramming Model
MR MR.pdf MR.ppt
Read Ch.1,2 of MR text
9/15
Project 1 Discussion: hints and links
Prj1Hints


MapReduce Demo: from Yahoo Tutorial
MR.Yahoo
VMware/MR.yahoo demo;
Amazon/MR.yahoo
9/20
Project due date revision; MR execution framework; MapReduce operations and math

Prj1NewDates
Queries
9/22
MR. Execution Framework
MoreMR
Amazon EC2 Demo
9/27
Tom White's MR Example; classification Example for project 1; New Hadoop MR API
White'sMR
LinDryerMR
Ch.3
10/3
Text processing using MR;
An impactful application area for data-intensive methods
MR.II
HumanGenomeDI

10/6
Introduction Google App Engine
GAEIntro
Google App Engine Demo

Inverted Index: design patterns for MR

Ch.4
10/10
Optimizations in MapReduce
MR.Opt
Ch.3: design patterns for MR
10/13
Co-occurrence matrix using MR

Ch.3: more design patterns for MR
10/18
Review for midterm exam; Graph algorithms
Review

10/25
Virtualization; some more review for exam
Virtualization
Demo on Virtual Box
10/27
Midterm Exam
107 Talbert

11/1
Project 2 discussion
MR.Project


AWS components
AWS

11/3
Non-function attributes of Cloud Computing Model
NonFunc

11/8
Apache Pig
PIGLang

11/11Exam 1 discussion; Prj1 Demo and discussion;
Hive
Hive
11/15Oozie Presentation by Eric NaglerOOZIE
11/17HDFS/MR internalsMRI
11/30
Large-scale DB: Hbase
Hbase
My New office : 345 Davis

Final Review
FinalReview

12/1Hbase (Contd.)Ch.1 from Hbase Definitive Guide
12/6Hbase (contd.); Windows Azure;
"SPI" cloud models