Data-intensive Computing

CSE4/587 Spring 2017

"Hmm..What will I learn in this course?"

Introduction

Welcome to Spring 2017 and Data-intensive Computing course. This course covers topics that are relevant to the emerging area of Data Science and Data-intensive Computing. Data Science deals with data acquisition, cleaning, exploratory data analysis, statistical modeling, algorithmic data processing, knowledge extraction, prediction and prescriptive analytics. Data-intensive computing deals with computing aspects such as the infrastructure, big-data architectures, data structures and algorithms that enable the Data Science. We will cover both aspects in this course.


Main text book for the course is:
Doing Data Science: Straight Talk from the Frontline, 1st Edition Author(s): Cathy O'Neil and Rachel Schutt ISBN: 978-1449358655 Publisher: O'Reilly Media


We will be using many other references, online sources and textbooks throughout the semester. The details will be provided in the References tab.

Tentative Curriculum

A broad overview of the topics to be covered is given below.


         
		 Introduction to Data.
		 Data Aqusition and cleaning.
		 Exploratory Data Analysis (EDA) using R Language.
		 Data Visualization.
		 Statistical modeling.
		 Algorithms for big-data processing.
		 Data bases for small-data and big-data.
		 Infrastructures for big-data (Hadoop Eco-system).
		 High speed, scalable big-data prcessing (Spark Eco-system).
		 Computing on the cloud.
		 Research issues in Data-intensive computing.

			

All concepts discussed during the lectures will be reinforced by six (Yes, six) labs and 1 term-project.