CSE4/587 Spring 2018

Welcome to Spring 2018 and Data-intensive Computing course. This course covers topics that are relevant to the emerging area of Data Science and Data-intensive Computing. Data Science deals with data acquisition, cleaning, exploratory data analysis, statistical modeling, algorithmic data processing, knowledge extraction, prediction and prescriptive analytics. Data-intensive computing deals with computing aspects such as the infrastructure, big-data architectures, data structures and algorithms that enable the Data Science. We will cover both aspects in this course.

There are three recommended tests: All are available for free online.

- Doing Data Science: Straight Talk from the Frontline, 1st Edition Author(s): Cathy O'Neil and Rachel Schutt ISBN: 978-1449358655 Publisher: O'Reilly Media
- H. Wickham, G. Grolemund. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 1st Edition, ISBN: 978-1491910399, O’Reilly media, 2017.
- Joel Grus. Data Science from Scratch: First Principles with Python 1st Edition, 978-1491901427, O’Reilly media. 2015.

We will be using many other references, online sources and textbooks throughout the semester. The details will be provided in the References tab.

A broad overview of the topics to be covered is given below.

```
Introduction to Data.
Data Aqusition and cleaning.
Exploratory Data Analysis (EDA) using R Language.
Data Visualization.
Statistical modeling.
Algorithms for big-data processing.
Data bases for small-data and big-data.
Infrastructures for big-data (Hadoop Eco-system).
High speed, scalable big-data prcessing (Spark Eco-system).
Computing on the cloud: Amamzon AWS, and Google cloud.
Research issues in Data-intensive computing.
```

All concepts discussed during the lectures will be reinforced by three (3) labs: R language, data analytics and visualization, big data computing.