Data-intensive computing deals with diverse data formats, storage models, application architectures, and programming models and algorithms and tools for large-scale data analytics. In particular we study approaches that address challenges in managing and utilizing large-scale data and the methods for transforming voluminous datasets (big data) into discoveries and intelligence for human understanding and decision making. Topics include: intelligent representation of data, approaches for discovering intelligence in data, data-driven computing, storage requirements of big data, organization of big data repositories such as Hadoop, characteristics of Write-Once-Read-Many (WORM) data, data-intensive programming models such as MapReduce and Spark analytics, web services-based cloud computing middleware, and scalable analytics and visualization. This course has four major goals: (i) understand data-intensive computing, that has been defined as the fourth paradigm for Sciences by the late Jim Grey, (ii) study, design and develop solutions using data-intensive computing models, (iii) predictive analytics and visualization using packages such as R and Spark analytics and (iv) focus on methods for scalability using the cloud computing infrastructures such as Google Compute Engine, and Amazon Web Services (AWS).On completion of this course students will be able to analyze, design, and implement effective solutions for data-intensive applications with very large scale data sets. More specifically a student will be able to:
|Instructor||Bina Ramamurthy (email@example.com)|
|Office Hours||TTh: 2.00-3.20PM|
|Office Location||345 Davis|
|Lecture location||Knox 20|
Overall grade for the course will be based on the student's
performance in: class attendance and participation (5%), 2 exams (50%), 3 labs (45%), .
95% or above is an A, 90% is an A- etc. will be the mapping for letter grades based on the overall percent. There will be curve applied at the end based on the relative performance of the students in the course. We will use separate curve for graduates and undergradautes.
There is one main text book that covers the major concepts defined in the description (algorithms and statistical model for data-intensive computing models). We will cover rest of the topics, hands-on lab material, big-data infrastructure details, cloud computing using online reference material and open source tools.
There are 3 labs planned each with about 4 weeks time. Each lab will cover one of more concepts will involve hands-on implementation and testing. You will need a reasonable laptop for this. The problem solved in each of the lab may or may not be related. The solution is expected to represent an entire pipeline / workflow leveraging the expertise you have developed in various areas through the lab work. NO late labs or projects will be accepted.
|Week||Concepts||Tools||Labs, Exams, Term Project|
|1/28||Introduction||Data Science||First day handout|
Attendance Policy: You are responsible for the contents of all lectures and recitations (your assigned section). If you know that you are going to miss a lecture or a recitation, have a reliable friend take notes for you. Of course, there is no excuse for missing due dates or exam days. We do, however, reserve the right to take attendance in both lecture and recitation. We may use this information to determine how to resolve borderline grades at the end of the course, especially if we see a lack of attendance and participation during lecture sessions. During lectures, we will be covering material from the textbook. We will also work out several of the problems from the text. Lecture will also consist of the exploration of several real world problems not covered in the book. You will be given a reading assignment at the end of each lecture for the next class.
Exams Policy: There will be a midterm (Exam 1) that will be administered and graded before the resign date. Midterm exam will cover all lecture and reading assignments before the exam, as well as concepts dealing with the lab assignments. Midterms are closed book, closed notes, and closed neighbor. The second exam (Exam 2) will be covering all lecture material after exam1 and all the labs. No make up exams for any reason. If you miss an exam, you will receive a zero for that portion of the grade. Second exam will be during the regular final exam week. It is on 5/16/2019, 3.30-6.30PM.