Data-intensive computing deals with storage models, application architectures, middleware, and programming models and tools for large-scale data analytics. In particular we study approaches that address challenges in managing and utilizing ultra-scale data and the methods for transforming voluminous datasets (big data) into discoveries and intelligence for human understanding and decision making. Topics include: intelligent representation of data, approaches discovering intelligence in data, data-driven computing, storage requirements of big data, organization of big data repositories such as Google File System (GFS), characteristics of Write-Once-Read-Many (WORM) data, data-intensive programming models such as MapReduce, fault-tolerance, privacy, security and performance, services-based cloud computing middleware, and scalable analytics and visualization. This course has four major goals: (i) understand data-intensive computing, that has been defined as the fourth paradigm for Sciences by the late Jim Grey, (ii) study, design and develop solutions using data-intensive computing models such as MapReduce, (iii) predictive analytics and visualization using packages such as R and Google analytics and (iv) focus on methods for scalability using the cloud computing infrastructures such as Google App Engine (GAE), and Amazon Elastic Compute Cloud (EC2).
On completion of this course students will be able to analyze, design, and implement effective solutions for data-intensive applications with very large scale data sets. More specifically a student will be able to:Website | http://www.cse.buffalo.edu/~bina/cse487/spring2016 |
Instructor | Bina Ramamurthy (bina@buffalo.edu) |
Office Hours | MWF: 11.00-11.50AM |
Office Location | 345 Davis |
Lecture Time | MW: 5.00-6.20PM |
Lecture location | NSC 215 |
Overall grade for the course will be based on the student's performance in: class attendance and participation (10%), 2 exams (45%), 3 projects (45%). 95% or above is an A, 90% is an A- etc. will be the mapping for letter grades based on the overall percent. There will be curve applied at the end based on the relative performance of the students in the course. We will use separate curve for graduates and undergradautes.
There are two text books that cover the two of the major goals defined in the description (data-intensive computing: fourth paradigm, data-intensive computing models, cloud computing) respectively:
Other online material/publications
Each project will involve complete installation of all the necessary toolkits, software packages and servers by each student (or group of students) in their workspace. Students will also write a detailed technical report on the project they implement. Students can work in groups of no more than 2 people. Choose the group members with complementary expertise. Project 1 will be based on the statistical problem solving approaches discussed in the Data science text book. We will use real data and the R software for statistical computing for performing data analytics. Project 2 will involve solving large volume WORM (Write once read many) data using MapReduce algorithm and Hadoop distributed file system. Project 3 will involve realtime (streaming) analysis using other components of Apache Spark. This idea is still tentative and we are in the process of exploring this direction. You may need to accounts and resources on amazon cloud and google cloud.
Attendance Policy: You are responsible for the contents of all lectures and recitations (your assigned section). If you know that you are going to miss a lecture or a recitation, have a reliable friend take notes for you. Of course, there is no excuse for missing due dates or exam days. We do, however, reserve the right to take attendance in both lecture and recitation. We may use this information to determine how to resolve borderline grades at the end of the course, especially if we see a lack of attendance and participation during lecture sessions. During lectures, we will be covering material from the textbook. We will also work out several of the problems from the text. Lecture will also consist of the exploration of several real world problems not covered in the book. You will be given a reading assignment at the end of each lecture for the next class.
Incomplete Policy: We only grant incompletes in this course under the direst of circumstances. By definition, an incomplete is warranted if the student is capable of completing the course satisfactorily, but some traumatic event has interfered with their capability to finish within the timeframe of the semester. Incompletes are not designed as stalling tactic to defer a poor performance in a class.
Academic Integrity Policy: UB's definition of Academic Integrity in part is, "Students are responsible for the honest completion and representation of their work". It is required as part of this course that you read and understand the departmental academic integrity policy located at the following URL:
http://www.cse.buffalo.edu/undergrad/policy_academic.php
There is a very fine line separating conversation pertaining to concepts and academic dishonesty. You are allowed to converse about general concepts, but in no way are you allowed to share code or have one person do the work for others. You must abide by the UB and Departmental Academic Integrity policy at all times. NOTE: Remember that items taken from the Internet are also covered by the academic integrity policy! If you are unsure if a particular action violates the academic integrity policy, assume that it does until you receive clarification from the instructor. We reserve the right to check or question any portion of any work submitted at any time during the semester or afterwards. If you are caught violating the academic integrity policy, you will minimally receive a ZERO in the course.
Exams Policy: There will be a midterm (Exam 1) that will be administered and graded before the resign date. Midterm material will cover all lecture and reading assignments before the exam, as well as concepts from the project assignments. Midterms are closed book, closed notes, and closed neighbor. The second exam (Exam 2) will be covering all lecture material after exam1 and all the projects. We do not give make up exams for any reason. If you miss an exam, you will receive a zero for that portion of the grade. Second exam will be on the last day of classes.