Fall 2018

Lectures
MoWeFr 4:00pm4:50pm, Capen 262
Please make sure to attend!

CSE 470 Recitations
No separate class, use office hours for any questions!

Office hours
Capen 212, Open door policy!
Course Resources
Multiple course resources are available!
Course resources include lecture slides, course reading and useful software tools. Note that you will have to authenticate to access this content. Ask your instructor if you have not received access guidelines so far.
Access
Course Overview
This course is intended for students interested in the efficient use of modern parallel systems ranging from multicore and manycore processors to largescale distributed memory clusters. The course puts equal emphasis on the theoretical foundations of parallel computing and practical aspects of different parallel programming models. It begins with a survey of common parallel architectures and types of parallelism, and then follows with an overview of formal approaches to assess scalability and efficiency of parallel algorithms and their implementations. In the second part, the course covers the most common and current parallel programming techniques and APIs, including for shared address space, manycore accelerators, distributed memory clusters and big data analytics platforms. Each component of the course involves solving practical computational and data driven problems, ranging from basic algorithms like sorting or searching, to graphs and numerical data analysis.
Organization
The course consists of a series of lectures organized into five topical modules. Each lecture module is complemented with a programming assignment exposing practical aspects of the covered material. Tentative course outline is provided below:

Overview of parallel processing landscape: why and how, types of parallelism, Flynn’s taxonomy and brief overview of parallel architectures, Exascale computing vs. Exascale data, practical demonstration of CCR as an example HPC center. (3 lectures)
Basic concepts in parallel processing: formal definition of parallelism, concepts of work, speedup, efficiency, overhead, strong and weak scalability (Amdahl’s law, Gustafson’s law), practical considerations using parallel sum and parallel prefix. (4 lectures)

Multicore programming: shared memory and shared address space, data and task parallelism, Cilk+, OpenMP, Intel TBB data structures (time permitting), parallel merge sort, pointer jumping, parallel BFS. (9 lectures)

Distributed memory programming: Message Passing Interface (including onesided communication, derived datatypes and MPIIO), interconnect topologies, latency + bandwidth model, parallel matrixvector product, parallel connected components, sample sort. (9 lectures)

Higherlevel programming models: MapReduce, Apache Spark and Resilient Distributed Datasets, Bulk Synchronous Parallel model, Pregel and Apache GraphX, triangle counting, connected components, single source shortest path. (9 lectures)

Manycore programming: SIMD parallelism, massively parallel GPGPU accelerators, data movement and organization, matrixmatrix product, connected components. (6 lectures)
Syllabus
You can download full syllabus from here.
Download