Modern day computation has to deal with massive (on- or off-line) data sets. Unfortunately, conventional algorithms do not scale well for such data sets. For example, on the Internet a router cannot store all the packets that get routed through it or in database applications, the entire table cannot always be stored in the main memory. In other words, the implicit assumption in conventional algorithms that the *entire* input can be stored and (arbitrary locations can be) accessed quickly is no longer valid.

"Data stream" is a proposed model of computation that is more amenable to modern day requirements. In particular, in the data stream model, an algorithm is allowed a few "passes" on the input and has access to only a sub-linear (in the length of the input) amount of memory. This area was essentially kick-started by the seminal work of Alon-Matias-Szegedy (which received the Goedel prize in 2005) and has since then spread into various application domains.

In this seminar, we will start with the theoretical foundations of data stream algorithms and then focus on two application domains:

- Online network traffic monitoring and analysis. Basically, we want to "learn" something about network data streams. Data stream problems will be addressed from a machine learning point of view.
- Databases. Data stream problems will be addressed from a more algorithmic point of view.

Depending on class interest, we might also look at computational complexity issues related to data streams.

The first few lectures will be presented by the instructors followed by student presentations. Few of the meetings will also play host to some external speaker talks as part of the UB theory seminar which will get re-started from Spring 08.

Atri Rudra and Hung Ngo are the instructors in charge of the joint seminar. Roughly speaking, Atri Rudra will take the algorithm/database angle and Hung Ngo will take the machine learning/traffic analysis angle. The background materials on the basic data stream model and related algorithms shall be covered by both instructors at the beginning of the semester.

Students can register for the joint seminar either via Atri Rudra's CSE 728 or Hung Ngo's CSE 725. We expect about 5 registrations for EACH of CSE728 and CSE725. Hence, we may email you individually asking you to "switch" the registration from 728 to 725 and vice versa.

Near the beginning of the Spring semester, we will email registered students to arrange the meeting time and announce the meeting place. We shall need to set up the time which works for every one first in order to book a classroom. PLEASE REGISTER EARLY, because room booking isn't easy if we get too close to the Spring semester.

We will meet roughly 3.5 to 4 hours pear week. Ideally, there will be two weekly meetings of about 2 hours each.

Students will be evaluated on class participation, scribing notes, and paper presentation. We will provide a suggestive list of papers for students to choose from. However, students can choose their own data stream related papers to present with consent from one of the instructors.