CSE 331 Project
Details on the problems for the coding project. See the coding page for more details on the coding submissions, the reflections page for details on submitting the reflection problems and the survey page for details on submitting the survey.
Acknowledgment
The development of the project was supported by a Mozilla Responsible Computer Science award . The support is gratefully acknowledged.
Some Suggestions and Warnings
While this coding parts of the project is somewhat similar to Question 3s on the homework, there are some crucial differences and we wanted to highlight few things for y'all upfront:
Form groups of size EXACTLY $3$
This is a group project (unlike Q3s on the HWs that had to be done individually) and you can work in groups of size exactly 3. The submissions will be on Autolab and everyone in the group will get the same grade.
Groups should agree on one programming language
While the submission for the coding problems in this project can be done in any of C++,Java
and Python
like in Question 3s on the homework, we highly recommend that the group agrees on one programming language for the group. This will make it much easier for your group to make progress and collaborate.
The same group for ALL problems
You are expected to work in the same group for ALL problems in the coding project. Not doing so will be considered to be an academic integrity violation. We will use the groups y'all register on Autolab as part of your Autolab Project Group Registration submission for your project submissions.
There is ONE exception to the above rule: It is OK if the group becomes smaller (e.g. if a group member resigns) but in that case another student cannot be added to group. If you need clarification on this, please get in touch with Atri.
There are no optimal algorithms known!
Other than the first problem, we do not know of optimal algorithms to solve the rest of the problems and we suspect that doing so is not possible (definitely not within a semester). Note that this is unlike the HW Q3s where your code is supposed to always output the optimal/correct solution: i.e. you will have to think of algorithms where you might not be able to prove any guarantee on how good your output is.
Start early!
There is a reason we are giving y'all about two months to do this project. If your group starts even the week before the project is due, your group will be in trouble especially beyond the first problem. Also for most problems you will have to think of algorithms beyond what we cover in class, so the more time your spend on this, the better.
What algorithms techniques will be relevant?
Our solutions only use algorithmic techniques (plus some heuristics) that we will cover in class. However, it is entirely possible that other algorithmic techniques might be useful. One such technique that we will not cover in class is that of network flows (Chapter 7 in the textbook). However, again to emphasize since we have not used any concepts from that chapter in our solutions, we cannot speak to the efficacy of such techniques for this project. In other words, if your group decides to use network flow in your submission and it does not work well, then that is on your group.
Optimizing the runtime of your code is NOT the main objective
Unlike the HW Q3s, there will be no timeouts for individual inputs (though there will be one overall per submission). This is because unlike the HW Q3s, the main goal in this coding project submissions will be to generate as much "revenue" as possible (read on for more details on this)-- optimizing the runtime would/should be a secondary object but again there is an overall timeout so really inefficient implementations (and definitely not polynomial runtime algorithms) will not make the cut.
Use the Notation Table
The project description is a bit notation heavy. If you start losing track, here is the notation page (opens in a new window) that might be useful.
Coding, Reflection and Survey are on different pages
The details on coding, reflection problems and the survey are on different pages.
Background
Today reliable internet service is not a luxury, it is a necessity . In February 2017, a lawsuit was filed by New York states Attorney General Eric Schneiderman against Spectrum
, which is a telecommunications company and also the largest provider of residential internet services in New York state. See e.g. the following from the AG's summons and complaint page 6 and point 2:
....Spectrum-TWC is the largest provider of residential Internet services in New York State. It provides Internet service to approximately 2.5 million New York households and earns well over a billion dollars in revenue annually from selling Internet services in New York.
Spectrum
was accused of fraudulently misleading its Internet services customers by promising internet speeds that it knew it could not provide. Spectrum
provided its customers with old generation modems and routers that it knew were incapable of achieving the promised internet speeds. Moreover, they are also accused of capping the bandwidth of routers in order to cut down on expenses and maximize their profit. There were more allegations against Spectrum
. Here is the official lawsuit complaint against Spectrum with all the allegations.
Click here to see a summary of the allegations
- Promised internet speeds differed from actual internet speeds,
Spectrum
made false advertisements and repeatedly assured its customers that they could achieve the same results with wireless as with a wired connection (even though wireless connection has other real world limitations). - Leased out older models of routers that were incapable of achieving the promised speeds (D1 and D2 as opposed to D3) due to “budget constraints” (maximizing profit).
- Charged customers based on the prices advertised for the higher speed plans (the motive for said advertisement was to prevent subscribers from switching ISPs).
- Promised to upgrade the routers without additional charges when tech improved but didn’t despite complaints from customers about the slow internet speed.
- Even the current generation routers did not deliver the speed they had promised, since too many subscribers were grouped into the same service group (less bandwidth available per subscriber) and the number of channels did not increase (which could have increased the bandwidth). Only made modifications when the situation was dire. This led to an abysmal performance during peak hours.
- Deceived the FCC by benchmarking speed during non-peak hours (and hence overestimating performance).
- Changed modems for the FCC panelists doing the benchmarking but not for customers. The FCC results, as a result, painted them in a better light.
- Did not invest in additional ports (cost per Gbps) to content providers which could have decreased latency/buffering/downtimes despite promising the opposite to consumers. In fact, leveraged their access to subscribers by extracting fees from content providers and delivering nothing in return, throttled the rate of content delivery in the absence of payments.
- Started charging a monthly leasing fee for the same modems for which there had previously been no charge.
- Customers were allowed to upgrade their modems to D3 but were subjected to long call hold times, the notice that was sent to customers conveniently left out the fact that their current modems were a bottleneck, there was a fee to install the modem, they had to return the D2 modem or pay an unreturned equipment fee in order to upgrade to D3 modems.
- Programmed D2 routers to cap speeds at 20 Mbps but did not switch the pricing tier to a lower one.
- Rejected recommendation to deploy newer 802.11ac routers for higher speed tiers to save costs.
- Manipulated FCC results, temporarily give more ports to Cogent so as to improve their score.
- League of Legends by Riot games requires low latency and packet loss, both were high (and higher compared to other NY ISPs) before Riot agreed to pay Spectrum.
- Cherry picked other online content besides League to slow down more than others.
The Basic Problem
The Problem
Essentially, Spectrum
was alleged to have used unethical and fraudulent ways to make profits. Now imagine that you just graduated from UB with a bachelor's degree in Computer Engineering/Computer Science and got hired by ForProfitOnly
Internet provider. You're recruited into ForProfitOnly
as a junior software engineer. It's your first day at work and your first assignment/task is to come up with routing algorithms to generate paths that will be used to route packets in ForProfitOnly's
network topology. Since it's your first day at work, you're very eager to please your superiors by delivering as much revenue as possible to ForProfitOnly
. Below is a detailed description of your task and the various problems that you need to solve.
Simplifications galore!
To make the problem more manageable, we have made many simplifying assumptions that at the same time hopefully capture the "essence" of the related fact in real-life. Wherever possible, we will provide more details that we are sweeping under the rug and if necessary provide a reference for further reading.
The Graph
You are given an undirected communication graph $G = (V,E)$ of the ForProfitOnly
network, where a designated node $i \in V$ is the content provider (think of a service like Netflix). Every other node $u \in V$ is a router. $C$ is the set of clients and $C \subseteq V \setminus \{i\}$.
Clients vs. Routers
For this project, clients can still route packets but routers cannot be clients.
Every edge $e \in E$ has a delay $d_e$ associated with it, which is the amount of time it takes for a packet of data to traverse through an edge $e \in E$.
$d_e=1$ for all edges $e$
For all the problems, you can assume that for every $e \in E$, $d_e = 1$.
This is a simplification as in real life networks different edges can have different delays. We make this simplification to make the problems in this project easier.
Every client node in $C$ requests a packet of data from the content provider node $i \in V$. The packet addressed to a client $c \in C$ is denoted by $\text{pkt}_c$.
Packets are different
The packets $\text{pkt}_c$ for different clients $c$ are different. I.e. you cannot combine packets for different clients together.
One packet per client
We made the simplifying assumption that there is only one packet per client $c$. Unlike some of the other simplifications we make in this project, this is not such a big simplification.
Routing is centralized
In this project, there is a central algorithm that chooses an $i-c$ path for each client $c$. This is not how routing happens on the Internet. In particular, the TCP/IP protocol is a distributed protocol. However, distributed algorithms are out of the scope of this course.
Let $P$ be any path from the content provider node $i \in V$ to some client node $c \in C$. We define $d(P)$ (which we call path delay) to be the sum of the delay of all edges in path $P$. That is, $d(P) = \sum_{e \in P} d_e$. For every client $c\in C$, let $d(c)$ be the minimum of $d(P_c)$ among all $i-c$ paths $P_c$.
Let $P'_c$ be the actual $i-c$ path determined by your algorithm for client $c$. We use $D(P'_c)$ to denote the routing delay, which is the delay incurred by the routing algorithm (see below for more details on routing) when it routes all packes along $P'_c$ for all clients $c\in C$. Note that this is not necessarily the same as $d(P'_c)$.
Every node $u\in V$ has a bandwidth $b_u$, a bound on the number of packets it can send in each time step.
No bound on incoming traffic
Note that we are not putting any restriction on how many packets a router can receive in each time step. We make this simplification so that the problems are a bit simpler.
Walkthrough video
Here is a walkthrough video that summarizes the above material as well as gives a quick survey of the rest of the project (thanks to Snigdha Motadaka one of the Fall 21 TAs):
Routing of packets
Every client $c \in C$ has a priority, which is given by $\text{priority(c)}$.
Larger value is higher priority
$\text{priority(c)}$ is a positive integer and a larger value implies a higher priority.
The content provider node $i$ will send out a packet to all clients simultaneously. This packet will be routed through the path $P'_c$ for each client $c$ that your algorithm outputs.
All packets are sent out of $i$ at the same time
We make the simplifying assumption that packets $\text{pkt}_c$ for all clients $c$ are ready to be routed out of $i$ at the same time. This in real-life is not necessary since different clients might request their packets at different times. Again, this simplification makes the problems easier.
Walkthrough video
Here is a walkthrough video that gives the algorithm idea of the routing algorithm (thanks to Aman Timalsina one of the Fall 21 TAs):
If you want to go into routing algorithm details, this is the walkthrough video for you (thanks to Aman Timalsina one of the Fall 21 TAs):
Algorithm Idea
The routing simulator will proceed iteratively, with each iteration mapping directly to a time ‘tick’. Inside each time tick, all the clients are processed and time is advanced only once this processing is finished.
The list of clients, initially maintained as an array (since linked list sorting is slow), will first be sorted by priority, which ensures only the highest priority packets get forwarded if there are bandwidth restrictions. The list is then converted to a Linked List so that as soon as a packet has reached, the client can be removed from the list.
The simulation iterates until all the packets have reached the end of their path. This condition is true when the list of clients is empty.
At each time tick, check if a packet has reached its destination. If it has, remove the client from the list of clients. The client’s delay is set to either the packet delay or infinity depending on whether the packet reached its intended client or not (respectively). Finally, skip to processing the next client.
If the packet is still in transit, its delay is incremented. It is then “forwarded” if the node it is currently at has enough bandwidth and after checking that the corresponding edge is valid. Forwarding comprises incrementing its location by 1 and decrementing the current node’s available bandwidth.
After each time tick, the bandwidth is reset to the router’s $b_u$ value for all routers that participated in the forwarding.
Once we are done processing the entire list, each client’s delay can be retrieved using the corresponding member variable.
Algorithm Details
- Sort list_clients, key = client.priority and in descending order
- Convert list_clients to a Linked List
- active = set()
- While list_clients is not empty:
- For client in list_clients:
- current_node = node that the packet is currently at
- If the client's packet has reached the end of it's path:
- Remove client from list_clients
- Client's delay is set to the packet delay if the current_node is the packet's intended client, otherwise it is set to ∞
- Continue to next iteration
- Increment packet delay
- If current_node's bandwidth > 0:
- If the current node and the next in the packet's path are not neighbors:
- Remove client from list_clients
- Client's delay = ∞
- Continue to next iteration
- Decrement current_node's bandwidth
- Increment packet's current node to the next node in the packet's path
- Add current_node to active
- If the current node and the next in the packet's path are not neighbors:
-
For node in active:
- Set node's bandwidth to node's original bandwidth
- Clear all elements from active
- For client in list_clients:
Payments and Revenue
Every client $c \in C$ makes a payment $\text{pmt}_c$ to ForProfitOnly
and also has a tolerance threshold $\alpha_c$, which represents $c$'s tolerance to slow internet. Please see Problem 1 for a detailed description.
Clients payments can vary
In this project we assume that the payment $\text{pmt}_c$ for different $c$ could be all different (unlike in real-life where typically there are "tiers" of payment).
Example 0
Let's look at a small example graph (please see the figure below) with $11$ nodes. Each node in this graph has a node id which is an integer. In this graph nodes $1,3,6,7,8$ are clients, node $0$ is the ISP node and the remaining nodes $2,4,5,9,10$ are routers. Also, every node's bandwidth is denoted as $b_{\text{<node id>}}$.
Let's say that the clients in this graph have the following payments: $\text{pmt}_1 = \$30, \text{pmt}_3 = \$40, \text{pmt}_6 = \$35, \text{pmt}_7 = \$40, \text{pmt}_8 = \$45$.


$d(c)$ in the above graph $G$
For the graph above, we have $d(1)=2, d(3)=2, d(6)=3, d(7)=4, d(8)=3$.
Below are five problems for you to solve. Problems 1-5 are cumulative. That is they build upon the previous problem and get progressively harder to solve.