Homework 5
Due by 8:00pm, Tuesday, April 1, 2025.
Make sure you follow all the homework policies.
All submissions should be done via Autolab.
Care package on topological ordering could be useful for Question 2.
-
This cost can e.g. correspond the amount of energy that has to be expended to transmit a bit over
e
-
Such an algorithm is also called a protocol. In particular, each node in $G$ runs a "local" algorithm that on receiving a bit (from one of its neighbors) does some local computation. Then depending on the outcome of the computation the node can decide to send a bit to one of its neighbors.
Question 1 (Computing Set Intersection on a Network) [50 points]
The Problem
In this problem, we will take a break from trying to minimize the runtime of the algorithm and focus on an important resource in distributed computing: the total number of bits communicated over a network by the algorithm.
Given a graph $G=(V,E)$, which is the underlying network topology, we want to compute the intersection of $n=|V|$ sets over the network $G$. More precisely, every node $u\in V$, gets a set $S_u\subseteq [M]$ for some integer $M\ge 1$. (Note that $M$ has nothing to do with the number of edges in $G$.) Further we are given a special node $t\in V$. The goal of this problem is to design an algorithm such that when the algorithm terminates, the node $t$ knows the intersection of all sets: \[\cap_{u\in V} S_u.\] Moreover, we want to design such an algorithm that minimizes the total communication over $G$.
Example
Consider the following example for $n=3$ and $M=5$. Consider the "triangle" graph $G=(\{u,w,t\},\{(u,w),(u,t),(w,t)\})$ and the three nodes have the following sets \[S_u=\{1,2,5\},\] \[S_w=\{2,4\},\] \[S_t=\{2,3,5\}.\]
Then at the end of the protocol, the node $t$ should have the set \[\{1,2,5\} \cap \{2,4\} \cap \{2,3,5\} =\{2\}.\]
Next, we give more details on the problem above. First, we will assume that any two nodes $u$ and $w$ can communicate directly if and only if $(u,w)\in E$. Further, sending one bit over an edge $e\in E$ incurs a cost of $c_e\ge 0$. 1 An algorithm given both the graph $G$ and sets $S_u$ for every $u\in V$ has to decide on a series of computations at the nodes and communication over the edges $e\in E$. 2 At the start of algorithm, every node $u\in V$ has an array $A_u$ of length $M$ such that \[A_u[i]= \begin{cases} 1 &\text{ if } i\in S_u\\ 0&\text{ otherwise} \end{cases} \] for every $i\in [M]$. When the algorithm terminates node $t$ has an array $B$ of length $M$ such that \[B[i]= \begin{cases} 1&\text{ if } A_u[i]=1 \text{ for every } u\in V\\ 0&\text{otherwise}, \end{cases} \] for every $i\in [M]$.
Example (Continued)
For the example input and output from earlier, the corresponding arrays will be (where the arrays are indexed from $1$ to $5$): \[A_u =[1,1,0,0,1],\] \[A_w =[0,1,0,1,0],\] \[A_t =[0,1,1,0,1],\] \[B~~ =[0,1,0,0,0].\]
The communication cost
of the algorithm is defined as follows. Let $b_e$ bits be the total number of bits sent by the algorithm on edge $e$ in (for a given input). Then the communication cost (for the input) is $\sum_{e\in E} b_e\cdot c_e$.
The goal of this problem is to solve the set intersection problem with small communication cost. Unlike previous homeworks, we will start with part (b) first:
- Part (b):
The problem you need to solve
Design an algorithm whose worst-case communication cost over all inputs over the universe $[M]$ is $O(MST(G)\cdot M)$, where $MST(G)$ is the cost of an MST on $G$ (i.e. it is the sum of costs of all edges in the MST). Note that the algorithm is only provided four things: (1) $G$, (2) $t \in V$, (3) cost of each edge $e$ in $G$, and (4) the arrays $A_u$ for every $u\in V$ as input. You should also assume that $G$ is connected.
Some clarifications
Some clarifications on what the algorithm can and cannot do:
- Your algorithm can use the knowledge of $G$ to build a protocol: i.e. the sequence of bits that should be exchanged over any edge (and when to send those bits).
- Once the protocol is built, the rest of the computation happens in a distributed manner. In other words any point of time and any node $u$, what bit(s) $u$ sends over an edge $(u,v)\in E$ depends only on $A_u$ and all the bits that $u$ has received till this point of time. (Again what computation to apply to compute these bits to be sent over is determined by the protocol.)
- At the end of the run of the protocol, the node $t$ has to output the final answer $B$ only based on $A_t$ and all the bits it has received from its neighbors over the entire run of the protocol.
- Part (a): You will argue part (b) for two special cases of $G$:
- Show that when $G$ is a complete graph on $n$ vertices (i.e. $G=([n],E)$ where $E=\{(i,j)|i<j\in [n]\}$) and $c_e=1$ for every $e\in E$, there is a protocol whose worst-case communication cost over all inputs over the universe $[M]$ is $O(n\cdot M)$.
Example (Continued)
Continuing our specific example consider the case when $c_{(u,w)}=c_{(u,t)}=c_{w,t)}=1$. In this case note that $MST(G)=2$. Here is a protocol that works for this special case. Node $u$ sends $A_u=[1,1,0,0,1]$ to $t$ over the edge $(u,t)$ and node $w$ sends $A_w =[0,1,0,1,0]$ to $t$ over the edge $(w,t)$. Note that in this case the algorithm sends $b_{(u,t)}=5$ bits over the edge $(u,t)$ and the algorithm sends $b_{(w,t)}=5$ bits over the edge $(w,t)$ for a total of $2\cdot 5=10$ bits. Now note that node $t$ has all of $A_u,A_w$ and $A_t$ and hence can compute $B$.
Note 1
The algorithm above is not a true distributed algorithm: this is because the algorithm can use the knowledge of the entire $G$. In a true distributed algorithm, the algorithm can only use ``local" information about $G$ (i.e. a node $u$ only knows about its neighbors) but we'll go with (easier) centralized algorithms in this problem.)
In the protocol above, we completely ignored the time needed by the nodes to do their computation and as the problem asks, we only concentrated on the communication cost of the algorithm.
- Show that when $G$ is a star graph on $n$ vertices with $t$ as its center (i.e. $G=([n],E)$ where $E=\{(t,j)|j\neq t\in [n]\}$) and $c_e=1$ for every $e\in E$, there is a protocol whose worst-case communication cost over all inputs over the universe $[M]$ is $O(n\cdot M)$.
Note
We would like to point out two things:
- Since for both the complete graph and star graph above, we have $MST(G)=n-1$, the bound of $O(n\cdot M)$ in both sub-parts is indeed $O(MST(G)\cdot M)$ for the special case of $G$.
- Note that your protocols for part (a) above can use the special structure of the complete graph while your algorithm for part (b) should work for any network topology $G$.
Note 2
The bound of $O(MST(G)\cdot M)$ is the best possible worst-case communication cost of any algorithm. However, proving this fact is a bit non-trivial.
Hint
As pointed out above, the only thing we care about in the above question of the total number of bits communicated over each and we completely ignore any computation time. The latter includes any computation any node does in any run of the protocol as well as the time the algorithm spends on computing whatever it wants on $G$. In other words, for this problem the only thing that matters is the communication cost (and just ignores any computation time). Note that this implies that as long as the total communication is $O(MST(G)\cdot M)$, you can use whatever algorithm you want for computations.
Common Mistake
Some students tend to mistake elements of the set $S_u$ (or the entries in the corresponding $A_u$) with nodes or edges in $G$. Note that the sets $S_u$ (or their corresponding arrays $A_u$) are completely independent of the underlying network topology $G$. In other words, your algorithm should work for any graph $G=(V,E)$ and sets $S_u$ for every $u\in V$
Submission
Submit part (a) and (b) separately
You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit your solution for part (a) separately from part (b).
Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q1(a) PDF to Q2(a) or even Q1(b)), then you will lose ALL points.
We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.
PDF only please
If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.
Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format.
Also the file size has to be at most 3MB.
Grading Guidelines
We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:
- Algorithm Idea for part 1:
7
points. - Algorithm Idea for part 2:
3
points.
- Algorithm idea:
8
points. - Algorithm details:
8
points. - Proof idea (for correctness):
8
points. - Proof details:
8
points. - Bounding communication cost:
8
point. Note:
If you skip part (a) and directly do part (b) then you will get up to5
points each based on your level in Algorithm idea and Proof idea for part (b).
Note
If you do not have labeled and separated out algorithm ideas for both part 1 and part 2 for part (a) and proof idea, algorithm idea, proof details, algorithm details and bounding the communication cost for part (b), you will get a zero(0) in the corresponding parts irrespective of the technical correctness of your solution.
Note
You must explicitly list your sources and collaborators when you upload your submission to Autolab. Note that there are only five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source or did not collaborate with anyone just say
None
.Question 2 (Shortest path in DAGs) [25 points]
The Problem
Consider the following problem. Given a Directed Acyclic Graph (or DAG) $G$ with arbitrary weights on the edges (i.e. they can be positive or negative) with $n$ vertices and $m$ edges, and given two vertices $s$ and $t$, compute the shortest path from $s$ to $t$ in $G$. (You can assume that $G$ is given to you in the adjacency list format.)
Note
In case you missed it, note that the weights can be negative and the run time of the algorithm needs to be a linear in the input size.
Here are the two parts of the problem:
- Part (a): Fix any topological ordering of $G$. Then consider the new graph $G'$ that one obtains by removing all vertices from $G$ that come before $s$ in the chosen topological ordering. (Note that when you remove a vertex, you also remove all of its incident edges.) Then argue that any $s-t$ path in $G$ is also an $s-t$ path in $G'$.
- Part (b): Present an $O(m+n)$ algorithm to compute the shortest path from $s$ to $t$ in $G$.
Submission
Submit part (a) and (b) separately
You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit your solution for part (a) separately from part (b).
Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q1(a) PDF to Q2(a) or even Q1(b)), then you will lose ALL points.
We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.
PDF only please
If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.
Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format.
Also the file size has to be at most 3MB.
Grading Guidelines
We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:
- Proof idea:
10
points.
- Algorithm idea:
3
points. - Algorithm details:
3
points. - Proof of correctness idea:
3
points. - Proof details:
3
points. - Runtime analysis:
3
point.
Note
If you do not have labeled and separated out proof idea, algorithm idea, proof details, algorithm details and runtime analysis for part (b), you will get a zero(0) irrespective of the technical correctness of your solution.
Note
You must explicitly list your sources and collaborators when you upload your submission to Autolab. Note that there are only five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source or did not collaborate with anyone just say
None
.Question 3 (Programming Assignment) [25 points]
Note
This assignment can be solved in either Java, Python or C++ (you should pick the language you are most comfortable with). Please make sure to look at the supporting documentation and files for the language of your choosing.
The Problem
In this problem, we will explore node-weighted graphs.
We are given a starting node $s$ and an ending node $e$, for some undirected graph $G$ with $n$ nodes. Further, each node $u$ has its own weight, $w_u$
(0 <= wu <= 50)
. The graph is formatted as an adjacency list, with the leading number being the node weight, meaning that for each node, $u$, we can access all of $u$'s neighbors, as well as $u$'s weight. The goal is to output the array of nodes in a shortest path from $s$ to $e$. (The weight of a path $P=(u_1,\dots,u_{\ell})$ is $\sum_{i=1}^{\ell} w_{u_i}$ and the shortest path is the one with the minimum weight.)Input
The input file is given with the first two lines being the start node and the end node, and the remainder of the file is an adjacency list with node weights for graph $G$ (we assume that the set of vertices is $\{0,1,\dots,n-1\}$). The adjacency list assumes that the current node is the index of the line under consideration. For instance the third line of the input file has the list of all nodes adjacent to node 0 (the first two being the start and end nodes) and so on.
s <- Starting node (some node between u0 and un-1). e <- Ending node (some node between u0 and un-1). w0 u1 u4 u6 <- All nodes that share edges with u0 w1 u3 uu <- All nodes that share edges with u1 w2 u0 <- All nodes that share edges with u2 . . . wn-1 u0 u4 u2 u7 <- All nodes that share edges with un-1
Output
The output should be a list/ArrayList/vector (depending on your language of choice) where every 2 adjacent nodes have an edge between them. The 1st node in the output should be startNode and the last node in the output should be endNode.
For example, if startNode was 2 and endNode was 7, and the minimum path between them is 2-9-8-4-7 then you should output [2, 9, 8, 4, 7]. If there is no path between the start and end nodes, output an empty output.
[s, n0 n1 n2... nm, e] <- Each entry is a node on the shortest path between 's' and 'e'
Note
There is no guarantee that the graph is completely connected. In the situation where a node can not be reached from the starting node, you should return an empty array.
Note
For this problem there can be multiple shortest paths so it is possible that your solution (assuming it is correct) might not match the corresponding sample output.
Hint
The best possible algorithm for this problem that we are aware of runs in time $O((m+n)\log{n})$ where $m$ in the total number of edges and $n$ is the number of vertices.
Note
Both the input and output parsers in each of the three languages are already written for you. Note that you have to work with the input data structures provided (which will come pre-loaded with the data from an input file).
Addition is the only change you should make
Irrespective of what language you use, you will have to submit just one file. That file will come pre-populated with some stuff in it. You should not change any of those things because if you do you might break what the grader expects and end up with a zero on the question. You should of course add stuff to it (including helper functions and data structures as you see fit).
Download Java Skeleton CodeDirectory Structure
├── src │ └── ub │ └── cse │ └── algo │ ├── Graph.java │ ├── Driver.java │ └── Solution.java ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ └── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given 3 coding files:
Driver.java
,Graph.java
andSolution.java
.Driver.java
takes the input file, parses it and creates an instance of the class and prints result in output file.Graph.java
reads the input file in a parsable format. You only need to update theSolution.java
file. You may write your own helper methods and data structures in it.The testcases folder has 3 input files and the outputs folder contains their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Hint
For this problem you might find the Java PriorityQueue or the TreeSet data structure useful. Note that not all operations in
PriorityQueue
have the time bound that you would expect from theory. See the "Implementation note" from the docs page for more.Method you need to write:
/** * This method must be filled in by you. You may add other methods and subclasses as you see fit, * but they must remain within the Solution_Solution class. */ public ArrayList
outputPath(){ /* * Find the smallest weighted path between _startNode and _endNode * The first number of graph's adjacency list is the weight of it's node */ return new ArrayList (); } The
Solution
class has 3 instance variables:_startNode
which is of type int and stores the starting node id._endNode
which is of type int and stores the ending node id.-
graph
which is of typeHashMap<Integer, ArrayList<Integer>>
where the key is the node id and the value is an arraylist of adjacent nodes. The 0th index of each arraylist is the weight of the key node
ArrayList
where every 2 adjacent nodes have an edge between them. Also, the 1st node in theArrayList
should be_startNode
and the last node in theArrayList
should be_endNode
. For example, if startNode was 2 and endNode was 7, and the shortest path between them is 2-9-8-4-7 then you should output [2, 9, 8, 4, 7]. If there is no path between the start and end nodes, output an emptyArrayList
.Compiling and executing from command line:
Assuming you're in the Template directory. Run javac src/ub/cse/algo/*.java to compile.
To execute your code on
input1.txt
, run java -cp "src" ub.cse.algo.Driver testcases/input1.txt. The output will be printed tostdout
.Submission
You only need to submit
Solution.java
to Autolab.Download Python Skeleton CodeDirectory Structure
├── Driver.py ├── Solution.py ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ └── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given two coding files:
Driver.py
andSolution.py
.Driver.py
takes the input file, parses it and calls yourSolution.py
method on data structures it creates. It then prints the computed shortest path. You only need to update theSolution.py
file. You may write your own helper methods and data structures in it.The testcases folder has 3 input files and the output folder has their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Hint
For this problem you might find the Python heapq module useful.
Method you need to write:
def outputPath(self): ################# YOUR CODE GOES HERE ################## return [] #Return empty
The
Solution
method is called with 3 arguments.start_node
which is the unique id of the start node.end_node
which is the unique id of the end node.graph
which is a dictionary where the key is a unique node id and the value is a list of nodes that share an edge with the key node. The 0th index of each list is the weight of the key node
The output should be a list where every 2 adjacent nodes have an edge between them. Also, the 1st node in the list should be
start_node
and the last node in the list should beend_node
. For example, ifstart_node
was 2 andend_node
was 7, and the minimum path between them is 2-9-8-4-7 then you should output [2, 9, 8, 4, 7]. If there is no path between the start and end nodes, output an empty list.Executing from command line:
Assuming you're in the same directory level as
Driver.py
and you want to run your code on theinput1.txt
. Run python Driver.py testcases/input1.txt.Submission
You only need to submit
Solution.py
to Autolab.Download C++ Skeleton CodeDirectory Structure
├── Driver.cpp ├── Solution.cpp ├── Graph.h ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ └── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given 3 source files:
Driver.cpp
,Graph.h
, andSolution.cpp
.Driver.cpp
takes the input file, parses it usingGraph.h
, and creates an instance of the classSolution
and calls theoutputPath()
method on it. It then prints the computed shortest path. You only need to update theSolution.cpp
file. You may write your own helper methods and data structures in it.The testcases folder has 3 input files and the output folder has their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Hint
For this problem you might find the STL priority_queue or the set container useful.
Method you need to write:
vector<int> Solution::outputPath() { /** * Implement the solution in this function * Return the variable "m_output_path" after you fill it with the appropriate * path from the start node to the end node. * * Don't change any files but this one! */ return m_outputPath; }
Note: The output for this function must output a
vector
ofint
The
Solution
class has 4 member variables.m_graph
which is a map where each key is a unique node id as an int and the value is a corresponding list of adjacent nodes for that key as a vector. The first value (0th index) is the weight associated with the key node.m_start_node
which is an integer representing the node id of the start nodem_end_node
which is an integer representing the node id of the end nodem_output_path
The output should be a vector where every 2 adjacent nodes have an edge between them. Also, the 1st node in the vector should bem_start_node
and the last node in the vector should bem_end_node
. For example, ifm_start_node
was 2 andm_end_node
was 7, and the minimum path between them is 2-9-8-4-7 then you should output [2, 9, 8, 4, 7]. If there is no path between the start and end nodes, output an empty vector.
Compiling and executing from command line:
Assuming you're in the same directory level as
Driver.cpp
. Run g++ -std=c++11 Driver.cpp to compile.To execute your code on
input1.txt
, run ./a.out testcases/input1.txt.Submission
You only need to submit
Solution.cpp
to Autolab.Grading Guidelines
We will follow the usual grading guidelines for programming questions.
For those of you who are feeling a little ambitious
For the top 3 submissions in the scoreboard in Java, the top 2 submissions in the scoreboard in Python, and the top 1 submission in the scoreboard in C++, we are offering 2.5 bonus points. But be warned! You should not be spending too much time on this. We rather you work on Questions 1 and 2 above.
- Show that when $G$ is a complete graph on $n$ vertices (i.e. $G=([n],E)$ where $E=\{(i,j)|i<j\in [n]\}$) and $c_e=1$ for every $e\in E$, there is a protocol whose worst-case communication cost over all inputs over the universe $[M]$ is $O(n\cdot M)$.