Homework 2
Due by 8:00pm, Tuesday, February 18, 2025.
Make sure you follow all the homework policies.
All submissions should be done via Autolab.
Sample Problem
The Problem
This problem is just to get you thinking about asymptotic analysis and input sizes.
An integer $n\ge 2$ is a prime, if the only divisors it has is $1$ and $n$. Consider the following algorithm to check if the given number $n$ is prime or not:
What is the function $f(n)$ such that the algorithm above has running time $\Theta(f(n))$? Is this a polynomial running time-- justify your answer. (A tangential question: Why is the algorithm correct?)
Proof Idea
We will show that $f(n)=\sqrt{n}$ by noting that the loop runs for at most $\sqrt{n}$ times and each loop takes $O(1)$ time. This shows that the algorithm runs in time $O(\sqrt{n})$. To prove a run time of $\Omega(\sqrt{n})$ we consider the case when $n$ is a prime. However, this is not a polynomial runtime since the input size is $\Theta(\log{n})$ (and not $n$).
Proof Details
We will show that $f(n)=\sqrt{n}$. For the upper bound, note that the algorithm performs at most $\sqrt{n}$ iterations. Further, for each such iteration, we spend $O\left(1\right)$ time. (This is because we have assumed that a division takes $O(1)$ time.) Thus, the algorithm has a running time of $O(\sqrt{n})$. For the lower bound, consider the case when $n$ is a prime. Note that in this case, the algorithm will iterate $i$ through to the largest integer smaller than $\sqrt{n}$, i.e. there will be $\Omega(\sqrt{n})$ iterations. Each iteration involves at least one computation, i.e. $\Omega(1)$ time. Thus, overall the algorithm has a running time of $\Omega(\sqrt{n})$ (and hence, $\Theta(\sqrt{n})$).
However, the algorithm does not have a polynomial running time. This is because the running time is measured in terms of the input size. Note that we need $\lceil \log{n}\rceil$ many bits to represent $n$, i.e. the input size is $\Theta(\log{n})$. In other words, the running time of $\Theta(\sqrt{n})$ is exponential (in terms of the input size). (If you did notice this, then great. If not, do not feel bad-- I got fooled the first time this question was put to me in a class. We tend to assume that $n$ is the input size of a problem, as most of the time that is indeed the case. However, this problem was meant to point out that you should always question you assumptions!)
This algorithm is correct because, by the definition of a prime number, a prime $n$ must have no other divisor besides $1$ and $n$, and this algorithm tests directly whether any whole numbers $<n$ divide $n$. We need to check only up to $\sqrt{n}$ because any divisor between $\sqrt{n}$ and $n$ would divide $n$ with a quotient between $2$ and $\sqrt{n}$. The quotient would also divide $n$ and would already be in the set of whole numbers between $2$ and $\sqrt{n}$. So we must only check one of each pair of complementary divisors.
Submission
You will NOT submit this question. This is for you to get into thinking more about asymptotic analysis.
Question 1 (Reformer) [50 points]
The Problem
The stable marriage problem does not handle divorces. This is because we assume everyone is interested in everyone else of the opposite sex and we assume that the preferences do not change.
In this problem, we will see the effect of changes in preferences in the outcome of the Gale-Shapley algorithm. In this problem, assume that men propose to women (as done in the book and unlike what we did in the lecture) and consider the following scenario.
Given an instance of the stable marriage problem (i.e. set of women $W$ and the
set of men $M$ along with their preference lists: $L_w$ and $L_m$ for every
$w\in W$ and $m\in M$ respectively), call a woman $w\in W$ a reformer
if the following property holds. There exists an $L'_w$ such that
if $w$ changes her preference list to $L'_w$ (from $L_w$) then the Gale-Shapley
algorithm matches everyone to someone else. In other words, let $S_{\textrm{orig}}$
be the stable marriage output by the Gale-Shapley algorithm for the
original input and $S_{\mathrm{new}}$ be the stable marriage output by
the Gale-Shapley algorithm for the new instance of the problem where $w$'s
preference list is replaced by $L'_w$ (but everyone else has the same
preference list as before). Then
\[S_{\mathrm{orig}}\cap S_{\mathrm{new}}=\emptyset.\]
Here are the two parts:
- Part (a): You will first solve a simpler version of the problem. For every integer $n\ge 3$ argue the following: There exists an instance of the stable marriage problem with $n$ men and $n$ women such that there is a woman $w$ such that if she changes her preference list from $L_w$ to $L'_w$ then we have \[S_{\mathrm{orig}}\neq S_{\mathrm{new}}.\]
- Part (b): For every integer $n\ge 3$ argue the following: There exists an instance of the
stable marriage problem with $n$ men and $n$ women such that there is a woman who
is a
reformer
.Common Mistake
A common mistake for this problem is to present an instance that solves part (a) but not part (b).
Note
To get full credit, you should be able to present an instance for every $n\ge 3$ in full detail. In other words, you have to present a "family" of examples (i.e. for each $n\ge 1$, you have to present the original $2n$ preference lists).
Hint
Note that you have the freedom to decide how the Gale-Shapley algorithm chooses a free man at any iteration of the while loop in case there is more than one choice.
Submission
Submit part (a) and (b) separately
You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit your solution for part (a) separately from part (b).
Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q1(a) PDF to Q1(b) or even Q2(a) or Q2(b)), then you will lose ALL points.
We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.
PDF only please
If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.
Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format.
Also the file size has to be at most 3MB.
Grading Guidelines
We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:
- Proof idea:
10
points for outlining your instances for every $n\ge 3$ as well as specifying what $S_{\mathrm{orig}}$ and $S_{\mathrm{new}}$ are for each instance.
- Proof idea:
20
points for outlining your instances for every $n\ge 3$. - Proof details:
20
points for a detailed description of your instance for each $n\ge 3$. (This is worth $15$ points.) You also have to argue why your family of instance proves what is needed to be proven-- in other words, you need to argue why the Gale Shapley algorithm outputs your claimed $S_{\mathrm{old}}$ and $S_{\mathrm{new}}$ (note that this implies you also need to specify $S_{\mathrm{orig}}$ and $S_{\mathrm{new}}$). This can be at the level of proof idea: proof details for this part are not needed. (This part is worth $5$ points.) However, note that if the grader cannot understand why your construction works immediately, then you might (and most probably will) lose points. So it is in your best interest to make sure that enough intuition is given on why your construction works.
Note
If you do not separate out proof idea and proof details for part (b), you will get a zero(0) irrespective of the technical correctness of your solution..
Note
Your must explicitly list your sources and collaborators when you upload your submission to Autolab. Note that there are only five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source or did not collaborate with anyone just say None
.
Question 2 (Big G is in town) [25 points]
The Problem
TheBig G
company in the bay area decides it has not been doing enough to hire CSE grads from UB so it decides to do an exclusive recruitment drive for UB students. The Big G
decides to fly over $n$ CSE majors from UB to the bay area during December for on-site interview on a single day. The company sets up $m$ slots in the day and arranges for $n$ Big G
engineers to interview the $n$ UB CSE majors. (You can and should assume that $m>n$.) The fabulous scheduling algorithms at Big G
's offices draw up a schedule for each of the $n$ majors so that the following conditions are satisfied:
- Each CSE major talks with every
Big G
engineer exactly once; - No two CSE majors meet the same
Big G
engineer in the same time slot; and - No two
Big G
engineers meet the same CSE major in the same time slot.
In between the schedule being fixed and the CSE majors being flown over, the Big G
engineers were very impressed with the CVs of the CSE majors (including, ahem, their performance in CSE 331) and decide that Big G
should hire all of the $n$ UB CSE majors. They decide as a group that it would make sense to assign each CSE major $S$ to a Big G
engineer $E$ in such a way that after $S$ meets $E$ during their scheduled slot, all of $S$'s and $E$'s subsequent meetings are canceled. Given that this is December, the Big G
engineers figure that taking the CSE majors out to the nice farmer market at the ferry building in San Francisco during a sunny December day would be a good way to entice the CSE majors to the bay area.
In other words, the goal for each engineer $E$ and the major $S$ who gets assigned to them, is to truncate both of their schedules after their meeting and cancel all subsequent meeting, so that no major gets stood-up. A major $S$ is stood-up if when $S$ arrives to meet with $E$ on their truncated schedule and $E$ has already left for the day with some other major $S'$.
Your goal in this problem is to design an algorithm that always finds a valid truncation of the original schedules so that no CSE major gets stood-up.
To help you get a grasp of the problem, consider the following example for $n=2$ and $m=4$. Let the majors be $S_1$ and $S_2$ and the Big G
engineers be $E_1$ and $E_2$. Suppose $S_1$ and $S_2$'s original schedules are as follows:
CSE Major | Slot 1 | Slot 2 | Slot 3 | Slot 4 |
$S_1$ | $E_1$ | free | $E_2$ | free |
$S_2$ | free | $E_1$ | free | $E_2$ |
In this case the (only) valid truncation is for $S_1$ to get assigned to $E_2$ in the third slot and for $S_2$ to get assigned to $E_1$ in the second slot. So the truncated schedule will look like this:
CSE Major | Slot 1 | Slot 2 | Slot 3 | Slot 4 |
$S_1$ | $E_1$ | free | $E_2$ (truncate here) | |
$S_2$ | free | $E_1$ (truncate here) |
Here are the two parts:
- Part (a): In this part, we will see two algorithms that do not work. Your job will be, for each algorithm, to give an example input where the algorithm does not work (and briefly explain why that is the case).
Engineer-centric Algorithm
: the first algorithm considers all engineers $E$ and say $E$'s last meeting is with CSE major $S$, then both $E$ and $S$'s schedules are truncated after their meeting.CSE major-centric Algorithm
: the second algorithm considers all CSE majors $S'$ and say $S'$s last meeting is with engineer $E'$, then both $E'$ and $S'$'s schedules are truncated after their meeting.
- Part (b): Design an efficient algorithm that always finds a valid truncation of the original schedules so that no CSE major gets stood-up.
Hint
In real life, you will almost never come across a problem whose description will match exactly with one you will see in this course. More often, you will come across problems that you have seen earlier but are stated in a way that don't look like the version you have seen earlier. One way to solve this problem would be to to simulate that situation. In algorithms-speak, you can to reduce the problem here to one that you have seen already.
Common Mistake
A common mistake for this part is to present an algorithm that does not take into account both the engineers' and CSE majors' schedules together.
Submission
Submit part (a) and (b) separately
You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit you solution for part (a) separately from part (b).
We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.
PDF only please
Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). If Autolab cannot display your file, then you will get a zero (0) on the entire question. Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format. However, after submission, Autolab will try and display your non-PDF submission and it should give an error message then. Also the file size has to be at most 3MB.
Grading Guidelines
We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:
- For both algorithms, Proof idea:
5
points each. In both cases you need to supply a counter-example and argue why your example shows that the corresponding algorithm is not correct.
- Algorithm idea:
4
points for explaining the idea behind your algorithm or a reduction to a familiar problem. - Algorithm details:
3
points for providing the specific details of the algorithm or the reduction. - Proof idea:
4
points for a proof idea that you can always achieve a valid truncation. - Proof details:
4
points for providing the proof details.
Note
If you do not have separated out algorithm idea, algorithms details, proof idea, proof details for part (b), you will get a zero(0) irrespective of the technical correctness of your solution..
Note
Your must explicitly list your sources and collaborators when you upload your submission to Autolab. Note that there are only five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source or did not collaborate with anyone just say None
.
Question 3 (Programming Assignment) [25 points]
Note
This assignment can be solved in either Java, Python or C++ (you should pick the language you are most comfortable with). Please make sure to look at the supporting documentation and files for the language of your choosing.
The Problem
In this problem we will consider the extension of the stable matching problem that more closely resembles the resident matching program that NRMP administers.
We are given $m$ hospitals and $n$ medical students. Each hospital has a ranking of all the students in order of preference, and each student has a ranking of all the hospitals in order of preference. Unlike the stable matching problem, a hospital can have more than one open slots (but a student can be assigned at most one hospital). We will assume that there are more students than there are slots available in all the $m$ hospitals put together (they can be equal). The goal is to output a stable assignment of students to hospitals. Note that since there are more students than hospitals, some students may be not assigned to any hospital. Also no hospital is assigned more students than the number of openings it has.
An assignment of students to hospital is stable if neither of the following situation arises:
- First type of instability: There are students $s$ and $s'$, and a hospital $h$, so that:
- $s$ is assigned to $h$, and
- $s'$ is assigned to no hospital, and
- $h$ prefers $s'$ to $s$.
- Second type of instability: There are students $s$ and $s'$, and hospitals $h$ and $h'$, so that:
- $s$ is assigned to $h$, and
- $s'$ is assigned to $h'$, and
- $h$ prefers $s'$ to $s$, and
- $s'$ prefers $h$ to $h'$.
Input
The input is an instance of the national resident problem in a text file of the following format:
m <- Number of hospitals n <- Number of students s01 s11 s21 s31 ... sn1 <- Preference of the 1st hospital (most preferred first, s01 is number of available slots) s02 s12 s22 s32 ... sn2 <- Preference of the 2nd hospital (most preferred first, s02 is number of available slots) s03 s13 s23 s33 ... sn3 <- Preference of the 3rd hospital (most preferred first, s03 is number of available slots) s04 s14 s24 s34 ... sn4 <- Preference of the 4th hospital (most preferred first, s04 is number of available slots) . . . s0m s1m s2m s3m ... snm <- Preference of the mth hospital (most preferred first, s0m is number of available slots) h11 h21 h31 ... hm1 <- Preference of the 1st student (most preferred first) h12 h22 h32 ... hm2 <- Preference of the 2nd student (most preferred first) h13 h23 h33 ... hm3 <- Preference of the 3rd student (most preferred first) h14 h24 h34 ... hm4 <- Preference of the 4th student (most preferred first) . . . h1n h2n h3n ... hmn <- Preference of the nth student (most preferred first)
For example
3 <- Number of hospitals 5 <- Number of students 1 2 3 5 1 4 <- Preference of the 1st hospital (most preferred first with first index is # available slot-- 1) 1 5 1 2 4 3 <- Preference of the 2nd hospital (most preferred first with first index is # available slot-- 1) 2 5 2 3 1 4 <- Preference of the 3rd hospital (most preferred first with first index is # available slot-- 2) 2 1 3 <- Preference of the 1st student (most preferred first) 3 2 1 <- Preference of the 2nd student (most preferred first) 3 1 2 <- Preference of the 3rd student (most preferred first) 1 2 3 <- Preference of the 4th student (most preferred first) 1 2 3 <- Preference of the 5th student (most preferred first)
Note
A hospital can have more than one available slot which is stored in the first index of its preference list.
Output
The output is an instance of stable matchings for the input in a text file of the following format:
(h1, s1) <- Pairing of the form (h,s) (h2, s2) <- Pairing of the form (h,s) (h3, s3) <- Pairing of the form (h,s) . . . (hn, sn) <- Pairing of the form (h,s)
For example:
(1, 5) <- Pairing of the form (h,s) (2, 1) (3, 2) (3, 3)
Note
We note that we will assume that a student that is not assigned to any hospital is not part of the output. More importantly, please note that there is more than one possible stable assignments possible for any given input. I.e. even if your algorithm is correct, its output might not match the given sample output. However, as long as your output is a stable assignment, you should be fine.
Hint
The best possible algorithm for this problem that we are aware of runs in time $O(m\cdot n)$.
Note
Both the input and output parsers in each of the three languages are already written for you. Note that you have to work with the input data structures provided (which will come pre-loaded with the data from an input file).
Addition is the only change you should make
Irrespective of what language you use, you will have to submit just one file. That file will come pre-populated with some stuff in it. You should not change any of those things because if you do you might break what the grader expects and end up with a zero on the question. You should of course add stuff to it (including helper functions and data structures as you see fit).
Directory Structure
├── src │ ├── ub │ ├── cse │ ├── algo │ ├── Driver.java │ ├── HW2Utility.java │ ├── Match.java │ ├── PreferenceLists.java │ ├── Solution.java ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ ├── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given five coding files: Driver.java
, HW2Utility.java
, Match.java
, PreferenceLists.java
and Solution.java
. Driver.java
takes the input file, parses it and creates an instance of the class and prints result in output file. You only need to update the Solution.java
file. You may write your own helper methods and data structures in it.
The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Method you need to write:
/** * This method must be filled in by you. You may add other methods and subclasses as you see fit, * but they must remain within the HW2_Student_Solution class. * @return Your set of stable matches. Order does not matter. */ public ArrayList<Match>> getMatches() { return new ArrayList<Match>>(); }
Note: The output for this function must output an Arraylist
of Match
Do not include students that did not get matched with a hospital in the output.
The Solution
class has 4 instance variables.
_nHospital
which is of type int and stores number of hospitals._nStudent
which is of type int and stores number of students._hospitalList
which is of typeHashMap<Integer, ArrayList<Integer>>
and stores the preference lists of hospitals. Please note that the front of theArrayList<Integer>
(index 0) denotes the number of available slots._studentList
which is of typeHashMap<Integer, ArrayList<Integer>>
and stores the preference lists of students. Please note that the front of theArrayList<Integer>
(index 0) denotes the most preferred hospital.
The Other files
Match.java
defines the Match
class. This should be fairly intuitive. Below is the entire content of the class:
public class Match implements Comparable{ public Integer student; public Integer hospital; Match(Integer student, Integer hospital) { this.student = student; this.hospital = hospital; } @Override public boolean equals(Object obj) { Match compare = (Match) obj; return (student.equals(compare.student)) && (hospital.equals(compare.hospital)); } @Override public String toString() { return "(" + student + ", " + hospital + ")"; } @Override public int compareTo(Match other) { return this.hospital.compareTo(other.hospital); } }
The file HW2Utility.java
handles some of the background stuff: e.g. readFile
reads in the file passed as a command line argument to Driver.java
and populates the preference lists within the HW2Utility
class.
Compiling and executing from command line:
Assuming you're in the same directory level as src
. Run javac src/ub/cse/algo/*.java to compile.
To execute your code on input1.txt
, run java -cp "src" ub.cse.algo.Driver testcases/input1.txt.
Submission
You only need to submit Solution.java
to Autolab.
Directory Structure
├── Driver.py ├── Utility.py ├── Match.py ├── PreferenceLists.py ├── Solution.py ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ ├── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given five coding files: Driver.py
, Utility.py
, Match.py
, PreferenceLists.py
and Solution.py
. Driver.py
takes the input file, parses it and creates an instance of the class Utility.py
and calls the get_Matches()
method on it. You only need to update the Solution.py
file. You may write your own helper methods and data structures in it.
The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Method you need to write:
class Solution: def __init__(self, m, n, hospital_list, student_list, hosp_open_slots): """ The constructor exists only to initialize variables. You do not need to change it. :param m: The number of hospitals :param n: The number of students :param hospital_list: The preference list of hospitals, as a dictionary. :param student_list: The preference list of the students, as a dictionary. :param hosp_open_slots: Open slots of each hospital """ self.m = m self.n = n self.hospital_list = hospital_list self.student_list = student_list self.hosp_open_slots = hosp_open_slots def get_matches(self): return []
The Solution
method is called with 5 variables.
m
which is the number of hospitals the input has.n
which is the number of students the input has.hospital_list
which is adictionary
that maps from a hospital's ID to its preference list. Please note that the front of the list (index 0)hospital_list[hospital]
(for everyhospital
) denotes the most preferred student.student_list
which is adictionary
that maps from a student's ID to its preference list. Please note that the front of the list (index 0)student_list[student]
(for everystudent
) denotes the most preferred hospital.hosp_open_slots
which is adictionary
that maps from a hospital's ID to its number of open spots.
Your method is expected to return a list
of Match
objects. The Match
objects do NOT have to be in any specific order.
Do not include students that did not get matched with a hospital in the output.
Executing from command line:
Assuming you're in the same directory level as Driver.py
and you want to run your code on the input1.txt
. Run python Driver.py testcases/input1.txt.
Submission
You only need to submit Solution.py
to Autolab.
Directory Structure
├── Driver.cpp ├── HW2Utility.h ├── Utility.h ├── PreferenceLists.h ├── Solution.cpp ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ ├── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given five coding files: Driver.cpp
, HW2Utility.cpp
, Utility.h
, PreferenceLists.h
, and Solution.cpp
. Driver.cpp
takes the input file, parses it and creates an instance of the class Solution
and calls the get_matches()
method on it. You only need to update the Solution.cpp
file. You may write your own helper methods and data structures in it.
The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Method you need to write:
Solution(int _m, int _n, map<int, vector<int>> _students, map<int, vector<int>> _hospitals) { m = _m; n = _n; students = _students; hospitals = _hospitals; } vector<pair<int,int>> getMatches() { stableMatchings.clear(); return stableMatchings; }
Note: The output for this function must output a vector
of pair<int,int>
There may be more than 1 pair for a hospital because hospitals may have multiple open positions.
Do not include students that did not get matched with a hospital in the output.
The Solution
class has 4 member variables.
n
an integer containing the number of studentsm
integer containing the number of hospitalsstudents
which is a map containing the preference list for each student.hospitals
which is a map containing the preference list for each hospital.
Compiling and executing from command line:
Assuming you're in the same directory level as Driver.cpp
. Run g++ -std=c++11 Driver.cpp to compile.
To execute your code on input1.txt
, run ./a.out testcases/input1.txt.
Submission
You only need to submit Solution.cpp
to Autolab.
Grading Guidelines
We will follow the usual grading guidelines for programming questions.