Assignment 2, Due at 11:59pm, Sunday Sep 14


Overview of what to do

You are to write a C++ program that does roughly the following. The program reads a file -- in a specific format to be described below -- that stores the set of edges of a graph. The file might (or might not) store some edges more than once. The task of the program is to count the number of distinct edges. There are many ways to do this task. You will implement two different algorithms for performing the task. (Can you guess which algorithm is faster for large input graphs?)

Details on what to do

You are to write a C++ program that does the following.
  1. It keeps reading user's inputs, line by line. Each input line the user types is supposed to be in one of the following three forms:
    vba filename
    sba filename
    • exit tells your program to quit
    • vba and sba do the task described above using the aforementioned algorithms vba and sba.
    • filename is the name of a file that stores edges of a graph in Stanford SNAP file format. The file looks something like this:
      # Graph : p2p-Gnutella04.txt 
      # Directed Gnutella P2P network from August 4 2002
      # Nodes: 6 Edges: 11
      # FromNodeId	ToNodeId
      1	3
      1	4
      2	4
      3	1
      6	2
      3	5
      3	6
      4	2
      1	3
      4	4
      6	2
      The lines that start with # are comment lines, you will ignore those. The edges are stored using the format a b, where a and b are two integers (two vertices) separated by a tab character. Note that an edge a b might occur several times in the form of a b or b a. In the example file above, the output is 7, because even though there are 11 edges stored in the file, the edge (1,3) is stored three times, the edge (2,4) is stored twice, the edge (2,6) is stored twice.
  2. The code base: to burden you less with parsing command lines, I have written a skeleton of the above program, leaving exactly the two functions vba and sba empty. You are to download the code base by typing:
    tar -xvf A2.tar
    cd A2
    Please read all the code in the code base, but you can only modify one file: algos.cpp to implement the two functions that were left empty there. You can compile the program by typing make. The Makefile is already written for you.
  3. The test data: you can download the test data by obtaining real graphs from the Stanford SNAP data set. For example, here are some of the smaller data sets for you to test your implementation on:
    gunzip p2p-Gnutella04.txt.gz
    gunzip wiki-Vote.txt.gz
    gunzip email-EuAll.txt.gz
    Please feel free to explore other data sets from SNAP. Some of them are very large, which makes it fun to run your program on and see how long it takes.

    My implementation

    • I have written a program called edgecount following the above specification and compiled it under timberlake. You can download and run it (in timberlake) to see how it works.
      If needed, change its permission so that it's executable:
      chmod 700 edgecount

    How to submit

    Submit only the algos.cpp file. We will put your submission into a directory that has all other files in the codebase and compile using make
    submit_cse250 algos.cpp
    Note again that the submission only works if you logged in to your CSE account and the cpp file is there. All previous things can be done at home, as long as you remember to upload the final file to your CSE account and run the submit script from there.


    • You'll get 0 point if the program doesn't compile using /usr/bin/g++ in timberlake. We grade mostly with an automatic script, and due to extreme lack of personels we don't have the resource to read partial solutions.
    • 10 points if the exit command works. (It already worked in the code-base I provided. So these 10 points are free. If you do nothing, just submit the algos.cpp file as is, you'll get 10 points)
    • 45 points if the vba command works and runs the vba algorithm as described above.
    • 45 points if the sba command works and runs the sba algorithm as described above.

    Supporting materials

    • Converting string to int: to convert a string to an integer in C++ (before C++11), there are two typical ways:
      #include <iostream>
      #include <sstream>
      #include <cstdlib> // for atoi()
      int main()
          std::string s = "1234";
          std::string t = "4567";
          int i = atoi(s.c_str());
          std::cout << "i = " << i << std::endl;
          std::istringstream iss(s);
          int j;
          iss >> j;
          std::cout << "j = " << j << std::endl;
          iss.clear(); // clear previous stream
          iss.str(t);  // set t to be characters in the new stream
          int k;
          iss >> k;
          std::cout << "k = " << k << std::endl;
          return 0;
    • Edges as pairs of integers: the best way to store edges of a graph is to treat each edge as a pair of integers. C++ has a pair type that you can use. (here are some examples on generic usage of pair.)
      // from this example, you can see that pairs are compared lexicographically
      #include <iostream>
      int main()
          std::pair<int, int> p1; // p1 is a pair of ints
          std::pair<int, int> p2; // p2 is also a pair of ints
          std::pair<int, int> p3; // p3 is also a pair of ints
          p1 = std::make_pair(1, 5);
          p2 = std::make_pair(5, 1);
          p3 = std::make_pair(1, 5);
          std::cout << "p1 " << (p1 == p2? "=" : "not =") << " p2" << std::endl;
          std::cout << "p1 " << (p1 < p2? "<" : "not <") << " p2" << std::endl;
          std::cout << "p1 " << (p1 == p3? "=" : "not =") << " p3" << std::endl;
          std::cout << "p1 " << (p1 < p3? "=" : "not <") << " p3" << std::endl;
          return 0;
    • Inserting elements into a set. set is one of the most straightforward data structures to use.
      #include <iostream>
      #include <set>
      int main()
          std::pair<int, int> p1;
          std::pair<int, int> p2;
          std::pair<int, int> p3;
          std::pair<int, int> p4;
          p1 = std::make_pair(1, 5);
          p2 = std::make_pair(5, 1);
          p3 = std::make_pair(1, 5);
          p4 = std::make_pair(2, 3);
          std::set<std::pair<int, int> > edgeSet;
          // the following prints "# of inserted edges = 3", do you see why?
          std::cout << "# of inserted edges = " << edgeSet.size() << std::endl;
          return 0;
    • Sort, traverse a vector, and the erase function:
      #include <iostream>
      #include <vector>
      #include <algorithm> // for sort()
      using namespace std; // I'm lazy now, so let's get rid of all the std::
      void printVector(vector<pair<int, int> > & myVec)
          // traverse the vector
          vector<pair<int, int> >::iterator i;
          for (i = myVec.begin(); i != myVec.end(); ++i) {
              cout << "(" << i->first << ", " <<  i->second << ") ";
          cout << endl;
      int main()
          pair<int, int> p1;
          pair<int, int> p2;
          pair<int, int> p3;
          pair<int, int> p4;
          p1 = make_pair(2, 4);
          p2 = make_pair(5, 1);
          p3 = make_pair(2, 4);
          p4 = make_pair(4, 2);
          vector<pair<int, int> > pairVector;
          cout << "# of inserted pairs = " << pairVector.size() << endl;
          sort(pairVector.begin(), pairVector.end());
          // finally, remove the first duplicate pair that it found
          vector<pair<int, int> >::iterator i = pairVector.begin();
          while (i != pairVector.end()) {
              vector<pair<int, int> >::iterator j = i+1;
              if (j != pairVector.end() && *j == *i) {
                  // remove the pair pointed to by i as a duplicate was found
                  i = pairVector.erase(i);
          return 0;