CSE396          Algorithm for FA to Regular Expression               Fall 2003


Given a DFA/NFA/GNFA N, and any states p,q of N, here is an algorithm to
find a regular expression for L_{p,q} = {x: N can process x from p to q}.
Recalling that L(N) = the union of L_{s,q} over all q \in F, you only need
to "+ together" the regexps for each L_{s,q} to get one for L(N).
(Alternately, you can add to N a new final state "f" and add arcs 
(q,\epsilon,f) for all "old" final states q, and then just do the algorithm
once to get L_{s,f}.  Sometimes this is quicker, sometimes not.)


1. Number p = 1, q = 2 (unless q = p), and the remaining states up to n.

2. Define an nxn matrix T of type "REGULAR EXPRESSION OVER \Sigma"
   such that each entry T(i,j) has the regular expression on the arc from
   state i to state j, or T(i,j) = \emptyset if there is no such arc.
   If there are multiple arcs from i to j, "+ together" their regexps.

3. Now execute the following routine:

    for (int k = n; k >= q+1; k--) {       // bypass state k 
	for (int i = 1; i < k; i++) {
	    if (T(i,k) != \emptyset) {     // bypass edge i

/*--------------------------------------------------------------------
To bypass the edge from i to k, take every other state j from 1 to k-1,
and update the label T(i,j) so that it accounts for all substrings that
could cause N to go from i to k, "twiddle around" in state k, and come
out of k at j. It is important that the possibility j = i is included.
---------------------------------------------------------------------*/

	       for (int j = 1; j < k; j++) { // update T(i,j), even if emptyset

		  T(i,j) = T(i,j) + T(i,k).[T(k,k)]^*.T(k,j);
               }
            }
/*----------------------------------------------------------------------
/* Assertion: Now any path reading a substring w in which N goes from
   i to k and leaves state k at state j is mapped on a direct line
   from i to j.  So we don't need the edge from i into k, and can
   mentally remove it (the code doesn't use it anymore, anyway).
------------------------------------------------------------------------*/

	} // all edges into k now bypassed.  Assertion: Now we don't need any
          // edges going into state k at all, so we can "kill" it.
          // The way this is coded, decrementing k automatically kills it.
    }

4. If q = p, then only state 1 is left, and L(N) = T(1,1)^*.
   If q = 2, then use the "basic two-state formula"

   L(N) = [T(1,1) + T(1,2).[T(2,2)]^*.T(2,1)]^* . T(1,2).T(2,2)^*