CSE396 Algorithm for FA to Regular Expression Fall 2003 Given a DFA/NFA/GNFA N, and any states p,q of N, here is an algorithm to find a regular expression for L_{p,q} = {x: N can process x from p to q}. Recalling that L(N) = the union of L_{s,q} over all q \in F, you only need to "+ together" the regexps for each L_{s,q} to get one for L(N). (Alternately, you can add to N a new final state "f" and add arcs (q,\epsilon,f) for all "old" final states q, and then just do the algorithm once to get L_{s,f}. Sometimes this is quicker, sometimes not.) 1. Number p = 1, q = 2 (unless q = p), and the remaining states up to n. 2. Define an nxn matrix T of type "REGULAR EXPRESSION OVER \Sigma" such that each entry T(i,j) has the regular expression on the arc from state i to state j, or T(i,j) = \emptyset if there is no such arc. If there are multiple arcs from i to j, "+ together" their regexps. 3. Now execute the following routine: for (int k = n; k >= q+1; k--) { // bypass state k for (int i = 1; i < k; i++) { if (T(i,k) != \emptyset) { // bypass edge i /*-------------------------------------------------------------------- To bypass the edge from i to k, take every other state j from 1 to k-1, and update the label T(i,j) so that it accounts for all substrings that could cause N to go from i to k, "twiddle around" in state k, and come out of k at j. It is important that the possibility j = i is included. ---------------------------------------------------------------------*/ for (int j = 1; j < k; j++) { // update T(i,j), even if emptyset T(i,j) = T(i,j) + T(i,k).[T(k,k)]^*.T(k,j); } } /*---------------------------------------------------------------------- /* Assertion: Now any path reading a substring w in which N goes from i to k and leaves state k at state j is mapped on a direct line from i to j. So we don't need the edge from i into k, and can mentally remove it (the code doesn't use it anymore, anyway). ------------------------------------------------------------------------*/ } // all edges into k now bypassed. Assertion: Now we don't need any // edges going into state k at all, so we can "kill" it. // The way this is coded, decrementing k automatically kills it. } 4. If q = p, then only state 1 is left, and L(N) = T(1,1)^*. If q = 2, then use the "basic two-state formula" L(N) = [T(1,1) + T(1,2).[T(2,2)]^*.T(2,1)]^* . T(1,2).T(2,2)^*