CSE396 Algorithm for FA to Regular Expression Fall 2003
Given a DFA/NFA/GNFA N, and any states p,q of N, here is an algorithm to
find a regular expression for L_{p,q} = {x: N can process x from p to q}.
Recalling that L(N) = the union of L_{s,q} over all q \in F, you only need
to "+ together" the regexps for each L_{s,q} to get one for L(N).
(Alternately, you can add to N a new final state "f" and add arcs
(q,\epsilon,f) for all "old" final states q, and then just do the algorithm
once to get L_{s,f}. Sometimes this is quicker, sometimes not.)
1. Number p = 1, q = 2 (unless q = p), and the remaining states up to n.
2. Define an nxn matrix T of type "REGULAR EXPRESSION OVER \Sigma"
such that each entry T(i,j) has the regular expression on the arc from
state i to state j, or T(i,j) = \emptyset if there is no such arc.
If there are multiple arcs from i to j, "+ together" their regexps.
3. Now execute the following routine:
for (int k = n; k >= q+1; k--) { // bypass state k
for (int i = 1; i < k; i++) {
if (T(i,k) != \emptyset) { // bypass edge i
/*--------------------------------------------------------------------
To bypass the edge from i to k, take every other state j from 1 to k-1,
and update the label T(i,j) so that it accounts for all substrings that
could cause N to go from i to k, "twiddle around" in state k, and come
out of k at j. It is important that the possibility j = i is included.
---------------------------------------------------------------------*/
for (int j = 1; j < k; j++) { // update T(i,j), even if emptyset
T(i,j) = T(i,j) + T(i,k).[T(k,k)]^*.T(k,j);
}
}
/*----------------------------------------------------------------------
/* Assertion: Now any path reading a substring w in which N goes from
i to k and leaves state k at state j is mapped on a direct line
from i to j. So we don't need the edge from i into k, and can
mentally remove it (the code doesn't use it anymore, anyway).
------------------------------------------------------------------------*/
} // all edges into k now bypassed. Assertion: Now we don't need any
// edges going into state k at all, so we can "kill" it.
// The way this is coded, decrementing k automatically kills it.
}
4. If q = p, then only state 1 is left, and L(N) = T(1,1)^*.
If q = 2, then use the "basic two-state formula"
L(N) = [T(1,1) + T(1,2).[T(2,2)]^*.T(2,1)]^* . T(1,2).T(2,2)^*