{\huge Permutation Problems With Strings}
From ``Parsons Puzzles'' to Babai's breakthrough and more?
Dale Parsons and Patricia Haden wrote a seminal paper on a genre of programming puzzles that Barbara Ericson at Georgia Tech is furthering in her doctoral work. The puzzles give lines of code in a scrambled order, sometimes with incorrect lines thrown in. The goal is to find the correct order of the correct lines.
Today Ken and I want to discuss various string rearrangement problems in relation to the Graph Isomorphism problem.
The Parsons method was pointed out to me by Ericson, who also serves as our Director of Computing Outreach. Her thesis abstract says:
Parson's programming puzzles are a family of code construction assignments where lines of code are given, and the task is to form the solution by sorting and possibly selecting the correct code lines. We introduce a novel family of Parson's puzzles where the lines of code need to be sorted in two dimensions. The vertical dimension is used to order the lines, whereas the horizontal dimension is used to change control flow and code blocks based on indentation as in Python.
This really intrigued me and I looked into what exactly is the Parsons method. One of her major references is a paper by Petri Ihantola and Ville Karavirta of Aalto University in Finland. Here is one of their examples:
\includegraphics[width=4in]{puzzle.png}
One can make a similar but harder puzzle out of Ken's example a year ago in our memorial post for Susan Horwitz, by scrambling his six lines of code that swap two nodes in a circularly linked list. In a class teaching proofs, I wonder how good an exercise it would be to scramble lines of a formal derivation. All this has us thinking further about problems involving rearranging strings. We can think of each given line as a character. Some Parsons puzzles have multiple occurrences of the same line which is like having the same character multiple times in a string.
Viewed this way, we can even call ``Parson's puzzle'' a Parsons puzzle: the apostrophe should if anywhere be after the final 's'. There's nothing wrong with ``Parsons puzzle'' (reading the name as an adjective not possessive), which is also a legal Parsons solution since the apostrophe ``line of code'' is skipped. There are isomorphic answers since we could swap either 's' or 'z'. Now we are getting into the domain of László Babai's new results about strings, to which we turn.
Graph Isomorphism With Subgroups
To understand László Babai's usage of ``string isomorphism,'' let's first consider a natural, known add-on to the graph isomorphism (GI) problem: Given two labeled graphs $latex {X,Y}&fg=000000$ on nodes $latex {1,\dots,n}&fg=000000$ plus a group $latex {G}&fg=000000$ presented by permutations $latex {\pi_1,\dots,\pi_k}&fg=000000$ of $latex {1,\dots,n}&fg=000000$, is there an isomorphism $latex {\sigma}&fg=000000$ from $latex {X}&fg=000000$ to $latex {Y}&fg=000000$ that belongs to $latex {G}&fg=000000$?
For example, consider $latex {n = 6}&fg=000000$ and the subgroup $latex {G}&fg=000000$ generated by $latex {\pi_1 = (1~2~3~6)(4~5)}&fg=000000$ and $latex {\pi_2 = (2~4)(5~6)}&fg=000000$. These are both even permutations since $latex {\pi_1 = (3~6)(2~3)(1~2)(4~5)}&fg=000000$ so we immediately know $latex {G}&fg=000000$ is not all of the symmetric group $latex {S_6}&fg=000000$. Now consider these two labeled graphs $latex {X}&fg=000000$ and $latex {Y}&fg=000000$:
\includegraphics[width=3.5in]{PrismIso.png}
$latex {X}&fg=000000$ and $latex {Y}&fg=000000$ are isomorphic by the interchange $latex {(3~5)}&fg=000000$, but being odd, this does not belong to $latex {G}&fg=000000$. Does that mean the answer to the problem for $latex {(X,Y,\pi_1,\pi_2)}&fg=000000$ is ``no''? Well, the graph $latex {X}&fg=000000$ has odd automorphisms such as its mirror flip---perhaps one of them composed with $latex {(3~5)}&fg=000000$ belongs to $latex {G}&fg=000000$? It's a tricky question. We could force a ``no'' answer by adding a ``dongle'' to nodes 3 and 5 in both graphs so they can only be swapped with each other and different ``dongles'' to other nodes that prevent any other possible isomorphism. The dongles could be a single node connected to 3 and 5 and $latex {i}&fg=000000$ nodes connected only to node $latex {i}&fg=000000$ for $latex {i = 1,2,4,6}&fg=000000$ in each graph.
GI is then the special case where we are given generators of $latex {S_n}&fg=000000$. Whether the general case reduces back to GI is a major open problem. Babai's FOCS 2008 paper with Paolo Codenotti analyzes this problem for hypergraphs, of which graphs are a special case.
String Isomorphism
The simplest form of string isomorphism is, given strings $latex {x,y}&fg=000000$ of length $latex {N}&fg=000000$, is there a permutation of $latex {1,\dots,N}&fg=000000$ that carries $latex {x}&fg=000000$ to $latex {y}&fg=000000$? This has a trivial answer: yes if and only if for every character $latex {c}&fg=000000$ in the alphabet, the numbers of occurrences of $latex {c}&fg=000000$ in $latex {x}&fg=000000$ and in $latex {y}&fg=000000$ are the same. This is easily in polynomial time, as are some other problems by the name ``string isomorphism'' on the Net. But here is the analogous subgroup problem (stated for a binary alphabet):
Given strings $latex {x,y \in \{0,1\}^N}&fg=000000$ and generators $latex {\pi_1,\dots,\pi_k}&fg=000000$ of a subgroup $latex {G}&fg=000000$ of $latex {S_N}&fg=000000$, does permuting the entries of $latex {x}&fg=000000$ by some member of $latex {G}&fg=000000$ give $latex {y}&fg=000000$?
This is not trivial. It is as hard as the graph subgroup version. Let us first reduce GI to it. Having $latex {G}&fg=000000$ enforces that only permutations derived from rearrangements of the $latex {n}&fg=000000$ node indices are allowed on the $latex {N = \binom{n}{2}}&fg=000000$ string indices. Specifying some way to ``unroll'' a graph's adjacency matrix $latex {A}&fg=000000$ into a string yields generators for the right $latex {G}&fg=000000$.
Since our graphs are undirected we need only unroll the upper triangle. Let's do it by first going down the largest diagonal
$latex \displaystyle A[1,2].A[2,3],\dots,A[n-1,n], &fg=000000$
then the next-largest, and so on in ``bands'' ending at the upper-right corner $latex {A[1,n]}&fg=000000$. For $latex {n = 4}&fg=000000$ we get $latex {N = 6}&fg=000000$ entries in the following order:
\includegraphics[width=1.5in]{K4.png}
Here the six ``edges'' are really the slots in the string for possible edges. A pair of generators for $latex {S_4}&fg=000000$ is $latex {(A~B~C~D)}&fg=000000$ and $latex {(A~B)}&fg=000000$. The corresponding permutations of the edge slots are $latex {\pi_1 = (1~2~3~6)(4~5)}&fg=000000$ and $latex {\pi_2 = (2~4)(5~6)}&fg=000000$. By ``design of accident'' these are the same as in the example above though there is no relation---this just expedites our seeing that the subgroup $latex {G_N}&fg=000000$ generated by $latex {\pi_1,\pi_2}&fg=000000$ is not all of $latex {S_6}&fg=000000$. The idea extends for any $latex {n > 4}&fg=000000$ in canonical fashion: from the cycle and transposition generators of $latex {S_n}&fg=000000$ we get two generators $latex {\pi_1,\pi_2}&fg=000000$ for $latex {G_N}&fg=000000$. Babai calls the group $latex {G_N}&fg=000000$ in such cases $latex {S_n^{(2)}}&fg=000000$ to emphasize the action being on pairs.
Now given any labeled graphs $latex {X}&fg=000000$ and $latex {Y}&fg=000000$ on $latex {n}&fg=000000$ nodes, we can read off the corresponding strings $latex {x}&fg=000000$ and $latex {y}&fg=000000$, and then $latex {X}&fg=000000$ is isomorphic to $latex {Y}&fg=000000$ if and only if $latex {(x,y,\pi_1,\pi_2)}&fg=000000$ is a yes-case of the string problem. This completes the reduction from GI.
To reduce the subgroup version of GI we map the given generators of $latex {G}&fg=000000$ to their images in $latex {S_n^{(2)}}&fg=000000$ under the above embedding. This reduction can be inverted: Given the string $latex {x}&fg=000000$, define the graph $latex {X}&fg=000000$ to be a disjoint union of isolated pairs of nodes $latex {i_0,i_1}&fg=000000$ connected by an edge for each $latex {i}&fg=000000$ such that $latex {x_i = 1}&fg=000000$, and isolated pairs not connected by an edge for each $latex {i}&fg=000000$ such that $latex {x_i = 0}&fg=000000$. Likewise map $latex {y}&fg=000000$ to a graph $latex {Y}&fg=000000$. Finally map each generator $latex {\pi_{\ell}}&fg=000000$ by replacing each move $latex {i \rightarrow j}&fg=000000$ in $latex {\pi_{\ell}}&fg=000000$ by $latex {i_0 \rightarrow j_0}&fg=000000$ and $latex {i_1 \rightarrow j_1}&fg=000000$.
Graph and String Automorphisms Too
An automorphism is a rearrangement that leaves a structure $latex {X}&fg=000000$ the same. Such permutations form a group called the automorphism group $latex {\mathsf{Aut}(X)}&fg=000000$. GI has long been known to be polynomial-time equivalent to the problem of computing a complete set of generators for $latex {\mathsf{Aut}(X)}&fg=000000$ when $latex {X}&fg=000000$ is a labeled graph. Assuming $latex {X}&fg=000000$ and $latex {Y}&fg=000000$ are connected graphs (else use their complements), the forward reduction takes $latex {X' = X \cup Y}&fg=000000$ and notes that every generating set contains a member that maps $latex {V(X)}&fg=000000$ onto $latex {V(Y)}&fg=000000$ if and only if $latex {X}&fg=000000$ is isomorphic to $latex {Y}&fg=000000$.
The reverse direction is trickier and requires both decomposing the group and progressively adding ``dongles'' to pairs of graphs and asking whether they are isomorphic. The decision version, ``does $latex {X}&fg=000000$ have a nontrivial automorphism?'' (GA), reduces to GI but is not known to be equivalent.
This extends to an equivalence of the subgroup-$latex {G}&fg=000000$ version of GI and finding generators for all automorphisms in a subgroup. We almost get a reduction to the subgroup version of GA which is a decision problem: Make two copies $latex {[n_0],[n_1]}&fg=000000$ of $latex {[n]}&fg=000000$ and convert each given generator $latex {\pi}&fg=000000$ of $latex {G}&fg=000000$ into a $latex {\pi'}&fg=000000$ that maps $latex {i_0}&fg=000000$ to $latex {(\pi i)_1}&fg=000000$ and $latex {i_1}&fg=000000$ to $latex {(\pi^{-1} i)_0}&fg=000000$. Also convert the identity which yields the swap $latex {\sigma: i_0 \leftrightarrow i_1}&fg=000000$ and let these generate $latex {G'}&fg=000000$. Then $latex {X}&fg=000000$ and $latex {Y}&fg=000000$ are isomorphic by a member of $latex {G}&fg=000000$ if and only if $latex {X \cup Y}&fg=000000$ has an automorphism that is an odd-length word over these generators. We still get a reduction to finding all generators of $latex {G' \cap \mathsf{Aut}(X \cup Y)}&fg=000000$. The reverse reduction is still tricky, but no more so than the original since it was decomposing groups anyway.
For strings $latex {x}&fg=000000$, we again have a situation where the basic problem is trivial: the automorphisms permute the indices for each individual character among themselves, so we need only write the usual generators for a product of symmetric groups. However, the subgroup version is again nontrivial:
Given a string $latex {x \in \{0,1\}^N}&fg=000000$ and generators for a subgroup $latex {G}&fg=000000$ of $latex {S_N}&fg=000000$, find generators for $latex {G \cap (S_0 \times S_1)}&fg=000000$ where $latex {S_0}&fg=000000$ permutes the indices $latex {i}&fg=000000$ where $latex {x_i = 0}&fg=000000$ and $latex {S_1}&fg=000000$ permutes $latex {j}&fg=000000$ with $latex {x_j = 1}&fg=000000$.
Note how this involves finding generators for the intersection of two subgroups. By ideas like those just above this is equivalent to the subgroup version of string isomorphism, hence in turn to the subgroup version of GI. The method of the last section can also reduce the functional graph-automorphism problem directly to this string version, and conversely.
The extra generality of the string problems is that the subgroup $latex {G'}&fg=000000$ involved need not be $latex {S_n^{(2)}}&fg=000000$ as in the reductions from GI, indeed need not arise from subgroups $latex {G}&fg=000000$ acting on graphs at all. This feature is exploited in the recursions that Babai's proof sets up. His algorithm classifies all of these problems into quasipolynomial time. Thus, defining the problems with subgroups makes them nicer.
What to Call These Problems?
We hasten to add that these problems are not new. In various contexts they appear in Eugene Luks's seminal 1980-82 paper putting the bounded-degree case of GI into $latex {\mathsf{P}}&fg=000000$, which has subgroup-intersection and much other content highlighted in Babai's 1994 Handbook of Combinatorics survey, and are foreshadowed in the treatment of coloring-preserving auto/iso-morphisms in Babai's own 1979 paper which introduced the term ``Las Vegas algorithm.''
In highlighting them all together we'd like to find a common and consistent naming scheme. Adapting names from these papers and echoing Luks's 1993 survey might suggest names like STRING-ISO$latex {_G}&fg=000000$ or GRAPH-AUTO$latex {_G}&fg=000000$ or GRAPH-AUTOGEN$latex {_G}&fg=000000$, the last distinguishing the problem of giving a full set of generators for $latex {G \cap \mathsf{Aut}(X)}&fg=000000$ from the decision problem of whether $latex {G \cap \mathsf{Aut}(X)}&fg=000000$ is nontrivial. We've had a suggestion to put the 'G' out front and the class of relational structures in parentheses, such as G-ISO$latex {(\mathsf{Strings})}&fg=000000$ or $latex {G}&fg=000000$-AUTO$latex {(\mathsf{Graphs})}&fg=000000$, with the latter's math-italicized $latex {G}&fg=000000$ indicating a particular subgroup $latex {G}&fg=000000$.
For abbreviations we like appending the 'G' to turn GI into GIG and similarly SIG for the string version. The rub is that for the automorphism problems this could lead to names like GAG and SAG, even GAGG and SAGG for the generator-set versions, which don't sound so nice. Writing $latex {\equiv}&fg=000000$ for polynomial-time or some tighter equivalence, and using less-gaggy notation, the current state is:
We don't know about the $latex {G}&fg=000000$-AUTO decision versions for graphs and strings. All of these in turn reduce to various forms of the canonization problem, which features in this 1983 paper by Babai and Luks, Luks's 1993 survey, this 1997 note by Yuri Gurevich, and this recent paper by Lance Fortnow and Joshua Grochow.
More Isomorphism and Permutation Problems
As a hint that much more can be said, consider this 1998 dissertation by David Christie titled ``Genome Rearrangement Problems.'' Then consider that the word ``genome'' stops appearing after page 15 but there are 143 pages of neat material on permutations and sorting and string edit-distance type problems and cases that are $latex {\mathsf{NP}}&fg=000000$-hard, or not. Evidently there are ``Parsons puzzles'' for genetic code. We're not saying that Babai's algorithm has any relevance to these---Laci is the first to disclaim any competitiveness with existing heuristic GI solvers---but something in the new structural and algorithmic ideas might click.
Open Problems
The most particular problem is whether GIG---that is, G-ISO($latex {\mathsf{Graphs}}&fg=000000$)---reduces back to GI. If not, then does it sit properly between GI and the canonization problems? What to call these problems?
We end with a simple Parsons problem, with one line to omit: \noindent and we await further news expectantly.
talk on Tuesday
to the University of Chicago,
We were glad to hear that Laci's part-III
the proof is by double recursion
was not disturbed by the previous day's hoax threat