{\huge The Power Of Guessing}

Some results and ideas on guessing and nondeterminism \vspace{.5in}

David Doty and Michael Wehar are theorists who work in different areas of computational complexity, and are at different stages of their careers. Doty is at the California Institute of Technology as a fellow in the DNA and Natural Algorithms Group, while Wehar just finished his BS and MS from Carnegie Mellon University at age nineteen. Doty is one of the experts on self-assembly systems, while Wehar is focused---at least for now---on complexity theory: his honors thesis is titled ``Intersection Emptiness for Finite Automata.''

Today I wish to discuss the power of guessing, the power of nondeterminism, and what we might learn about the question $latex {\mathsf{P} =? \mathsf{NP}}&fg=000000$ from simulations and from contexts in which $latex {\mathsf{NP}}&fg=000000$ can be replaced by $latex {\mathsf{P}}&fg=000000$ or at least sub-exponential time.

While DNA self-assembly systems and complexity theory seem far apart, they are tied by a common thread: what makes a system universal? This thread began for Doty in structural complexity theory and Jack Lutz's program of resource-bounded theories of measure and dimensionality, and as with Lutz himself moved into biological computing models. This involved it in questions dear to me---self-assembly is one area covered by a conference that I help start, with Eric Baum, almost twenty years ago. The 19th DNA conference is scheduled for next fall---see here for details. It is exciting to see that while the conference has ``mutated'' some, it still is going strong.

Besides proving nice results we will mention shortly and working with top people at CMU including Klaus Sutner, Manuel Blum, and Richard Statman, Wehar is noted for work on Rubik's Cube. We quote him: Back in 2009 when I was 16 years old, I solved the 20x20x20 cube in 20 hours 37 minutes and 7.42 seconds. It was quite an accomplishment even though it was a rather slow solve. I think I could solve it in somewhere between 8 and 14 hours if I were to give a second stab at it. I just wanted to share this with all of you. The ``7.42 seconds'' is the part of this statement that I like---as complexity theorists we might have been satisfied to say ``order-of 20 hours'' or even ``polynomial in 20 hours.''

\includegraphics[width=3in]{BigCube.png}

Efficiency and Guessing

This degree of time-precision brings to mind the main point of a paper at the just-held FOCS conference co-authored by Doty with Lutz and Matthew Patitz, Robert Schweller, Scott Summers, and Damien Woods. This is on a tight notion of universality for self-assembling tile systems, analogous to a kind of linear-time universality in cellular automata. The tile assembly process itself is nondeterministic and asynchronous.

When a cellular automaton model $latex {{\cal C}}&fg=000000$ is said to be universal, for itself or for some other model $latex {{\cal M}}&fg=000000$ such as Turing machines, it can mean several things. It can mean there is an easily computed mapping $latex {f}&fg=000000$ from computations in $latex {{\cal M}}&fg=000000$ to computations in $latex {{\cal C}}&fg=000000$ such that the results by a machine $latex {M \in {\cal M}}&fg=000000$ on an input $latex {x}&fg=000000$ can be quickly inferred from notionally-equivalent results of the automaton $latex {C = f(M)}&fg=000000$ running on the same (or lightly encoded) $latex {x}&fg=000000$. Or, stronger, it can mean $latex {f}&fg=000000$ is a mapping from configurations $latex {I,J}&fg=000000$ of an $latex {M}&fg=000000$ to corresponding configurations $latex {I',J'}&fg=000000$ of a $latex {C}&fg=000000$, such that $latex {I}&fg=000000$ goes to $latex {J}&fg=000000$ in one step if and only if $latex {I'}&fg=000000$ goes to $latex {J'}&fg=000000$ in some number of steps. The latter number of steps and sizes of $latex {I',J'}&fg=000000$ could still be huge. This was so with Matthew Cook's seminal exponential-time simulation of arbitrary Turing machines by the cellular automaton Rule 110. Woods and his student Turlough Neary got this down to a polynomial overhead in their 2006 ICALP paper.

The new result is that a standard tile-assembly model is universal for itself, under an even more special mapping $latex {f}&fg=000000$. Here, an individual tile $latex {T}&fg=000000$ of a system $latex {M}&fg=000000$ being simulated is represented as a ``super-tile'' $latex {T'}&fg=000000$ of the fixed universal system $latex {U}&fg=000000$. In fact, due to the nondeterministic and asynchronous nature of the simulator, there may be many super-tiles that encode an individual tile $latex {T}&fg=000000$, so $latex {f}&fg=000000$ is a many-one function from super-tiles of $latex {U}&fg=000000$ to tiles of $latex {M}&fg=000000$. The super-tile is a legal combination of $latex {m^2}&fg=000000$ tiles from the finite set that constitutes the simulator $latex {U}&fg=000000$, where the number $latex {m}&fg=000000$ depends only on the simulated system $latex {M}&fg=000000$. Assemblies of tiles $latex {T}&fg=000000$ belonging to $latex {M}&fg=000000$ are directly mimicked by assemblies of the super-tiles $latex {T'}&fg=000000$, as are the dynamics of the assembly process.

Note that there is an $latex {m^2}&fg=000000$ blowup from $latex {T}&fg=000000$ to $latex {T'}&fg=000000$, but this re-scaling depends only on the fixed $latex {M}&fg=000000$. So this is a linear-scaled simulation with a known, small constant scale factor, and gives rise to a notion called intrinsic universality (IU). One point of IU is that the requirements on $latex {f}&fg=000000$ and efficiency are so tight that communication complexity and Kolmogorov complexity techniques can be used to prove that certain other kinds of celullar automata are not IU. In other words, such weak celullar automata are provably not universal under this strict notion of linear-time simulation. As usual see the full paper for details.

$latex {§}&fg=000000$

On the power of guessing in general, I wish there were some great breakthrough to report, but there are a couple of results and an idea that I would like to share with you. All are related in one way or another to the power of guessing. Perhaps the key question in all of complexity theory is, what is the power of guessing, of nondeterminism? Of course that is the core of the $latex {\mathsf{P} = \mathsf{NP}}&fg=000000$ question, but it extends to many other parts of theory.

The three questions we will discuss are:

Let's look at each in turn.

An Oracle Result

I believe that while $latex {\mathsf{P}\neq\mathsf{NP}}&fg=000000$ may be true, nondeterministic machines can still be simulated better than by brute-force. Color me a strong disbeliever in the Exponential Time Hypothesis. Ken believes ETH is off only by a log factor in the exponent, in the sense of a $latex {2^{O(n/\log n)}}&fg=000000$ upper bound. We will see. In an paper a while ago with Subrahmanyam Kalyanasundaram, Ken, and Farbod Shokrieh we worked on a modest improvement to the best known deterministic simulation of a Nondeterministic Turing Machine (NTM). Here is the abstract from our paper---I quote it to save me some writing. The standard simulation of a nondeterministic Turing machine (NTM) by a deterministic one essentially searches a large bounded-degree graph whose size is exponential in the running time of the NTM. The graph is the natural one defined by the configurations of the NTM. All methods in the literature have required time linear in the size $latex {S}&fg=000000$ of this graph. This paper presents a new simulation method that runs in time $latex {O(\sqrt S)}&fg=000000$. The search savings exploit the one- dimensional nature of Turing machine tapes. In addition, we remove most of the time-dependence on nondeterministic choice of states and tape head movements.

Doty proved a pretty oracle result that shows that any great improvement to what we did would have to be non-relativizing. Again I will quote from his paper: Hartmanis used Kolmogorov complexity to provide an alternate proof of the classical result of Baker, Gill, and Solovay that there is an oracle relative to which P is not NP. We refine the technique to strengthen the result, constructing an oracle relative to which a conjecture of Lipton is false. Is it cool to be mentioned in an abstract even if it shows that you were wrong? I guess it's okay, since the proof is only for oracles. I still stand by the conjecture that $latex {\mathsf{NTIME}(n)}&fg=000000$ is contain in $latex {\mathsf{DTIME}(c^{n})}&fg=000000$ for some constant $latex {c<2}&fg=000000$. But the result of Doty does show that any proof will have to be non-relativizing.

An Automata Result

One of the great mysteries to me is how can the ``obvious'' algorithm for the intersection of finite automata be the best possible? The usual intersection algorithm forms the direct product of the two machines and then uses the emptiness test on the product machine. For $latex {k}&fg=000000$ machines of $latex {n}&fg=000000$ states each, this leads to an $latex {n^{k}}&fg=000000$ order algorithm. This beautiful result---due to Michael Rabin and Dana Scott---cannot be the best. Or can it? We have found better methods for so many other problems that I find it hard to believe that by using randomness, or some new data structure, or some other trick, we cannot reduce the running time below $latex {n^{k}}&fg=000000$ But it seems that doing much better is unlikely because a strong improvement will lead to very surprising results in complexity theory.

Years ago with George Karakostas and Anastasios Viglas we wrote a paper ``On the complexity of intersecting finite state automata'' where we proved many consequences of the assumption that the product emptiness problem could be solved in $latex {n^{o(k)}}&fg=000000$ time for fixed $latex {k}&fg=000000$. As we discussed in greater detail here, one was that

$latex \displaystyle \mathsf{NL} \neq \mathsf{NP}. &fg=000000$

Wehar in his thesis has proved much more. In particular, he shows that the same assumption yields that if the intersection problem is in $latex {\mathsf{DTIME}(n^{o(k)})}&fg=000000$, then $latex {\mathsf{NL} \neq \mathsf{P}}&fg=000000$. This of course greatly improves our result.

A New Idea?

Define $latex {\mathsf{NTIME}(T(n),G(n))}&fg=000000$ to be those languages accepted by a nondeterministic Turing Machine (NTM) that runs in time $latex {T(n)}&fg=000000$ and uses at most $latex {G(n)}&fg=000000$ guesses. Note, if $latex {G(n) = T(n)}&fg=000000$, then this is just the usual definition of $latex {\mathsf{NTIME}(T(n))}&fg=000000$.

Here is the idea that I have been playing with recently. Suppose that we consider a set $latex {S}&fg=000000$ in $latex {\mathsf{NTIME}({\text{poly}(n)},\delta n)}&fg=000000$ where $latex {\delta < 1}&fg=000000$. Then, there is a polynomial size circuit that solves membership for $latex {S}&fg=000000$ correctly for at least $latex {2^{(1-\delta)n}}&fg=000000$ inputs of length $latex {n}&fg=000000$. The proof is trivial: each input of length $latex {n}&fg=000000$ uses some guessing string of length $latex {\delta n}&fg=000000$, so by the pigeonhole principle there is a guess that works for the required number of inputs.

I find this simple observation to be quite intriguing. For example, suppose that $latex {\mathsf{SAT}}&fg=000000$ could be solved by a NTM that runs in polynomial time and makes only $latex {\delta n}&fg=000000$ guesses. Then again there would be an exponentially large fraction of $latex {\mathsf{SAT}}&fg=000000$ problems that would be easy for some circuit.

Two remarks about this idea. Since $latex {\mathsf{SAT}}&fg=000000$ can be checked we should be able to arrange that when the algorithm says ``yes'' it is always correct. Also we can avoid a trivialization of this observation that relates to the encoding of the problems. In the ``usual'' encoding many strings will not correspond to a $latex {\mathsf{SAT}}&fg=000000$ problem. We can fix this by using a more clever encoding.

I will stop now and come back to this idea later with more details.

Open Problems

The gap between what we believe about the power of guessing and what we can actually prove remains huge. As we have discussed many times before, it remains open even to prove that $latex {\mathsf{NTIME}(n)}&fg=000000$ is not contained in $latex {\mathsf{DTIME}(n\log n)}&fg=000000$.

Also if you are interested in the last idea, we can try and work out the details together.