{\huge A Tighter Grip on Circuit Depth}

The polynomial hierarchy is infinite for a random oracle

\vspace{.3in}

Benjamin Rossman, Rocco Servedio, and Li-Yang Tan have made a breakthrough in proving lower bounds on constant-depth circuits. It came from a bi-coastal collaboration of Rossman visiting the Berkeley Simons Institute from Japan and Tan visiting from Berkeley to Servedio at Columbia University in New York. Their new paper solves several 20- and 30-year old open problems.

Today we congratulate them on their achievement and describe part of how their new result works.

What exactly did they prove? As some have already remarked, how one answers this says something about communications in our field. Exactly what they proved is:

Theorem 1 For some explicit constant $latex {c > 0}&fg=000000$ and all depths $latex {d \geq 2}&fg=000000$ up to $latex {c\sqrt{\log n}/\log\log n}&fg=000000$, there is an $latex {n}&fg=000000$-ary monotone Boolean function $latex {f}&fg=000000$ that has a simple $latex {O(n)}&fg=000000$-size circuit $latex {S_d}&fg=000000$ (indeed, a formula) of depth $latex {d}&fg=000000$ with unbounded fan-in $latex {\mathsf{AND}}&fg=000000$ and $latex {\mathsf{OR}}&fg=000000$ gates, but such that every circuit $latex {C}&fg=000000$ of depth $latex {d}&fg=000000$ with the opposite kind of gate at its output as $latex {S_d}&fg=000000$, or with bottom fan-in at most $latex {\log n/10(d-1)}&fg=000000$ at the inputs, either has size above $latex {2^{n^{1/6(d-1)}}}&fg=000000$ or else agrees with $latex {S_d}&fg=000000$ on at most a $latex {(\frac{1}{2} + n^{-\Omega(1/d)})}&fg=000000$ proportion of the inputs.

What Did They Prove, Again?

That is quite a mouthful. We can at least simplify it---as they do up front in their paper---by noting that every circuit of depth $latex {d-1}&fg=000000$ trivially obeys both of the stated restrictions on circuits of depth $latex {d}&fg=000000$.

Theorem 2 Every Boolean circuit $latex {C}&fg=000000$ of depth $latex {d-1}&fg=000000$ and size at most $latex {2^{n^{1/6(d-1)}}}&fg=000000$ gets at least a $latex {(\frac{1}{2} - n^{-\Omega(1/d)})}&fg=000000$ fraction of the inputs wrong with regard to computing $latex {S_d}&fg=000000$.

Johan H\aa{}stad's famous 1986 PhD thesis had proved a similar lower bound only in the worst case. If this still seems hard to parse, however, here is a consequence they proved. The influence of a Boolean function $latex {f}&fg=000000$ is the sum from $latex {i=1}&fg=000000$ to $latex {n}&fg=000000$ of the proportion of $latex {w \in \{0,1\}^n}&fg=000000$ for which flipping the $latex {i}&fg=000000$-th bit of $latex {w}&fg=000000$ changes the value $latex {f(w)}&fg=000000$.

Theorem 3 For some non-constant depths $latex {d = d(n)}&fg=000000$ and size functions $latex {s(n)}&fg=000000$ greater than quasi-polynomial in $latex {n}&fg=000000$, there are monotone Boolean functions $latex {f_n}&fg=000000$ whose influence is only $latex {O(\log n)}&fg=000000$, but that still cannot be approximated better than a $latex {\frac{1}{2} + o(1)}&fg=000000$ fraction by circuits of depth $latex {d(n)}&fg=000000$ and size at most $latex {s(n)}&fg=000000$.

This gives a big ``No'' to a question in two posts by our friend Gil Kalai in 2010 on his blog and in 2012 on StackExchange. It rules out any kind of strong converse to a famous 1993 theorem of Nathan Linial, Yishay Mansour, and Noam Nisan, later improved by Ravi Boppana, showing that small constant-depth circuits compute functions of low sensitivity. This theorem has many applications, so the big bound against a converse is significant news, but perhaps the above statement still does not come trippingly off the tongue. Well, here's something else they proved:

Theorem 4 Relative to a random oracle $latex {A}&fg=000000$, the polynomial hierarchy $latex {\mathsf{PH}^A}&fg=000000$ is infinite.

Now that's a nice short statement that can grab people---at least people like us who did complexity in the 1980s and before. We know this as an open problem going back even before Charlie Bennett and John Gill proved that $latex {\mathsf{NP}^A \neq \mathsf{P}^A}&fg=000000$ for a random $latex {A}&fg=000000$ in 1981.

However, there is not much surprise and not much mileage in that statement. It was believed even before Ron Book observed in 1994 that its negation collapses the hierarchy without an oracle: Given a relativizable class $latex {{\cal C}}&fg=000000$, define $latex {\mathsf{Almost}\text{-}{\cal C}}&fg=000000$ to be the set of languages $latex {B}&fg=000000$ such that the measure of $latex {A}&fg=000000$ such that $latex {B \notin {\cal C}^A}&fg=000000$ is zero. The measure is properly on the space of infinite 0-1 sequences but it is OK to think of the usual Lebesgue measure on $latex {[0,1]}&fg=000000$ where e.g.

$latex \displaystyle 0.001101010001010001\dots &fg=000000$

denotes the set of primes, ignoring the clash between finite sets like $latex {\{2,3\}}&fg=000000$ each of which has a co-finite counterpart like $latex {\mathbb{N} \setminus \{0,1,3\}}&fg=000000$ that maps to the same dyadic rational number. For classes $latex {{\cal C}}&fg=000000$ enjoying certain closure properties, $latex {\mathsf{Almost}\text{-}{\cal C}}&fg=000000$ equals $latex {\mathsf{BP}\cdot {\cal C}}&fg=000000$, where the dot means we have $latex {\mathsf{BPP}}&fg=000000$-machines each of whose random branches ends with one query to the given language in $latex {{\cal C}}&fg=000000$, whose answer becomes the result of that branch. For instance, $latex {\mathsf{Almost}\text{-}\mathsf{NP}}&fg=000000$ equals the Arthur-Merlin class $latex {\mathsf{AM} = \mathsf{BP}\cdot\mathsf{NP}}&fg=000000$. Now if $latex {\Pi_k^p \not\subseteq \mathsf{Almost}\text{-}\Sigma_k^p}&fg=000000$ for every $latex {k}&fg=000000$, then by standard 0-1 laws the measure of $latex {A}&fg=000000$ putting $latex {\Pi_k^p \subseteq \left(\Sigma_k^p\right)^A}&fg=000000$ is zero. Then by the fact that a countable union of sets of measure zero has measure zero, the hierarchy is infinite for a random oracle. Hence its collapse for a random oracle implies $latex {\Pi_k^p \subseteq \mathsf{BP}\cdot\Sigma_k^p}&fg=000000$ for some $latex {k}&fg=000000$. This in turn would collapse the hierarchy to $latex {\Sigma_{k+1}^p \cap \Pi_{k+1}^p}&fg=000000$ without oracle, much as $latex {\mathsf{co}\text{-}\mathsf{NP} \subseteq \mathsf{AM}}&fg=000000$ collapses it to $latex {\Sigma_2^p \cap \Pi_2^p}&fg=000000$.

The point in $latex {\mathsf{BP\cdot NP} = \mathsf{Almost}\text{-}\mathsf{NP}}&fg=000000$ is that random oracles furnish random bits for the computations. The random oracles could be richer in that exponentially many poly-length computations could use exponentially many random oracle bits in-toto. But the aforementioned closure properties together with probability amplification bridge this difference to give equality when $latex {{\cal C}}&fg=000000$ is a hierarchy level. The point in the new lower bound is that the random oracle bits connect instead to distributions over the input space in an average-case argument. This connection is expressed well in their paper.

The New Idea

Sometimes a new idea comes from a new mathematical object, but other times it comes from a new way of applying and controlling a known object. Leslie Valiant in the late 1970s introduced the kind of projection that can do the following to each variable $latex {x}&fg=000000$ in a Boolean or numerical formula:

  1. Substitute $latex {x}&fg=000000$ by a constant value;
  2. Leave $latex {x}&fg=000000$ alone; or
  3. Substitute $latex {x}&fg=000000$ by a variable $latex {y}&fg=000000$ already occurring in the formula.

An equivalent rule to the last is that you can rename two variables $latex {x,y}&fg=000000$ to be the same variable $latex {z}&fg=000000$. This applies simultaneously to every occurrence of the variable. By undoing this rule one can convert every formula $latex {f}&fg=000000$ with $latex {m}&fg=000000$ occurrences of $latex {n}&fg=000000$ variables into a formula $latex {f_1}&fg=000000$ with $latex {m}&fg=000000$ different variables each occurring once. This $latex {f_1}&fg=000000$ is called a read-once formula and is unique up to permuting or renaming variables, so every $latex {f}&fg=000000$ is a projection of a read-once formula. The formulas $latex {S_d}&fg=000000$ targeted in their proof are already read-once, so the game becomes how to analyze the formulas $latex {f'}&fg=000000$ that arise as projections of $latex {S_d}&fg=000000$.

When only the first two rules are applied, $latex {f'}&fg=000000$ is a restriction of $latex {f_1}&fg=000000$. H\aa{}stad's 1986 proof technique analyzed random restrictions obtained by independently leaving each variable $latex {x_i}&fg=000000$ alone with some probability $latex {p}&fg=000000$ and then assigning a random 0-or-1 value to each variable not left alone. Restrictions applied to a read-once formula not only preserve its structure but also preserve read-onceness, which keeps the behavior of different parts of the formula independent. This independence is sacrificed by using projections, which can ``entangle'' different parts of the formula and thereby worsen bias.

The proof technique both then and now works by reductio ad absurdum on the depth $latex {d}&fg=000000$. $latex {S_d}&fg=000000$ is always a monotone alternating $latex {\mathsf{AND}}&fg=000000$-$latex {\mathsf{OR}}&fg=000000$ formula of a kind introduced by Mike Sipser with $latex {\mathsf{AND}}&fg=000000$ at the inputs, so the output gate is $latex {\mathsf{OR}}&fg=000000$ if $latex {d}&fg=000000$ is even, $latex {\mathsf{AND}}&fg=000000$ if odd. It is ``tuned'' by giving the fan-in $latex {w_j}&fg=000000$ at leach level $latex {j}&fg=000000$, $latex {1 \leq j \leq d}&fg=000000$, so $latex {n = n_d = \prod_{j=1}^d w_j}&fg=000000$. With the right combination of $latex {w_j}&fg=000000$ and $latex {p}&fg=000000$, H\aa{}stad showed that a random restriction makes $latex {f'}&fg=000000$ equivalent to a formula of depth $latex {d-1}&fg=000000$ with a large enough expected number $latex {n' \geq n_{d-1}}&fg=000000$ of variables that embeds the relevant structure of $latex {S_{d-1}}&fg=000000$. A circuit $latex {C_d}&fg=000000$ of small enough size $latex {s_d}&fg=000000$ and depth $latex {d-1}&fg=000000$ (or depth $latex {d}&fg=000000$ with the opposite kind of gate at the output or small fan-in at the input level) also gets beaten down by the restrictions into an especially simple form, with sufficient probability.

The ``reductio'' can then continue with $latex {d' = d-1}&fg=000000$, until an obvious contradiction is reached on going from $latex {d = 2}&fg=000000$ to $latex {d' = 1}&fg=000000$ with the circuit $latex {C-1}&fg=000000$ that results. One can with more care actually structure the proof as a standard induction going from lower bounds for $latex {d-1}&fg=000000$ to the desired ones for $latex {d}&fg=000000$, but the mechanism is the same either way.

For an average-case argument one also needs to preserve the balance between arguments $latex {x}&fg=000000$ giving $latex {f'(x) = 0}&fg=000000$ and those giving $latex {f'(x) = 1}&fg=000000$. Otherwise the trivial circuit that always says ``no'' or its partner who always says ``yes'' already has more advantage than the desired conclusion allows. This is where one might first think of the third projection rule as counter-productive. By identifying $latex {x_i}&fg=000000$ and $latex {x_j}&fg=000000$, any ``bias'' in $latex {x_i}&fg=000000$ might be propagated to more parts of the formula. However, this also intuitively creates greater sensitivity in terms of influence as defined above. If $latex {(w,i)}&fg=000000$ is sensitive, then letting $latex {w' = w}&fg=000000$ with bit $latex {i}&fg=000000$ flipped, $latex {w}&fg=000000$ and $latex {w'}&fg=000000$ balance each other.

Rossman, Servedio, and Tan craft the argument, using a slightly different ``tuning'' of $latex {S_d}&fg=000000$ from H\aa{}stad's, so that the benefits carry the day: $latex {S'_d}&fg=000000$ retains its structure and balance and hardness but the prospective smaller-depth circuit $latex {C}&fg=000000$ ``collapses.'' As they note, random projections had been used in a 2001 paper by Russell Impagliazzo and Nathan Segerlind for a proof complexity lower bound, but apparently had not been applied so bluntly to circuits. Their projections first determine blocks of variables $latex {x_{i,a}}&fg=000000$ that get mapped to the same new variable $latex {y_a}&fg=000000$ for different $latex {i}&fg=000000$. Then they can be specified by a distribution over restrictions, thus simplifying the randomness analysis. If the circuit $latex {C}&fg=000000$ has a gate that accesses some $latex {x_{i,a}}&fg=000000$ and some $latex {\bar{x}_{j,a}}&fg=000000$, then after the projection the gate accesses both $latex {y_a}&fg=000000$ and $latex {\bar{y}_a}&fg=000000$ and hence can be collapsed to $latex {0}&fg=000000$ (if the gate is AND) or $latex {1}&fg=000000$ (if OR). Another difference from H\aa{}stad's technique is that they need to adjust the projection probability adaptively depending on outcomes at previous depths. This is the point where we say to refer to their 59-page paper for details, but they also have excellent further explanations in sections 4 and 7.

Why Excited

Dick and I are excited because we have thought about inductive lower bound arguments whose base cases involve read-once formulas and whose inductive steps are projections. In a post with Dick three years ago I described one case that may retain interest even though the statement it was trying to prove turned out to be false.

My planned attack began with a ``niceness'' theorem for functions $latex {f}&fg=000000$ having a funny kind of read-once arithmetical formula that has powering as a primitive operation---for instance, $latex {(x+2y-z)^3}&fg=000000$ counts as one not three occurrences of the variables in contrast to $latex {(x+2y-z)*(x+2y-z)*(x+2y-z)}&fg=000000$---but does not have an additive constant in any subterm. I proved that the first partial derivatives $latex {\frac{\partial f}{\partial x_i}}&fg=000000$ form a Gröbner basis under any admissible monomial ordering. If one allows additive constants, then the partials still bound such a Gröbner basis. The argument is less simple than one might expect and does not obviously extend to higher derivatives---at least not obviously to me, as I've sat on revisions of a paper for over a decade while trying to prove that. But the theorem is still good enough to use as a base case and ask:

How fast can various complexity measures associated to Gröbner bases grow under successive applications of (random) projections?

My computer runs back then were sobering: even after just a few projections in some targeted cases the answer was, ``mighty fast.'' And as I related in the post, the desired assertion about a certain ``monomial complexity'' measure was refuted. But there are other formulations and conjectures and measures to try, and this new result in the Boolean case may give encouragement to try them. There is also the difference that in my case the read-once formula $latex {f_1}&fg=000000$ was ``easy'' and was derived from a given function $latex {f}&fg=000000$ we are trying to prove ``hard'' by undoing projections, whereas in the new case the $latex {S_d}&fg=000000$ functions are read-once but are the ones we are showing hard for lower-depth circuits. So perhaps this all runs in the opposite direction---but still the prospect of new ways to control projections is pretty cool.

Open Problems

What new results may come out of this breakthrough lower bound technique?