{\huge What's Blue and White and Red All Over?} Election and debate special from GLL \vspace{.5in}
Aram Harrow is both a computer scientist and a physicist---something that makes him unique and special. Few people can explain what a probabilistic Turing Machine is and also what complementarity is. He has done some beautiful work in pure and applied mathematics, work that uses his dual abilities.
Today Dick and I wish to describe and interpret an intricate recent theorem of his in joint work with Alexandra Kolla and Leonard Schulman.
Aram is no stranger to these pages. Of course he has also been our honored participant in our year-long debate on the feasibility of building quantum computers, all during his current appointment in the Computer Science and Engineering department of the University of Washington in Seattle. He is also one of three mainstays on the prominent quantum computing blog created by Dave Bacon, and during our debate still managed to get a little bit of other work done. In January he will join the MIT Department of Physics as an assistant professor, and we congratulate him on that.
As related in his post, the paper grew out of work on the unique games conjecture, seeking to apply quantum methods though the end result was ``classical.'' The impetus included models of physical noise, such as featured in Aram's debate with Gil Kalai. To bring this full circle, Gil himself is the first person credited for discussions that spurred the paper, but these were in 2010 long before the debate idea arose, and not with Aram.
The Proof
Usually we discuss the paper first, or at least state its main results, but exceptional times demand exceptions. Their ``Proof overview'' subsection begins by defining a ``senate operator,'' based on the idea that in the U.S. Senate, states have equal weight regardless of population. Relaxing equal-weighting will lead to our main open problem. The Section~3 title summarizes their pivotal Proposition~2 as ``senates dominate dictatorships.'' A main ingredient of their proof are the Krawtchouk Polynomials, whose Internet page talks about ``the destiny of the scientist during the Great Terror.''
Our interdisciplinary push seems to have entered Politics, but the real point is: the proof is hard. So hard that the authors tell me they have no idea of the value of the constant $latex {a}&fg=000000$ whose existence constitutes their main theorem. We will still try to give some idea of its significance, and why the result is hard, and how you can approach the issues without having to delve into the whole party manifesto. But you will not be able to make like President Gerald Ford in his first debate (parody) with Jimmy Carter in 1976:
``It was my understanding that there would be no math in this debate.''
The Paper
The paper's title, ``Dimension-free L2 maximal inequality for spherical means in the hypercube,'' may not generate any 15-second sound bites. Nor may it help predict the outcome of the U.S. Presidential election, but it does contain an important new theorem.
Let's first say why this theorem is significant and hard. The term ``hypercube'' is just the set $latex {\{0,1\}^n}&fg=000000$ of binary strings of length $latex {n}&fg=000000$---one of the favorite objects for computer scientists. In this world, the sphere of radius $latex {k}&fg=000000$ around a point $latex {x}&fg=000000$ is the set of strings that are Hamming distance $latex {k}&fg=000000$ from $latex {x}&fg=000000$. Thus, if $latex {x=000}&fg=000000$ and $latex {k=2}&fg=000000$ the points in the sphere---sometimes we call them balls---are:
$latex \displaystyle 110, 101, 011. &fg=000000$
Now imagine we select a small fraction of the points from the hypercube, say $latex {\epsilon 2^{n}}&fg=000000$. Can one find a single point $latex {x}&fg=000000$ that is special, in that every sphere around $latex {x}&fg=000000$ of any radius $latex {k}&fg=000000$ has less than half of its points in the selected set? Clearly, this seems quite reasonable, if $latex {\epsilon}&fg=000000$ is small enough.
This is a case where the intuition is correct, but proving it is hard. A standard method of proving that special points exist is to pick $latex {x}&fg=000000$ at random, essentially using the probabilistic method. Alas this works for spheres of one radius, or a few radii, but does not seem to be powerful enough to show that the point works for all radii.
The probabilistic method is extremely powerful, but when it fails, often the requirements for a proof become very difficult. This is the case with their new result. Actually in our opinion this is one of the most important aspects of the new paper. Since the probabilistic method does not always work, any new ideas that can solve problems where it fails are automatically of great importance. Here one idea that Aram tells me was important is a different kind of ``complementarity,'' namely that $latex {B_k(x) = B_{n-k}(x')}&fg=000000$ where $latex {x'}&fg=000000$ is the vertex opposite $latex {x}&fg=000000$ in the hypercube.
Let's turn now to state the theorem in a mathematically precise manner and probe its reach.
The Result
The new inequality involves the standard $latex {L_2}&fg=000000$ norm of a function $latex {f: \{0,1\}^n \longrightarrow \mathbb{R}}&fg=000000$, which is defined by
$latex \displaystyle ||f||_2 = \text{ the square root of } \sum_x f(x)^2.&fg=000000$
The other functions involved maximize the average of $latex {f}&fg=000000$ over the balls:$latex \displaystyle M_f(x) = \max_k \frac{1}{|B_k(x)|}\sum_{y \in B_k(x)}f(y).&fg=000000$
The key clause in the paper's title is that the bound given by their inequality does not depend on $latex {n}&fg=000000$. Here is their new result:
Theorem 1 There exists $latex {a > 0}&fg=000000$ such that for all $latex {n}&fg=000000$ and $latex {f}&fg=000000$,$latex \displaystyle ||M_f||_2 \leq a||f||_2.&fg=000000$
Pretty neat, no? Well it's neat, but it can use some interpretation.
Speaking mathematically, call $latex {x}&fg=000000$ ``light'' if $latex {|f(x)|^2}&fg=000000$ is below the average of the squares, i.e., $latex {|f(x)|^2 < \frac{1}{2^n}||f||_2^2}&fg=000000$. We might hope to ``bulk $latex {x}&fg=000000$ up'' by finding an average over one of the spheres enclosing $latex {x}&fg=000000$ that is greater. For many $latex {f}&fg=000000$'s there will be some $latex {x}&fg=000000$'s for which this makes a big difference. However, the theorem shows that under $latex {L_2}&fg=000000$ norms the cumulative effect of this extra allowance will never be more than a fixed constant factor---independent of the dimension. That's fine, but can we give a more helpfully vivid interpretation?
An Election Interpretation
It is election season in the United States again. One hears often about how demographic and personal characteristics channel one's political preferences, and that there are ``typical Democratic voters'' and ``typical Republican voters.'' The categories are often binary or close to it: male/female, married/single, urban/suburban-or-rural, over/under age 50, and so on.
We can define them by positions on single issues: pro-life/pro-choice, gun-rights/gun-control, for/against gay marriage, immigration amnesty, anything like that. We can also add categories that seem irrelevant: righthanded/lefthanded, dark/light eyes or hair, tall/short, nearsighted/not, Coke/Pepsi. We can even include divisions that are not roughly balanced in the population, such as vegetarian/carnivore. Hence also we can break multi-way categories into binary pairs, such as X/not-X for every ethnic group X. Then the possible combinations of values of $latex {n}&fg=000000$ characteristics correspond to the vertices of the $latex {n}&fg=000000$-dimensional hypercube.
What justifies saying that a combination $latex {x}&fg=000000$ of characteristics is ``typical'' for a party is that if you vary some of them---any number $latex {k}&fg=000000$ of them---you still find mostly people who vote for that party. Importantly this should be true for changing any set of $latex {k}&fg=000000$ characteristics, not just specific ones. Given the strength of this requirement, do typical voters exist? The fact of single-issue voting may even make this seem unlikely. However, the following broad statement holds in enough cases to give a way to approach the issues in their paper:
Theorem 2 If a party wins by a large enough landslide, then it has typical voters.
That is, there are voters $latex {x}&fg=000000$ such that not only did $latex {x}&fg=000000$ vote for the winner, but for any $latex {k > 0}&fg=000000$, over all voters $latex {y}&fg=000000$ who differ from $latex {x}&fg=000000$ in $latex {k}&fg=000000$ characteristics, the winner got a majority of votes from those $latex {y}&fg=000000$ as well. Well this comes with some caveats so we need to look closer.
The One-Vertex, One-Vote Case
Let us first suppose that every combination has exactly one voter. Then we define an indicator function $latex {f}&fg=000000$ by:
$latex \displaystyle f(x) = 1 \text{~if &fg=000000$
Now if the loser won an $latex {\epsilon}&fg=000000$ portion of the vote, then since $latex {f(x)^2 = f(x)}&fg=000000$, we have:
$latex \displaystyle ||f||_2^2 = \sum_x f(x)^2 =\sum_x f(x) = \epsilon 2^n.&fg=000000$
By the theorem applied to $latex {f}&fg=000000$, squaring both sides,
$latex \displaystyle \sum_x M_f(x)^2 \leq a^2\epsilon\cdot 2^n.&fg=000000$
Dividing by $latex {2^n}&fg=000000$ makes the left side an average $latex {m}&fg=000000$ such that $latex {m \leq a^2\epsilon}&fg=000000$. There must exist some $latex {x}&fg=000000$ giving at most the average value, i.e. such that $latex {M_f(x)^2 \leq m}&fg=000000$, which implies that for all $latex {k}&fg=000000$,
$latex \displaystyle \frac{1}{|B_k(x)|}\sum_{y \in B_k(x)} f(y) \leq a\sqrt{\epsilon}.&fg=000000$
Taking $latex {\epsilon}&fg=000000$ small enough that $latex {a^2 \epsilon < 1/4}&fg=000000$ makes $latex {\frac{1}{|B_k(x)|}\sum_{y \in B_k(x)} f(y) < \frac{1}{2}}&fg=000000$ for all $latex {k}&fg=000000$. Thus the loser achieves no more than a minority over all of the balls centered on $latex {x}&fg=000000$.
This makes $latex {x}&fg=000000$ a typical voter. By extending the averaging argument, one can show there are many typical voters. Thus when the election map is close to all-red or all-blue, there are many places where the same color prevails in all concentric spheres. However, things become trickier when we vary the requirements on the space.
Weighted Measures
Now suppose we have a weighted measure $latex {\mu(x)}&fg=000000$ on the hypercube. With regard to this measure,
$latex \displaystyle ||f||_2^2 = \sum_x f(x)^2 \mu(x).&fg=000000$
It also now makes sense to define weighted versions of the averages over the Hamming balls, leading to the operator$latex \displaystyle M^*_f(x) = \max_k\frac{\sum_{y \in B_k(x)}f(y)\mu(y)}{\sum_{y \in B_k(x)}\mu(y)}.&fg=000000$
Does their theorem now hold in the form that there exists $latex {a_*> 0}&fg=000000$ such that for all $latex {f}&fg=000000$, $latex {||M^*_f||_2 \leq a_*||f||_2}&fg=000000$? Expanded, this now means:Theorem? We ask whether for all $latex {n}&fg=000000$, all counting measures $latex {\mu}&fg=000000$ on $latex {\{0,1\}^n}&fg=000000$, and all $latex {f:\{0,1\}^n \longrightarrow \mathbb{R}}&fg=000000$, there exists $latex {a_* > 0}&fg=000000$ (possibly depending on $latex {\sum_x \mu(x)/2^n}&fg=000000$ or on $latex {\sum_x \mu(x)^2/2^n}&fg=000000$) such that
$latex \displaystyle \sum_x M^*_f(x)^2 \mu(x) \leq a_*\sum_x f(x)^2 \mu(x).&fg=000000$
The paper does not address this directly, and we understand from private communication with the authors that they have not considered it yet. If it is true, then we can derive the existence of ``typical voters'' in the most general setting.
We assume that the $latex {\mu(x)}&fg=000000$-many voters who share the same characteristics $latex {x}&fg=000000$ all vote the same way. (If this is not true, we can create some new characteristics that separate them, and are free to add them as a co-ordinates to every vector because the count $latex {\mu(x)}&fg=000000$ is allowed to be anything including zero.) Let $latex {D}&fg=000000$ be the set of $latex {x}&fg=000000$ that vote for the loser. Then $latex {wt(D) = \sum_{x \in D}\mu(x)}&fg=000000$ is the total number of votes for $latex {D}&fg=000000$, out of a total electorate of $latex {E = \sum_x\mu(x)}&fg=000000$. Define $latex {f}&fg=000000$ to be the 0-1 valued indicator function of $latex {D}&fg=000000$ as before. Then
$latex \displaystyle ||f||_2^2 = \sum_{x \in D} \mu(x) = wt(D),&fg=000000$
while$latex \displaystyle ||M^*_f||_2^2 = \sum_x\max_k \left[\frac{wt(B_k(x) \cap D)}{wt(B_k(x))}\right]^2\mu(x).&fg=000000$
The ``Theorem?'' and dividing both sides by $latex {E}&fg=000000$ yields$latex \displaystyle \sum_x\max_k \left[\frac{wt(B_k(x) \cap D)}{wt(B_k(x))}\right]^2\mu(x) \leq a_*^2 \frac{wt(D)}{E}.&fg=000000$
Again an averaging argument with respect to $latex {E = \sum_x \mu(x)}&fg=000000$ yields the existence of $latex {x}&fg=000000$ such that for all $latex {k}&fg=000000$,$latex \displaystyle \frac{wt(B_k(x) \cap D)}{wt(D)} < a_*\sqrt{\frac{wt(D)}{E}}.&fg=000000$
Here the right-hand side has $latex {\sqrt{\ell}}&fg=000000$ where $latex {\ell}&fg=000000$ is the loser's percentage of the total vote, and a large enough victory makes $latex {a_*\sqrt{\ell} < 1/2}&fg=000000$. This implies that the loser still has a minority in every Hamming ball centered on the voter $latex {x}&fg=000000$.
Can We Use the Inequality They Proved?
We can try to work instead with the function
$latex \displaystyle f_*(x) = f(x)\sqrt{\mu(x)}.&fg=000000$
Then we obtain$latex \displaystyle ||f_*||_2^2 = \sum_x f(x)^2 \mu(x),&fg=000000$
while (using the original maximizing operator $latex {M}&fg=000000$ again)$latex \displaystyle ||M_{f_*}||_2^2 = \sum_x \max_k \left[\frac{1}{|B_k(x)|}\sum_{y \in B_k(x)} f(y)\sqrt{\mu(y)}\right]^2.&fg=000000$
In general this does not look very tractable, nor does the proved theorem $latex {||M_{f_*}||_2 \leq a||f_*||_2}&fg=000000$ seem either to imply or follow from the ``Theorem?'' above. In our specific election case, we might try$latex \displaystyle f_*(x) = \sqrt{\mu(x)} \text{~if the &fg=000000$
Then $latex {||f_*||_2 = wt(D)}&fg=000000$, and the theorem they proved gives$latex \displaystyle \sum_x \max_k \left[\frac{1}{|B_k(x)|}\sum_{y \in B_k(x)}\sqrt{\mu(y)}\right]^2 \leq a^2 wt(D).&fg=000000$
The averaging argument, however, is still in terms of $latex {2^n}&fg=000000$ not $latex {E}&fg=000000$ as just above, so we get the existence of an $latex {x}&fg=000000$ such that for all $latex {k}&fg=000000$,$latex \displaystyle \frac{\sum_{y \in B_k(x)} \sqrt{\mu(y)}}{\binom{n}{k}} \leq a\sqrt{\frac{wt(D)}{2^n}}.&fg=000000$
Neither side of this inequality yields the desired interpretation. The Cauchy-Schwarz inequality can relate the numerator to $latex {wt(B_k(x) \cap D)}&fg=000000$, but the inequality goes in the wrong direction for our purpose. This hints how tricky the considerations in the paper are.The problem appears to be mainly the ``white space'' caused on the map when $latex {\mu(x) = 0}&fg=000000$. We can argue that this still should see good behavior as a limiting case of $latex {\mu(x)}&fg=000000$ always being positive but fractional---where we can also stipulate that $latex {\sum_x \mu(x) = 1}&fg=000000$---and letting it tend to zero for some $latex {x}&fg=000000$. Resolving this, however, may entail solving some easier problems that Aram and the others have attempted, such as for spaces formed by Cartesian products of other graphs (besides the $latex {n}&fg=000000$-fold product of $latex {2}&fg=000000$-cliques giving the hypercube). Moreover their type of inequality is open even for $latex {\{0,1,2\}^n}&fg=000000$, and for other norms including the $latex {L_1}&fg=000000$-norm. An $latex {L_1}&fg=000000$-norm inequality ostensibly would help our desired election conclusions, but one of their key propositions does not extend to it, at least not conveniently as they remark. As always we refer interested readers to the paper for full details and further questions.
Open Problems
For what other norms and spaces does their inequality hold? Does their inequality hold under weighted counting measures $latex {\mu}&fg=000000$ on $latex {\{0,1\}^n}&fg=000000$? Can we obtain the special case of ``typical voters'' from the inequality they have without the one-vertex, one-vote restriction, or under milder added assumptions? Can we prove it if $latex {\mu(x) \geq 1}&fg=000000$ and $latex {\sum_x \mu(x) = O(2^n)}&fg=000000$?
Will math matter to the upcoming Presidential debates?