{\huge How Joe Traub Beat the Street}
An insight into the computation of financial information
\vspace{.25in}
Joseph Traub passed away just a week ago, on August 24th. He is best known for his computer science leadership positions at CMU, Columbia, CSTB, the Journal of Complexity---they all start with ``C.'' CSTB is the Computer Science and Telecommunications Board of the National Academies of Science, Engineering, and Medicine. At each of these he was the head and for all except Carnegie-Mellon he was the first head---the founder.
Today Ken and I wish to highlight one technical result of Traub that you may not know about.
I knew Joe for many years, and he will be missed. He had a sharp mind and corresponding sharp wit. There was a slight twinkle in his eye that would have called to mind a leprechaun had he been Irish rather than German---from a family that left after the Nazis seized his father's bank in 1938 when he was six. He was a delight to interact with on almost any topic.
Joe created an entire area of theory: information-based complexity theory. To quote its website:
Information-based complexity (IBC) is the branch of computational complexity that studies problems for which the information is partial, contaminated, and priced. [Functions over the unit cube] must be replaced by ... finite sets (by, for example, evaluating the functions at a finite number of points). Therefore, we have only partial information about the functions. Furthermore, the function values may be contaminated by round-off error. Finally, evaluating the functions can be expensive, and so computing these values has a price.
Traub was helped greatly by a brilliant colleague, Henryk Wo\'zniakowski, who developed key ideas alongside him in the 1970s that were distilled in their 1980 monograph, A General Theory of Optimal Algorithms. A shorter source is Joe's seminal paper, ``An Introduction to Information-Based Complexity.'' A statement on Wo\'zniakowski's Columbia webpage shows his sustaining interest in IBC:
I am mostly interested in computational complexity of continuous problems. For most continuous problems, we have only partial information. For problems defined on spaces of functions, this partial information is usually given by a finite number of function values at some points which we can choose. One of the central issues is to determine how many pieces of information, or function values, are needed to solve the computational problem to within a prescribed accuracy.
IBC Under Attack
I must admit that years ago I opened my Bulletin of the AMS, in 1992 to be exact, and was shocked to see an article titled: ``Some Basic Information On Information-Based Complexity Theory'' written by the eminent numerical analyst Beresford Parlett. The abstract got right to the point: ...Why then do most numerical analysts turn a cold shoulder to IBCT? Close analysis of two representative papers reveals a mixture of nice new observations, error bounds repackaged in new language, misdirected examples, and misleading theorems.
What is interesting about this paper is that the attack on IBCT=IBC is on the idiosyncrasies of its framework. The claim is not that there are errors---technical errors---in the IBC papers, but rather that the model, the framework, is uninteresting and misleading. You should take a look at the very readable paper of Parlett to decide for yourself.
Traub of course disagreed and had a chance to fire back an answer. I believe that Parlett's attack actually had little effect on IBC, except to make it better known in the mathematical community.
What was the tiff about? We can try to boil it down even more than Parlett's article does.
Integrals and Algorithms
Integrals are fundamental to most of mathematics. And in many practical cases the integrals cannot be exactly computed. This leads to a huge area of research on how to approximate an integral within some $latex {\epsilon>0}&fg=000000$ error. Even simple ones like
$latex \displaystyle \int_{a}^{b} f(x) dx &fg=000000$
can be challenging for nasty functions $latex {f(x)}&fg=000000$. Even worse are multi-dimensional integrals:$latex \displaystyle \int \int \cdots \int_{S} f(x_{1},\dots,x_{d}) dx_{1}dx_{2}\dots dx_{d}, &fg=000000$
where $latex {S}&fg=000000$ is a $latex {d}&fg=000000$-dimensional region. Speaking very roughly, let's say $latex {S = \{0,1\}^d}&fg=000000$ and what matters to $latex {f(x)}&fg=000000$ is whether components $latex {x_i}&fg=000000$ are ``low'' (toward $latex {0}&fg=000000$) or ``high'' (toward $latex {1}&fg=000000$) in each dimension. If the gradient of $latex {f}&fg=000000$ in each dimension $latex {i}&fg=000000$ behaves differently for each setting of other components being ``low'' or ``high,'' then we might need $latex {2^d}&fg=000000$ samples just to get a basic outline of the function's behavior. The IBC website fingers this exponential issue as the ``curse of dimensionality.''Now as ordinary complexity theorists, our first instinct would be to define properties intrinsic to the function $latex {f}&fg=000000$ that cause high complexity for any algorithm. Making a continuous analogy to concepts in discrete Boolean complexity, we would try to formulate an effective measure of ``sensitivity.'' We would talk about functions $latex {f}&fg=000000$ that resemble the $latex {n}&fg=000000$-ary parity function in respect of sensitivity but don't have a simple known integral. Notions of $latex {f}&fg=000000$ being ``isotropic'' could cut both ways---they could make the sensitivity spread out but could enable a good global estimate of the integral.
IBC, however, focuses on properties of algorithms and restrictions on the kind of inputs they are given. Parlett's general objection is that doing so begs the question of a proper complexity theory and reverts to the standard---and hallowed enough---domain of ordinary numerical analysis of algorithms.
Ken and I find ourselves in the middle. Pure complexity theory as described by Parlett has run into barriers even more since his article. We wrote a pair of posts on bounds against algorithms that are ``progressive.'' The IBC website talks about settings for algorithms. This seems to us a fair compromise notion.
One classic setting for approximating these high-dimensional integrals is the Monte-Carlo method. The common feature of this setting is that one samples points $latex {x=x_{1}\dots x_{d}}&fg=000000$ from $latex {S}&fg=000000$ and takes the average of the values of $latex {f(x)}&fg=000000$. This method yields a fair approximation, but the convergence is slow. For example, the dimension $latex {d}&fg=000000$ is $latex {360 = 30 \times 12}&fg=000000$: the number of months in a 30 year collateralized mortgage obligation (CMO). To get error $latex {\epsilon=10^{-2}}&fg=000000$ requires $latex {10^{720}}&fg=000000$ evaluations of the function---clearly a completely hopeless task. It certainly felt hopeless to financial firms on Wall Street that needed to estimate fair market values for these CMOs.
IBC Strikes Back
Enter Traub with some of his Columbia IBC research group in 1994. They discovered that a non-random method would beat the Monte-Carlo by many orders of magnitude. This was based on an old idea that uses low-discrepancy sequences. These are sequences $latex {X}&fg=000000$ of points inside a region $latex {S}&fg=000000$ that have good coverage of the space. For the unit cube $latex {S}&fg=000000$ as above, one may specify a class $latex {{\cal B}}&fg=000000$ of natural measurable subsets of $latex {S}&fg=000000$; then low discrepancy means that for every $latex {B \in {\cal B}}&fg=000000$ the proportion of points in $latex {X}&fg=000000$ that belong to $latex {B}&fg=000000$ is close to the measure of $latex {B}&fg=000000$---and moreover this happens for every long enough subsequence of $latex {X}&fg=000000$.
In the worst case the resulting Quasi-Monte Carlo (QMC) methods would perform much poorer than Monte-Carlo methods. However, the surprise was that for actual financial calculations they worked well---really well.
As with any change in basic methodology, the work and claims of Traub's group were met with great initial skepticism by the financial firms. However, the performance was and remains so good that many financial calculations still use QMC. This raised a further beautiful problem that loops back to the initial mission of IBC:
Why do the discrepancy-based methods perform so well?
That is, what properties of the particular high-dimensional integrals that occurred with CMOs and other applications that made non-random methods work so well? There are some ideas of why this is true in the actual financial calculations, but the basic question remains open. Wikipedia's IBC article picks up the story in a way that connects to Parlett's point though neither he nor his article is mentioned:
These results are empirical; where does computational complexity come in? QMC is not a panacea for all high-dimensional integrals. What is special about financial derivatives?
It continues:
Here's a possible explanation. The 360 dimensions in the CMO represent monthly future times. Due to the discounted value of money variables representing times for in the future are less important than the variables representing nearby times. Thus the integrals are non-isotropic.
Working with Ian Sloan, Wo\'zniakowski introduced a notion of weighted spaces that leverages the above observation. Their 1998 paper laid down conditions on the integrals and showed that they are rendered tractable in the QMC setting, often in worst case where Monte Carlo gives average case at best. How far this positive result can be extended---for which other classes of integrals does QMC beat MC---is a concrete form of the wonderful problem.
Limits and the Integral Approach
In the 1990s he expanded his involvement with the Santa Fe Institute. As the institute's memorial describes it,
In the 1990s he organized a series of SFI workshops on the Limits to Scientific Knowledge with the goal of enriching science in the same way that the work of Gödel and Turing on the limits of mathematics enriched that field. The workshops explored limits in various disciplines, including physics, economics, and geophysics.
Joe wrote a paper with Piet Hut and David Ruelle titled, ``Varieties of Limits to Scientific Knowledge.'' The abstract is crisp:
Any area of knowledge is structured by an intricate interplay of limits, and joining. An apparently restrictive limit may actually reveal itself as liberating. We discuss the role that limits play in the real world, in our mathematical idealizations, and in the mappings between them. We propose a classification of limits, and suggest how and when limits to knowledge appear as challenges that can advance knowledge.
The memorial quotes our friend Joshua Grochow on the significance of the integrated approach to science that his work represented:
``Joe was an incredibly original thinker. He crossed over entrenched mathematical borders, as though they didn't exist, to bring a new formalism of complexity to bear on the computational complexity of continuous algorithms. I was in the middle of an extended series of conversations with him spanning four or five different fields, and this seemed typical for him; I, and many others, will miss him dearly.''
Open Problems
Our condolences to the Traub family and to our colleagues at Columbia and Santa Fe.