Collaborators
Projects
Breaking the Curse of Dimensionality with the Kolmogorov Superposition Theorem
ProblemThe so-called "curse of dimensionality" says that the computational work explodes exponentially as the dimensionality of the inputs grows. This has stymied simulation of high-dimensional physical systems, such as plasma physics, for decades. Recently, very high dimensional classification problems have been solved using neural networks, a particular form of nonlinear function representation. However, neural networks provide little insight into the solution itself.
ApproachThe Kolmogorov Superposition Theorem (KST) allows us to represent a continuous multi-dimensional function \(f\) as the composition of one-dimensional functions in the following form:
\(f(x_1,\ldots,x_n) = \sum^{2n+1}_{q=1} g_q\left(\sum^n_{p=1} \lambda_p \phi(x_p + q \epsilon) \right)\)where the function \(\phi\) is universal in that it does not depend on \(f\). This form has not been used successfully for computation due to the lack of smoothness in the chosen \(\phi\). We have developed an optimal complexity, Lipschitz continuous representation for \(\phi\), and are currently working to develop a computable representation for \(g_q\).
ImpactA computational understanding of the structure of continuous functions will give us a powerful tool to understand the operation of neural networks, and other complex nonlinear classifiers. Our work should enable the compression of the deep neural networks currently in use, and selection of optimal activiation functions. This would have broad impact across the entire field of machine learning and data assimilation. Moreover, it could be possible to solve high dimensional physical systems using one dimensional tools as well.
PlanWe have successfully tackled the initial question of a definition for \(\phi\) based on a reparameterization of the Hölder continuous form of Köppen. We are currently experimenting with both finite element and Chebyshev polynomial representations for \(g_q\). Since the KST form was shown to be a 3 layer, feed-forward neural network by Hecht-Neilsen, we should be able to reformulate problems for which neural networks are quite effective, such as classification, in terms of the KST representation.
What is the complexity for application of the Laplacian operator?
Clearly we must envision it as an operator between two function spaces. Thus we must know the complexity of representation to tolerance \(\epsilon\) for functions in that space (\(W^1_p\), Besov, etc.). Ask Ridg about collecting these results, and look at DeVore papers. I still do not see how to get a lower bound even knowing that. Maybe we need to look at the identity operator first. By the above definition, the complexity is zero. However in a Galerkin method, we want to know the complexity of applying and inverting the mass matrix. So we want to be specific, we want to knwo the complexity of just generating the representation, no connection to Galerkin, and maybe that is a separate question.
For the second question, we can look at the Laplacian as a matrix approximation to the operator, and we know its size based upon the minimum representation size to get the accuracy we need. I think we should be able to get a bound on the nubmer of nonzeros/row by looking at the convergence rate of the method based on the overlap of the functions. This relationship can be inverted to get a complexity.