The Wonderful People Who Pay My Students

2021-2024, $327,000, DOE Applied Math Research, U.S. DOE Contract DE-AC02-06CH11357
Extending PETSc's Composable Solvers
We will conduct research on two critical topics: data aggregation and solver adaptivity:
- Data Aggregation The ability to solve multiple instances independently is becoming increasingly important, partly because of the increasing volume of data that applications are tasked to model, and partly in response to long-term hardware trends. For instance, machine learning tools have been used as solvers (linear, nonlinear, etc.). Such ML-based solvers are often batched, however, making them brittle to changes in architecture or problem regime. We propose to extend this simple form of data aggregation and instead batch across multiple algorithmic levels to entire simulations based on solvers. Thus an effective research agenda must include tools for determining what parts of the solution spaces are not being handled well by the various solvers. This work will include developing a smart composition scheme inside PETSc/DMNetwork and applying it to large applications. It will also involve developing a data aggregation approach for the Landau operators.
- Solver Adaptivity We will focus on solver adaptivity for high modes, multigrid activity, and optimization of discretization for improved algebraic solvers. This work will help overcome two ubiquitous difficulties that arise in the design and effective use of composable solvers: (1) users can be overwhelmed quickly in nontrivial problems by the complexity of multilevel schemes, often not knowing what algorithms are used and how they are composed; and (2) when something goes wrong, it is difficult to determine the source of the obstacle and how to overcome it. We will develop optimal polynomial smoothers and provide interfaces to foster research on control systems that adaptively recalibrate smoothers. We will construct diagnostic operations that can be activated through command line options. Moreover, we will conduct remedial operations: with a good approximation of the lowest generalized eigenmodes, we can adapt the prolongation operators to accurately interpolate them, thus allowing the capture of these modes in our coarse problems. Using data-aggregated optimization solvers, we also will extend the local co-derivative methods to adaptively refined grids and will improve the robustness of the discretizations.
This work will enable breakthroughs in numerical algorithms to overcome difficulties in convergence, while supporting the transition to data-driven predictive science for network and multiphysics PDE-based simulation. These advances are needed for many computational domains that use PETSc, including primary motivators in fusion energy, power and water networks, geosciences, and subsurface flow. Moreover, by focusing on how solvers are formulated and used, this work addresses the need for the synthesis—at scale—of traditional high-performance simulations with data-driven computing, including machine learning, artificial intelligence, statistical analysis, data analytics, and uncertainty quantification.
2020-2025, $8,530,200, DOE PSAAPIII: FOA00020680005

Center for Exacale Simulation of Hybrid Rocket Motors
2020-2022, $112,333, DOE ECP WDMApp: S017663

Development of scalable algebraic solvers for the ECP WDMApp project.
2020, $30,000, UB Blue Sky Fund
Explaining Mass Loss in the Antarctic Ice Sheet through Integrated Systems Modeling

Large-scale ice sheets like the Antarctic Ice Sheet play an important role in governing the Earth’s water budget. Satellite data indicate that the surface height of the XX region of the Antarctic ice sheet is rapidly decreasing (XX m/yr). The decrease in surface height cannot be explained by simple models of water loss along the base of the ice sheet nor by sublimation along the ice sheet surface. Furthermore, the trend of surface height reduction extends over 1000 km inland from the coast, suggesting the surface height reduction cannot be due to warm water currents beneath the ice. We propose an integrated systems modeling approach to develop preliminary models that test the hypotheses that regional tectonics and geology as complex water networks within the ice can solve the mass balance. To this end, we will employ a combination of high-resolution satellite data, analytic mass balance modeling, and numerical simulations of three-dimensional non-linear ice flow and water networks. Spatial analysis and ice sheet simulations will be analyzed with an interactive immersive virtual reality system, building on the hardware in the Geodynamics Research and Visualization lab in the Department of Geology. In addition, this Blue Sky proposal will lay the ground work for future proposals that investigate the integrated crustal contribution to the ice sheet behavior through the acquisition of a seismometer in collaboration with the Department of Geology.
2020-2021, $80,000, DOE Basic Energy Sciences, U.S. DOE Contract DE-AC02-0000011838
Development of Terrestrial Dynamical Cores for the ACME to Simulate Water Cycle

Developing a predictive understanding of the terrestrial water cycle at local to global scale is essential for accurate assessment of water resources, agricultural production, and energy generation given current climate variability. Higher spatial resolution in the land and river components of the Accelerated Climate Modeling for Energy (ACME) project alone is insufficient to meet the U.S. Department of Energy's (DOE's) 10-year vision for the Earth System Modeling program. Next generation hyperresolution terrestrial models need to move beyond one-dimensional systems by including scale appropriate two- and three-dimensional physics formulations. The need for high-fidelity terrestrial models has been identified in past DOE's workshops on water-energy nexus, U.S. energy sector vulnerabilities to climate change and extreme weather, and terrestrial-aquatic interface. This project is aimed at developing rigorously verified, spatially adaptive, scalable, multi-physics dynamical cores for global-scale modeling of three-dimensional processes in the land and two-dimensional processes in the river component of ACME. The dynamical cores will use the Portable, Extensible Toolkit for Scientific Computation library to provide numerical solution of discretized equations. Phase-1 of the project will focus on developing standalone versions of the dynamical cores. A validation and verification framework will be developed to build trustworthiness of the high-fidelity, high-resolution dynamical cores. An automated testing framework will be implemented to increase reliability and productivity in the model development process. By the end of Phase-1, the newly developed dynamical cores will be applied to a wide range of problems including lateral water and energy transport in high-latitude systems, transport of water and energy in the soil-plant continuum, urban flooding, and surface water dynamics at regional scales. Phase-2 of the project will focus on integration of the dynamical cores within ACME and application of the new model to study changes in terrestrial water cycle at global scales. These enhanced modeling capabilities will improve ACME's capability to understand how the hydrological cycle evolves and impacts a wide range of water resource issues, which are critical to DOE's missions.
2020-2025, $504,431, NSF CSSI: 1931524

Physical systems in which fluid flows interact with immersed structures are found in a wide range of areas of science and engineering. Such fluid-structure interactions are ubiquitous in biological systems, including blood flow in the heart, the ingestion of food, and mucus transport in the lung. Fluid-structure interaction is also a crucial aspect of new approaches to energy harvesting, such as wave-energy converters that extract energy from the motion of sea or ocean waves, and in advanced approaches to manufacturing, such as 3D printing. This award supports the development of an advanced computer simulation infrastructure for modeling this full range of application areas. Computer models advanced by this project could ultimately lead to improved diagnostics and treatments for human disease, optimized designs of novel approaches to renewable energy, and reduced manufacturing costs through improved production times in 3D printing.

This project aims to enhance the IBAMR computer modeling and simulation infrastructure that provides advanced implementations of the immersed boundary (IB) method and its extensions with support for adaptive mesh refinement (AMR). IBAMR is designed to simulate large-scale fluid-structure interaction models on distributed memory-parallel systems. Most current IBAMR models assume that the properties of the fluid are uniform, but many physical systems involve multiphase fluid models with inhomogeneous properties, such as air-water interfaces or the complex fluid environments of biological systems. This project aims to extend recently developed support in IBAMR for treating multiphase flows by improving the accuracy and efficiency of IBAMR's treatment of multiphase Newtonian flows, and also by extending this multiphase flow modeling capability to treat multiphase complex (polymeric) fluid flows, which are commonly encountered in biological systems, and to treat reacting flows with complex chemistry, which are relevant to models of combustion, astrophysics, and additive manufacturing using stereolithography (3D printing). This project also aims to re-engineer IBAMR for massive parallelism, so that it may effectively use very large computational resources in service of applications that require very high fidelity. The project will also develop modules that will facilitate the use of image-derived geometries, and it will develop novel fluid-structure coupling schemes that will facilitate the use of independent fluid and solid solvers. These capabilities are motivated within this project by models of cardiac, gastrointestinal, and lung physiology; renewable energy; and advanced manufacturing. This software will be used in courses developed by the members of the project team. The project also aims to grow the community of IBAMR users by enhancing project documentation and training materials, hosting user group meetings, and offering short courses.

This award by the NSF Office of Advanced Cyberinfrastructure is co funded by the Division of Civil, Mechanical, and Manufacturing Innovation to provide enabling tools to advance potentially transformative fundamental research, particularly in biomechanics and mechanobiology.
2016-2021, $585,000, Subcontract, NSF EAR-0949446
The Computational Infrastructure for Geodynamics

CIG is fully described at http://www.geodynamics.org.
2018-2021, $270,000, DOE Applied Math Research, U.S. DOE Contract DE-AC02-06CH11357
Extending PETSc's Composable Hierarchical Nested Solvers
This proposal comprises complementary subprojects that all embrace this paradigm - and thus have far more potential than work only on algebraic solvers could achieve.
- DMNetwork: composable multiphysics PDE-based network simulations (Section 3.1): robust, scalable simulation framework for large-scale multiphysics network applications, such as power-grid and water networks;
- SITAR: solver-integrated tree-based adaptive refinement (Section 3.2): high-efficiency, integrated, meta-multigrid solvers with adaptivity, motivated by modeling in fusion, geodynamics, and nuclear reactors; and
- Outer-loop computations: integrated solver and data management capabilities needed for work toward predictive science (Section 3.3): including construction of auxiliary heterogeneous subsystems, as driven by the needs of numerical optimization and synthesis of data-intensive and simulation-intensive components.
Work on this synergistic combination of topics will push fundamental applied mathematics research past conventional boundaries toward a holistic perspective on composable solvers, as needed for expanded coupling in forward simulations, along with outer-loop optimization and analysis.
2018-2020, $160,000, DOE Basic Energy Sciences, U.S. DOE Contract DE-AC02-0000011838
Development of Terrestrial Dynamical Cores for the ACME to Simulate Water Cycle

Developing a predictive understanding of the terrestrial water cycle at local to global scale is essential for accurate assessment of water resources, agricultural production, and energy generation given current climate variability. Higher spatial resolution in the land and river components of the Accelerated Climate Modeling for Energy (ACME) project alone is insufficient to meet the U.S. Department of Energy's (DOE's) 10-year vision for the Earth System Modeling program. Next generation hyperresolution terrestrial models need to move beyond one-dimensional systems by including scale appropriate two- and three-dimensional physics formulations. The need for high-fidelity terrestrial models has been identified in past DOE's workshops on water-energy nexus, U.S. energy sector vulnerabilities to climate change and extreme weather, and terrestrial-aquatic interface. This project is aimed at developing rigorously verified, spatially adaptive, scalable, multi-physics dynamical cores for global-scale modeling of three-dimensional processes in the land and two-dimensional processes in the river component of ACME. The dynamical cores will use the Portable, Extensible Toolkit for Scientific Computation library to provide numerical solution of discretized equations. Phase-1 of the project will focus on developing standalone versions of the dynamical cores. A validation and verification framework will be developed to build trustworthiness of the high-fidelity, high-resolution dynamical cores. An automated testing framework will be implemented to increase reliability and productivity in the model development process. By the end of Phase-1, the newly developed dynamical cores will be applied to a wide range of problems including lateral water and energy transport in high-latitude systems, transport of water and energy in the soil-plant continuum, urban flooding, and surface water dynamics at regional scales. Phase-2 of the project will focus on integration of the dynamical cores within ACME and application of the new model to study changes in terrestrial water cycle at global scales. These enhanced modeling capabilities will improve ACME's capability to understand how the hydrological cycle evolves and impacts a wide range of water resource issues, which are critical to DOE's missions.
2017-2020, $386,000, Platform for Advanced Scientific Computing (PASC), ETHZ

This is a PASC Software Development Proposal for the development of Salvus, a spectral-element software package built from the ground up with state-of-the-art techniques and tools. By leveraging the PETSc library (https://www.mcs.anl.gov/petsc/) and its DMPlex finite-element API (Lange et al., 2015), Salvus has been capable of running in parallel from the start. At its core, PETSc provides a distributed vector and matrix API, with specific routines added to support finite-element methods, allowing us to write a fully functioning, parallelised wave propagation package for acoustic and elastic domains in less than 10'000 lines of code, with performance comparable to SPECFEM3D.

Here we propose to position Salvus as a top wavefield modelling and inversion package in the exascale era. To ensure performance of Salvus on today's and tomorrow's supercomputing platforms, our work will focus on cross-architecture developments, code and I/O optimisation, and systematic testing and validation. This will be complemented by actions to increase and broaden the usability and impact of Salvus. They include workflow developments, the implementation of frequency-domain solvers, and extensions of the physics that can be modelled.
2017-2018, $100,000, Partnership for Edge Physics Simulation, LBNL Contract 18083698
Solver Development for SCREAM and EPSI
Prof. Knepley will aid Dr. Mark Adams in developing advanced nonlinear solver capability for plasma physics simulations. In addition, he will provide mesh management and discretization support in PETSc for LBNL's solver development for the SCREAM and EPSI projects.
2015-2018, $240,000, DOE Applied Math Research, U.S. DOE Contract DE-AC02-06CH11357
Extending PETSc's Composable Hierarchically Nested Linear Solvers

The changing landscape of both scientific application needs (more complex physical models and coupling of diverse models) and high-performance computing systems (many-core node architectures, hybrid combinations of GPU and conventional processors, and high node counts) requires continued innovation in mathematical algorithms for solving sparse linear systems and high-quality, scalable software implementations of these algorithms. To this end, we propose to study composable, hierarchical, nested solvers—that is, solvers composed of multiple levels of nested algorithms and data models to exploit architectural features and/or problem-specific structure.

We focus on five complementary subprojects: (1) algorithmic and software infrastructure for nested block methods; (2) scalable solvers for tree-based structured grid adaptive mesh refinement; (3) advanced multigrid methods for multiphysics applications; (4) eigenanalysis of composite linear solvers, allowing one to understand what portions of the solution space are being handled well by the solver and which require additional preconditioning; and (5) software for understanding and analyzing composable, hierarchical, nested solvers in relation to their constituent parts, as well as user composition of such solvers. An essential crosscutting aspect of this work is the development of flexible and extensible software infrastructure to support these hierarchical algorithms.

The robust and scalable linear solvers will benefit numerous PDE-based applications, including gyrokinetics, geodynamics, neutron transport, subsurface flow, ice sheet modeling, solid mechanics, and earthquake simulation. Thus, this work will have an immediate and powerful impact on a broad range of important Office of Science and other DOE work.
2015-2020, $262,655, NSF SI2-SSI: 1450339
Scalable Infrastructure for Enabling Multiscale and Multiphysics Applications in Fluid Dynamics, Solid Mechanics, and Fluid-Structure Interaction

Many biological and biomedical systems involve the interaction of a flexible structure and a fluid. These systems range from the writhing and coiling of DNA, to the beating and pumping of cilia and flagella, to the flow of blood in the body, to the locomotion of fish, insects, and birds. This project aims to develop advanced software infrastructure for performing dynamic computer simulations of such biological and biomedical systems. To facilitate the deployment of this software in a range of scientific and engineering applications, this project will develop new software capabilities in concert with new computer models that use the software. Specific application domains to be advanced in this project include models of aquatic locomotion that can be used to understand the neural control of movement and ultimately to develop new treatments for neurological pathologies such as spinal cord injuries, and models that simulate the interaction between the electrophysiology of the heart and the contractions of the heart that pump blood throughout the body, which could lead to improved approaches to treating heart disease. The software to be developed within the project is freely available online and is used by a number of independent research groups in a variety of scientific and engineering domains. It is being actively used in projects that model different aspects of cardiovascular dynamics, such as platelet aggregation and the dynamics of natural and prosthetic heart valves, and in projects that study other biological problems, including cancer dynamics, insect flight, aquatic locomotion, and the dynamics of phytoplankton. The software is also being applied to non-biological problems, including nanoscale models of colloidal suspensions and models of active particles. The improved methods and software to be developed in this project will thereby have a broad and sustained impact on a large number of ongoing research efforts in the biological and biomedical sciences and other scientific and engineering disciplines.

The immersed boundary (IB) method is a broadly applicable framework for modeling and simulating fluid-structure interaction (FSI). The IB method was introduced to model the fluid dynamics of heart valves, and subsequent development initially focused on simulating cardiac fluid dynamics. This methodology is broadly useful, however, and has been applied to a variety of problems in which a fluid flow interacts with immersed structures, including elastic bodies, bodies with known or prescribed deformational kinematics, and rigid bodies. Extensions of the IB method have also been developed to model electrophysiological systems and systems with chemically active structures. To improve the efficiency of the IB method, the PI has developed adaptive versions of the IB method that employ structured adaptive mesh refinement (AMR) to deploy high spatial resolution only where needed. These methods have been implemented within the IBAMR software framework, which provides parallel implementations of the IB method and its extensions that leverage high-quality computational libraries including SAMRAI, PETSc, and libMesh. This project will further extend the IBAMR software by implementing modeling and discretization technologies required by the research applications of current and prospective users of the software, by developing improved solver infrastructure facilitated by the implementation of native support for structured AMR discretizations in the PETSc library, and by integrating with existing high-quality software tools for model development, deployment, and analysis. IBAMR is freely distributed online and is used within a number of independent research groups both to the further development of the IB method and also to its application to simulate diverse problems in fluid dynamics and FSI. By enhancing IBAMR, this project will also enhance the ability of these and other researchers to construct detailed models without requiring those researchers to develop the significant software infrastructure needed to perform such simulations. This project will also develop general-purpose support for AMR discretizations in PETSc, a software library with thousands of active users, ~400 downloads per month, and numerous applications. The work of this project will help to grow the IBAMR user community of students and researchers by developing UI tools for building models, running simulations, and analyzing results. Students will be actively engaged in all aspects of the project, including code, method, and model development.
2015-2017, $400,000, Intel Parallel Computing Center
Extending PETSc with Adaptive Mesh Refinement and Optimal Solvers, Applied to PFLOTRAN, and Optimized for Modern Intel Processors

We will extend and optimize PETSc for structured adaptive mesh refinement (SAMR) discretizations with initial deployment to the premier open source, state-of-the-art massively parallel subsurface flow and reactive transport code – PFLOTRAN. This work is significant in that it radically changes algorithms, data access, and low level computational organization in order to maximize performance and scalability on modern Intel architectures, and encapsulates this knowledge in the PETSc libraries for the broadest possible impact.

PETSc will provide a new solver interface to SAMR, enabling the efficient representation of multi-scale phenomenon, while maintaining the simplicity of structured grid kernel computations. These kernel computations are amenable to the high number of vector lanes characteristic of current and future Intel architectures, and we will aggressively vectorize the existing code. Moreover, we propose to use the most asymptotically efficient solvers for equations of interests to PFLOTRAN such as Richards’ equations: matrix-free full approximation scheme (FAS) nonlinear full multigrid (FMG) methods. FMG has the potential to solve nonlinear systems of equations to truncation error accuracy with a few (5-20) work units or residual calculations. These formulations have a unique opportunity to leverage modern architectures to deliver fast, accurate, versatile solvers for subsurface flows and other PETSc applications.
2012-2015, $117,710, NSF SI2-SSE: 1147680
SPIKE --- An Implementation of a Recursive Divide-and-Conquer Parallel Strategy for Solving Large Systems of Linear Equations

Drs. Negrut, Sameh, and Knepley will investigate, produce, and maintain a methodology and its software implementation that leverage emerging heterogeneous hardware architectures to solve billion-unknowns linear systems in a robust, scalable, and efficient fashion. The two classes of problems targeted under this project are banded dense and sparse general linear systems.

This project is motivated by the observation that the task of solving a linear system is one of the most ubiquitous ingredients in the numerical solution of Applied Mathematics problems. It is relied upon for the implicit integration of Ordinary Differential Equation (ODE) and Differential Algebraic Equation (DAE) problems, in the numerical solution of Partial Differential Equation (PDE) problems, in interior point optimization methods, in least squares approximations, in solving eigenvalue problems, and in data analysis. In fact, the vast majority of nonlinear problems in Scientific Computing are solved iteratively by drawing on local linearizations of nonlinear operators and the solution of linear systems. Recent advances in (a) hardware architecture; i.e., the emergence of General Purpose Graphics Processing Unit (GP-GPU) cards, and (b) scalable solution algorithms, provide an opportunity to develop a new class of parallel algorithms, called SPIKE, which can robustly and efficiently solve very large linear systems of equations.

Drawing on its divide-and-conquer paradigm, SPIKE builds on several algorithmic primitives: matrix reordering strategies, dense linear algebra operations, sparse direct solvers, and Krylov subspace methods. It provides a scalable solution that can be deployed in a heterogeneous hardware ecosystem and has the potential to solve billion-unknown linear systems in the cloud or on tomorrow?s exascale supercomputers. Its high degree of scalability and improved efficiency stem from (i) optimized memory access pattern owing to an aggressive pre-processing stage that reduces a generic sparse matrix to a banded one through a novel reordering strategy; (ii) good exposure of coarse and fine grain parallelism owing to a recursive, divide-and-conquer solution strategy; (iii) efficient vectorization in evaluating the coupling terms in the divide-and-conquer stage owing to a CPU+GPU heterogeneous computing approach; and (iv) algorithmic polymorphism, given that SPIKE can serve both as a direct solver or an effective preconditioner in an iterative Krylov-type method.

In Engineering, SPIKE will provide the Computer Aided Engineering (CAE) community with a key component; i.e., fast solution of linear systems, required by the analysis of complex problems through computer simulation. Examples of applications that would benefit from this technology are Structural Mechanics problems (Finite Element Analysis in car crash simulation), Computational Fluid Dynamics problems (solving Navier-Stokes equations in the simulation of turbulent flow around a wing profile), and Computational Multibody Dynamics problems (solving Newton-Euler equations in large granular dynamics problems).

SPIKE will also be interfaced to the Portable, Extensible Toolkit for Scientific Computation (PETSc), a two decades old flexible and scalable framework for solving Science and Engineering problems on supercomputers. Through PETSc, SPIKE will be made available to a High Performance Computing user community with more than 20,000 members worldwide. PETSc users will be able to run SPIKE without any modifications on vastly different supercomputer architectures such as the IBM BlueGene/P and BlueGene/Q, or the Cray XT5. SPIKE will thus run scalably on the largest machines in the world and will be tuned for very different network and hardware topologies while maintaining a simple code base.

The experience collected and lessons learned in this project will augment a graduate level class, ?High Performance Computing for Engineering Applications? taught at the University of Wisconsin-Madison. A SPIKE tutorial and research outcomes will be presented each year at the International Conference for High Performance Computing, Networking, Storage and Analysis. A one day High Performance Computing Boot Camp will be organized each year in conjunction with the American Society of Mechanical Engineers (ASME) conference and used to disseminate the software outcomes of this effort. Finally, this project will shape the research agendas of two graduate students working on advanced degrees in Computational Science.
2012-2015, $240,000, DOE Applied Math Research, U.S. DOE Contract DE-AC02-06CH11357
Extending PETSc's Composable Hierarchically Nested Linear Solvers
We propose to focus on the following four complementary subprojects, which have the common theme of addressing issues in scalability and robustness through hierarchical algorithms:
- Scalable preconditioners for Stokes and Karush-Kuhn-Tucker (KKT) systems;
- Unstructured geometric-algebraic multigrid methods;
- Hierarchical Krylov methods designed to scale Krylov solvers up to million-core simulations; and
- Interactive eigenanalysis of composite linear solvers, allowing one to understand what portions of the solution space are being handled well by the solver and which require additional preconditioning procedures.
2010-2015, $650,000, Subcontract, NSF EAR-0949446
The Computational Infrastructure for Geodynamics

CIG is fully described at http://www.geodynamics.org.
2009-2014, $550,000, DOE Math-CS Institute, U.S. DOE Contract DE-AC02-06CH11357
Nonlinear Algorithms to Circumvent the Memory Bandwidth Limitations of Implicit PDE Simulations
Sparse matrix computations, which form the computational kernel of many DOE simulation codes, are severely memory bandwidth limited. This bottleneck holds true for single core systems, multicore systems, the new general-purpose graphics processing systems (GP GPUs), and the slightly more exotic IBM Roadrunner Cell processor. Under the best of circumstances one cannot achieve more than 15 to 20 percent of the processors' performance in sparse matrix computations because the processors are waiting on memory loads up to 85 percent of the time. With current architectural trends this situation is unlikely to change; if anything, it will get worse.

Sparse matrix computations arise because the paradigm for solving nonlinear partial differential equations (PDEs) is to iteratively perform a global linearization and then solve a sparse matrix linear problem (that is, Newton's method). The linearization involves computing a sparse matrix approximation to a nonlinear problem, which is (because of its large size) stored in main memory. The linear solution process requires constantly moving the sparse matrix entries from main memory into the CPU, where a small number of computations are performed using those values, before more values need to be loaded, thus allowing no register or cache reuse of the sparse matrix entries.

To better utilize computer systems, we need to eliminate the "global linearize, then linear solve" paradigm and replace it with a "partially linearize, then solve" paradigm that acknowledges the memory hierarchy and moves the best possible CPU utilization, from 15% to over 75%, resulting in run times that are one-fifth as long. To achieve this transformation will require the joint efforts of mathematicians and computer scientists. The full approximation scheme (FAS), also called nonlinear multigrid, provides an algorithmic framework for the "partially linearize, then solve" approach and will form the starting point for our work. Loosely speaking, we propose a transition from Newton-multigrid algorithms to multigrid-Newton.

This project will develop the mathematical algorithms, analysis, and computer science techniques (including parsing, code generation, code optimization, and automatic differentiation) to move the "partially linearize, then solve" paradigm from concept to practice. This research will result in PDE-based simulations running up to five times faster than traditional linear algebra approaches on the next generation of computer systems. Preliminary work indicates the "partially linearize, then solve" approach has great potential. The two key research questions we will answer under this work are as follows:
- How will we maintain roughly the same convergence rates with the nonlinear schemes as with Newton-multigrid? Obviously if the convergence rates decay so badly that we end up doing many more floating-point operations, then the higher utilization is not a good measure of performance.
- How will we provide a software system that end users will accept and use? This likely means not requiring a radical shifting of the way application scientists develop code. But it may involve a radical shift in the way we develop ``libraries'' for implicit methods for PDEs.
2010-2012, $45,000, Subcontract, Army Research Office W911NF-09-1-0488
Classical Density Functional Theory of Fluids: Ions at a Dielectric Interface
The overall (three-year) goal of the proposed work is to develop a classical density functional theory (DFT) of ions and charged polymers near dielectric interfaces in a three-dimensional system. Over the 3 years of the grant, we have had great success and some setbacks, as might be expected for any research project. These include:
- The biggest success was publication of the paper describing the general theory in Journal of Physical Chemistry Letters. This is the big theory paper to come out of this project and represents the most important goal of the proposed work (i.e., constructing the functional itself). In the year since publication, it has already been cited 3 times, indicating an impact of the work.
- Numerical implementation of an efficient algorithm to solve the dielectric DFT equations is still being implemented and work on this front will continue beyond the completion of the grant. This is the biggest setback of the project. The finding of an efficient algorithm to numerically solve the dielectric DFT equations was much more difficult than aniticipated. The root of the problem is that the reaction field is not a radial interaction between ions. Because of this, fast Fourier transforms (FFTs) can no longer be used and one must be clever in making the program scale with $N \log N$ rather than $N^2$ where $N$ is the number of grid points to be solved.
- On the other hand, one great numerical success was in implementing fast computational methods for the reaction in 3D. Matt Knepley at the Computational Institute of the University of Chicago, through a subcontract of this grant, has implemented a Fast Multipole Method (FMM) and published a paper on this topic.
- An exciting offshoot of the DFT work was a collaboration with Dezso Boda in Hungary that resulted an alternative method (not using DFT at all) of ions at dielectric interfaces that we developed. The specific goal of the proposed work is to develop a DFT of ions at a dielectric interface and this has been done (as just mentioned). The broader goal of the proposed to work is develop methods that any scientist can use when working with ions at a dielectric interface, especially in three dimensional geometries. We have started to develop another method with Dr. Boda, a long-time collaborator, called the Local Equilibrium Grand Canonical Monte Carlo (LE-GCMC). Because we have already developed a fast method to include dielectrics in Monte Carlo, this method will allow for computing ion currents (currently at the drift-diffusion level, but not limited to that) in complex 3D geometries that will be very challenging with DFT. The grant sponsored Dr. Bodaís one month visit to the PIís institution to work on this. It is important to note that this work is in addition to the DFT development; LE-GCMC came out of discussions that Dr. Boda and the PI had in discussing the dielectric DFT. Moreover, Dr. Boda's visit helped greatly in moving toward an efficient algorithm to numerically solve the dielectric DFT equations.
- Another offshoot of this work is the hope of coupling the new dielectric DFT functional to experiments. This work, which will continue beyond the completion of the grant, is with experimentalists working on nanoáuidic devices. This resulted in 2 papers (a third is in preparation), including one in the prestigeous Nano Letters.
The final report is available here.
2001-2011, $1,200,000, DOE SciDAC ISIC
Towards Optimal Petascale Simulations

TOPS is fully described at http://www.scidac.gov/math/TOPS.html.
2009-2011, $90,000, NSF Grant OCI-0850680
Mechanical Transformation of Knowledge to Libraries

The proposed project approaches the problem of library development by a fundamentally different approach -- one in which the knowledge abstractions at an appropriate level are represented and then this knowledge is converted into implementations optimized for different target platforms and circumstances. The primary intellectual merit here is the automation of high quality numerical code generation.

The broader impact is that abstraction driven automation for library developers can significantly cut down the "software gap", delay between development of new architectures and new software that maximizes its use. The impact of this will be broad across all disciplines that use HPC.
2005-2010, $650,000, Subcontract, NSF EAR-0426271
The Computational Infrastructure for Geodynamics

CIG is fully described at http://www.geodynamics.org.
2005-2007, $210,000, Subcontract, DOE Reactor Core Modeling

Matt Knepley's HomePage

The Wonderful People Who Pay My Students