Enabling Collaborative Science Through Grid Technology
Russ Miller
Director, Center for Computational Research
UB Distinguished Professor, Computer Science & Engineering
Senior Research Scientist, Hauptman-Woodward Medical Inst

Outline
Bioinformatics in Buffalo
Supercomputing in Buffalo
Grid Computing
Grid Computing in Buffalo
Shake-and-Bake: Computational Crystallography
ECCE: Computational Chemistry

Biomedical Advances
PSA Test (screen for Prostate Cancer)
Avonex: Interferon Treatment for Multiple Sclerosis
Artificial Blood
Nicorette Gum
Fetal Viability Test
Implantable Pacemaker
Edible Vaccine for Hepatitis C
Timed-Release Insulin Therapy
Anti-Arrythmia Therapy
Tarantula venom
Direct Methods Structure Determination
Listed on “Top Ten Algorithms of the 20th Century”
Vancomycin
Gramacidin A
High Throughput  Crystallization Method: Patented
NIH National Genomics Center: Northeast Consortium
Howard Hughes Medical Institute: Center for Genomics & Proteomics

Bioinformatics in Buffalo
A $290M Initiative
UB Center for Advanced Bioengineering & Biomedical Technologies
$1M/yr NYS
Med Tech for Product Dev & Commer.
Center Disease Modeling & Therapy Discovery
UB, HWI, RPCI, Kaleida
$15.3M NYS
Software, device development, and drug therapies
Buffalo Center of Excellence in Bioinformatics
UB, HWI, RPCI
$61M NYS
$10M Federal Government
$151 Corporate Funding
UB Faculty Funding: $64M

Partnerships
Lead Partners: SUNY-Buffalo, Hauptman-Woodward Medical Research Institute, Roswell Park Cancer Institute

Experimental Facilities I
Molecular Targeting Laboratory
Screen 30-50K compounds every 3 months
Apply compound to cell (different genes treated w fluor markers)
Rapidly identify effect on specific gene expression pathways
Gene Expression Laboratory
High-throughput microarray and gene chip
Discover new genes, their functions, and pathways
Proteomics and Molecular Kinetics Lab
Identify molecular targets found in Gene Expression Lab
Disease Modeling Laboratory
In vivo testing (flies, mice, baboons,…)
Gene targeting and genetic mapping facilities

Experimental Facilities II
Bioengineering Support Laboratory
Capabilities in photonics and nano-tech research
E.g., handheld devices to test for diseases
Protein Scale-Up and Purification
High-Throughput Robotic Combinatorial Chemistry/ Parallel Synthetic Chemistry Capabilities
Drugs created robotically; Tested for interaction with target protein
Rapid identification of a large number of potential drugs
Public Health and Molecular Pathology
Tissue repositories; disease gene maps; medical informatics
High-Throughput Search Process for Structural Biology
Tests 1536 “chemical cocktails” to determine effective parameters for crystallization

SUNY-B  2002-03 Snapshot
Personnel
Hired Jeff Skolnick as Director (7/02)
Brought 13 additional staff to Buffalo
Authorized to hire 10 additional research groups
Hired Norma Nowak as co-Director (4/03)
Authorized to hire 10 additional research groups
Additional members TBD
External Funding ($0)
Applications submitted
Deliverables
Six (6) scientific papers
Resources
Building
6TF ® 10TF Compute Cluster

Center for Computational Research
High-Performance Computing and High-End Visualization
110 Research Groups in 27 Depts
25 Companies and Institutions
Sample Areas
Urban Visualization and Simulation
Computational Chemistry
Ground Water Modeling
Geophysical Mass Flows
Networked Multimedia
Medical Imaging
Training
Workshops; Courses
Degree Programs

CCR 1999-2003 Snapshot
Personnel
18 State-Supported Staff
2 Grant-Supported Staff
External Funding
$111M External Funding
$13.5M as lead
$97.5M in support
$41.8M Vendor Donations
Deliverables
350+ Publications
Software, Media, Algorithms, Consulting, Training, CPU Cycles, etc.

Computational Resources (9TF)
Dell Linux Cluster - #22 on top500
600 P4 Processors  (2.4 GHz)
600 GB RAM; 40 TB Disk; Myrinet
Dell Linux Cluster - #187 on top500
4036 Processors (PIII 1.2 GHz)
2TB RAM; 160TB Disk; 16TB SN

Sample Computational Research
Computational Chemistry (King, Kofke, Coppens, Furlani, Tilson, Lund, Swihart, Ruckenstein, Garvey)
Algorithm development & simulations
Groundwater Flow Modeling (Rabideau, Jankovic, Becker, Flewelling)
Predict contaminant flow in groundwater & possible migration into streams and lakes
Geophysical Mass Flows (Patra, Sheridan, Pitman, Bursik, Jones, Winer)
Study of geophysical mass flows for risk assessment of lava flows and mudslides
Bioinformatics (Zhou, Miller, Hu, Szyperski – NIH Consortium, HWI)
Protein Folding: computer simulations to understand the 3D structure of proteins
Structural Biology; Pharmacology
Computational Fluid Dynamics (Madnia, DesJardin, Lordi, Taulbee)
Modeling turbulent flows and combustion to improve design of chemical reactors, turbine engines, and airplanes
Physics (Jones, Sen)
Many-body phenomena in condensed matter physics
Chemical Reactions (Mountziaris)
Molecular Simulation (Errington)

Visualization Resources
Fakespace ImmersaDesk R2
Portable 3D Device
Tiled-Display Wall
20 NEC projectors: 15.7M pixels
Screen is 11’´7’
Dell PCs with Myrinet2000
Access Grid Node
Group-to-Group Communication
Commodity components
SGI Reality Center 3300W
Dual Barco’s on 8’´4’ screen
VREX VR-4200 Stereo Imaging Projector
Portable projector works with PC

Sample Visualization Areas
Computational Science (Patra, Sheridan, Becker, Flewelling, Baker, Miller, Pitman)
Simulation and modeling
Urban Visualization and Simulation (CCR)
Public projects involving urban planning
Medical Imaging (Hoffmann, Bakshi, Glick, Miletich, Baker)
Tools for pre-operative planning; predictive disease analysis
Geographic Information Systems (CCR, Bisantz, Llinas, Kesavadas, Green)
Parallel data sourcing software
Historical Reenactments (Paley, Kesavadas, More)
Faithful representations of previously existing scenarios
Multimedia Presentations (Anstey, Pape)
Networked, interactive, 3D activities

3D Medical Visualization App
Collaboration with Children’s Hospital
Leading miniature access surgery center
Application reads data output from a CT Scan
Visualize multiple surfaces and volumes
Export images, movies or CAD representation of model

Multiple Sclerosis Project
Collaboration with Buffalo Neuroimaging Analysis Center (BNAC)
Developers of Avonex, drug of choice for treatment of MS
MS Project examines patients and compares scans to healthy volunteers

Multiple Sclerosis Project
Compare caudate nuclei between MS patients and healthy controls
Looking for size as well as structure changes
Localized deformities
Spacing between halves
Able to see correlation between disease progression and physical structure changes

Grid Computing 2003

Grid Computing Overview
Coordinate Computing Resources, People, Instruments in Dynamic Geographically-Distributed Multi-Institutional Environment
Treat Computing Resources like Commodities
Compute cycles, data storage, instruments
Human communication environments
No Central Control; No Trust

Computational Grids & Electric Power Grids
Similarities/Goals of CG and EPG
Ubiquitous
Consumer is comfortable with lack of knowledge of details
Differences Between CG and EPG
Wider spectrum of performance & services
Access governed by more complicated issues
Security
Performance
Socio-political factors

Growth of Data and Load
vs. Moore’s Law

A Short History of the Grid
Grand Challenge Problems (1980s)
NSF and DOE initiatives
“Science is a team sport”
Initiate multi-resource projects involving computation, instruments, visualization, data
Evolution of Related Communities
Parallel computation
Address resource limitations
Networking
Gigabit testbed program
 Investigate potential testbed network architectures
Explore usefulness for end-users

The Globus Project
(Ian Foster and Carl Kesselman)
Globus model focuses on providing key Grid services
Resource access and management
Grid FTP
Information Service
Security services
Authentication
Authorization
Policy
Delegation
Network reservation, monitoring, control

Extensible TeraGrid Facility (ETF)

Enabling the Grid
Internet is Infrastructure
Increased network bandwidth and advanced services
Advances in Storage Capacity
Terabyte costs less than $5,000
Internet-Aware Instruments
Increased Availability of Compute Resources
Clusters, supercomputers, storage, visualization devices
Advances in Application Concepts
Computational science: simulation and modeling
Collaborative environments ® large and varied teams
Grids Today
Moving towards production; Focus on middleware

X-Ray Crystallography
Objective: Provide a 3-D mapping of the atoms in a crystal.
Procedure:
Isolate a single crystal.
Perform the X-Ray diffraction experiment.
Determine molecular structure that agrees with diffration data.

X-Ray Data & Corresponding Molecular Structure

Shake-and-Bake Method:
  Dual-Space Refinement

Grid-Based SnB
Objectives
Install Grid-Enabled Version of SnB
Job Submission and Monitoring over Internet
SnB Output Stored in Database
SnB Output Mined through Internet-Based Integrated Querying Tool
Serve as Template for Chem-Grid & Bio-Grid
Experience with Globus and Related Tools

Proof of Concept
Combine CCR’s Heterogeneous Compute Platforms into a Grid
 Client/Server Configurations
 Rapid Prototype 4Q02 (not Globus)
 Develop a user interface to monitor system
 Dynamic HTML Grid Interface
 Key Features for Proof of Concept
 Load Balancing
 Fault Tolerance
 Result and Grid Statistics

Client/Server Configuration

Internet Grid Console
 Dynamic HTML Grid Status
 Grid Server Information
 Date/Completion Time
 Parallel Run Time/Serial Run Time/Speedup
 Trial Result Rate (Trial/Minute)
 Shows Configured Platform Information Dynamically
 Platform – Type/Name/Picture
 Status – Idle/Working/Offline
 Resources – Nodes/Total Process/Available Process/Running Process
 Shows Job Status Dynamically
 Trails – Total Number/Amount Processed
 Platform Server State – Block Queue/Float/Race
 Result Figure of Merit Histogram

Grid Server Console (Vancomycin)

Status Report
Grid Portal
Access control lists, security groups
User attributes, history, proxies
Managed through MySQL database
Distributed data grid
Globus
Vers 2.2.4 installed and in production
Metacomputing Directory Services (MDS) stored in MySQL
Eliminates need for LDAP
Condor and Condor-G
Used for resource management and grid job submissions

Slide 35

ECCE “Grid” at CCR
Import Scientific Information
Application independent input
ECCE automatically formats for target application (Gaussian98, NWChem)
Computing at CCR
881 available CPUs (>2.5TFlops)
(Xeon, P3, Power3, R12K)
Uniform access to all platforms via ECCE “job launcher”
Chemical Analysis
Full complement of visual tools for understanding data/publication quality graphics
Computational Chemistry
Relativistic effects/Heavy elements
Algorithm development
Theoretical physical chemistry
Structural/Systems Biology
Protein structure
Enzyme catalysis
Chemical Engineering
Condensed phases/Mixed phase predictions
Catalysis
Geology, Pharmacology, Medical School

Slide 37

BioGrids
EUROGRID BioGRID
Asia Pacific BioGRID
NC BioGrid
Bioinformatics Research Network
Osaka University Biogrid
Indiana University BioArchive BioGrid

Contact Information
miller@buffalo.edu
www.ccr.buffalo.edu

Acknowledgments
Mark Green
Steve Gallo
Jason Rappleye
Jeff Tilson
Martins Innus
Betty Capaldi
Bruce Holm
Janet Penksa
George DeTitta
Herb Hauptman
Charles Weeks
Steve Potter
Rohit Bakshi
Philip Glick