Projects

SMARTEn

CNS Core: Small: Rethinking the Software Architecture for Mobile DNA Analysis

To learn more, please visit the official project web page: https://cse.buffalo.edu/~jzola/smarten/.

CAREER

CAREER: Scalable Software and Algorithmic Infrastructure for Probabilistic Graphical Modeling

Probabilistic Graphical Models (PGMs) remain key machine learning technique, and are especially popular in biomedical domains. This project responds to the recognized and growing demand for scalable PGM learning methods that could capitalize on parallel architectures such as large clusters of multi-core processors. The research focus is on exact structure learning of Bayesian networks and Markov random fields, in the context of biomedical data analytics.

The project is based on the two main components: a new high performance abstraction for managing data in machine learning applications, including memory efficient strategies for answering counting queries on multi-core processors (see SABNAtk), and a new programming model for distributed memory systems to facilitate efficient exploration of large-scale combinatorial search spaces (see project SCoOL). These abstractions are used to realize a set of new parallel, exact algorithms for structure search, and the related problems

The research activities are tightly coupled with multiple educational efforts, spanning development of an interdisciplinary course for medical professionals to train them in the use of advanced cyberinfrastructure, engagement of undergraduate students and underrepresented minorities in research, and outreach to middle and high school students to attract them to STEM.

Deliverables

SABNA Scalable Accelerated Bayesian Network Analytics. This very actively developed software toolkit provides a set of sequential and parallel tools for structure learning of Bayesian networks. It is designed to provide exact solutions on large-scale data in acceptable time limits.
SABNAtk Toolkit for fast counting over categorical data. This toolkit can be used to power any application that involves counting of data configurations, e.g., to estimate likelihoods, evaluate model scoring functions, etc. Currently, SABNAtk is the workhorse supporting SABNA.

MEADS

OAC Core: Small: Scalable Non-linear Dimensionality Reduction Methods to Accelerate Scientific Discovery

MEADS - Manifolds for Extreme-scale Applied Data Science - is the joint project with the University at Buffalo Data Science (UBDS) group of Dr. Varun Chandola (lead PI), and research group of Dr. Olga Wodo.

This multidisciplinary research project aims at developing scalable end-to-end non-linear dimensionality reduction solutions to accurately learn the dynamic behavior of complex systems (e.g., described by PDEs). The project is centered around the following topics:

New realizations of non-linear spectral dimensionality reduction methods to learn manifolds in distributed memory environments such as MPI and Map/Reduce clusters of multi-core processors.
Design of new algorithmic strategies to manage data influx while maintaining crucial properties of the discovered sub-manifolds.
Development of end-to-end solutions for cutting-edge applications in advanced manufacturing.

To learn more, please visit the official project web page: https://ubdsgroup.github.io/meads/.

Deliverables

APSPark is efficient and scalable All-Pairs Shortest-Path Solver for Apache Spark. On a modest Spark cluster (e.g., 1024 Intel Xeon cores), the solver can handle arbitrary undirected graphs with over 200,000 vertices.
IsomapSpark is a tool to efficiently learn manifolds from large-scale high-dimensional data. The method is based on Isomap spectral dimensionality reduction and is implemented entirely in Apache Spark.

MiDAS

Collaborative Research: QRM: Microstructure Manifold Analysis Using Hierarchical Set of Morphological, Topological, and Process Descriptors