Visit our GitLab repository to see latest software development activities!
- SABNAtk Toolkit for fast counting over categorical data. This toolkit can be used to power any application that involves counting of data configurations, e.g., to estimate likelihoods, evaluate model scoring functions, etc. Currently, SABNAtk is the workhorse supporting SABNA.
- SABNA Scalable Accelerated Bayesian Network Analytics. This very actively developed software toolkit provides a set of sequential and parallel tools for structure learning of Bayesian networks. It is designed to provide exact solutions on large-scale data in acceptable time limits.
- APSPark Efficient and scalable All-Pairs Shortest-Path Solver for Apache Spark. On a modest Spark cluster (e.g., 1024 Intel Xeon cores), the solver can handle arbitrary undirected graphs with over 200,000 vertices. This work has been extended by Mohammad Javanmard, Zafar Ahmad and colleagues into DPSPark to cover a broader spectrum of dynamic programming algorithms.
- IsomapSpark is a tool to efficiently learn manifolds from large-scale high-dimensional data. The method is based on Isomap spectral dimensionality reduction and is implemented entirely in Apache Spark.
- ELaSTIC is a software suite for a rapid identification and clustering of similar sequences from large-scale biological sequence collections. At its core is an efficient MinHash-based strategy to detect similar sequence pairs without aligning all sequences against each other. It is designed to work with data sets consisting of millions of DNA/RNA or amino acid strings, using various alignment criteria.