Visit our GitLab repository to see latest software development activities!
- SABNAtk Toolkit for fast counting over categorical data. This toolkit can be used to power any application that involves counting of data configurations, e.g., to estimate likelihoods, evaluate model scoring functions, etc. Currently, SABNAtk is the workhorse supporting SABNA.
- SABNA Scalable Accelerated Bayesian Network Analytics. This very actively developed software toolkit provides a set of sequential and parallel tools for structure learning of Bayesian networks. It is designed to provide exact solutions on large-scale data in acceptable time limits.
- APSPark Efficient and scalable All-Pairs Shortest-Path Solver for Apache Spark. On a modest Spark cluster (e.g., 1024 Intel Xeon cores), the solver can handle arbitrary undirected graphs with over 200,000 vertices.
- ELaSTIC is a software suite for a rapid identification and clustering of similar sequences from large-scale biological sequence collections. At its core is an efficient MinHash-based strategy to detect similar sequence pairs without aligning all sequences against each other. It is designed to work with data sets consisting of millions of DNA/RNA or amino acid strings, using various alignment criteria.