Software

Visit our Codeberg.org repository to see latest software development activities!

SMARTEn Framework for mobile DNA analytics. This software suite aims to provide new ways to analyze metagenomic DNA using fully mobile setups. It is foundation for the Coriolis metagenomic classifier.

SCoOL A simple programming model designed to facilitate and accelerate the search space exploration phase of the optimization processes.

SABNAtk Toolkit for fast counting over categorical data. This toolkit can be used to power any application that involves counting of data configurations, e.g., to estimate likelihoods, evaluate model scoring functions, etc. Currently, SABNAtk is the workhorse supporting SABNA.

SABNA Scalable Accelerated Bayesian Network Analytics. This very actively developed software toolkit provides a set of sequential and parallel tools for structure learning of Bayesian networks. It is designed to provide exact solutions on large-scale data in acceptable time limits.

APSPark Efficient and scalable All-Pairs Shortest-Path Solver for Apache Spark. On a modest Spark cluster (e.g., 1024 Intel Xeon cores), the solver can handle arbitrary undirected graphs with over 200,000 vertices. This work has been extended by Mohammad Javanmard, Zafar Ahmad and colleagues into DPSPark to cover a broader spectrum of dynamic programming algorithms.

IsomapSpark is a tool to efficiently learn manifolds from large-scale high-dimensional data. The method is based on Isomap spectral dimensionality reduction and is implemented entirely in Apache Spark.

ELaSTIC is a software suite for a rapid identification and clustering of similar sequences from large-scale biological sequence collections. At its core is an efficient MinHash-based strategy to detect similar sequence pairs without aligning all sequences against each other. It is designed to work with data sets consisting of millions of DNA/RNA or amino acid strings, using various alignment criteria.