Tutorials, Sep. 20, 2014:

Integrated Analysis of Next-gen sequencing data analysis using variant tools

Bo Peng

Assistant professor, Department of bioinformatics and computational biology, The University of Texas, MD Anderson Cancer Center

Suzanne Leal

Professor, Department of Molecular and Human Genetics, Baylor College of Medicine.

Tutorial Website: http://varianttools.sourceforge.net/Tutorial/ACM-BCB2014

Abstract: Calling, annotating, filtering, and analyzing millions of genetic variants from whole-genome and whole-exome studies can be difficult due to the availability of a wide array of data formats, tools and annotation sources, as well as the sheer size of the data files. A big trunk of a bioinformatician's time can be wasted on writing and maintaining scripts to convert data between different file formats, handle annotations from different sources, and connect inputs and outputs of various tools to create data processing pipelines. variant tools is a flexible data analysis toolset that provides a powerful command line interface to import and manipulate genetic variants and genotypes, to annotate variants using a large number of annotation databases, and to locate disease predisposing variants using more than 20 rare-variant association tests implemented in an association analysis framework called variant association tools (VAT). Using a sample whole exome sequencing project, this tutorial demonstrates how to perform quality control, annotation, variant selection, and association analysis for next-gen sequencing studies using variant tools and VAT.

Intended Audience:

Researchers who have used or plan to use illumina or other next-gen sequencing platforms to identify genetic variants associated with the phenotype of interest (e.g. cancer) would be interested in learning how to use variant tool to analyze their data. Researchers who do not analyze such data personally might also be interested in learning the pipelines to analyze such data (e.g. steps for quality control and association analysis).

Brief biography of the presenters:

Dr. Bo Peng is an assistant professor at the Department of Bioinformatics and Computational Biology, the University of Texas, MD Anderson Cancer Center. With a background in computer science and biostatistics, he has worked on various topics in genetic epidemiological studies of complex diseases using genome-wide association, and currently whole-genome and whole-exome sequencing approaches. Dr. Peng is the author of one of the top population genetics simulation software simuPOP and the leading author of a related book "Forward-time population genetics simulations, methods, implementation, and applications" (Wiley-Blackwell, ASIN: B0072LWP12). He organized two workshops on the applications of simuPOP, one at the University of Alabama at Birmingham, another at the Rice University, Houston. Dr. Peng is the core developer of variant tools and has extensive experience in analyzing whole genome and whole exome sequencing projects using this tool.

Dr. Suzanne Leal's research interests focus on the mapping of complex and Mendelian traits and understanding the interactions between genes and between genes and the environment. Her recent work on the methods to analyze rare variants has lead to the development of the Combined Multivariate and Collapsing (CMC) and the Kernel Based Adaptive Cluster (KBAC) methods to test for rare variant associations with Complex Traits. Dr. Leal's group implemented these and other commonly used rare variants association tests in Variant Association Tools, which is an extension to Variant Tools. Dr. Leal organizes and also teaches at annual gene mapping courses at the Rockefeller University (New York, NY, USA), Max Delbru?ck Center (Berlin, Germany) and Helmholtz Institute (Munich, Germany). Recently She has taught statistical genetics courses at Beijing University (Beijing, Peoples Republic of China), European School of Genetic Medicine (Bologna, Italy), University of Helsinki (Helsinki, Finland) and University of Oslo (Oslo, Norway).

Informatics approaches to Evidence-Based Medicine, with emphasis on Systematic Reviews


Aaron M. Cohen, MD, MS, Oregon Health & Science University
Neil R. Smalheiser, MD, PhD, University of Illinois at Chicago

Tutorial Website:


National policy, insurance reimbursements, and funding agencies increasingly focus on the need for evidence-based recommendations to support clinical guidelines and therapies. Systematic reviews and meta-analyses play a central role in informing evidence-based medicine practice and policy. Standardized methods for comprehensive collection of relevant evidence, and careful filtering for quality are applied by experts to assess the overall state of knowledge about a medical question. These questions may address, for example, confidence in the relative and absolute efficacy and safety of treatments for a specific disease. This process requires a large investment of time and manual team effort. Dr. Cohen will focus in detail on the process of assembling evidence and writing and updating systematic reviews. He will identify steps which are particularly cumbersome and inefficient, which are good candidates for being assisted via automated informatics models and tools. He will discuss recent approaches to improving the systematic review process, ranging from machine learning tools to streamline effort, to alternative "rapid reviews" strategies, to data mining of community medical data.Dr. Smalheiser will describe a pipeline of three machine learning-based tools designed to re-engineer the literature retrieval and triage steps: a metasearch engine for collecting relevant articles with high recall, a publication type tagger to identify randomized controlled trial articles with high accuracy, and a tool to identify distinct articles based on the same underlying trial.

Intended Audience: The tutorial is directed primarily at computer scientists, who have little background in medical informatics, but who are interested in identifying research opportunities in evidence-based medicine, and particularly in the field of systematic reviews. We expect that few, if any, attendees will be experts or active practitioners in evidence-based medicine. We believe that 2 hours is sufficient to give an in-depth overview, to summarize some current informatics projects in this area, and allow some time for discussion and questions.

Brief biography of the presenters

Aaron M. Cohen is Associate Professor in OHSU's Department of Medical Informatics and Clinical Epidemiology. His research interests focus on the development and application of text-mining techniques and tools for biomedical researchers. He applies information retrieval and machine learning techniques to scientific literature and databases to help researchers to more effectively use and explore the ever-expanding biomedical literature. Aaron received an M.D. from the University of Michigan, and holds a master's degree in biomedical informatics from OHSU. Current projects include the use of automated classification systems in the process of creating systematic drug reviews, and the development and evaluation of computer assisted biomedical question answering systems. cohenaa@ohsu.edu, 503-494-0046.

Neil R. Smalheiser is Associate Professor in Psychiatry at UIC. He has almost 30 years of experience pursuing basic wet-lab research in neuroscience, most recently studying synaptic plasticity and the genomics of small RNAs. He has also directed multi-disciplinary, multi-institutional consortia dedicated totext mining and bioinformatics research, which have created new theoretical models, databases, open source software, and web-based services. Regardless of the subject matter, one common thread in his research is to link and synthesize different datasets, approaches and apparently disparate scientific problems to form new concepts and paradigms. Dr. Smalheiser has organized panels and symposia at numerous international conferences and has participated in earlier ACM health informatics conferences. neils@uic.edu, 312-413-4581.

Robot Motion Planning Methods for Modeling Structures and Motions of Biomolecules


Amarda Shehu, Assistant Professor in Department of Computer Science, Affiliated Appointments in Department of Bioengineering and School of Systems Biology, George Mason University
Nurit Haspel, Assistant Professor in Department of Computer Science, University of Massachusetts at Boston,

Tutorial Website: http://www.cs.gmu.edu/~ashehu/Tutorials/Shehu-Haspel-BCB14-Tutorial/


In the last two decades, great progress has been made in molecular modeling through robotics-inspired computational treatments of biological molecules. Deep mechanistic analogies between articulated robots and biomolecules have allowed robotics researchers to bring forth methods originally developed to address the robot motion planning problem in robotics to address and elucidate the relationship between macromolecular structure, dynamics, and function in computational structural biology. Tight coupling of approaches based on robot motion planning with computational physics and statistical mechanics have resulted in powerful methods capable of elucidating protein-ligand binding, order of secondary structure formation in protein folding, kinetic and thermodynamic properties of folding and equilibrium fluctuations in proteins and RNA, loop motions in proteins, small-scale and large-scale motions in multimodal proteins transitioning between different stable structures, and more. The objective of this tutorial is to introduce the broad community of researchers and students at ACM-BCB to robotics-inspired treatments and methodologies for modeling structures and motions in biomolecules. A comprehensive review of of the current state of the art, ranging from the probabilistic roadmap approach to tree-based approaches, will be accompanied with specific detailed highlights and software demonstrations of powerful and recent representative robotics-inspired methods for peptides, proteins, and RNA.

Intended Audience: The objective of this tutorial is to introduce the broad bioinformatics and computational biology community of students and researchers that attend ACM-BCB to robotics-inspired treatments and methodologies put forth by robotics researchers for the purpose of understanding and elucidating the role of structure and dynamics in the biological function of key macromolecules, such as proteins and RNA.

Brief biography of the presenters

Shehu has unique expertise in tight coupling of robotics-inspired probabilistic search and optimization with computational protein biophysics, and has made significant contributions to modeling native structures, equilibrium fluctuations and conformational ensembles, loop motions, large-scale motions connecting diverse functional states, and assembly of proteins and peptides since 2005. Her work has resulted in 56 peer-reviewed publications (26 to journals, 28 to conferences and workshops, and 2 book chapters) from 2006 to 2014. In particular, the subject material of the proposed tutorial will draw from a recent chapter written by Shehu on "Probabilistic Search and Optimization for Protein Energy Landscapes" for CRC Press, which summarizes robotics-inspired and evolutionary-inspired approaches for capturing structure and dynamics in proteins. amarda@gmu.edu, 703-993-4135.

Haspel has substantial expertise in robotics-inspired methods for modeling domain motions and self-assembly in proteins, as well as characterizing intermediate states in protein conformational spaces. She has made significant contribution to modeling protein-protein interactions using a combination of algorithmic techniques and biophysics-based atomic level simulations. Her work has resulted in 41 peer-reviewed publications (24 to journals, 12 to conferences and workshops, and 5 book chapters) from 2003 to 2013. Both have extensive experience on teaching and giving presentations of seminars, invited talks, and conference papers. In particular, Shehu and Haspel are assistant professors that regularly teach both robotics, bioinformatics, and computational biology courses in their departments.nurit.haspel@umb.edu, 617-287-6414

Network approaches in aging research with focus on biological network alignment


Dr. Tijana Milenkovic , Assistant Professor, Department of Computer Science and Engineering University of Notre Dame
Fazle Elahi Faisal, Ph.D. Student , Department of Computer Science and Engineering University of Notre Dame

Tutorial Website: http://www.cse.nd.edu/~cone/acmbcb2014/nrabna.html


Genes (proteins) interact with each other to keep us alive. And this is exactly what biological networks (BNs) model. Therefore, BN research is promising to revolutionize our biological understanding. Because susceptibility to diseases increases with age, studying human aging is important. But studying human aging experimentally is hard. Hence, aging-related knowledge needs to be transferred from model species. This transfer has traditionally been carried out by genomic sequence alignment. But because sequence data and BN data can give complementary insights, sequence alignment alone can limit the knowledge transfer. Thus, BN alignment can be used to transfer aging-related knowledge between topologically and functionally conserved network regions of different species. Gene expression research has also been indispensable for investigating aging, but it typically ignores genes' interconnectivities. Thus, analyzing genes' topologies in BNs could contribute to our understanding of aging. However, current methods for analyzing systems-level BNs deal with their static representations, although cells are dynamic. Because of this, and because different data can give complementary biological insights, current static BNs can be integrated with aging-related expression data to form dynamic, age-specific BNs. Then, cellular changes with age can be studied from such BNs. This tutorial will review state-of-the-art BN research of aging.

Intended Audience: The tutorial will be designed at the introductory level, giving enough background on the proposed topics. It is intended to bring together scientists at all stages of their career with interests or expertise in computational or mathematical analyses of BNs, or in applications of the methods to practical problems in biology, such as aging, evolution, disease, or therapeutics.

Brief biography of the presenters

Tijana Milenkovic and Fazle Faisal work in Complex Networks and Computational Biology on developing mathematical and computational methods for efficient extraction of functional information from the topology of large, noisy, heterogeneous, and dynamic real-world networks, focusing mostly on molecular, physiological, and social networks. Milenkovic's research has been funded by NSF and NIH. Milenkovic and Faisal have published several articles on topics relevant to the tutorial in key journals, e.g., Science, PNAS, Bioinformatics, Journal of the Royal Society Interface, BMC Bioinformatics, or BMC Systems Biology, as well as conferences, e.g., European Conference on Computational Biology (ECCB) and the ACM Conference on Bioinformatics, Computational Biology, and Biomedicine (ACM BCB).

Computational Prediction of Protein-Protein Interfaces with Emphasis on Partner-Specific Protein-Protein Interactions


Vasant G Honavar, Professor and Edward Frymoyer Chair of Information Sciences and Technology, Professor of Bioinformatics and Genomics, Professor of Neuroscience, Pennsylvania State University

Li Xue, Postdoctoral Research Associate, The Huck Institutes of the Life Sciences, Pennsylvania State University.

Tutorial Website: TBA


Protein-protein interactions play a central role in formation of complexes and pathways that carry out virtually all major cellular processes. Both the distortion of protein interfaces in obligate complexes and aberrant recognition in transient complexes can lead to disease. Because of the difficulties and cost associated with experimental determination of protein complexes, there is an urgent need for reliable computational methods for predicting protein-protein interfaces from sequence and/or structure of a protein, and when available, its putative binding partner. Although most protein-protein interactions, in particular, transient interactions, are partner-specific, most existing protein interface predictors are not. Our group has recently shown that in partner-specific protein-protein interface prediction reveal that in the case of complexes resulting from transient protein-protein interactions, interfaces are highly conserved across homologous complexes and exploited this finding to design reliable partner-specific protein-protein interface predictors. We will briefly review the current state of the art in computational methods for protein-protein interface prediction. We will introduce both sequence homology based as well as machine learning based partner-specific protein-protein interface predictors. We also will also discuss the challenges of evaluating the performance of such predictors using commonly used metrics such as AUC and offer some alternatives.

Intended Audience: This tutorial is broadly targeted to Computer Scientists, Computational Biologists and Bioinformaticists interested in developing computational methods for analysis and prediction of protein interactions and Interfaces, Structural Biologists, Immunologists, etc. interested in using such methods to understand sequence and structural correlates of protein-protein interactions, uncover the molecular basis of diseases, and to design novel therapies and drugs.

Brief biography of the presenters

Dr. Honavar is a professor and the Edward Frymoyer Chair of Information Sciences and Technology, Professor of Geonomics and Bioinofmatics and Professor of Neuroscience at Pennsylvania State University. Dr. Vasant Honavar received his Ph.D. in Computer Science and Cognitive Science in 1990 from the University of Wisconsin Madison, specializing in Artificial Intelligence. During 1990-2013, Dr. Honavar was at Iowa State University where he was on the faculty of Computer Science and of Bioinformatics and Computational Biology (which he co-founded in 1999 and provided leadership as chair during 2003-2005). During 2010-2013, Honavar served as a Program Director in the Division of Information and Intelligent Systems at the National Science Foundation where, among other things, he led the Big Data Program. Honavar's current research and teaching interests include Artificial Intelligence, Machine Learning, Bioinformatics, Big Data Analytics, Computational Molecular Biology, Data Mining, Discovery Informatics, Information Integration, Knowledge Representation and Inference, Semantic Technologies, and Health Informatics. Honavar has led several NSF, NIH, and USDA funded research projects that have resulted in foundational research contributions (documented in over 250 peer-reviewed publications with over 8000 citations) in: Scalable approaches to building predictive models from large, distributed, semantically disparate data (big data); Constructing predictive models from sequence, image, text, multi-relational, graph-structured data; Eliciting causal information from multiple sources of observational and experimental data; Selective sharing of knowledge across disparate knowledge bases; Representing and reasoning about preferences; Composing complex services from components; and Applications in bioinformatics and computational biology (especially analysis and prediction of protein-protein, protein-DNA, and protein-RNA interactions and interfaces, B-cell and T-cell epitopes, post-translational modifications), Social network Informatics, Health Informatics Energy Informatics, Security Informatics, and related areas. Honavar has developed and taught courses in Artificial Intelligence, Machine Learning, Bioinformatics, Computational Structural Biology, Computational Functional Genomics and Systems Biology. Honavar has supervised 30 Ph.D. students, 6 postdoctoral fellows, 25 Masters students and 16 undergraduate research students, many of which have started professorship. He is an Associate Editor of IEEE/ACM Transactions on Bioinformatics and Computational Biology, and a member of the editorial board of several journals including the Springer Open Journal of Big Data.

Dr. Xue is a postdoctoral research fellow in the College of Information Sciences and Technology and the Huck Institutes of the Life Sciences at Pennsylvania State University. She obtained her Ph.D. degree in Bioinformatics and Computational Biology with a minor in Statistics at Iowa State University, and her Masters' degree in Machine Learning and Artificial Intelligence at Shanghai Jiaotong University. Dr. Xue and her collaborators have pioneered the partner-specific method to analyze the conservation of locations of the interacting sites in protein-protein complexes, and have successfully applied the partner-specific conserved interfacial residues in reliably singling out near-native protein-protein interaction models generated by docking programs. She has published several research articles on these topics.