CSE 577: Processing of Strings and Sequences

Course Information

Instructor

Dr. Jaroslaw Zola

Department of Computer Science and Engineering Department of Biomedical Informatics

Email: jzola@buffalo.edu Web: http://www.jzola.org/

For all email communication, please make sure to add prefix [PSS] to mail subject.

Course Description

This course is intended for students interested in learning efficient techniques for processing and analyzing large text collections, such as large-scale system logs, massive text corpora, medical records, or databases of DNA and protein sequences. The main focus is on fast algorithms and data structures for strings and sequences, including pattern matching, pairwise comparison, indexing and searching, as well as probabilistic methods, like fingerprinting and hashing. The theoretical component is complemented by practical considerations regarding efficient implementations of the discussed algorithms, and their applications in the real-world systems. The example applications include tools like UNIX grep, frameworks for plagiarism detection, as well as tools driving computational biology (e.g., BLAST, read mappers, DNA assemblers, etc.). The course has also a programming component, in which students implement in their language of choice small but fully functional text processing applications.

This course is the Software and Information Systems focus area course at CSE.

Course Organization

The course consists of a series of lectures covering multiple algorithms on strings and sequences, including their design, analysis and real-world applications. Lectures are complemented with a programming assignments exposing practical aspects of the covered material. Below is a list of topics usually covered in the course:

  1. Exact pattern matching: Knuth-Morris-Pratt, Boyer-Moore and Aho-Corasick algorithms
  2. Suffix Trees: construction, querying, applications
  3. Suffix Arrays and LCP arrays: construction, querying, applications
  4. BWT and FM-Index: construction, querying, applications
  5. Succinct data structures: bit vector, wavelet trees
  6. Winnowing, fingerprinting and locality sensitive hashing for text processing
  7. Inexact matching and pairwise sequence comparison: Smith-Waterman and Needleman-Wunsch algorithms

Course Prerequisites

The course has no specific prerequisites for CSE graduate students. However, the course requires some good experience in synthesis and analysis of algorithms, at least at the level of “CSE250: Data Structures and Their Algorithms.” The course has a programming component, hence an ability to write working and correct code is must have. The course is programming language oblivious.

Program Outcomes

Upon completion of this course you will:

Course Requirements

The course has three requirements:

  1. Midterm exam testing your understanding of the most basic string algorithms and the ability to reason about their performance and applicability.
  2. Programming assignments exposing you to the practical aspects of the covered material. Each assignment will be a mini-project implementing a small text processing application (e.g., grep, etc.).
  3. Final exam testing your overall understanding of the material.

Grading Policy

The final grade will be weighted average: 20% midterm exam, 30% final exam, 50% programming assignments. The number-to-letter grade mapping will be done as indicated in the table below.

Score Grade Points
95-100 A 4.0
90-94 A- 3.67
80-89 B+ 3.33
70-79 B 3.0
60-69 B- 2.67
55-59 C+ 2.33
50-54 C 2.00
45-49 C- 1.67
40-44 D 1
0-39 F 0.0

In general, no incomplete grades (“IU” or “I”) will be given. However, in special circumstances that are truly beyond your control and justify incomplete grade, we will follow the university policy on incomplete grades, available here.

Course Materials

This course does not have a required textbook. However, the following book is highly recommended, as the course is roughly based on its content:

  1. D. Gusfield, “Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology,” Cambridge University Press, 1997.

Additional readings (e.g., papers, tutorials, etc.) will be referenced thought out the course as needed.

Academic Integrity

Academic integrity is a fundamental university value. Through the honest completion of academic work, students sustain the integrity of the university and of themselves while facilitating the university’s imperative for the transmission of knowledge and culture based upon the generation of new and innovative ideas.

You must be familiar with the university and departmental policies on academic integrity!!! The university policies for graduate students are available here. The CSE policies are available from the CSE web page.

Any violation of these policies, including but not limited to cheating on any course deliverable (e.g., homework project, exam, etc.), will result in automatic failure of the course. There will be no leniency! If you decide to use a code from some external source, e.g., an open source project, you must include a proper and clearly visible attribution in your product (it is a good idea to contact your instructor to check if the code you plan to use is admissible).

Accessibility Resources

If you have any disability which requires reasonable accommodations to enable you to participate in this course, please contact the Office of Accessibility Resources in 60 Capen Hall, 716-645-2608 and also the instructor of this course during the first week of class. The office will provide you with information and review appropriate arrangements for reasonable accommodations, which can be found on the web at: http://www.buffalo.edu/studentlife/who-we-are/departments/accessibility.html.

University Support Services

As a student you may experience a range of issues that can cause barriers to learning or reduce your ability to participate in daily activities. These might include strained relationships, anxiety, high levels of stress, alcohol/drug problems, feeling down, health concerns, or unwanted sexual experiences. Counseling, Health Services and Health Promotion are here to help with these or other issues you may experience. You learn can more about these programs and services by contacting:

Counseling Services 120 Richmond Quad (North Campus), 716-645-2720 202 Michael Hall (South Campus), 716-829-5800 https://www.buffalo.edu/studentlife/who-we-are/departments/counseling.html

Health Services Michael Hall (South Campus), 716-829-3316 https://www.buffalo.edu/studentlife/who-we-are/departments/health.html

Office of Health Promotion 114 Student Union (North Campus), 716-645-2837 https://www.buffalo.edu/studentlife/who-we-are/departments/health-promotion.html.

Sexual Violence

UB is committed to providing a safe learning environment free of all forms of discrimination and sexual harassment, including sexual assault, domestic and dating violence and stalking. If you have experienced gender-based violence (intimate partner violence, attempted or completed sexual assault, harassment, coercion, stalking, etc.), UB has resources to help. This includes academic accommodations, health and counseling services, housing accommodations, helping with legal protective orders, and assistance with reporting the incident to police or other UB officials if you so choose. Please contact UB’s Title IX Coordinator at 716-645-2266 for more information. For confidential assistance, you may also contact a Crisis Services Campus Advocate at 716-796-4399.

Please be aware UB faculty are mandated to report violence or harassment on the basis of sex or gender. This means that if you tell me about a situation, I will need to report it to the Office of Equity, Diversity and Inclusion. You will still have options about how the situation will be handled, including whether or not you wish to pursue a formal complaint. Please know that if you do not wish to have UB proceed with an investigation, your request will be honored unless UB’s failure to act does not adequately mitigate the risk of harm to you or other members of the university community. You also have the option of speaking with trained counselors who can maintain complete confidentiality. UB’s Options for Confidentially Disclosing Sexual Violence provides a full explanation of the resources available, as well as contact information. You may call UB’s Office of Equity, Diversity and Inclusion at 716-645-2266 for more information, and you have the option of calling that office anonymously if you would prefer not to disclose your identity.

Copyright 2018-2021 Jaroslaw Zola jzola@buffalo.edu