Instructor:     Jason Corso (UBIT: jcorso)
 
Course Webpage:     http://www.cse.buffalo.edu/~jcorso/t/CSE555 or 
http://www.cse.buffalo.edu/~jcorso/t/CSE455 but this is just a link to the first one.
 
Syllabus:     http://www.cse.buffalo.edu/~jcorso/t/CSE555/files/syllabus.pdf.
 
 
 Meeting Times:     TR 11:00-12:20
 
 
 Location:     Knox 4
 
 
 Recitation Times:W 9-10 (Clemen 
 107), M 10-11 (Baldy 115)
 
 
Teaching Assistants:     
 
 Shujie Liu (ubit: sl252) and Suxin Guo (ubit: suxinguo)
 
Office Hours:
      
      - Instructor: R 12:20-2:30 (Davis 332)
      
 
      - TA: Shujie: W 4-5 and F 9-11 in Davis 302 area.
 
      - TA: Suxin: M 1-3 and R 3:30-5:30 in Davis 302 area.
 
    
 Final Exam:     Thursday 3 May 2012, 11:45–2:45 in Knox 04.
 
A Note On Contacting The Instructor:     You are encouraged to contact the instructor or TA via the newsgroup rather than email. If you choose
 email, then you must 1) send the email from a buffalo.edu address and 2) include [CSE555] at the beginning of the command-line (even if you
 are in CSE455). Email that does not follow these conventions will not be read.
 
 
 
 
 
 
 
 
   -  May 1 -- Homework 3 Problem 1 Solutions (from TA).
   
 -  Apr. 25 -- New solution files available.  See below for 
   complete list.
   
 -  Apr. 24 -- Solution Files Available
        
   
 -  Apr. 24 -- Homework 3 deadline is extended until April 30 
   Midnight.
   
 -  Apr. 13 -- Fixed a small problem with prpy/lindisc.py that many 
   of you have noticed.  You can get the fix from the same location.
   
 -  Apr. 10 -- Homework 3 Posted
   
 -  Apr 10 -- Changed schedule for last couple of weeks and added 
   the readings for the second half of the semester that were missing 
   from this page.
   
 -  Mar. 20 -- Homework 2 
     Posted[src/dat zip]
   
 -  Mar. 7 -- Midterm Exam is Thursday during class time.  Exam is 
   closed book.  Simple calculators are permitted at the exam (no 
   cell-phones are permitted as calculators).
   
 -  Mar. 1 -- Midterms from 2009,
   2010 and
   2011 are available for your 
   reference.  However, please note three important changes:  (1) 455 
   was not offered in 2009 or 2010, (2) the order of topics may have 
   been different (you are responsible for everything we've covered in 
   the class and assigned readings, and (3) the midterm will have less 
   mathematical derivation and more working of actual problems to 
   ensure you've absorbed the material.
 
   
 -  Feb 28 -- Homework 1 is now due.  Homework 2 will be posted 
   this week.  Midterm is next Thursday.
   
 -  Feb 17 -- The example code is now supported 
     as a bzr repository on the cse server.
   
 -  Feb 17 -- Information on how to submit your 
     homework assignments is posted.
   
 -  Feb 14 -- Scanned notes for Linear Discriminants are also 
     posted.
   
 -  Feb 14 -- Updates to source code available, 
   including more datatools and linear discriminants.  http://www.cse.buffalo.edu/~jcorso/t/555code/data 
   Data is also available (to work with the example code).
 
   
 -  Jan 31 -- Homework 1 Posted (due 
     Feb. 27)
   
 -  Jan 26 -- Initial source code available.
   
 -  Jan 17 -- First Class.
 
 
 
 
 
 
 
 
 
 Slides are linked off of the week number on the left column.  These
 will be updated as the semester proceeds. 
 
 
 
   
 
 
 
 
 
 
 Code:Background
 
 Students are required to learn and use Python (i.e., SciPy, NumPy) in the course. All
 programming materials given in lecture and all programming aspects of the homeworks will be given in Python. A brief introduction to scientific
 Python will be given in the course, but it is the students’ responsibility to get up to speed. Additional python resources will be maintained at
 
 http://www.cse.buffalo.edu/~jcorso/t/CSE555/python_resources.html
 .
                                                                                                      
 
No work in Matlab, Java, C/C++, OCaml or other programming environment is allowed in this course.
 
To allow for a common Python environment, the course will officially rely on the Enthough Python Distribution (EPD)
 http://www.enthought.com/products/epd.php, which is easy to get, free, and includes the packages needed for our material. The
 course will use EPD version 7.2. Students are encouraged to install it on their own computers, and it is also installed on the CSE network
 (see https://wiki.cse.buffalo.edu/services/content/enthought-python-distribution for more
 information).
 
 
 
Code:Access
 
 The professor will make all of the source code discussed in class 
 available to the students.  In addition, some pieces of source code 
 will be provided as part of the homework assignments.
 
 The source code discussed in the class and the core package is 
 accessible to the students in three ways
 
   -  Via the web: http://www.cse.buffalo.edu/~jcorso/t/555code/
 
   
 -  On the departmental (student) Unix network: 
   /home/csefaculty/jcorso/555code.  You can copy the whole 
   directory with rsync:  rsync -Cavuz 
     USERNAME@nickelback.cse.buffalo.edu:/home/csefaculty/jcorso/555code 
   .
 
 
 -  The directory is actually a bzr repository to which you 
 should have read access.  So, you can just pull a copy of the 
 repository (you will not have privileges to commit) with  bzr checkout 
 bzr+ssh://USERNAME@nickelback.cse.buffalo.edu//home/csefaculty/jcorso/555code/code 
 
 This option is particularly of interest because the code will be 
 periodically updated throughout the semester and you will want to have 
 the most recent version.
 
 
 
 
  Note, the source code will be updated periodically throughout the 
   semester and you need to get the latest versions 
 
 Also, note that toy data is included at the above location as well.  
 It is not in the repository, however.
 
 
 
 
 We are trying to be as paperless as possible.  So, you will need to 
 submit your homeworks in electronic form using the CSE department's 
 submit scripts.
 
 For each homework assignment, you need to submit the writeup and the 
 source code (Python) including some README file with the code.  Note 
 that the programming questions have all been set up to use Dr. Corso's 
 skeleton and we will hence directly execute the code for grading.
 
 To actually submit them follow these steps, replace homework### with 
 the specific homework, such as homework1, homework2, etc.
 
 
   -  Login to a department student Unix machine, hadar, metallica, 
   nickelback, pollux, styx, timberlake (the code MUST work on the CSE 
   machines).
   
 -  Use "tar -cvf homework###.tar list-of-files-or-directories".
   
 -  Then, type "submit_cse555 homework###.tar" if you are in 
 CSE555 or type "submit_cse455 homework###.tar" if you are in CSE455.
 
 
 
 If you into any problems, you should tell us immediately so that we 
 can rectify them.  The timestamp on the submitted file is used as the 
 submission time and it cannot be late.
 
 
 
 
 
 
 
 
Main Course Material
 Course Catalog Description:     Foundations of pattern recognition algorithms and machines, including statistical and structural methods. Data
 structures for pattern representation, feature discovery and selection, classification vs. description, parametric and non-parametric classification,
 supervised and unsupervised learning, use of contextual evidence, clustering, recognition with strings, and small sample-size
 problems.
 
Prerequisites:     It is assumed the students have a working knowledge of calculus, linear algebra, and probability theory. It is also assumed the
 students have some experience programming in a scientific computing environment.
                                                                                                        
                                                                                                        
 
Course Goals:     After taking the course, the student should have a clear understanding of 1) the design and construction and a pattern recognition
 system and 2) the major approaches in statistical and syntactic pattern recognition. The student should also have some exposure to the theoretical
 issues involved in pattern recognition system design such as the curse of dimensionality. Finally, the student will have a clear working knowledge
 of implementing pattern recognition techniques and the scientific Python computing environment. These goals are evaluated through the course
 project, homeworks, and exams.
 
Textbooks:     The main (required) textbook for the course is
      
      - Duda, R.O., Hart, P.E., and Stork, D.G. Pattern Classification. Wiley-Interscience. 2nd Edition. 2001.
      
 
 The textbook has a website: http://www.rii.ricoh.com/~stork/DHS.html.
 
Recommended supplemental textbooks are
      
      - Bishop, C. M. Pattern Recognition and Machine Learning. Springer. 2007.
      
 
      - Marsland, S. Machine Learning: An Algorithmic Perspective. CRC Press. 2009. (Also uses Python.)
      
 
      - Theodoridis, S. and Koutroumbas, K. Pattern Recognition. Edition 4. Academic Press, 2008.
      
 
      - Russell, S. and Norvig, N. Artificial Intelligence: A Modern Approach. Prentice Hall Series in Artificial Intelligence. 2003.
      
 
      - Bishop, C. M. Neural Networks for Pattern Recognition. Oxford University Press. 1995.
      
 
      - Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning. Springer. 2001.
      
 
      - Koller, D. and Friedman, N. Probabilistic Graphical Models. MIT Press. 2009.
 
 
 
Course Work
 The course work this offering is different than the past few offerings from Prof. Corso. This year, there will be no project and rather four
 homeworks that each involve both theoretical and practical aspects of the materials; rather than cram the project to the end of the term, the same
 amount of work will be distributed throughout the term and allow for a more comprehensive coverage of the material in both theory and
 practice.
 
Homeworks:     There will be four homeworks, equally weighted. They will cover both theoretical and practical (implementation) aspects of the
 material. Students may collectively discuss the homework problems, but they must write them independently.
 
No sharing any of source code or written/typed materials is permitted. No stealing of any source code or written/typed materials off of the internet
 is permitted. No utilization of any third-party libraries, other than those explicitly mentioned in the assignment description, is permitted. Refer to
 the Academic Integrity statement at the end of the syllabus for more information; a zero tolerance policy on cheating will be adopted in this course.
 This means simply if you cheat once you will get an F.
 
Course Project:     There is no course project. See above.
 
Programming Environment: Scientific Python:     
 
 
   
Turning in Assignments: Paperless:     Students are required to use the departmental submit scripts to turn in assignments. No hardcopy
 assignments will be accepted for either the theoretical problems (scan if needed) or the implementations. More information will be given in the
 specific homework assignments.
 
Code will be run by the TAs on all programming aspects of the work.
 
 
Course Evaluation
 The following is a description of how students will be evaluated in the course. The instructor reserves the right to make minor adjustments as
 necessary.
 
A final percentage score will be calculated as a weighted average of the course work according to the following table:
      
      - Mid-Term Exam (30%)
      
 
      - Final Exam (30%)
      
 
      - Homeworks (10% each for 40%)
 
 Letter grades will be given in the range of F to A (with minuses and pluses). Mapping of raw percentage scores to letter grades will be based on the
 following rubric: Letter grade A is given for raw percentage scores of 87.5 and higher for 555 and 85 and higher for 455. Remaining letter scores
 are graded based on a clustering of the students output with each cluster mean mapped to a letter grade in decreasing order (essentially, this means
 graded on a curve); this is based on overall class performance.
 
Computer code will be executed during the grading on all homeworks using provided driver scripts on novel but similar data. If a program does
 not execute (this is Python and no compilation is required) without an error, then no points will be awarded for partial credit.
 Otherwise, half of the points are awarded for correctness of the output and another half are awarded based on correctness of the
 code.
 
Distinctions 455 and 555 grading:     455 and 555 will be graded on separate curves and the mapping to grade A is different.
 In addition, 455 students will be required to solve fewer problems on the exams, the specified amount will be determined per
 exam.
 
Late Work and Missed Exam Policy:     No late work will be accepted. Ample time will be given to complete both the homeworks and the
 project; use it wisely. Similarly, the date of the exams will be known far in advance. Do not miss the exam. No make-up exams will be given other
 than for those University approved reasons. This is a firm policy. Do not expect special treatment.
 
Regrading:     If you have a question about the grading of any piece of work, first consult with the teaching assistant who graded your work. If you
 cannot resolve your questions with the teaching assistant, you should consult with the instructor of the course.
 
Any questions about the grading of a piece of work must be raised within one week of the date that the work was returned by the teaching
 assistant or the instructor. In other words, if you do not pick up your work in a timely fashion, you may forfeit your right to question the grading
 of your work.
 
Incomplete (“I”) Grades:     Generally, incomplete (“I”) grades are not given. However, very rarely, circumstances truly beyond a student’s
 control prevents him or her from completing work in the course. In such cases the instructor can give a grade of “I.” The student will be given
 instructions and a deadline for completing the work, usually no more than 30 days past the end of the semester. University and department policy
 dictate that “I” grades can be given only if the following conditions are met:
                                                                                                        
                                                                                                        
      
      - An Incomplete will only be given for missing a small part of the course.
      
 
      - An Incomplete will only be given when the student misses work due to circumstances beyond his/her control.
      
 
      - An Incomplete will only be given when the student is passing the course except for the missed material.
      
 
      - An Incomplete is to be made up with the original course instructor within the time specied by the appropriate University regulation
      (see appropriate document above), and usually within the following semester.
      
 
      - An Incomplete will not be given to allow the student to informally retake the entire course, and have that grade count as the grade
      of the original course.
 
 Incompletes can not be given as a shelter from poor grades. It is your responsibility to make a timely resignation from the course if you are doing
 poorly for any reason. The last day to resign the course is Friday, March 30 2012.
 
 
Course Outline
 The following is the list of topics we will cover this semester. The selection of topics has been made to provide the student with both a fair
 sampling and an indepth, useful know-how of the big field of pattern recognition. This has required that we drop some topics completely (e.g.,
 Neural Networks) to allow for more indepth discussion of other topics (e.g., Dimension Reduction). As many topics as possible will be grounded
 with real-world problems and data, and they will be presented both in terms of the mathematical theory as well as the algorithmic and
 programming aspects.
 
A calendar will be maintained on the course website and updated as the semester proceeds. This outline may change to adapt to
 interest and progress (or lack thereof). The flow of topics is also a different this term than previous offerings by Prof. Corso; the
 changes are based on feedback received from students and are in the interest of optimizing the effectiveness and interest of the
 course.
 
      
      - Introduction to Pattern Recognition
      
 
      - Tree Classifiers                                                                                                                                       Getting our feet wet with real
      classifiers
          
          - Decision Trees: CART, C4.5, ID3.
          
 
          - Random Forests
 
       
      - Bayesian Decision Theory                                                                                                                                            Grounding our
      inquiry
      
 
      - Linear Discriminants                                                                                                             Discriminative Classifiers: the Decision
      Boundary
      
          
          - Separability
          
 
          - Perceptrons
                                                                                                        
                                                                                                        
          
 
          - Support Vector Machines
 
       
      - Parametric Techniques                                                                                       Generative Methods grounded in Bayesian Decision
      Theory
      
          
          - Maximum Likelihood Estimation
          
 
          - Bayesian Parameter Estimation
          
 
          - Sufficient Statistics
 
       
      - Non-Parametric Techniques
      
          
          - Kernel Density Estimators
          
 
          - Parzen Window
          
 
          - Nearest Neighbor Methods
 
       
      - Unsupervised Methods                                                                                                                          Exploring the Data for Latent
      Structure
      
          
          - Component Analysis and Dimension Reduction
              
              - The Curse of Dimensionality
              
 
              - Principal Component Analysis
              
 
              - Fisher Linear Discriminant
              
 
              - Locally Linear Embedding
 
           
          - Clustering
          
              
              - K-Means
              
 
              - Expectation Maximization
              
 
              - Mean Shift
 
           
       
      - Classifier Ensembles
          
          - Bagging
          
 
          - Boosting / AdaBoost
 
                                                                                                        
                                                                                                        
       
      - Graphical Models                                                                                  The Modern Language of Pattern Recognition and Machine
      Learning
      
          
          - Introductory ideas and relation back to earlier topics
          
 
          - Bayesian Networks
          
 
          - Sequential Models
          
              
              - State-Space Models
              
 
              - Hidden Markov Models
              
 
              - Dynamic Bayesian Networks
 
           
       
      - Algorithm Independent Topics                                                                               Theoretical Treatments in the Context of Learned
      Tools
      
          
          - No Free Lunch Theorem
          
 
          - Ugly Duckling Theorem
          
 
          - Bias-Variance Dilemma
          
 
          - Jacknife and Bootstrap Methods
 
       
      - Other Items Time Permitting
          
          - Syntactic Methods
          
 
          - Neural Networks
 
       
 
 
Additional Information
 Newsgroup:     There is a newsgroup, sunyab.cse.555, for this course. You must learn how to read news and subscribe to this newsgroup. You are
 expected to read the newsgroup on a daily basis. There will often be important material posted there, such as supplementary course notes,
 homework and sample exam questions, and occasionally late breaking news. You may post general course related articles to the newsgroup.
 Use discretion in posting articles related to homework assignments and the project: when in doubt, e-mail the TA or instructor
 first.
 
All 455 students should use the 555 newsgroup as well.
 
The news (nntp) server you need to connect to is news.buffalo.edu. Note that you must authenticate using your UBIT name and password to use
 this news server, and you must be connecting from a UB IP address (i.e. if you are not using a university machine, you need to use VPN). For
 further information on accessing the newsgroup, refer to http://ubit.buffalo.edu/newsgroups/index.php.
 
Similar Courses at This and Other Institutions:     (incomplete and in no important order)
                                                                                                        
                                                                                                        
      
 
 
General Notes
 If you don’t understand something covered in class, ask about it right away. The only silly question is the one which is not asked. If
 you get a poor mark on an assignment or exam, find out why right away. Don’t wait a month before asking. The instructor and
 teaching assistant are available to answer your questions. Don’t be afraid to ask questions, or to approach the instructor or TA
 in class, during office hours, through the newsgroup or through e-mail. This course is intended to be hard work, but it is also
 intended to be interesting and fun. We think pattern recognition is interesting and exciting, and we want to convince you of
 this.
 
 
Disabilities
 If you have a diagnosed disability (physical, learning, or psychological) that will make it difficult for you to carry out the course work as
 outlined, or that requires accommodations such as recruiting note-takers, readers, or extended time on exams or assignments,
 you must consult with the Office of Disability Services (25 Capen Hall, Tel: 645-2608, TTY: 645-2616, Fax: 645-3116,
 http://www.student-affairs.buffalo.edu/ods/). You must advise your instructor during the rst two weeks of the course so that
 we may review possible arrangements for reasonable accommodations.
 
 
Counseling Center
 Your attention is called to the Counseling Center (645-2720), 120 Richmond Quad. The Counseling Center staff are trained to help you deal with a
 wide range of issues, including how to study effectively and how to deal with exam-related stress. Services are free and condential. Their web site
 is http://www.student-affairs.buffalo.edu/shs/ccenter/.
 
 
Distractions In The Classroom - Behavioral Expectations
 The following is the text of a policy adopted by the Faculty Senate on 5/2/2000. You are expected to know and adhere to this
 policy.
 
      
                                                                                                        
                                                                                                        
      
 
 
OBSTRUCTION OR DISRUPTION IN THE CLASSROOM - POLICIES
 UNIVERSITY AT BUFFALO
 
      To prevent and respond to distracting behavior faculty should clarify standards for the conduct of class, either in the syllabus, or by
      referencing the expectations cited in the Student Conduct Regulations. Classroom “etiquette” expectations should
      include:
     
     - Attending classes and paying attention. Do not ask an instructor in class to go over material you missed by skipping a
     class or not concentrating.
     
 
     - Not coming to class late or leaving early. If you must enter a class late, do so quietly and do not disrupt the class by
     walking between the class and the instructor. Do not leave class unless it is an absolute necessity.
     
 
     - Not talking with other classmates while the instructor or another student is speaking. If you have a question or a
     comment, please raise your hand, rather than starting a conversation about it with your neighbor.
     
 
     - Showing respect and concern for others by not monopolizing class discussion. Allow others time to give their input
     and ask questions. Do not stray from the topic of class discussion.
     
 
     - Not eating and drinking during class time.
     
 
     - Turning off the electronics: cell phones, pagers, and beeper watches.
     
 
     - Avoiding audible and visible signs of restlessness. These are both rude and disruptive to the rest of the class.
     
 
     - Focusing on class material during class time. Sleeping, talking to others, doing work for another class, reading the
     newspaper, checking email, and exploring the internet are unacceptable and can be disruptive.
     
 
     - Not packing bookbags or backpacks to leave until the instructor has dismissed class.
     
 
       
 
 
Academic Integrity
 A zero-tolerance policy on cheating will be adopted in this course. The following is the formal statement of academic integrity. Source:
 http://www.cse.buffalo.edu/graduate/policies_acad_integrity.php
 
The academic degrees and the research findings produced by our Department are worth no more than the integrity of the process by which they are
 gained. If we do not maintain reliably high standards of ethics and integrity in our work and our relationships, we have nothing
 of value to offer one another or to offer the larger community outside this Department, whether potential employers or fellow
 scholars.
 
For this reason, the principles of Academic Integrity have priority over every other consideration in every aspect of our departmental life, and we
 will defend these principles vigorously. It is essential that every student be fully aware of these principles, what the procedures are by which
 possible violations are investigated and adjudicated, and what the punishments for these violations are. Wherever they are suspected,
 potential violations will be investigated and determinations of fact sought. In short, breaches of Academic Integrity will not be
 tolerated.
                                                                                                        
                                                                                                        
 
 
University Statements on Academic Integrity
 The University at Buffalo Department of Computer Science and Engineering endorses and adheres to the University policy on Academic Integrity.
 Students should be familiar with that policy, as expressed in the following documents.:
      
 
 
Departmental Statement on Academic Integrity in Coding Assignments and Projects
 The following statement further describes the specific application of these general principles to a common context in the CSE Department
 environment, the production of source code for project and homework assignments. It should be thoroughly understood before undertaking any
 cooperative activities or using any other sources in such contexts.
 
All academic work must be your own. Plagiarism, defined as copying or receiving materials from a source or sources and submitting this material
 as one’s own without acknowledging the particular debts to the source (quotations, paraphrases, basic ideas), or otherwise representing the work of
 another as one’s own, is never allowed. Collaboration, usually evidenced by unjustifiable similarity, is never permitted in individual assignments.
 Any submitted academic work may be subject to screening by software programs designed to detect evidence of plagiarism or
 collaboration.
 
It is your responsibility to maintain the security of your computer accounts and your written work. Do not share passwords with anyone, nor write
 your password down where it may be seen by others. Do not change permissions to allow others to read your course directories and files. Do not
 walk away from a workstation without logging out. These are your responsibilities. In groups that collaborate inappropriately, it may be
 impossible to determine who has offered work to others in the group, who has received work, and who may have inadvertently
 made their work available to the others by failure to maintain adequate personal security. In such cases, all will be held equally
 liable.
 
These policies and interpretations may be augmented by individual instructors for their courses. Always check the handouts and web pages of your
 course and section for additional guidelines.
 
 
Departmental Policy on Violations of Academic Integrity
 Any student accused of a violation of academic integrity will be so notified by the course director. An informal review will be conducted, including
 a meeting between these parties. After this review and upon determination that a violation has occurred, the following sanctions will be imposed. It
 is the policy of this department that, in general, any violation of academic integrity will result in an F for the course, that all
 departmental financial support including teaching assistantship, research assistantship or scholarships be terminated, that
 notification of this action be placed in the student’s confidential departmental record, and that the student be permanently
 ineligible for future departmental financial support. A second violation of academic integrity will cause the department to
 seek permanent dismissal from the major and bar from enrollment in any departmental courses. Especially flagrant violations
 will be considered under formal review proceedings, which may in addition to the above sanctions result in expulsion from the
 University.