The Department of Computer Science & Engineering
 STUART C. SHAPIRO: CSE 116 B

# Performance, Searching & Sorting

Riley, Chapter 3

Time vs. Space
Two major measures of program performace are space and time. They often are indirectly related: the Space-Time Tradeoff.
For example, it may take a long time for your browser to download some web page, but if you're willing to spend the space to store it on your hard disk (cache it), it will be faster to view it subsequent times.
Since computer memory is relatively cheap, we will be more concerned with time performance of programs.

Basic time efficiency
Never do something multiple times, when you can do it only once.
For example, store the value of a computation in a variable, instead of computing it several times.

Don't sweat the small stuff
In most programs, most time is spent in (iterative or recursive) loops.
We will focus on them.

Don't sweat the small stuff, part 2.
For small data sets, even the most inefficient program may run fast enough.
We will be concerned with how the time increases when the size of the data set increases.

The meaning of "real-time"
A program runs in real-time if it runs fast enough for the humans interacting with it.
Example: A program to predict the next day's weather should take less than a day to run.
Example: A tournament-level computer chess player must play within the same time constraints as human players.

Moral: For realistic data sets, when the program runs quickly enough, that's fine, but we'll still be concerned with how the time increases when the size of the data set increases.

Example
If it takes 1 hour to look up 100 cities, and record the state each is in, how long will it take to do that for 200 cities?
If it takes 1 hour to create a table of distances between 100 cities, how long will it take to do that for 200 cities?
This illustrates the difference between linear growth (grows like n), and quadratic growth (grows like n2).

Big-Oh Performance
We say that the run-time performance of the city-state recording procedure is O(n), and of the distance-table creation procedure is O(n2). Big-oh notation ignores constant multipliers, and lower-order terms. See the text for a more detailed discussion.

Some frequently met performance (complexity) categories, in order: O(1) (constant); O(log n) (logarithmic); O(n) (linear); O(n log n); O(n2) (quadratic); O(n3) (cubic); O(na) (polynomial); O(an) (exponential).

Searching and Sorting
We will often use searching and sorting algorithms to compare the performance of different collection classes.

Searching
The basic search problem is: given a collection of elements, decide if some element is in it. If so, return it, or its position, or just some indication of that fact. If not, return some indication that it's not.
Examples: ```contains(Object o)```; `String.indexOf(char ch)`.
Example tasks: Look up someone in the telephone directory; find a file with a given name on your disk.

Sorting
The basic sorting problem is: given a collection of elements, organize the elements so that the collection may be searched more efficiently, or printed, or operated on, or transferred to some other medium.
Examples: The telephone company must sort its customer database in order to print a telephone book; your mail tool's address book

Modelling Searching and Sorting
We will often model general searching and sorting tasks with a collection of integers.

Search Problem:
Given an array of ints, a, and some int, x, if x is in a, return its index, otherwise return -1.
See four search programs: Linear, Ordered, and two versions of Binary search. Two are O(n). Two are O(log n).

See the binary search simulator.

Analysis of binary search: probe once with an array of size n; probe once with size n/2; ... n/4; ... size 1. So probe log n times.
So binary search is O(log n).

Sorting Problem:
Given an array of ints, a, sort them into non-decreasing order.

See: an insertion sort program, and its simulator; a Selection sort programs, and its simulator; and a Quicksort program, and its simulator.