The Department of Computer Science & Engineering |
STUART C. SHAPIRO: CSE
305
|
According to a theorem proved by Corrado Böhm and Giuseppe Jacopini (CACM, 1966), any procedure can be written with the following control structures:
break
(or
exit
) statement is added, code repetition is not needed.
Note, that in the Preliminaries notes, I said that among the defining criteria of being a programming language was the facilities for sequence, selection, and loop.
A standard part of any imperative program is a sequence of statements, to be executed in sequential order. Functional programming languages get sequential execution from the left-to-right order of evaluating the arguments to a function. Logical programming languages also often have a left-to-right evaluation order.
Fortran indicates statement sequence by line sequence, but also allows a sequence of statements on one line separated by semi-colons (;).
The semi-colon (;) started out as a statement separator, but it turned out to be cleaner syntax to make it a statement terminator, as it is in most current languages.
A compound statement is a sequence of statements treated
syntactically as one statement. The statements in a compound
statement are usually surrounded by braces ({ ... }
), but
some languages use begin ... end
or other bracketing
keywords.
A block is a compound statement in which variables may be declared with scope limited to that compound statement.
goto
<label>
, where <label>
is a symbolic or
numeric label of some statement. (We will assume symbolic labels.)
Generally, a statement is labeled using the syntax <label>:
<statement>
.
Some languages allow a goto target that is the result of a run-time computation.
There are two major objections to the use of goto:
andx = 3.0; y = z / x;
The first obviously does not involve a divide by zero, but consider what you have to do to check that the second does not.x = 3.0; here: y = z / x;
The use of the goto was challenged by Edsger Dijkstra in his famous letter to the editor, "Goto Statement Considered Harmful" (CACM 1968). This launched the structured programming movement. Since the Böhm/Jacopini theorem proved that the goto is not needed, several subsequent languages downplayed, or even eliminated the goto statement. Java, for example, does not have it.
exit
statement is an executable statement whose
effect is to immediately continue execution at the statement
immediately following the lexically innermost containing control
structure. What control structures may be exited from is
language-dependent.
The exit statement can eliminate the need for repeated code when using only the sequence, selection, and loop control structures. For example, consider the following pseudocode for reading and processing some input file, first using a goto:
Using aloop: input := read(file); if (input = eof) goto out; process(input); goto loop; out: ...
while
, this may be rewritten as:
Note the repetition ofinput := read(file); while (input != eof) { process(input); input := read(file); } ...
input := read(file);
. However,
using exit, this may be rewritten as:
while (true) {
input := read(file);
if (input == eof) exit;
process(input);
}
...
(Of course, if we can combine reading and testing, this use of exit
can be avoided:
while ((input := read(file)) != eof) {
process(input);
}
or in C,
while (scanf(format, input) != EOF) {
process(input);
}
Nevertheless, sometimes exit
is the only way to avoid
repeated code.)
A single-level exit
always exits from the lexically
innermost control structure. A multi-level exit
may exit
from a more distant lexically containing control structure. In some
languages, the multi-level exit
takes a numerical
parameter, i, and exits from the ith
containing structure. In others, the multi-level exit
takes a symbolic parameter, label, and exits from the control
structure labeled label. This use of a label differs from, and
is not as dangerous as the statement label used as the target of a
goto. Multi-level exit
statements can be used to avoid
code repetition that is needed if only the single-level
exit
is available.
In C-based languages, break
is used as the exit
statement. In C, C++, and Python break
is a single-level exit.
In Java it is a multi-level exit, taking an optional symbolic label.
Perl uses last
as a multi-level exit with an optional
symbolic label.
In Common Lisp (return-from <label>
[<expression>])
is a multi-level exit. (return
[<expression>])
is equivalent to (return-from nil
[<expression>])
, and serves as a single-level exit.
- Selection
-
- The if Statements
-
- Single-Branch Selection
- The single-branch selection statement is usually of the form
if (<expression>) <statement>
. If the
<expression>
evaluates to a true value the
<statement>
is executed. Otherwise it isn't. In
either case, the next statement done is the statement following the
if
statement.
Perl has both an if (<expression>)
<compound-statement>
and an unless (<expression>)
<compound-statement>
. The latter executes the
<compound-statement>
if and only if the
<expression>
is false. In Perl, a
<compound-statement>
, surrounded by braces, is
required, even if there is only one statement in it.
Common Lisp, like Perl, has both (when <expression>
<form>)
and (unless <expression> <form>)
single-branch selection expressions.
- Double-Branch Selection
- The double-branch selection statement is usually of the form
if (<expression>) <then-statement> else
<else-statement>
and is often referred to as the
if-then-else statement. If the <expression>
evaluates
to a true value, <then-statement>
is executed.
Otherwise <else-statement>
is. In either case, only
one of the two statements is executed, and execution continues with
the statement following the if satement. This is the selection
referred to in the Böhm/Jacopini theorem. Of course the then
statement could always be empty, reducing to a single-branch selection
statement.
The if-then-else statement gave rise to a famous case of syntactic
ambiguity. When is the else statement executed in a case like
if (test1)
if (test2)
statement1
else
statement2
The two possibilities are: if test1 is true and test2 is false; if
test1 is false. In current languges, this is generally solved by a
rule that matches the else with the nearest unmatched if. Thus, in
the example above, statement2 would be done in the case that test1 is
true and test2 is false. If the other case were wanted, brackets or a
special keyword need to be used. In Java, brackets would be used to
turn the inner if statement into a one-statement compound statement:
if (test1) {
if (test2)
statement1
}
else
statement2
Other languages end every if with a keyword such as end
if
, making the two cases
if (test1)
if (test2)
statement1
else
statement2
end if
end if
and
if (test1)
if (test2)
statement1
end if
else
statement2
end if
Common Lisp's double-branch selection expression is
(if test
then-expression
else-expression)
- Multi-Branch Selection
- The multi-branch selection that is the most straightforward
extension of the double-branch selection chooses one of multiple
statements to execute based on the first of multiple tests to evaluate
to true. In the C-based languages, it can be expressed as a nested set
of (right-associative) if-then-elses:
if (test1) statement1
else if (test2) statement2
else if (test3) statement3
...
else if (testn) statementn
else default-statement
Notice that if we used indentation and brackets to indicate statement
nesting, this would be
if (test1) {statement1}
else {if (test2) {statement2}
else {if (test3) {statement3}
...
else {if (testn) {statementn}
else {default-statement}}...}}
To flatten this, even syntactically, some languages combine the
else
with the subsequent if
giving an
elseif
(or elif
) keyword. In Python, the
above would look like
if test1:
statement1
elif test2:
statement2
elif test3:
statement3
...
elif testn:
statementn
else:
default-statement
and does not involve nested selection statements.
Common Lisp's multiple-branch selection expression looks like this
latter version:
(cond (test1 expression11 expression12 ...)
(test2 expression21 expression22 ...)
(test3 expression31 expression32 ...)
...
(t default-expression1 default-expression2 ...))
- The case Statement
- The case statement, introduced by ALGOL W (1966), is a special
purpose multiple-branch selection statement for use when all the tests
test the same expression for equality to various values.
The general form of the case statement is captured in the version
in Common Lisp:
(case keyform
(keylist1 expression11 expression12 ...)
(keylist2 expression21 expression22 ...)
(keylist3 expression31 expression32 ...)
...
(t default-expression1 default-expression2 ...))
Each keylist
must be a list of literal values (they are
not evaluated). To evaluate the case
expression, the
keyform
is evaluated. If its value is listed in one of
the keylist
s the expression
s of that
keylist
are evaluated in order, and the value of the
case
expression is the value of the last such
expression
. If the value of the keyform
is
not listed in one of the keylist
s, and the optional
t
case is present, the default-expression
s
are evaluated, and the value of the last one of those is the value of
the case
expression. No key may appear in more than one
keylist
.
Several languages limit the case keys to be ordinal values. That
way, the case statement may be compiled into a table of instructions
indexed by the key.
The case statement of the C-based languages is
switch (keyform) {
case key1: statement11 statement12 ...
case key2: statement21 statement22 ...
case key3: statement31 statement32 ...
...
[default: default-statement1 default-statement2 ...]
}
These keys are limited to integers, and there can only be one key per
case. To handle the situation of allowing the same set of statements
to be executed for several different keys, the C-based languages
specify that control flows from the last statement of the chosen case
directly through to the first statement of the next listed case. If
the programmer does not want this to happen, a break
statement must be used. For example,
switch (keyform) {
case 1:
case 3:
case 5:
case 7:
case 9: statement-odd;
break;
case 2:
case 4:
case 6:
case 8: statement-even;
break;
default: statement-too-big;
}
Often, there is only one key in each case, and the
break
can easily be forgotten.
- Loop
-
- Iterative Loops
-
- Logically Controlled Loops
-
- Pretest Loops
- The pretest logically controlled loop, referred to as the while
loop, is the loop considered in the proof of the Böhm/Jacopini
theorem. The version used in the C-based languages is a typical
example:
while (test) statement
The semantics of this (in HOSL) is:
loop: if not test goto out
statement
goto loop
out: ...
Notice that test
is evaluated each time around the loop,
and that statement
might never be executed.
- Posttest Controlled Loops
- The C-based languages also have a posttest logically controlled
loop, called the do-while:
do statement while (test);
Its semantics is:
loop: statement
if test goto loop
Notice that statement
is always executed at least once,
and, again test
is evaluated each time around the loop.
A variant that several languages have is called the repeat-until
loop and looks like:
repeat statement until test;
Its semantics is:
loop: statement
if not test goto loop
The repeat-until is very similar to the do-while, but often easier to
think about because of the opposite sense of its test.
- Loop Forever
- The most flexible iterative loop is the one that loops forever,
until an exit statement is executed within its body. The Common Lisp
version is
(loop {expression})
In the C-based languages, it may be simulated by
while (true) statement
or
for (;;;) statement
Recall the example of the exit statement above:
while (true) {
input := read(file);
if (input == eof) exit;
process(input);
}
- Counter-Controlled Loops
- The oldest counter-controlled loop, Fortran's DO loop illustrates
all the issues:
DO label variable = initial-expression, terminal-expression [, stepsize-expression]
statements
label last-statement
next-statement
The semantics of this are [Sebesta, p. 331-332]
init-value := initial-expression
terminal-value := terminal-expression
step-value := stepsize-expression
variable := init-value
iteration-count := max(int((terminal-value - init-value + step-value)
/ step-value),
0)
loop: if iteration-count <= 0 goto out
statements
label: last-statement
variable = variable + step-value
iteration-count = iteration-count - 1
goto loop
out: next-statement
Note:
- The loop parameter expressions are evaluated only once, so changing variables
that are part of them, doesn't affect the number of times the loop is
executed.
- The number of times the loop is exected is controlled by the
iteration-count, not the value of the variable, so assigning to the
loop variable inside the loop doesn't affect the number of times the
loop is executed.
- The loop label is available to be the target of goto's inside the
loop body. The effect would be to skip the rest of the loop body
(except the last-statement---therefore the CONTINUE statement) and
continue with the next iteration.
- The scope of the loop variable is not limited to the loop body,
and it retains its last value when the loop is terminated.
The counter-controlled loop statement of the C-based languages is
different in several respects from Fortran's DO loop. Their format
is:
for (init-expr1, ..., init-exprk;
terminal-expr1, ..., terminal-exprn;
step-expr1, ..., step-exprm)
statement
The semantics is
{ init-expr1
...
init-exprk
loop:
terminal-expr1
...
if not terminal-exprn goto out
statement
bottom:
step-expr1
...,
step-exprm
goto loop
}
out:
Note:
- The loop parameter expressions are arbitrary expressions, except
that, in Java, the terminal expressions must be Boolean expressions.
- The loop parameter expressions are evaluated each time through the
loop, so changing variables that are part of them, does affect the
number of times the loop is executed.
- The number of times the loop is exected is controlled by the last
terminal expression, which is evaluated each time through the loop, so
assigning to any variables it uses inside the loop does affect the
number of times the loop is executed.
- If a
continue
statement is executed inside the loop,
control transfers to the bottom
label. The effect would
be to skip the rest of the loop body and continue with the next
iteration. (continue
may also be used inside while and
do-while loops.)
- The scope of any variable declared inside the init expressions is
limited to the loop parameter expressions and the loop body, and so
their last values are not available when the loop is terminated.
Just about every current language has a counter-controlled loop.
- Loops Based on Data Types or Data Structures
- Python doesn't have a counter-controlled loop. Its equivalent is
[Lutz, Python Pocket Reference, p. 30]
for target in sequence:
suite
[else:
suite]
The loop variable(s) range(s) over all the members of the given
sequence. The effect of the counter-controlled loop
is achieved by the range
function:
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(2,6)
[2, 3, 4, 5]
>>> for i in range(2,6):
... print i
...
2
3
4
5
The else
suite is executed if the loop terminates
normally (other than by executing break
.
Python allows sequence
to be any object for
which an Iterator
object is defined.
Several other languages have loops that range over all the elements
of some data structure. We saw an example from Perl in my notes on Data
Types.
Here's a simple example from Common Lisp:
cl-user(4): (loop
for d in '(Sunday Monday Tuesday Wednesday Thursday Friday Saturday)
do (print d))
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
nil
Actually, Common Lisp's loop
is extremely flexible. See
the CSE202
course notes on iteration and the Common
Lisp HyperSpec Sect 6.
As we saw, Java 1.5 also has a
for-each loop:
import java.util.*;
public class DataLoop {
public enum Month {January, February, March, April, May, June,
July, August, September, October, November, December}
public static void main(String[] args) {
HashSet<DataLoop.Month> thirtyDayMonths = new HashSet<DataLoop.Month>();
thirtyDayMonths.add(Month.September);
thirtyDayMonths.add(Month.April);
thirtyDayMonths.add(Month.June);
thirtyDayMonths.add(Month.November);
String output = "Thirty days hath: ";
for (Month m : thirtyDayMonths) {
output += m + " ";
}
System.out.println(output);
int[] eightPrimes = {2,3,5,7,11,13,17,19};
System.out.println();
System.out.println("Eight primes:");
for (int p : eightPrimes) {
System.out.println(p);
}
} // end of main()
} // DataLoop
---------------------------------------------------------
<wasat:Programs:2:262> /util/java/jdk1.5.0/bin/javac DataLoop.java
<wasat:Programs:2:263>/util/java/jdk1.5.0/bin/java DataLoop
Thirty days hath: September June April November
Eight primes:
2
3
5
7
11
13
17
19
- Generators
- A generator is a function that, each time it is called, returns
another member of some data structure (collection). There are
generally three parts to a generator function:
- A function that takes the collection as argument, and sets up
the generator. It may also return the generator function as its
value.
- The generator function itself, that returns another element of the
collection each time it is called.
- A way to tell that the collection has been exhausted. Either the
generator returns some special value (such as nil), or throws an
exception, or there is a special function for the purpose.
Generators in Java are classes that implement the
Iterator
interface. Their three parts are the three methods
iterator()
next()
hasNext()
Java iterators are used "under the covers" in the
for-each
loop.
Python's for
loop also uses iterators "under the
covers". It's two methods are [Python Library Reference Sect. 2.3.5]:
__iter__()
: Returns an iterator object.
next()
: Returns the next item, or throws a
StopIteration
exception.
- Recursive Loops
- Loops are one of the three classes of control structures of the
Böhm/Jacopini theorem. The loops we have been considering have
been iterative loops. Recursive loops are an alternative. For
example, an iterative Lisp function to call the function
visit
on every member of a list is
(defun visitAll (list)
(loop for x in list
do (visit x)))
whereas a recursive version to do the same thing is
(defun visitAll (list)
(unless (endp list)
(visit (first list))
(visitAll (rest list))))
Every iterative loop may be rewritten as a recursive loop, but some
recursive loops may be rewritten as iterative loops only with the aid
of an explicit stack.
- Backtrack Control Structures
- Based on success/fail tests.
test1, test2, ..., testi, testi+1, ... testn
If testi succeeds, try testi+1
If testi fails, back up to testi-1, and try to succeed another way
If testn succeeds, done
If test1 fails ultimately, entire set fails.
Consider matching the pattern [abc]*abcbe
against the
string abcaabcbabccdabcbe
See XEmacs
21.5 HTML Manuals Sect. 12.4
Prolog uses backtracking as its main control structure:
birthday(arthur, "Dec 3, 1980").
birthday(bea, "March 15, 1985").
birthday(chuck, "Dec 3, 1980").
birthday(dave, "March 15, 1985").
birthday(ethel, "June 17, 1975").
birthday(fran, "March 15, 1985").
findSame :- birthday(X,D),
format("1: ~a's birthday is ~s.~n", [X,D]),
birthday(Y,D),
format("2: ~a's birthday is ~s.~n", [Y,D]),
Y \== X,
format(" ~a and ~a have the same birthday.~n", [X,Y]),
birthday(Z,D),
format("3: ~a's birthday is ~s.~n", [Z,D]),
Z \== Y, Z \== X,
format(" ~a, ~a, and ~a have the same birthday.~n", [X,Y,Z]).
-----------------------------------------------------------------------------
<wasat:Programs:2:273> sicstus -l birthdays.prolog --goal "findSame,halt."
...
1: arthur's birthday is Dec 3, 1980.
2: arthur's birthday is Dec 3, 1980.
2: chuck's birthday is Dec 3, 1980.
arthur and chuck have the same birthday.
3: arthur's birthday is Dec 3, 1980.
3: chuck's birthday is Dec 3, 1980.
1: bea's birthday is March 15, 1985.
2: bea's birthday is March 15, 1985.
2: dave's birthday is March 15, 1985.
bea and dave have the same birthday.
3: bea's birthday is March 15, 1985.
3: dave's birthday is March 15, 1985.
3: fran's birthday is March 15, 1985.
bea, dave, and fran have the same birthday.
- Guarded Commands
- Dijkstra's guarded if:
if <test1> -> <statement1>
[] <test2> -> <statement2>
[] ...
[] <testn> -> <statementn>
fi
Evaluate the tests, and, nondeterministically, execute the statement
of one of the tests that evaluates to true. If none of the tests
evaluates to true, it is a run-time error.
Dijkstra's guarded loop:
do <test1> -> <statement1>
[] <test2> -> <statement2>
[] ...
[] <testn> -> <statementn>
od
Evaluate the tests, and, if any evaluate to true, nondeterministically
execute the statement of one of the tests that evaluates to true, and
then, do it all again. When none of the tests evaluates to true, the
loop terminates.