UNIVERSITY AT BUFFALO, THE STATE UNIVERSITY OF NEW YORK
The Department of Computer Science & Engineering

STUART C. SHAPIRO: CSE 305

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003

Subprograms

Definition

A subprogram is a piece of program with a well-defined operational semantics that can be executed (called) from various places in a program. It is useful for two reasons:

It eliminates the need to replicate its code everywhere that code needs to be executed.
It serves as a process abstraction: the programmer can make use of its operation without being further concerned with how that operation is implemented.

Major Types

There are three types of subprograms:

Procedures: executed for its side-effects
Functions: executed for its value
Methods: attached to objects

However, they have more in common than what distinguishes them, so our discussion will mostly be about them all.

General Characteristics

These characteristics [text, p. 354-355] all have exceptions, but we'll ignore those in the general discussion.

A subprogram has a single entry point.
Only one subprogram is executing at a time; others may be active, but suspended.
When a subprogram terminates, execution returns to its caller (the program unit that called the subprogram).

Calling Methods

A function call, consisting of the name of the function and a list of actual parameters, is syntactically an expression, and can be situated wherever an expression of the function's return type can be.

Fortran has a special statement for calling a procedure, CALL Sub [(<argument_list>)]. Other languages allow a procedure call to be a statement by itself. C-based languages allow a function call to be a statement---its value is discarded.

If a procedure subprogram has no parameters, Fortran allows just the name to appear in the CALL statement. Other languages require the parentheses, even with nothing between them.

Parts of a Subprogram Definition

Header: Specifies the name, formal parameters (formal arguments), return type, other attributes. Each may be optional.
Body: Specifies the code to be executed when the subprogram is called.

The header can sometimes appear without the body if the compiler needs the header information to compile calls, but the programmer does not want to provide the body details yet.

Subprogram Bodies

In most current languages, a subprogram body is a block that provides a local binding environment for the subprogram's formal parameters, local variables, and, in some languages, locally scoped subprograms.

In some old programming languages, the body did not provide a separate binding environment, just an entry point and a way to specify return of execution to the caller.

SNOBOL had a dynamically-defined body. There was an entry point, and a local binding environment, but termination was determined by the dynamically next return goto to be executed.

Termination Specification Methods

The issue is how to specify that the subprogram is finished, and execution is to return to the caller, and, for function subprograms, what value is to be returned.

A procedure may return when execution falls through to the bottom. I.e. the body is syntactically one statement; the procedure returns when that statement has been executed.

A return statement may be used. It is an executable statement whose effect is to terminate the procedure and return execution to the calling unit. return is a variety of exit that exits a procedure.

Fortran (77 & earlier) allows the name of a function subprogram to be used as a local variable within its body. The name's value at the time of termination is the value returned by the function.

Pascal also used this technique. However, since Pascal allowed functions to be recursive, using the function name where it would be evaluated for an l-value caused an error: "attempt to call a function with incorrect number of arguments".

Fortran 90 (and later) allows recursive functions, but only if declared so. In that case, the variable whose value is to be used as the result must also be declared:

      Program factTest
      Integer i, fact
      Do i = 1, 4
         Print *, i, fact(i)
      EndDo
      End

      Recursive Function fact(n) Result(result)
      Integer result, n
      If (n .eq. 0) Then
         result = 1
      Else
         result = n * fact(n-1)
      Endif
      End
--------------------------------------------------
<cirrus:Programs:1:128> f90 -o factTest.fout factTest.f

<cirrus:Programs:1:129> factTest.fout
 1 1
 2 2
 3 6
 4 24

Also notice the EndDo in Fortran 90.

Many current languages allow return <expression>. This causes the function to terminate, and the value of <expression> becomes the value of the function.

Subprogram Headers

Name

Most subprograms have names, but in Common Lisp it is possible to create a nameless function. Here's an example of applying a nameless function to a pair of arguments:

cl-user(1): ((lambda (x y)
	       (if (>= x y) x y))
	     3 6)
6

The subprogram name follows the syntactic rules for an identifier, and, in statically scoped languages, has a scope, which is the enclosing body.

Type

Function subprograms have a type, which is the type of the returned value.

Procedures don't return a value, so usually don't have a type.

In C-based languages, a procedure is a function whose type is void.

Formal Parameters

Matching Actual to Formal Parameters

Positional Parameters

The usual method for matching actual to formal parameters. The n^th actual parameter is matched with the n^th formal parameter.

Optional Parameters

In some languages, in some circumstances, it is possible for the number of actual parameters to be less than the number of formal parameters. The extra formal parameters are treated like local variables initialized to the default value, if any.

In SNOBOL there could be more actual parameters than formal parameters. The extra actual parameters were matched to the local variables in order as though they were additional formal parameters.

In Common Lisp optional formal parameters are declared as such:

cl-user(6): (defun reverse (list &optional end)
	      (if (endp list) end 
		(reverse (rest list) (cons (first list) end))))
reverse

cl-user(7): (reverse '(a b c d e))
(e d c b a)

Common Lisp also allows for a sequence of optional arguments to be gathered into a list:

cl-user(12): (defun union* (set1 set2 &rest sets)
	      (let ((result (union set1 set2)))
		(dolist (set sets result)
		  (setf result (union result set)))))
union*

cl-user(13): (union* '(a b c d) '(c e f g) '(a g h i) '(f j k))
(i h g a d b c e f j k)

Perl uses this technique as its only method of parameter matching:

#! /util/bin/perl

sub test {
  print @_[0], @_[1], @_[2], "\n";
}

test("a", "b", "c");

-------------------------------------
<cirrus:Programs:1:107> perl parmMatch.perl
abc

Keyword Parameters

Formal parameter is mentioned with actual parameter in call to indicate they should be matched.

cl-user(14): (defun testKey (&key first second third)
	      (list first second third))
testKey

cl-user(15): (testKey :second 2 :third 'c :first 'I)
(I 2 c)

Also see member

cl-user(16): (member 'c '(a b c d e))
(c d e)

cl-user(17): (member '(3 c) '((1 a) (2 b) (3 c) (4 d) (5 e)))
nil

cl-user(18): (member '(3 c) '((1 a) (2 b) (3 c) (4 d) (5 e))
		     :test #'equal)
((3 c) (4 d) (5 e))

cl-user(19): (member 'March '((January 31) (February 28) (March 31) (April 30))
		     :key #'first)
((March 31) (April 30))

Default Values

Where optional or keyword formal parameters are allowed, the programmer may generally specify a default value to be used if the subprogram is called without a matching actual parameter.

Parameter Passing

A motivational example from Fortran 95:

      Program cbr
      Integer i, j
      i = 3
      j = 5
      Print *, "Before call, i = ", i, " j = ", j
      Call Swap(i, j)
      Print *, "After call, i = ", i, " j = ", j
      End

      Subroutine Swap(x, y)
      Integer x, y, temp
      temp = x
      x = y
      y = temp
      End

----------------------------------------------------
<cirrus:Programs:1:112> f95 -o cbr.fout cbr.f

<cirrus:Programs:1:113> cbr.fout
 Before call, i =  3  j =  5
 After call, i =  5  j =  3

By interchanging the values of the formal parameters, subroutine Swap interchanged the values of the actual parameters.

Semantic Parameter Passing Modes

In Mode: The value of the actual parameter is made available to the subprogram via the formal parameter, but the value of the actual parameter cannot be changed by the subprogram. This is what you're used to from Java.
Out Mode: The value of the actual parameter is not made available to the subprogram, but the value of the actual parameter can be changed/set by the subprogram.
Inout mode: The value of the actual parameter is made available to the subprogram via the formal parameter, and the value of the actual parameter can be changed by the subprogram. This is what was demonstrated in the Fortran95 program above.

Implementation Models of Parameter Passing

Pass-by-Name: An inout mode technique where the actual parameter expression textually replaces the formal parameter in the code of the subprogram. Identifier names in the actual parameter expression might accidentally be the same as those of other identifiers in the subprogram, in which they would be the same variables. This technique is not used in any major current language.
Pass-by-Value: An in mode technique. The formal parameter is bound to its own memory cell at subroutine call, and the value of the actual parameter is copied into it. The formal parameter then acts like a local variable within the subprogram. The formal parameter is simply deallocated at subprogram termination time.
Pass-by-Result: An out mode technique. The formal parameter acts like an uninitialized local variable in the subprogram, but at subprogram termination time, its value is copied into the l-value of the actual parameter.
Pass-by-Value-Result: An inout mode technique. Acts the same as pass-by-value at subprogram call, and the same as pass-by-result at subprogram termination time. During subprogram execution time, the formal parameter is bound to its own memory cell, distinct from that of the actual parameter.
Pass-by-Reference: An inout mode technique. At subroutine call, the formal parameter is bound to the same memory cell as the actual parameter. Thus, during subprogram execution, the formal parameter is an alias of the actual parameter. This is the technique traditionally used by Fortran. See the example above.
In Fortran IV, call-by-reference could be used to change the values of literal constants! Later versions of Fortran prevent this.

Subtleties

All parameter passing in Java is pass-by-value. However, many variables hold references to objects, so, although the parameter is passed by value, the object is essentially passed by reference:

public class Name {
    public String first, last;

    public Name (String f, String l){
	first = f;
	last = l;
    }

    public String toString() {
	return first + " " + last;
    }

    public static void ReName(Name n, String f, String l) {
	n.first = f;
	n.last = l;
    }

    public static void main (String[] args) {
	
	Name tom = new Name("Tom", "Thumb");
	System.out.println("Tom's name is " + tom);
	ReName(tom, "Betty", "Boop");
	System.out.println("Tom's name is " + tom);
    } // end of main ()
}// Name

----------------------------------------------------
<cirrus:Programs:1:122> javac Name.java

<cirrus:Programs:1:123> java Name
Tom's name is Tom Thumb
Tom's name is Betty Boop

It's sometimes said that C uses pass-by-value except for arrays, which are passed by reference. But that's not true. C always uses pass-by-value. However, array-valued variables are actually pointer variables that point to the first element of the array. So the situation is like the situation in Java when passing reference variables. C's array is essentially passed by reference because the parameter is really passed by value:

#include <stdio.h>

void swap(int b[]) {
  int temp;
  printf("b = %d\n", b);
  temp = b[0];
  b[0] = b[1];
  b[1] = temp;
}

int main() {
  int a[2] = {3, 5};
  printf("a = %d\n", a);
  printf("Before call, a = [%d, %d]\n", a[0], a[1]);
  swap(a);
  printf("After call, a = [%d, %d]\n", a[0], a[1]);
  return 0;
}

-----------------------------------------------------
<cirrus:Programs:1:132> gcc -Wall cbv.c -o cbv.out
cbv.c: In function `swap':
cbv.c:5: warning: int format, pointer arg (arg 2)
cbv.c: In function `main':
cbv.c:13: warning: int format, pointer arg (arg 2)

<cirrus:Programs:1:133> cbv.out
a = -4264512
Before call, a = [3, 5]
b = -4264512
After call, a = [5, 3]

In C++, an object-valued variable is bound to enough memory on the stack to hold the entire object. When such a variable is passed by value, the actual object cannot be changed by the subprogram:

#include <iostream>
#include <string>
using namespace std;

class Name {
public: string first, last;

  Name (string f, string l){
    first = f;
    last = l;
  }

  string toString() {
    return first + " " + last;
  }
};

void ReName(Name n, string f, string l) {
  n.first = f;
  n.last = l;
}

int main () {
  Name tom("Tom", "Thumb");
  cout << "Tom's name is " << tom.toString() << endl;
  ReName(tom, "Betty", "Boop");
  cout << "Tom's name is " << tom.toString() << endl;
  return 0;
}

------------------------------------------------------------
<cirrus:Programs:1:154> g++ Name.cc -o Name.ccout -R /util/gnu/lib

<cirrus:Programs:1:155> Name.ccout
Tom's name is Tom Thumb
Tom's name is Tom Thumb

Alternatively, the C++ object can be allocated on the heap, and a reference to it can be assigned to a pointer variable. Passing this variable by value has the same effect as passing a reference variable by value in Java:

#include <iostream>
#include <string>
using namespace std;

class Name {
public: string first, last;

  Name (string f, string l){
    first = f;
    last = l;
  }

  string toString() {
    return first + " " + last;
  }
};

void ReName(Name* n, string f, string l) {
  n->first = f;
  n->last = l;
}

int main () {
  Name* tom = new Name("Tom", "Thumb");
  cout << "Tom's name is " << tom->toString() << endl;
  ReName(tom, "Betty", "Boop");
  cout << "Tom's name is " << tom->toString() << endl;
  return 0;
}

----------------------------------------------------------
<cirrus:Programs:1:157> g++ cbp.cc -o cbp.ccout -R /util/gnu/lib

<cirrus:Programs:1:158> cbp.ccout
Tom's name is Tom Thumb
Tom's name is Betty Boop

C++ has reference variables and reference types. A C++ reference variable is a constant pointer variable that is implicitly dereferenced. If a formal parameter is a reference variable, it is bound to the address of the actual parameter, giving the same effect as Fortran's call-by-reference:

#include <iostream>
using namespace std;

void swap(int& x, int& y) {
  int temp;
  temp = x;
  x = y;
  y = temp;
    }

int main() {
  int i, j;
  i = 3;
  j = 5;
  cout << "Before call, i = " << i << " j = " << j << endl;
  swap(i, j);
  cout << "After call, i = " << i << " j = " << j << endl;
    }

--------------------------------------------
<cirrus:Programs:1:160> g++ cbr.cc -o cbr.ccout -R /util/gnu/lib

<cirrus:Programs:1:161> cbr.ccout
Before call, i = 3 j = 5
After call, i = 5 j = 3

Function Parameters

One strength of subprograms is that they can operate on different data during different calls---the subprogram is parameterized. However, usually, if a subprogram calls another subprogram, which subprogram it calls is fixed by the its code. This could also vary from call to call if the subprogram could be passed a subprogram as a parameter.

We already saw one example: the Common Lisp function member can be passed different test and different keys to use.

Common Lisp and other functional languages include function as one of their data types. Therefore functions can be passed as parameters as easily as other data types. Common Lisp also has functions that apply functions to their arguments:

cl-user(32): (type-of #'+)
compiled-function

cl-user(33): (funcall #'+ 1 2 3 4)
10

cl-user(34): (apply #'+ '(1 2 3 4))
10

cl-user(35): (apply #'+ 1 2 '(3 4))
10

cl-user(43): (type-of #'(lambda (x) (* x x)))
function

cl-user(44): (funcall #'(lambda (x) (* x x)) 3)
9

Here's a function that prints a table of a list of numbers and the results of applying some function to each of those numbers:

cl-user(45): (defun double (x)
	       (+ x x))
double

cl-user(46): (defun sqr (x)
	       (* x x))
sqr

cl-user(47): (defun printTable (f numbers)
	       (dolist (x numbers)
		 (format t "~d  ~d~%" x (funcall f x))))
printTable

cl-user(48): (printTable #'double '(1 2 3 4 5))
1  2
2  4
3  6
4  8
5  10
nil

cl-user(49): (printTable #'sqr '(1 2 3 4 5))
1  1
2  4
3  9
4  16
5  25
nil

In C, an actual parameter may be the name of a function; the matching formal parameter must be a pointer to a function returning the correct type:

#include <stdio.h>

int doubleit(int x) {
  return x + x;
}

int sqr(int x) {
  return x * x;
}

void printTable(int (*f)(), int a[], int length) {
  int i;
  for (i = 0; i < length; i++) {
    printf("%d  %d\n", a[i], (*f)(a[i]));
  }
}

int main() {
  int a[5] = {1, 2, 3, 4, 5};
  printTable(doubleit, a, 5);
  printf("\n");
  printTable(sqr, a, 5);
  return 0;
}

-------------------------------------------------------
<cirrus:Programs:1:203> gcc -Wall funcall.c -o funcall.out

<cirrus:Programs:1:204> funcall.out
1  2
2  4
3  6
4  8
5  10

1  1
2  4
3  9
4  16
5  25

Overloaded Subprograms, Generic Subprograms, User-Defined Overloaded Operators, & Coroutines

An overloaded subprogram has different procedures for different protocols (signature: number, order, types of parameters; plus return type if function). This is commonly done in Java, especially for constructors.

A generic subprogram uses basically the same procedure on different types of actual parameters, possibly with some minor changes. For example a sort procedure that can sort a collection of any type of element as long as that type has a "before" method. The details of how before works may differ from type to type, but the sort procedure is the same.

Some languages, e.g. C++, allow the programmer to provide new procedures for the languages operators (+, *, etc.), thus further overloading them.

A coroutine is a subprogram that is an exception to the characteristic that a subprogram is called by a caller, executes until it terminates, and then returns control to its caller. A coroutine may temporarily give control back to its caller, or to another coroutine, and later be resumed from where it left off. A set of coroutines may pass control among themselves, each resuming each time from where it left off.

Read the text.

CSE 305 Programming Languages Lecture Notes Stuart C. Shapiro Fall, 2003

Subprograms

Stuart C. Shapiro <shapiro@cse.buffalo.edu>

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003