The Department of Computer Science & Engineering |
STUART C. SHAPIRO: CSE
305
|
A variable is a bundle of six attributes: name, scope, address, lifetime, type, and value.
An attribute may be bound to a variable (or other program entity) at various times. Sebesta mentions: language design time; language implementation time; compile time; link time; load time; and run time.
We will just be concerned with a two-way distinction:
There are two named variables,bsh % string = "This is a string."; bsh % set = new HashSet(); bsh % set.add(string); bsh % print(string); This is a string. bsh % print(set); [This is a string.]
string
and
set
. The variable string
is
bound to a word of memory that contains a reference to a string object. This
reference is string
's value. The variable
set
is bound, as its value, to a reference to an
instance of the HashSet
class, and that instance
includes a word of memory which contains a copy of the string reference bound,
as a value, to string
. That element of the
HashSet
is as much a variable as
string
is, but it doesn't have a name.
On the other hand, more than one variable might have the same name. Consider the Java code,
The twofor (int i = 0; i < 10; i++) { a[i] = i; } for (int i = 0; i < 100; i++) { squares[i] = i*i; }
for
loops each have a variable named
i
, yet they are different variables.
Also, when a subroutine, that has a local variable x
,
calls itself recursively, each instance of the subroutine will have a
separate variable named x
. This issue of subroutine
management will be discussed in more detail later.
The two variables named i
in the
for
loops are statically bound to their names at compile
time. The variables named x
, in all recursive
activations except the top one, are bound to their names dynamically,
during run time.
When using an interpreted language (such as bash, Erlang, Haskell, the Java
BeanShell, Lisp, Python, or Ruby), variables such as
string
and set
, above, may be
created and bound to their names dynamically, at run time.
To discuss scope, we need another 3-way distinction
This discussion of scope also applies to names other than variable names, for example, names of subprograms.
Python displays all the traditional characteristics of static scoping:
<timberlake:CSE305:1:29> cat scope.py #! /util/bin/python # Illustration of Nested Scopes # Stuart C. Shapiro # February 4, 2005 def A(): print x # global x def B(): def C(): x = "C's x" print x # local x D() def D(): print x # nonlocal x x = "B's x" print x # local x A() C() x = "Global x" B() <timberlake:CSE305:1:30> scope.py B's x Global x C's x B's x
Static and dynamic scope may be clearly compared in Common Lisp and Emacs-Lisp.
Here's an interaction with Common Lisp:
Common Lisp's variables are statically scoped. (Although variables may be declared to be dynamically scoped.) Sincecl-user(1): (setf x 1) 1 cl-user(2): (defun outer (x) (inner)) outer cl-user(3): (defun inner () x) inner cl-user(4): (outer 2) 1 cl-user(5): (inner) 1
x
is not local to the function inner
, it refers to the
variable with the same name that has most recently been declared
looking up the static spatial area of the program. There, the most
recent declaration of x
is the global one
implicitly declared in the setf
expression. So the
x
of inner
is in the scope of
the global x
, and they refer to the same variable.
However, the x
of the function
outer
is a formal parameter, and so is in
different scope, and so refers to a different variable.
Now here's the apparently same interaction with Emacs-Lisp
Emacs-Lisp is dynamically scoped (like pre-Common Lisp Lisps). Since(setf x 1) 1 (defun outer (x) (inner)) outer (defun inner () x) inner (outer 2) 2 (inner) 1
x
is not local to the function inner
,
it refers to the variable with the same name that has most recently
been declared looking up the dynamic chain of function calls. When
inner
was called from outer
, that
would be outer
's x
, but when
inner
was called from the top-level, that would be
the top-level, global x
, the one assigned by the
setf
.
The programming languages that descend from Algol 60 allow blocks where variables may be declared, giving them smaller scopes than subprograms (methods). This Java for loop
is an excellent examples of this.for (int i = 0; i < 10; i++) { a[i] = i; }
The scope of Prolog variables is limited to a single "clause". There
is no lexical nesting. So the compiler issues a warning when it compiles
inner
, and the attempt to execute
inner
causes a run-time error:
<timberlake:CSE305:1:53> cat scope.pro :- X is 1, format("Global(?) X is ~d~n", [X]). outer(X) :- format("Outer X is ~d~n", [X]), inner. inner :- format("~Inner X is d~n", [X]). % error :- outer(2). :- halt. <timberlake:CSE305:1:54> prolog -l scope.pro % compiling /projects/shapiro/CSE305/scope.pro... Global(?) X is 1 * [X] - singleton variables * Approximate lines: 5-6, file: '/projects/shapiro/CSE305/scope.pro' Outer X is 2 ! Consistency error: [126,73,110,110,101,114,32,88,32,105|...] and user:[_4193] are inconsistent ! format_arguments ! goal: format([126,73,110,110,101,114,32,88,32|...],user:[_4193]) ! Approximate lines: 6-8, file: '/projects/shapiro/CSE305/scope.pro' % compiled /projects/shapiro/CSE305/scope.pro in module user, 0 msec 2304 bytes
Summary
The static scope of a variable is the block in which it is declared,
plus all spatially (lexically) enclosed blocks, except those where it
is shadowed by a declaration of another variable with the same name.
Some languages, in some circumstances, include the area of the block
in which it is declared that occurs before the declaration; others
don't. Java doesn't allow declaration of a new variable inside the
static scope of another variable of the same name.
The dynamic scope of a variable is the block in which it is declared, plus all dynamically enclosed blocks, i.e., blocks that are executed while the block in which the variable is declared is still executing.
Dynamic scope is very difficult to understand and to check for program correctness, since it is extremely hard to tell, by looking at a program, where any given variable has gotten its value. Most programming languages use static scoping, although Perl, as well as Common Lisp, allows variables to be declared to use dynamic scoping.
In general, you should make the scope of any variable be the smallest that is needed. In particular: declare the for loop index in the for loop itself; and avoid global variables unless absolutely necessary.
See Sebesta for more examples of static and dynamic scoping in block-structured languages.
x = y
, where the l-value is
the address of x
, the r-value is the value of
y
and the r-value is to be stored into the address
at the l-value. Note that the computation of the
l-value might be as complicated as the computation of the
r-value, as ina[<expression>] =
<expression>;
Fortran77 has several ways to create aliases. One is by the
Equivalence
statement:
TheProgram Alias C Test program for aliases Integer i,j Equivalence (i, j) i = 1 10 Print *, '10: i = ', i, ', j = ', j j = 2 20 Print *, '20: i = ', i, ', j = ', j End ------------------------------------------------------- <cirrus:Programs:1:124> f77 -o alias.out alias.f NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o alias.out alias.f alias.f: MAIN alias: <cirrus:Programs:1:125> alias.out 10: i = 1, j = 1 20: i = 2, j = 2
Equivalence
statement is deprecated in Fortran90.
Deprecated
"A deprecated element or attribute is one that has been outdated by newer constructs... Deprecated elements may become obsolete in future versions" [http://www.w3.org/TR/REC-html40/conform.html]
C also lets you do this, if you know where to look:
This is not a "feature" of C, but results from it not doing range checking on arrays./* * C Alias Program * */ #include <stdio.h> int main() { int i, a[3] = {1,2,3}, j; i = 0; j = 6; printf("a = %d, %d, %d, %d, %d, \n", a[-2], a[-1], a[0], a[1], a[2]); return 0; } ------------------------------------------------------- <cirrus:Programs:1:137> gcc -Wall alias.c -o alias.out <cirrus:Programs:1:138> ./alias.out a = 0, 6, 1, 2, 3,
There are other ways to create aliases. We will discuss them in later sections of the course.
Clearly, aliasing can lead to programs that are hard to understand and to debug.
n
of the following C
recursive function are stored on the stack.
int factorial(int n) { if (n == 1) return 1; else return n * factorial(n-1); }
HashSet
discussed above and its
unnamed variables are stored on the heap.
Relevant terms:
Variable categories by lifetime and memory location:
Fortran77 and earlier versions of Fortran use only static variables. One implication is that recursion is not possible. This can be demonstrated by a subroutine that keeps count of the number of times it has been called:
Notice that, even thoughProgram Count C Demonstration of static variables in Fortran. print *, 'Starting Test Program' Call CountingRoutine() Call CountingRoutine() Call CountingRoutine() Call CountingRoutine() Call CountingRoutine() End Subroutine CountingRoutine () C Keeps track of the number of times it has been called C and prints that count each time. Integer count Data count/0/ count = count + 1 Print *, 'count = ', count Return End ------------------------------------------------------- <cirrus:Programs:1:152> f77 -o count.fout count.f NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o count.fout count.f count.f: MAIN count: countingroutine: <cirrus:Programs:1:153> count.fout Starting Test Program count = 1 count = 2 count = 3 count = 4 count = 5
count
is a local variable of
CountingRoutine
, its lifetime exceeds the running time of
CountingRoutine
.
C can achieve this effect by declaring a variable to be static
:
/* * Count * Stuart C. Shapiro * * This program demonstrates static variables * with a function that counts the number of times that it is called. * */ #include <stdio.h> void counting_function() { /* Prints the number of times that it has been called. */ static int count = 0; printf("count = %d\n", ++count); } int main() { /* Demonstrates counting_function by calling it 5 times. */ counting_function(); counting_function(); counting_function(); counting_function(); counting_function(); return 0; } ------------------------------------------------------- <cirrus:Programs:1:140> gcc -Wall count.c <cirrus:Programs:1:141> ./a.out count = 1 count = 2 count = 3 count = 4 count = 5
This can also be done in Java:
/** * Counter.java * * * Created: Mon Sep 15 16:47:41 2003 * * @author Stuart C. Shapiro */ public class Counter { public static int count; public Counter (){ } /* Prints the number of times that it has been called. */ public static void counting_function() { System.out.println("count = " + ++count); } /* Demonstrates counting_function by calling it 5 times. */ public static void main (String[] args) { counting_function(); counting_function(); counting_function(); counting_function(); counting_function(); } // end of main () }// Counter ------------------------------------------------------- <cirrus:Programs:1:142> javac Counter.java <cirrus:Programs:1:143> java Counter count = 1 count = 2 count = 3 count = 4 count = 5
Note that Sebesta says,
"when the static modifier appears in the declaration of a variable in a class definition in C++, Java, and C#, it has only an indirect connection to the concept of the lifetime of the variable. In this context, it means the variable is a class variable, rather than an instance variable. Class variables are created some time before the class is first instantiated." [Sebesta, p. 222]The Java Standard says,
"Preparation involves creating the static fields (class variables and constants) for a class or interface and initializing such fields to the default values (¤4.5.5). This does not require the execution of any source code" [Java Language Specification, Section 12.3.2]Notice that the class variable
count
was available for
use without constructing an instance of the class
Counter
. CLOS also has class variables, called "shared
slots", but they cannot be accessed except via an instance of the class.
In most current programming languages, the formal parameters and local variables of subroutines (functions, methods) are stack-dynamic variables. Memory cells are allocated for them when the subroutine begins execution, and are deallocated when the subroutine ends execution.
At any time during the run of the program, the stack contains the memory cells for all the subroutines currently executing, including all the invocations of recursive subroutines currently executing.
If subroutine A calls subroutine B, then B terminates, and A then calls C, the stack memory used by the formal parameters and local variables of C will be some or all the memory cells just used by B, and may be more:
/* * C Program testing stack reuse * */ #include <stdio.h> void a(){ int i = 743; printf("In a, i = %d\n", i); } void b() { int j; printf("In b, j = %d\n", j); } int main() { a(); b(); return 0; } ----------------------------------- <wasat:Programs:2:127> gcc -Wall stackReuse.c -o stackReuse.out <wasat:Programs:2:128> ./stackReuse.out In a, i = 743 In b, j = 743
The address of a local variable of a subroutine will always be the same address relative to the beginning of the area of the stack that subroutine uses. That is how the compiler can compile code for the subroutine, even though it is not known until run-time what the actual addresses of the local variables will be.
new
operator. For example, this Java BeanShell
statement
allocates memory cells on the heap to hold the instance variables of absh % set = new HashSet();
HashSet
, binds the nameless variables to those cells,
and returns a reference (pointer) to them to be stored in the stack-dynamic
variable set
.
In C, the allocation operator is the function
malloc(size)
, which takes an argument specifying the
amount of memory required, and returns a pointer to that area of heap
memory.
Heap memory must be used for any dynamically allocated object or data structure that can be allocated in a subroutine and then have a pointer (reference) to it assigned to a variable which is outside the dynamic scope of the subroutine, so that its lifetime must extend to the time after the subroutine terminates and deallocates its stack memory. Consider the Java program,
Although the memory for theimport java.util.*; public class HeapDemo { public static HashSet singleton(Object obj) { HashSet set; set = new HashSet(); set.add(obj); return set; } public static void main (String[] args) { HashSet myset; myset = singleton("element"); System.out.println(myset); myset = singleton("another"); System.out.println(myset); } }// HeapDemo ------------------------------------------------------- <cirrus:Programs:2:101> javac HeapDemo.java <cirrus:Programs:2:102> java HeapDemo [element] [another]
HashSet
is allocated in
the singleton
method, it cannot be allocated on the
stack, because it must survive the termination of
singleton
.
Heap memory should be returned to the heap when it is no longer needed (about
to be no longer reachable from any named variable), like the HashSet
[element]
was no longer needed (reachable) after
myset
was reassigned above. Otherwise, a program that
runs long enough might use up the heap and abnormally terminate. (The process of
heap memory becoming increasingly unreachable but unavailable for reallocation
is called "memory leakage".) In C and C++, heap memory must be explicitly
deallocated with the operator free(p)
or
delete p
, respectively, where p
is a pointer to the object or data structure whose memory is no longer needed.
Requiring the programmer to explicitly deallocate heap memory allows for these programmer mistakes:
A more reliable idea is for the programming system itself to deallocate unreachable heap memory, a process called "automatic garbage collection", which will be discussed again in Chapter 6. Lisp was the first programming language to perform automatic garbage collection. Some other languages that do automatic garbage collection are Erlang, Haskell, Java, Perl, Python, and Ruby.
list = [10.2, 3.5]
, where the variable storing the
two-element array (not the variable list
) is an implicit
heap-dynamic variable. I am a little skeptical about this category.
Notice thatcl-user(5): (let ((count 0)) (defun countingFunction () (incf count))) countingFunction cl-user(6): (countingFunction) 1 cl-user(7): (countingFunction) 2 cl-user(8): (countingFunction) 3 cl-user(9): (countingFunction) 4 cl-user(10): count Error: Attempt to take the value of the unbound variable `count'. [condition type: unbound-variable] ... [1] cl-user(11): :res cl-user(12): (countingFunction) 5
count
must be a heap-dynamic variable,
because its lifetime exceeds the execution time of the block in
which it is declared.
+
, with the variable as operand.
+
in the statement x = y + z
. Once
compiled, the information used by the compiler for typing needn't be
retained. Often, the names of variables are not retained after
compilation, so that symbolic debugging of a running program cannot be
done.
@ | is an array | |
a name starting with | $ | is a scalar (a number or string) |
% | is a hash |
a name starting with | I, i, J,
j, K, k, L, l, M, m, N, n | is an integer |
a name starting with | anything else | is a real |
In Fortran, an explicit type declaration can override the implicit naming convention, which could lead to confusion.
The explanation of this error message is<timberlake:CSE305:2:109> sml Standard ML of New Jersey v110.69 [built: Thu May 28 09:54:29 2009] - fun reciprocal(x) = 1.0 / x; val reciprocal = fn : real -> real - fun bad(y) = reciprocal(y) + 2 * y; stdIn:2.14-2.35 Error: operator and operand don't agree [literal] operator domain: int * int operand: int * real in expression: 2 * y
Haskell also uses static typing and does type inference.2
is of type int.
Since*
requires both its operands to be of the same type and2
is an int,y
should be an int.
Buty
is the argument of the functionreciprocal
.
Whenreciprocal
was defined, its formal parameter was used in an expression that requires it to be a real.
Therefore the actual argument ofreciprocal
, namelyy
must be real.
That is a type conflict, and a compiler error.
However, it seems better to think of such languages has having typed values, rather than typed variables. For example, the Python Reference Manual says "Objects are Python's abstraction for data...Every object has an identity, a type and a value" [Sect. 3.1], and the ANSI Common Lisp standard says that "Objects, not variables, have types" [ANSI Common Lisp, Section 4.1]. Consider this example:
In this example,cl-user(1): (setf x 33.72 y 7.9) 7.9 cl-user(2): (print (gcd x y)) Error: `33.72' is not of the expected type `integer' [condition type: type-error]
gcd
requires its arguments to be
integers, and a type error results. Yet the type error is about
33.72
, not about x
.
Although Java uses static typing, it also has types associated with values:
bsh % list = new LinkedList(); bsh % list.add("A string"); bsh % list.add(new HashSet()); bsh % print(list.getFirst().getClass()); class java.lang.String bsh % print(list.get(1).getClass()); class java.util.HashSet
Notice that in the Java expression
referenceVariable.method()
, the static type of
referenceVariable
must have the
method()
defined for it, or a compile-time error will be
issued. However, the actual class of the dynamic value of
referenceVariable
will be used to choose the particular
details of the method()
, as long as that class is a
subclass of the static class of referenceVariable
.
For example, let ClassA
be a class in which
methodA()
is defined, let
ClassA1
, ClassA2
, and
ClassA3
extend ClassA
and
specialize methodA()
, and let
varA
be a variable declared to be of type
ClassA
.
UML diagram
by Dan Schlegel
The
expression varA.methodA()
is syntactically legal
regardless of whether the current value of varA
is an
object of type ClassA
, ClassA1
,
ClassA2
, or ClassA3
, but the
value of varA.getClass()
will determine the specific
version of methodA()
used.
On the other hand, if
varB
is declared to be of some superclass of
ClassA
for which methodA()
is
not defined, varB.methodA()
will produce a compiler
error, unless a cast is used, such as
((ClassA2)varB).methodA()
.
Languages with typed values include Common Lisp, Haskell, Java, JavaScript, Ruby and Python.
Usually, in languages with typed values, the programmer may write code to test the types of values and give reasonable error messages if they are not what was expected, but, without doing this, a type error might only be caught many levels of function calls below where the error actually occurred. Static type-checking generally makes program debugging easier.
This might lead to confusion with pointer or reference
variables.
For example, after the assignment set = new HashSet();
,
above, should we say that the value of set
is a
HashSet
or a reference to a HashSet
? The
latter is the more careful way to speak; the former is more informal.
We will discuss this more when we discuss pointer types in Chapter 6.
Now we will discuss when a variable is first bound to a value, and whether its value binding is allowed to change.
When a variable is bound to an address (memory cell), its value might be whatever bit settings were left in that cell, interpreted according to the variable's type, for example this C program:
or it might be initialized, either to a default value, or to an value explicitly specified in a declaration.#include <stdio.h> int main() { int x; double y; printf("x = %d y = %e\n", x, y); return 0; } ------------------------------------------------------- <cirrus:Programs:1:103> gcc -Wall leftover.c -o leftover.out <cirrus:Programs:1:104> ./leftover.out x = -4264396 y = 8.485876e-314
"FORTRAN also provides a nice feature ... of initially defining the values of a number of variables in a compact manner, using a DATA statement. The DATA statement is of the form:DATA listofvars1/listofconsts1/[[,]listv2/listc2/]... ... The DATA statement puts the constant values into the variables on the list at compile time" [S. L. Edgar, FORTRAN For The '90's (New York: Computer Science Press) 1992, p. 199-200. italics in the original]
If variable initialization is done at run-time, the initialization
expression will usually be any expression that could be on the
right-hand side of an assignment statement.
We already saw that Java initializes variables to their default
values during compile-time. It can also initialize variables during
run-time to the value of any expression. See the Standard
Sect. 14.4.
Common Lisp:bsh % final double pi = 3.14159; bsh % pi = 3; // Error: Typed variable: pi: Final variable, can't assign : at Line: 3 : in file:: pi = 3
cl-user(1): pi 3.141592653589793d0 cl-user(2): (defconstant mypi 3.14159) mypi cl-user(3): (setf mypi 3) Error: Cannot change the value of mypi -- it is a constant. [condition type: program-error]
A named constant that gets its value binding at compile time is called a manifest constant.
<timberlake:Test:1:33> python Python 2.6.4 (r264:75706, Dec 21 2009) on linux2 [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print True, False True False >>> True, False = "newTrue", "newFalse" >>> print True, False newTrue newFalse
1
. Fortran
allocates a cell in RAM, stores in integer 1
in that
cell, and makes every occurrence of 1
in the program a
variable whose address is bound to that cell, thus saving memory. Old
versions of Fortran could even change the value of such a variable
dynamically. We will discuss how in Chapter 9.
For example, C's #define <identifier>
<string>
declares a macro expanded by a preprocessor that
runs before the compiler.