The Department of Computer Science & Engineering |
STUART C. SHAPIRO: CSE
305
|
A variable is a bundle of six attributes: name, address, value, type, scope, and lifetime.
An attribute may be bound to a variable (or other program entity) at various times. Sebesta mentions: language design time; language implementation time; compile time; link time; load time; and run time.
We will just be concerned with a two-way distinction:
There are two named variables,bsh % string = "This is a string."; bsh % set = new HashSet(); bsh % set.add(string); bsh % print(string); This is a string. bsh % print(set); [This is a string.]
string
and
set
. The variable string
is bound to a word
of memory that contains a reference to a string object. This reference
is string
's value. The variable set
is
bound, as its value, to a reference to an instance of the
HashSet
class, and that instance includes a word of
memory which contains a copy of the string reference bound, as a
value, to string
. That element of the
HashSet
is as much a variable as string
is,
but it doesn't have a name.
On the other hand, more than one variable might have the same name. Consider the Java code,
The twofor (int i = 0; i < 10; i++) { a[i] = i; } for (int i = 0; i < 100; i++) { squares[i] = i*i; }
for
loops each have a variable named
i
, yet they are different variables.
Also, when a subroutine, that has a local variable x
,
calls itself recursively, each instance of the subroutine will have a
separate variable named x
. This issue of subroutine
management will be discussed in more detail later.
The two variables named i
in the
for
loops are statically bound to their names at compile
time. The variables named x
, in all recursive
activations except the top one, are bound to their names dynamically,
during run time.
When using an interpreted language, such as bash, Lisp, BeanShell
Java, or Python, variables such as string
and
set
, above, may be created and bound to their names
dynamically, at run time.
To discuss scope, we need another 3-way distinction
This discussion of scope applies to names other than variable names, for example, names of subprograms.
Python displays all the traditional characteristics of static scoping:
<yeager:2005:1:52> cat scope.py #! /util/bin/python # Illustration of Nested Scopes # Stuart C. Shapiro # February 4, 2005 def A(): print x # global x def B(): def C(): x = "C's x" print x # local x D() def D(): print x # nonlocal x x = "B's x" print x # local x A() C() x = "Global x" B() <yeager:2005:1:53> scope.py B's x Global x C's x B's x
Static and dynamic scope may be clearly compared in Common Lisp and Emacs-Lisp.
Here's an interaction with Common Lisp:
Common Lisp's variables are statically scoped. (Although variables may be declared to be dynamically scoped.) Sincecl-user(1): (setf x 1) 1 cl-user(2): (defun outer (x) (inner)) outer cl-user(3): (defun inner () x) inner cl-user(4): (outer 2) 1 cl-user(5): (inner) 1
x
is not local to the function inner
, it refers to the
variable with the same name that has most recently been declared
looking up the static spatial area of the program. There, the most
recent declaration of x
is the global one
implicitly declared in the setf
expression. So the
x
of inner
is in the scope of
the global x
, and they refer to the same variable.
However, the x
of the function
outer
is a formal parameter, and so is in
different scope, and so refers to a different variable.
Now here's the apparently same interaction with Emacs-Lisp
Emacs-Lisp is dynamically scoped (like pre-Common Lisp Lisps). Since(setf x 1) 1 (defun outer (x) (inner)) outer (defun inner () x) inner (outer 2) 2 (inner) 1
x
is not local to the function inner
,
it refers to the variable with the same name that has most recently
been declared looking up the dynamic chain of function calls. When
inner
was called from outer
, that
would be outer
's x
, but when
inner
was called from the top-level, that would be
the top-level, global x
, the one assigned by the
setf
.
The programming languages that descend from Algol 60 allow blocks where variables may be declared, giving them smaller scopes than subprograms (methods). The Java for loops shown above are excellent examples of this.
The scope of Prolog variables is limited to a single "clause". There is no lexical nesting:
<yeager:2005:1:54> cat scope.prolog first :- X is 5, format("~d~n", [X]), second. second :- format("~d~n", [X]). % error <yeager:2005:1:55> sicstus -l scope.prolog --goal first. % compiling /web/faculty/shapiro/Courses/CSE305/2005/scope.prolog... * [X] - singleton variables in user:second/0 * Approximate lines: 5-6, file: '/web/faculty/shapiro/Courses/CSE305/2005/scope.prolog' % compiled /web/faculty/shapiro/Courses/CSE305/2005/scope.prolog in module user, 0 msec 820 bytes SICStus 3.9.1 (sparc-solaris-5.7): Thu Jun 27 22:58:18 MET DST 2002 Licensed to cse.buffalo.edu 5 ! Consistency error: [126,100,126,110] is inconsistent with user:[_4629] ! format_arguments ! goal: format([126,100,126,110],user:[_4629]) | ?-
Summary
The static scope of a variable is the block in which it is declared,
plus all spatially (lexically) enclosed blocks, except those where it
is shadowed by a declaration of another variable with the same name.
Some languages, in some circumstances, include the area of the block
in which it is declared that occurs before the declaration; others
don't. Java doesn't allow declaration of a new variable inside the
static scope of another variable of the same name.
The dynamic scope of a variable is the block in which it is declared, plus all dynamically enclosed blocks, i.e., blocks that are executed while the block in which the variable is declared is still executing.
Dynamic scope is very difficult to understand and to check for program correctness, since it is extremely hard to tell, by looking at a program, where any given variable has gotten its value. Most programming languages use static scoping, although Perl, as well as Common Lisp, allows variables to be declared to use dynamic scoping.
In general, you should make the scope of any variable be the smallest that is needed. In particular: declare the for loop index in the for loop itself; and avoid global variables unless absolutely necessary.
See Sebesta for more examples of static and dynamic scoping in block-structured languages.
x = y
, where the l-value is
the address of x
, the r-value is the value of
y
and the r-value is to be stored into the address
at the l-value. Note that the computation of the
l-value might be as complicated as the computation of the
r-value, as ina[<expression>] =
<expression>;
Fortran77 has several ways to create aliases. One is by the
Equivalence
statement:
TheProgram Alias C Test program for aliases Integer i,j Equivalence (i, j) i = 1 10 Print *, '10: i = ', i, ', j = ', j j = 2 20 Print *, '20: i = ', i, ', j = ', j End ------------------------------------------------------- <cirrus:Programs:1:124> f77 -o alias.out alias.f NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o alias.out alias.f alias.f: MAIN alias: <cirrus:Programs:1:125> alias.out 10: i = 1, j = 1 20: i = 2, j = 2
Equivalence
statement is deprecated in Fortran90.
Deprecated
"A deprecated element or attribute is one that has been outdated by newer constructs... Deprecated elements may become obsolete in future versions" [http://www.w3.org/TR/REC-html40/conform.html]
C also lets you do this, if you know where to look:
This is not a "feature" of C, but results from it not doing range checking on arrays./* * C Alias Program * */ #include <stdio.h> int main() { int i, a[3] = {1,2,3}, j; i = 0; j = 6; printf("a = %d, %d, %d, %d, %d, \n", a[-2], a[-1], a[0], a[1], a[2]); return 0; } ------------------------------------------------------- <cirrus:Programs:1:137> gcc -Wall alias.c -o alias.out <cirrus:Programs:1:138> alias.out a = 0, 6, 1, 2, 3,
There are other ways to create aliases. We will discuss them in later sections of the course.
Clearly, aliasing can lead to programs that are hard to understand and to debug.
x
of the recursive subroutine
mentioned above are stored on the stack; the HashSet
and
its unnamed variables discussed above are stored on the heap.
Relevant terms:
Variable categories by lifetime and memory location:
Fortran77 and earlier versions of Fortran use only static variables. One implication is that recursion is not possible. This can be demonstrated by a subroutine that keeps count of the number of times it has been called:
Notice that, even thoughProgram Count C Demonstration of static variables in Fortran. print *, 'Starting Test Program' Call CountingRoutine() Call CountingRoutine() Call CountingRoutine() Call CountingRoutine() Call CountingRoutine() End Subroutine CountingRoutine () C Keeps track of the number of times it has been called C and prints that count each time. Integer count Data count/0/ count = count + 1 Print *, 'count = ', count Return End ------------------------------------------------------- <cirrus:Programs:1:152> f77 -o count.fout count.f NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o count.fout count.f count.f: MAIN count: countingroutine: <cirrus:Programs:1:153> count.fout Starting Test Program count = 1 count = 2 count = 3 count = 4 count = 5
count
is a local variable of
CountingRoutine
, its lifetime exceeds the running time of
CountingRoutine
.
C can achieve this effect by declaring a variable to be static
:
/* * Count * Stuart C. Shapiro * * This program demonstrates static variables * with a function that counts the number of times that it is called. * */ #include <stdio.h> void counting_function() { /* Prints the number of times that it has been called. */ static int count = 0; printf("count = %d\n", ++count); } int main() { /* Demonstrates counting_function by calling it 5 times. */ counting_function(); counting_function(); counting_function(); counting_function(); counting_function(); return 0; } ------------------------------------------------------- <cirrus:Programs:1:140> gcc -Wall count.c <cirrus:Programs:1:141> a.out count = 1 count = 2 count = 3 count = 4 count = 5
This can also be done in Java:
/** * Counter.java * * * Created: Mon Sep 15 16:47:41 2003 * * @author Stuart C. Shapiro */ public class Counter { public static int count; public Counter (){ } /* Prints the number of times that it has been called. */ public static void counting_function() { System.out.println("count = " + ++count); } /* Demonstrates counting_function by calling it 5 times. */ public static void main (String[] args) { counting_function(); counting_function(); counting_function(); counting_function(); counting_function(); } // end of main () }// Counter ------------------------------------------------------- <cirrus:Programs:1:142> javac Counter.java <cirrus:Programs:1:143> java Counter count = 1 count = 2 count = 3 count = 4 count = 5
Is it true that
"when the static modifier appears in the declaration of a variable in a class definition in C++, Java, and C#, it has nothing to do with the lifetime of the variable. In this context, it means the variable is a class variable, rather than an instance variable." [Sebesta, p. 203]The Java Standard says,
"Preparation involves creating the static fields (class variables and constants) for a class or interface and initializing such fields to the default values (¤4.5.5). This does not require the execution of any source code" [Java Language Specification, Section 12.3.2]Notice that the class variable
count
was available for
use without constructing an instance of the class
Counter
. CLOS also has class variables, called "shared
slots", but they cannot be accessed except via an instance of the class.
In most current programming languages, the formal parameters and local variables of subroutines (functions, methods) are stack-dynamic variables. Memory cells are allocated for them when the subroutine begins execution, and are deallocated when the subroutine ends execution.
At any time during the run of the program, the stack contains the memory cells for all the subroutines currently executing, including all the invocations of recursive subroutines currently executing.
If subroutine A calls subroutine B, then B terminates, and A then calls C, the stack memory used by the formal parameters and local variables of C will be some or all the memory cells just used by B, and may be more:
/* * C Program testing stack reuse * */ #include <stdio.h> void a(){ int i = 743; printf("In a, i = %d\n", i); } void b() { int j; printf("In b, j = %d\n", j); } int main() { a(); b(); return 0; } ----------------------------------- <wasat:Programs:2:127> gcc -Wall stackReuse.c -o stackReuse.out <wasat:Programs:2:128> stackReuse.out In a, i = 743 In b, j = 743
The address of a local variable of a subroutine will always be the same address relative to the beginning of the area of the stack that subroutine uses. That is how the compiler can compile code for the subroutine, even though it is not known until run-time what the actual addresses of the local variables will be.
new
operator. For example, in the Java
BeanShell interaction shown above, the statement set = new
HashSet();
allocated memory cells on the heap to hold the
instance variables of a HashSet
, bound the nameless
variables to those cells, and returned a reference (pointer) to them
to be stored in the stack-dynamic variable set
.
In C, the allocation operator is the function
malloc(size)
, which takes an argument specifying the
amount of memory required, and returns a pointer to that area of heap
memory.
Heap memory must be used for any dynamically allocated object or data structure that can be allocated in a subroutine and then have a pointer (reference) to it assigned to a variable which is outside the dynamic scope of the subroutine, so that its lifetime must extend to the time after the subroutine terminates and deallocates its stack memory. Consider the Java program,
Although the memory for theimport java.util.*; public class HeapDemo { public static HashSet singleton(Object obj) { HashSet set; set = new HashSet(); set.add(obj); return set; } public static void main (String[] args) { HashSet myset; myset = singleton("element"); System.out.println(myset); myset = singleton("another"); System.out.println(myset); } }// HeapDemo ------------------------------------------------------- <cirrus:Programs:2:101> javac HeapDemo.java <cirrus:Programs:2:102> java HeapDemo [element] [another]
HashSet
is allocated in
the singleton
method, it cannot be allocated on the
stack, because it must survive the termination of
singleton
.
Heap memory should be returned to the heap when it is no longer
needed, like the HashSet [element]
was no longer needed
after myset
was reassigned above. Otherwise, a program
that runs long enough might use up the heap and abnormally terminate.
In C and C++, heap memory must be explicitly deallocated with the
operator free(p)
or delete p
, respectively,
where p
is a pointer to the object or data structure
whose memory is no longer needed.
Requiring the programmer to explicitly deallocate heap memory allows for mistakes of failing to deallocate storage that is no longer needed, resulting in programs that eventually use up their heaps, or of attempting to use pointers to memory that has already been deallocated (the "dangling pointer" problem, which is discussed again in Chapter 6).
A more reliable idea is for the programming system itself to deallocate unusable heap memory, a process called "automatic garbage collection", which will be discussed again in Chapter 6. Lisp was the first programming language to perform automatic garbage collection. Java and Python also do it.
list = [10.2, 3.5]
, where the variable storing the
two-element array (not the variable list
) is an implicit
heap-dynamic variable. I am a little skeptical about this category.
+
, with the variable as operand.
+
in the statement x = y + z
. Once
compiled, the information used by the compiler for typing needn't be
retained. Often, the names of variables are not retained after
compilation, so that symbolic debugging of a running program cannot be
done.
@ | is an array | |
a name starting with | $ | is a scalar (a number or string) |
% | is a hash |
a name starting with | I, i, J,
j, K, k, L, l, M, m, N, n | is an integer |
a name starting with | anything else | is a real |
In Fortran, an explicit type declaration can override the implicit naming convention, which could lead to confusion.
The explanation of this error message is<cirrus:Programs:2:109> sml97 Standard ML of New Jersey, Version 110.0.3, January 30, 1998 val use = fn : string -> unit - fun reciprocal(x) = 1.0 / x; val reciprocal = fn : real -> real - fun bad(y) = reciprocal(y) + 2 * y; stdIn:7.14-7.28 Error: operator and operand don't agree [literal] operator domain: int * int operand: int * real in expression: 2 * y
2
is of type int.
Since*
requires both its operands to be of the same type,y
should be an int.
Buty
is the argument of the functionreciprocal
.
Whenreciprocal
was defined, its formal parameter was used in an expression that requires it to be a real.
Therefore the actual argument ofreciprocal
, namelyy
must be real.
That is a type conflict, and a compiler error.
However, it seems better to think of such languages has having typed values, rather than typed variables. For example, the Python Reference Manual says "Objects are Python's abstraction for data...Every object has an identity, a type and a value" [Sect. 3.1], and the ANSI Common Lisp standard says that "Objects, not variables, have types" [ANSI Common Lisp, Section 4.1]. Consider this example:
In this example,cl-user(1): (setf x 33.72 y 7.9) 7.9 cl-user(2): (print (gcd x y)) Error: `33.72' is not of the expected type `integer' [condition type: type-error]
gcd
requires its arguments to be
integers, and a type error results. Yet the type error is about
33.72
, not about x
.
Although Java uses static typing, it also has types associated with values:
bsh % list = new LinkedList(); bsh % list.add("A string"); bsh % list.add(new HashSet()); bsh % print(list.getFirst().getClass()); class java.lang.String bsh % print(list.get(1).getClass()); class java.util.HashSet
Notice that in the Java expression
referenceVariable.method()
, the static type of
referenceVariable
must have the method()
defined for it, or a compile-time error will be issued. However, the
actual class of the dynamic value of referenceVariable
will be used to choose the particular details of the
method()
, if that class is a subclass of the static class
of referenceVariable
.
For example, let ClassA
be a class in which
methodA()
is defined, let ClassA1
,
ClassA2
, and ClassA3
extend
ClassA
and specialize methodA()
and let
varA
be a variable declared to be of type
ClassA
. The expression varA.methodA()
is
syntactically legal regardless of whether the current value of
varA
is an object of type ClassA
,
ClassA1
, ClassA2
, or ClassA3
,
but the value of varA.getClass()
will determine the
specific version of methodA()
used. On the other hand,
if varB
is declared to be of some superclass of
ClassA
for which methodA()
is not defined,
varB.methodA()
will produce a compiler error, unless a
cast is used, such as ((ClassA2)varB).methodA()
.
In at least several languages with typed values, including Common Lisp, Java, JavaScript, and Python the programmer may write code to test the types of values and give reasonable error messages if they are not what was expected, but, without doing this, a type error might only be caught many levels of function calls below where the error actually occurred. Static type-checking generally makes program debugging easier.
This might lead to confusion with pointer or reference
variables.
For example, after the assignment set = new HashSet();
,
above, should we say that the value of set
is a
HashSet
or a reference to a HashSet
? The
latter is the more careful way to speak; the former is more informal.
We will discuss this more when we discuss pointer types in Chapter 6.
Now we will discuss when a variable is first bound to a value, and whether its value binding is allowed to change.
When a variable is bound to an address (memory cell), its value might be whatever bit settings were left in that cell, interpreted according to the variable's type, for example this C program:
or it might be initialized, either to a default value, or to an value explicitly specified in a declaration.#include <stdio.h> int main() { int x; double y; printf("x = %d y = %e\n", x, y); return 0; } ------------------------------------------------------- <cirrus:Programs:1:103> gcc -Wall leftover.c -o leftover.out <cirrus:Programs:1:104> leftover.out x = -4264396 y = 8.485876e-314
"FORTRAN also provides a nice feature ... of initially defining the values of a number of variables in a compact manner, using a DATA statement. The DATA statement is of the form:DATA listofvars1/listofconsts1/[[,]listv2/listc2/]... ... The DATA statement puts the constant values into the variables on the list at compile time" [S. L. Edgar, FORTRAN For The '90's (New York: Computer Science Press) 1992, p. 199-200. italics in the original]
If variable initialization is done at run-time, the initialization
expression will usually be any expression that could be on the
right-hand side of an assignment statement.
We already saw that Java initializes variables to their default
values during compile-time. It can also initialize variables during
run-time to the value of any expression. See the Standard
Sect. 14.4.
Common Lisp:bsh % final double pi = 3.14159; bsh % pi = 3; // Error: Typed variable: pi: Final variable, can't assign : at Line: 3 : in file:: pi = 3
cl-user(1): pi 3.141592653589793d0 cl-user(2): (defconstant mypi 3.14159) mypi cl-user(3): (setf mypi 3) Error: Cannot change the value of mypi -- it is a constant. [condition type: program-error]
A named constant that gets its value binding at compile time is called a manifest constant.
<yeager:2005:1:56> python Python 2.2.1 (#2, Jul 19 2002, 09:50:59) [C] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> print True, False, None 1 0 None >>> True, False, None = "newTrue", "newFalse", "newNone" >>> print True, False, None newTrue newFalse newNone
1
. Fortran
allocates a cell in RAM, stores in integer 1
in that
cell, and makes every occurrence of 1
in the program a
variable whose address is bound to that cell, thus saving memory. Old
versions of Fortran could even change the value of such a variable
dynamically. We will discuss how in Chapter 9.
For example, C's #define <identifier>
<string>
declares a macro expanded by a preprocessor that
runs before the compiler.