UNIVERSITY AT BUFFALO, THE STATE UNIVERSITY OF NEW YORK
The Department of Computer Science & Engineering

STUART C. SHAPIRO: CSE 305

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003

Data Types

The standard definition of data type is (see text, p. 234):

A collection of data objects;
and a set of operations on those objects.

The major steps in the evolution of data types were:

A few basic built-in types, such as integers, reals, and homogeneous arrays.
Fixed size, heterogeneous aggregates (records, structures).
User-defined data types.
Abstract Data Types (ADTs). ("Standard definition" established.)
Objects (in the OO sense).

The rest of this chapter is a survey of data types and their design issues.

Primitive Data Types

are data types not defined in terms of other data types.

Numbers

Integers

Often there is an unsigned type for binary data, and several types of signed integers, differing by length (number of bytes used).

Various coding schemes are possible. Most languages now use binary numbers for positive integers, and twos complement for negative integers.

Fixed-Point

Fixed number of digits with a fixed decimal point position. Used for business applications, including currency.

Represented by binary coded decimal (BCD). Each digit represented by its binary equivalent. For example, 35 in BCD is 0011 0101.

Floating-Point

Called "floating-point" because the decimal point floats so that the number is represented as
[+|-] (1|2|3|4|5|6|7|8|9) . {digit} E [+|-] {digit}

Usually represented using IEEE Floating-Point Standard: sign bit, exponent, fraction ("mantissa"). For more details of number representations, see my CSE115 notes on Java arithmetic.

Usually several types, differing on precision (number of bits used for fractional part).

Operations on numbers will be discussed in Chapter 7.

Booleans

The data type for conditional expressions. There are only two values, true and false.

Only some programming languages have an actual Boolean type with two special values, true and false. C uses the int 0 for false, and any other int for true. Lisp uses nil for false and any other value for true.

Characters

Many languages have a data type for single characters.

Often represented in ASCII, which uses 8 bits, and so can code 128 differet characters.

There is a move, started by Java to use Unicode, which uses 16 bits, and can represent character's from most of the languages in the world.

Strings

The use of strings as a data type grew from the need of strings for output. However, it's one thing to be able to write strings; it's another to be able to store them in variables and operate on them.

Many languages have a data type named something like string, others use arrays of characters. However, strings are usually implemented as arrays of characters.

The length of a string may be stored with the value or the variable, or may be indicated by a sentinal. For example, C and C++ terminate strings with the null character, '\0'.

String concatenation is such a common operation that several languages include an operator for it, such as Java's overloaded +. Java uses concatenation to construct output lines. Other languages use format strings with interpolated control characters.

Some other common operations are: string length; substring extraction; character at position; string comparison; and substring search.

A major issue is whether string operations are destructive (change the argument string) or non-destructive (return a string like the argument string, except...). In Java, Strings are immutable (have no destructive operations), whereas StringBuffers are like Strings, but are mutable:

bsh % str1 = "This is a string.";

bsh % str2 = str1.replace('i', 'y');

bsh % print(str1);
This is a string.

bsh % print(str2);
Thys ys a stryng.

bsh % str3 = new StringBuffer("This is a string.");

bsh % print(str3);
This is a string.

bsh % str4 = str3.replace(8,9,"another");

bsh % print(str3);
This is another string.

bsh % print(str4);
This is another string.

Common Lisp has one kind of string, but both destructive and non-destructive operations:

cl-user(1): (setf str1 "This is a string.")
"This is a string."

cl-user(2): (setf str2 (substitute #\y #\i str1))
"Thys ys a stryng."

cl-user(3): str1
"This is a string."

cl-user(4): str2
"Thys ys a stryng."

cl-user(5): (setf str2 (nsubstitute #\y #\i str1))
"Thys ys a stryng."

cl-user(6): str1
"Thys ys a stryng."

cl-user(7): str2
"Thys ys a stryng."

A string's length may be static, as is Java's String, dynamic, as is Java's Stringbuffer, or limited dynamic, as the text says C's are. However, the program

#include <stdio.h>
#include <string.h>

#define true 1

int main() {
  char str[10];
  int i = 0;
  while (true) {
    str[i++] = 'a';
    str[i] = '\0';
    printf("str = %s; Its length is %d\n", str, strlen(str));
  }
  return 0;
}

----------------------------------------------
<cirrus:Programs:1:102> gcc -Wall dstrlen.c -o dstrlen.out

<cirrus:Programs:1:103> dstrlen.out
str = a; Its length is 1
str = aa; Its length is 2
str = aaa; Its length is 3
str = aaaa; Its length is 4
str = aaaaa; Its length is 5
str = aaaaaa; Its length is 6
str = aaaaaaa; Its length is 7
str = aaaaaaaa; Its length is 8
str = aaaaaaaaa; Its length is 9
str = aaaaaaaaaa; Its length is 10
str = aaaaaaaaaaa; Its length is 11
str = aaaaaaaaaaaa; Its length is 12

was an infinite loop. When I killed it, str had a length of 1,598. Of course, this is C not doing range checking on arrays, again.

Pattern matching is a common operation on strings that is a very involved subject. A large part of Perl is devoted to pattern matching. Java has an extensive pattern matching capability in the package java.util.regex. C++ also has a pattern matching library. (X)Emacs supports regular expression pattern matching for searching and replacing strings. For example, the regular expression <[^>]*> will match html tags.

User-Defined Types

A user-defined type is a data type with a user-declared name. For example, in C:

#include <stdio.h>

#define KperM 0.62137
#define MperK 1.60935

typedef float kilometer;
typedef float mile;

kilometer MtoK(mile x) {
  return x * MperK;
}

mile KtoM(kilometer x) {
  return x * KperM;
}

int main() {
  mile m = 100;
  kilometer k = 100;
  printf("%3.0f miles = %5.2f kilometers.\n", m, MtoK(m));
  printf("%3.0f kph = %5.2f mph.\n", k, KtoM(k));
  return 0;
}

----------------------------------------------------------------
<cirrus:Programs:1:114> gcc -Wall conversion.c -o conversion.out

<cirrus:Programs:1:115> conversion.out
100 miles = 160.93 kilometers.
100 kph = 62.14 mph.

In C, the typedef identifier is a synonym for its parent type. However, that is not true in all languages with user-defined types. If the new type identifier is not a synomym, a question is, is name type compatibility used, or structure type compatibility.

In name type compatibility, two expressions having compatible types depends on the type identifier, even if the parent types are the same. In structure type compatibility, it depends on the parent types. For example, in the Ada-like type declarations

type array1type is array(1..10) of Integer;
type array2type is array(11..20) of Integer;

A: array1type;
B: array2type;

A and B do not have compatible types under name type compatibility, but do under structure type compatibility.

Some languages use name type compatibility, some use structure type compatibility, and some have facilities for both.

If a variable is declared with a type expression, such as

A: array(1..10) of Integer;

the variable is considered to have an anonymous type.

Ordinal Types

An ordinal type is one whose values can be mapped to the natural numbers, such as char. The integer types are also considered ordinal types, although the signed integers also have negatives. The important thing is that, except for the minimal value, every value of an ordinal type is the successor of a value of its type, and, except for the maximal value, every value of an ordinal type is the predecessor of a value of its type. So one should be able to use any ordinal type as an array subscript, or as a for loop index.

Enumeration Types

An enumeration type is an ordinal type whose values are identifiers chosen by the programmer. For example, in C

#include <stdio.h>

enum months {Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec};

int monLength[12] = {31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};

char* monName[12] = {"January", "February", "March", "April",
		      "May", "June", "July", "August",
		      "September", "October", "November", "December"};

int main() {
  enum months m;
  for (m = Jan; m <= Dec; m++) {
  printf("%s has %d days.\n", monName[m], monLength[m]);
  }
  return 0;
}

---------------------------------------------------------------
<cirrus:Programs:2:106> gcc -Wall enumtest.c -o enumtest.out

<cirrus:Programs:2:107> enumtest.out
January has 31 days.
February has 28 days.
March has 31 days.
April has 30 days.
May has 31 days.
June has 30 days.
July has 31 days.
August has 31 days.
September has 30 days.
October has 31 days.
November has 30 days.
December has 31 days.

As is usual for C, the enumeration type is treated just like int and its values are treated like int values. C++, though is more careful:

#include <stdio.h>

enum months {Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec};

enum days {Sun, Mon, Tue, Wed, Thur, Fri, Sat};

int main() {
  enum months m;
  enum days d = Thur;
  m = d;
  printf("It ran.\n");
  return 0;
}

----------------------------------------------------------------
<cirrus:Programs:2:111> g++ -Wall enumtest.cc -o enumtest.out+
enumtest.cc: In function `int main()':
enumtest.cc:18: cannot convert `days' to `months' in assignment

Compare enumeration types to Java's named constants, such as java.awt.Color.blue. Java's named constants achieve much of the effect of enumeration types, but it takes more code to create them:

public class Month {
    private String name;
    private int length;
    private int ordinal;

    private static Month[] calendar = new Month[12];
    public static final Month Jan = new Month(1, "January", 31);
    public static final Month Feb = new Month(2, "February", 28);
    public static final Month Mar = new Month(3, "March", 31);
    public static final Month Apr = new Month(4, "April", 30);
    public static final Month May = new Month(5, "May", 31);
    public static final Month Jun = new Month(6, "June", 30);
    public static final Month Jul = new Month(7, "July", 31);
    public static final Month Aug = new Month(8, "August", 31);
    public static final Month Sep = new Month(9, "September", 30);
    public static final Month Oct = new Month(10, "October", 31);
    public static final Month Nov = new Month(11, "November", 30);
    public static final Month Dec = new Month(12, "December", 31);

    public Month (int ord, String n, int ln){
	ordinal = ord;
	name = n;
	length = ln;
	calendar[ordinal-1] = this;
    }
    
    public String toString() {
	return name;
    }

    public int getLength() {
	return length;
    }

    public int getOrdinal() {
	return ordinal;
    }

    public Month getNext() {
	return calendar[ordinal];
    }

    public boolean leq(Month m) {
	return getOrdinal() <= m.getOrdinal();
    }

    public static void main (String[] args) {
	for (Month m = Jan; m.leq(Dec); m = m.getNext()) {
	    System.out.println(m + " has " + m.getLength() + " days.");
	    if (m == Dec) break;
	}
    } // end of main ()
    
}// Month

-------------------------------------------------
<cirrus:Programs:1:142> javac Month.java

<cirrus:Programs:1:143> java Month
January has 31 days.
February has 28 days.
March has 31 days.
April has 30 days.
May has 31 days.
June has 30 days.
July has 31 days.
August has 31 days.
September has 30 days.
October has 31 days.
November has 30 days.
December has 31 days.

Subrange Types

A subrange type is a consecutive set of values of some ordinal type. They can be used for subranges of enumeration types. Here's an Ada example similar to the text's:

type Days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun)

subtype WeekDays is Days range Mon..Fri;
sybtype WeekendDays is Days range Sat..Sun;

Day1: Days;
Day2: WeekDays;
Day3: WeekendDays;

Day1 := Day2 and Day1 := Day3 are legal.
Day2 := Day3 and Day3 := Day2 are illegal.
Day2 := Day1 or Day3 := Day1 are only legal if Day1 has a proper value at run-time.

Subrange types are particularly useful for the indexes of arrays, such as

subtype arrayIndex is Integer range 1..100;
squares: array(arrayIndex) of Integer;

and for the indexes of for loops, such as

for i in arrayIndex loop
  squares[i] := i*i;
end loop;

Symbols

Symbols is a data type in Common Lisp whose values are all the identifiers. It is not an ordinal type, as no symbol has any natural relation to any other.

cl-user(1): (type-of 3)
fixnum

cl-user(2): (type-of 3.7)
single-float

cl-user(3): (type-of 'January)
symbol

cl-user(4): (setf monLength
	      '((January 31) (February 28) (March 31) (April 30)
		(May 31) (June 30) (July 31) (August 31)
		(September 30) (October 31) (November 30) (December 31)))
((January 31) (February 28) (March 31) (April 30) (May 31) (June 30) (July 31)
 (August 31) (September 30) (October 31) ...)

cl-user(9): (let (m)
	      (loop
		(format t "Enter a month or `bye': ")
		(setf m (read))
		(if (eql m 'bye)
		    (return 'Goodbye))
		(format t "~A has ~D days.~%" m (second (assoc m monLength)))))
Enter a month or `bye': March
March has 31 days.
Enter a month or `bye': June
June has 30 days.
Enter a month or `bye': bye
Goodbye

A symbol is like an OO object; among other instance variable-like components are its name, value, and function:

cl-user(27): (setf Fibonacci 11235)
11235

cl-user(28): (defun Fibonacci (n)
	       (if (< n 3)
		   1
		 (+ (Fibonacci (- n 1))
		    (Fibonacci (- n 2)))))
Fibonacci

cl-user(29): (symbol-name 'Fibonacci)
"Fibonacci"

cl-user(30): (symbol-value 'Fibonacci)
11235

cl-user(31): Fibonacci
11235

cl-user(32): (symbol-function 'Fibonacci)
#<Interpreted Function Fibonacci>

cl-user(33): (type-of (symbol-function 'Fibonacci))
function

cl-user(34): (Fibonacci 10)
55

Few other programming languages have a symbol data type.

Array Types

An array is an aggregate of data values, called elements of the array, with the following properties:

It is a homogeneous aggregate. That is, all the data values are of the same type.
The data values can be randomly accessed. That is, access to any element is just as fast as to any other.
The elements are accessed via a sequence of one or more indexes (subscripts), which are values of some ordinal type.
The values of the subscripts may be computed at run-time.

The ability to compute subscripts makes a subscripted array like a variable name that can be computed. More precisely, a subscripted array is an expression evaluated for its l-value. Compare these two subroutines for the Fibonacci sequence:

#include <stdio.h>

int fibonacci(int n) {
  if (n<3) return 1;
  int previous = 1,
    current = 1,
    next, i;
  for (i=3; i<=n; i++) {
    next = previous + current;
    previous = current;
    current = next;
  }
  return current;
}

int Fibonacci (int n) {
  if (n<3) return 1;
  int num[3] = {1,1},
    current = 1,
    i;
    for (i=3; i<=n; i++) {
      current = (current + 1) % 3;
      num[current] = num[(current + 1) % 3] + num[(current + 2) % 3];
    }
    return num[current];
}

int main() {
  int i;
  for (i=1; i<8; i++)
    printf("fibonacci(%d) = %d\n", i, fibonacci(i));
  printf("\n");
  for (i=1; i<8; i++)
    printf("Fibonacci(%d) = %d\n", i, Fibonacci(i));
  return 0;
}
------------------------------------------------
<cirrus:Programs:1:140> gcc -Wall indexdemo.c -o indexdemo.out

<cirrus:Programs:1:141> indexdemo.out
fibonacci(1) = 1
fibonacci(2) = 1
fibonacci(3) = 2
fibonacci(4) = 3
fibonacci(5) = 5
fibonacci(6) = 8
fibonacci(7) = 13

Fibonacci(1) = 1
Fibonacci(2) = 1
Fibonacci(3) = 2
Fibonacci(4) = 3
Fibonacci(5) = 5
Fibonacci(6) = 8
Fibonacci(7) = 13

An array can be thought of as a mapping, or even a function. For example, the C array monLength, above, is a mapping from a month's ordinal, 0..11, to its length. The Common Lisp use of monLength is more directly represented as a mapping. An array might also be thought of as a function from a month's ordinal to its length. This may be clearer by comparing the use of monLength in the C program to the use of getLength() in the Java Month class.

Most current programming languages use parentheses around the arguments of a function, e.g. f(x), and brackets around the subscripts of an array, e.g. a[i], but Fortran and Ada use parentheses for arrays also. Thinking of an array as a function justifies this, but most programmers find it confusing.

Common Lisp, as usual uses a more functional notation:

cl-user(33): (setf a (make-array 10))
#(nil nil nil nil nil nil nil nil nil nil)

cl-user(34): (setf days #(Sun Mon Tue Wed Thu Fri Sat))
#(Sun Mon Tue Wed Thu Fri Sat)

cl-user(35): (aref days 3)
Wed

cl-user(36): (setf (aref a 2) 5)
5

cl-user(37): (aref a 2)
5

Some programming languages, including Java and Common Lisp, do range-checking. That is, they give a run-time error if the program tries to use an out-of-range subscript. Others, including C, Perl, and Fortran, do not. A programming language that does range checking is clearly more reliable.

Some programming languages have a fixed lowest subscript: in C-based languages, it is 0; in Fortran, it is 1. Others allow the programmer to choose the lowest subscript.

The array subscript range might be statically bound (during compile-time); dynamically bound (during run-time), but then fixed; or fully dynamic (might change during run-time).

Array storage binding might be static, dynamic on the stack, or dynamic on the heap.

Some languages provide a convenient way to initialize arrays, such as the C-based languages,

int[] squares = {0, 1, 2, 9, 16, 25};

However, one must distinguish whether the {...} notation is a general array-valued constructor, allowed on the rhs of assignment statements, or only a special syntax for declaration statements.

Some languages provide array operations, i.e., operations on arrays themselves. For example, in Fortran:

      Program arrayop

      Integer A1(5), A2(5), A3(5), A4(5)
      Data A1 /1, 2, 3, 4, 5/ A2 /6, 7, 8, 9, 10/
      A3 = A1 + A2
      A4 = A1 * A2

      Print *, A1
      Print *, A2
      Print *, A3
      Print *, A4
      End

------------------------------------
<cirrus:Programs:2:109> f77 -o arrayop.fout arrayop.f
NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o arrayop.fout arrayop.f
arrayop.f:
 MAIN arrayop:

<cirrus:Programs:2:110> arrayop.fout
   1  2  3  4  5
   6  7  8  9  10
   7  9  11  13  15
   6  14  24  36  50

APL is A Programming Language specially designed to operate on arrays.

Two-dimensional arrays may be thought of as solid rectangles (rectangular arrays), or as arrays of arrays (jagged arrays). Some languages insist the programmer think of arrays one way, some the other, and some support both.
Rectangular arrays are indexed with one pair of brackets, such as a[i, j].
Jagged arrays are indexed with two pairs of brackets, such as a[i][j].

Java supports only jagged arrays:

bsh % int[][] a = new int[3][4];

bsh % print(a.length);
3

bsh % print(a[1].length);
4

Note that a is a 3-element array of 4-element arrays. It is usual to also think of this as 3 rows of 4 columns each:

bsh % for (int i=0; i<3; i++) for (int j=0; j<4; j++) a[i][j] = 10*i+j;

bsh % for (int i=0; i<3; i++) {
	for (int j=0; j<4; j++) {System.out.print(a[i][j] + " ");}
	System.out.println();}
0 1 2 3 
10 11 12 13 
20 21 22 23

An array stored so that all the elements of the first row are stored before all the elements of the second row, etc. is referred to as stored in row major order.
We can see this clearly in C:

#include <stdio.h>

int a[3][4];

int main() {
  int i,j;

  for (i=0; i<3; i++) {
    for (j=0; j<<4; j++) {
      a[i][j] = 10*i + j;
    }
  }

  for (i=0; i<12; i++) {
    printf("%3d", *(a[0] + i));}

  printf("\n");
  return 0;
}

--------------------------------------
<cirrus:Programs:2:125> gcc -Wall arrayorder.c -o arrayorder.out

<cirrus:Programs:2:126> arrayorder.out
  0  1  2  3 10 11 12 13 20 21 22 23

This shows that C stores arrays in row major order.

Let's try Fortran:

      Program arrayorder

      Integer A(3,4)

      Do 50 i = 1, 3
         Do 50 j = 1, 4
            A(i,j) = 10*i + j
 50   Continue

      Print *, A
      End

-----------------------------------------
<cirrus:Programs:2:127> f77 -o arrayorder.fout arrayorder.f
NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o arrayorder.fout arrayorder.f
arrayorder.f:
 MAIN arrayorder:

<cirrus:Programs:2:128> arrayorder.fout
   11  21  31  12  22  32  13  23  33  14  24  34

Fortran stores arrays in column major order. Since Fortran and C programs can easily call each other, this is an important difference.

Jagged arrays needn't have every row have the same number of columns.

Fortran 95 and Ada allow references to a slice of an array--- a more or less regular piece of an array.

The entire discussion of two-dimensional arrays extends to multi-dimensional arrays.

Associative Arrays

Associative arrays, also called maps in C++, hash tables in Common Lisp, Maps in Java, and hashes in Perl, are generalizations of arrays for which the "index" can be any type. The "index" is called a key, and the element stored with the key is called the value. Here is a use of Perl's hashes to print the length of all the months:

#! /util/bin/perl

@months = ("January", "February", "March", "April",
	   "May", "June", "July", "August",
	   "September", "October", "November", "December");

%monLength = ("January" => 30, "February" => 28, "March" => 31, "April" => 30,
	      "May" => 31, "June" => 30, "July" => 31, "August" => 31,
	      "September" => 30, "October" => 31, "November" => 30,
              "December" => 31);

foreach $month (@months) {
  print "$month has $monLength{$month} days.\n";
}

-----------------------------------------------------
<cirrus:Programs:1:176> perl months.perl
January has 30 days.
February has 28 days.
March has 31 days.
April has 30 days.
May has 31 days.
June has 30 days.
July has 31 days.
August has 31 days.
September has 30 days.
October has 31 days.
November has 30 days.
December has 31 days.

Record Types

Records, first introduced in COBOL, may be thought of as primitive object classes:

They have instance variables.
They have set and get methods.
They have no other methods.
They do not support inheritance.

C and C++ calls them structs. Common Lisp calls them structures. Note that C++ and Common Lisp have true, modern, objects as well. See the text for more details.

Union Types

A semi-organized way to allow some variables to be different types at different times, even though they are statically typed. Not very safe. See the text.

Pointer and Reference Types

The set of data objects in the pointer type is the set of memory addresses plus nil, which is an explicitly invalid address. That is, the value bound to a variable whose type is a pointer type is either a memory address or nil.

It is most common for a pointer variable to be an address of a memory cell in the heap, but C and C++ also allow addresses in RAM or on the stack.

Fortran 77 (and earlier) does not have pointer types, but they can be simulated by using one array for data and a separate array of indices into the first array as the pointers.

How can a pointer variable contain an address in RAM or on the stack? Addresses in RAM or on the stack are allocated when variables are declared. If ptr is a pointer variable, we want: ptr := <expression>, but <expression> would be evaluated for its r-value. So we need something that says "evaluate this expression for its l-value." In C and C++ that operator is &, and its operand must be an expression that could be on the left-hand side of an assignment statement.

In statically scoped languages, the declaration of a pointer variable must include the type of variable it points to.

If x is a variable and ptr is a pointer variable, what is the meaning of x := ptr?

If x is also a pointer variable, it's a simple assignment statement.
If x is not a pointer variable, it's either an error or the compiler must know that ptr is to be dereferenced. C and C++ use * as an explicit dereferencing operator. Fortran 95 does implicit dereferencing.

Here's a C program using a pointer whose value is an address in the stack:

#include <stdio.h>
int* ptr;

void sub1() {
  int x, y;
  x = 3;
  ptr = &x;
  y = *ptr;
  printf("x = %2d; y = %2d.\n", x, y);
}

void sub2() {
  int z = 5;
  printf("z = %2d.\n", z);
}

void sub3() {
  printf("*ptr = %2d.\n", *ptr);
}

int main() {
  sub1();
  sub2();
  sub3();
  return 0;
}

--------------------------------------------
<cirrus:Programs:1:193> gcc -Wall pointerTest.c -o pointerTest.out

<cirrus:Programs:1:194> pointerTest.out
x =  3; y =  3.
z =  5.
*ptr =  5.

Here is an example in Fortran 95, showing implicit dereferencing:

      Program pointerTest

      Integer, Pointer :: ptr
      Integer, Target :: x
      Integer :: y

      x = 3
      ptr => x
      y = ptr
      Print *, "x = ", x, "y = ", y
      End

-----------------------------------------------------
<cirrus:Programs:1:237> f95 -o pointerTest.fout pointerTest.f

<cirrus:Programs:1:238> pointerTest.fout
 x =  3 y =  3

Pointer arithmetic is allowed in C and C++. If ptr is of type typ, and i is of type int, the expression ptr + i evaluates to the address i*sizeof(typ) beyond ptr.

In C and C++, an array name is a constant pointer to the first element of the array, so subscripting is done by pointer arithmetic, and pointer expressions may replace subscripted arrays.

Anonymous variables on the heap are manipulated via pointers. The allocation operators new, in Java and C++, and malloc(size), in C, return pointers to the newly allocated heap memory.

Many novice C programmers find pointers to be confusing, but "if everything is a pointer, you don't have to think about pointers," and that is the approach taken by Lisp and Java. In those languages, you can think you are storing an object (or, at worst, a reference to an object) in a variable. You just have to remember that a change made via one reference variable may be seen via another reference variable.

The dangling pointer problem is the problem of a pointer variable pointing to a memory cell that was already deallocated via another pointer variable (and possibly even reused).

This C program shows that a pointer may be mistakenly used, even though the space it points to has been deallocated:

#include <stdio.h>
#include <malloc.h>

int* ptr;

int main() {
  ptr = malloc(sizeof(int));
  *ptr = 3;
  free(ptr);
  printf("*ptr = %2d\n", *ptr);
  return 0;
}

---------------------------------------------------
<cirrus:Programs:1:253> gcc -Wall danglingTest.c -o danglingTest.out

<cirrus:Programs:1:254> danglingTest.out
*ptr =  3

The dangling pointer problem is commonly solved by removing explicit deallocation from the programmer.

The problem of lost heap-dynamic variables (garbage) is the problem of memory cells allocated on the heap becoming unreachable when the pointer variables referring to them end their lifetime or get reassigned to other heap memory. This problem is solved in Lisp and Java by automatic garbage collection.

CSE 305 Programming Languages Lecture Notes Stuart C. Shapiro Fall, 2003

Data Types

Stuart C. Shapiro <shapiro@cse.buffalo.edu>

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003