The Department of Computer Science & Engineering
cse@buffalo

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003


Names

Names, or identifiers, are used for variables, subprograms (or methods), types, classes, etc.

Before discussing what a name looks like, we need to discuss what aren't names:

Comments
Fortran77, which is line oriented uses a C or * in column 1 to indicate that the line is a comment.

Other languages typically have one comment symbol to indicate that the rest of the line is a comment, and a pair of brackets to indicate that the enclosing material is a comment. For example,
LanguageRest of Line Comment Open CommentClose Comment
C// /**/
C++// /**/
Fortran90!   
Java// /**/
Common Lisp; #||#
Perl# ==cut
Prolog% /**/
(Perl's comment brackets must be at the beginning of a line where a statement would be legal.)

There are also common practices, which IDE's and tools are sometimes sensitive to. Such as, in Java, the open comment bracket /** begins a JavaDoc comment. And in Lisp

Whitespace
Whitespace includes spaces, and other characters that act as separators, such as newlines and tabs.

Fortran, however, treats the newline as indicating the end of a statement, unless it is continued by any character other than a blank or 0 in column 6. Fortran also ignores spaces (blanks). For example in the Do statement,

Do 50 n = 1, 9999
if the comma is omitted, the statement will be interpreted as the assignment statement
Do50n = 19999

Tokens
A token is "a sequence of characters with a unit of meaning." [Wall, Christiansen & Orwant, Programming Perl, p. 49]

Punctuation
Punctuation, sometimes called separators, are non-whitespace characters that separate other tokens. They may include parentheses, brackets, and semicolons. For example in the expression a[i], the brackets separate the tokens a and i and prevent the expression from looking like the identifier ai.

Operators
Operators include the numeric, relational, boolean, and other operators of the language. For example, the 37 operators of Java are
=	>	<	!	~	?	:
==	<=	>=	!=	&&	||	++	--
+	-	*	/	&	|	^	%
<<	>>	>>>
+=	-=	*=	/=	&=	|=	^=	%=
<<=	>>=	>>>=
Operators usually separate other tokens. For example, in Java, x+y is the same as x + y. In Lisp, however, most of these symbols are ordinary characters, so that while (+ x y) is an expression that evaluates to the sum of x and y, (+xy) is a call to the function of no arguments whose name is +xy.

Numbers and other literals
Literals are tokens that the compiler recognizes as particular data values. They include numbers such as 5 and 78.34, but many languages have literals of other types, such as the Java boolean literals true and false, and Java's null.

There is generally an involved syntax for numeric literals, including optional signs, decimal points, exponentiation marks, and radix indicators. For example, in C++ and Java 0x57 is a hexadecimal integer equal to the decimal integer 87, and in Lisp, -3745e-2 is a floating point number equal to -37.45. Lisp also has literals of a ratio type, such as 3/5.

Names (Identifiers)
Names, or identifiers, are used for variables, subprograms (or methods), types, classes, etc. Different languages have different rules for the formation of identifiers. In Java,
"An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. An identifier cannot have the same spelling (Unicode character sequence) as a keyword, boolean literal, or the null literal." [The Java Standard, Section 3.8.]
In Fortran77, a name may only be 1-6 letters and/or digits, the first of which must be a letter. Fortran90 allows names up to 31 characters long, and allows them to include the _ character. Common Lisp allows names to be of arbitrary length, and treats as a name any token that cannot be interpreted as a number. So Lisp names include
1+     /5     ^/-     734ff     89..93 
Also, the symbols that are operators in other languages, such as + and > are names in Common Lisp. In fact, Common Lisp treats any character preceded by the escape character \ to be an alphabetic character. So the following are also Lisp names
ab\(c     quo\"te
and even several\ words\ strung\ together, which includes internal spaces. Even the newline character may be included in a Lisp name if preceded by an escape character.

Common Lisp also includes escape brackets:
|several words strung together|
is the same name as
several\ words\ strung\ together

Moreover, Common Lisp puts the attributes of characters in the control of the programmer. For example, the programmer could make ( and ) be considered simple alphabetic characters, and make [ and ] serve the role ( and ) normally do.

Languages also differ about the significance of upper- and lower-case letters. Most modern languages distinguish between them. So HashTable is a different name from Hashtable.

Prolog considers a name that starts with an upper-case letter to be a variable, while one that begins with a lower-case letter is considered to be a literal symbol.

In Perl, every variable name must start with a "funny character". The name of a scalar variable, such as one that stores a number or string, must start with a $, such as $x. The name of a variable whose value is an array must start with an @, such as @monthTable. The name of a variable whose value is a hash table, called simply a "hash", must start with a %, such as %addressBook.

Fortran allows lower-case letters in names, but considers them equivalent to the upper-case version.

Versions of Common Lisp before ACL 6 differentiated upper-case from lower-case letters, but automatically upper-cased non-escaped lower-case letters.

Although Emacs-Lisp is not a version of Common Lisp, like ACL 6, it differentiates upper- from lower-case letters, and does not change either to the other.

Keywords and Reserved Words
Keywords and reserved words are tokens that look like identifiers, but whose use is restricted. The text distinguishes them by saying that a keyword is restricted in only certain contexts, whereas a reserved word may never be used as an identifier. However, what Java calls keywords would be reserved words by this definition. When starting with a new programming language, finding the list of keywords and reserved words and what their restrictions are is as important as finding out what the comment symbols are.

First Previous Next

Copyright © 2003 by Stuart C. Shapiro. All rights reserved.

Stuart C. Shapiro <shapiro@cse.buffalo.edu>