The Department of Computer Science & Engineering |
STUART C. SHAPIRO: CSE
305
|
Before discussing what a name looks like, we need to discuss what aren't names:
C
or
*
in column 1 to indicate that the line is a comment.
Other languages typically have one comment symbol to indicate that the rest of the line is a comment, and a pair of brackets to indicate that the enclosing material is a comment. For example,
Language | Rest of Line Comment | Open Comment | Close Comment |
---|---|---|---|
bash | # |
none | none |
C | // |
/* | */ |
C++ | // |
/* | */ |
Fortran90,95 | ! |
none | none |
Java | // |
/* | */ |
Common Lisp | ; |
#| | |# |
Perl | # |
= | =cut |
Prolog | % |
/* | */ |
Python | # |
none | none |
There are also common practices, which IDE's and tools are sometimes
sensitive to. Such as, in Java, the open comment bracket
/**
begins a JavaDoc comment. And in Lisp
;;;
is used at the beginning of a line, for comments
that are outside any function definition;
;;
is used at the beginning of an indented line
(indented like other lines of code), for comments within a function
definition
;
is used after, but on the same line as normal code,
to comment on that line of code.
Bash, Fortran, and Python will normally consider the end of a line to be a statement terminator. They each have a way to explicitly indicate continuation onto the next line.
Fortran ignores spaces (blanks). For example in the Do statement,
if the comma is omitted, the statement will be interpreted as the assignment statementDo 50 n = 1, 9999
Do50n = 19999
Bash uses spaces as separators, especially between a command and its arguments:
bash-2.02$ x=3 bash-2.02$ echo $x 3 bash-2.02$ x = 3 bash: x: command not found
Python uses indentation at the beginning of the line to indicate a block.
if expr: print "Block line 1." print "Block line 2." else: print "In else block." print "Out of Block"
Better: A token is a terminal symbol of the programming language, that the reader (parser) passes to the compiler or interpreter.
a[i]
, the brackets separate the tokens a
and
i
and prevent the expression from looking like the
identifier ai
.
In Python, the commas in [1,2,3]
separate the elements
of the list.
In bash, punction marks are called metacharacters:
metacharacterA character that, when unquoted, separates words. One of the following:
| & ; ( ) < > space tab
[bash man page]
Operators usually separate other tokens. For example, in Java,= > < ! ~ ? : == <= >= != && || ++ -- + - * / & | ^ % << >> >>> += -= *= /= &= |= ^= %= <<= >>= >>>=
x+y
is the same as x + y
. In Lisp, however,
most of these symbols are ordinary characters, so that while (+
x y)
is an expression that evaluates to the sum of
x
and y
, (+xy)
is a call to the
function of no arguments whose name is +xy
, and
3+5
is a variable.
5
and
78.34
, but many languages have literals of other types,
such as the Java boolean literals true
and
false
, and Java's null
.
There is generally an involved syntax for numeric literals,
including optional signs, decimal points, exponentiation marks, and
radix indicators. For example, in C++ and Java 0x57
is a
hexadecimal integer equal to the decimal integer 87
, and
in Lisp, -3745e-2
is a floating point number equal to
-37.45
. Lisp also has literals of a ratio type, such as
3/5
.
"An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. An identifier cannot have the same spelling (Unicode character sequence) as a keyword, boolean literal, or the null literal." [The Java Standard, Section 3.8.]In Fortran77, a name may only be 1-6 letters and/or digits, the first of which must be a letter. Fortran90 allows names up to 31 characters long, and allows them to include the
_
character.
Common Lisp allows names (symbols) to be of arbitrary length, and treats as a name any token that cannot be interpreted as a number. (See the Lisp Hyperspec Section 2.2 Reader Algorithm and Section 2.3.4 Symbols as Tokens) So Lisp names include
Also, the symbols that are operators in other languages, such as1+ /5 ^/- 734ff 89..93
+
and >
are names in Common Lisp.
In fact, Common Lisp treats any character preceded by the escape
character \
to be an alphabetic character. So the
following are also Lisp names
and evenab\(c quo\"te
several\ words\ strung\ together
, which includes
internal spaces. Even the newline character may be included in a Lisp
name if preceded by an escape character.
Common Lisp also includes escape brackets:
|several words strung together|
is the same name as
several\ words\ strung\ together
Common Lisp macro characters, when encountered by the reader, cause the reader to call a function that recursively reads the input file, and returns an object as if the reader read that in the first place.
Moreover, Common Lisp puts the attributes of characters in the
control of the programmer. For example, the programmer could make
(
and )
be considered simple alphabetic
characters, and make [
and ]
serve the role
(
and )
normally do.
Languages also differ about the significance of upper- and
lower-case letters. Most modern languages distinguish between them.
So HashTable
is a different name from Hashtable
.
Prolog considers a name that starts with an upper-case letter to be a variable, while one that begins with a lower-case letter is considered to be a literal symbol.
In Perl, every variable name must start with a "funny character".
The name of a scalar variable, such as one that stores a number or
string, must start with a $
, such as $x
.
The name of a variable whose value is an array must start with an
@
, such as @monthTable
. The name of a
variable whose value is a hash table, called simply a "hash", must
start with a %
, such as %addressBook
.
Fortran allows lower-case letters in names, but considers them equivalent to the upper-case version.
Versions of Common Lisp before ACL version 6 differentiated upper-case from lower-case letters, but automatically upper-cased non-escaped lower-case letters.
Although Emacs-Lisp is not a version of Common Lisp, like current ACL, it differentiates upper- from lower-case letters, and does not change either to the other.
The bash man page says,
Reserved words
are words that have a special meaning to the shell. The following words are recognized as reserved when unquoted and either the first word of a simple command ... or the third word of a case or for command:
! case do done elif else esac fi for function if in select then until while { } time [[ ]]
...
Note that unlike the metacharacters ( and ), { and } are reserved words ... Since they do not cause a word break, they must be separated from [other words] by whitespace.