The Department of Computer Science & Engineering |
STUART C. SHAPIRO: CSE
305
|
40 character BCD:
&-0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZ
FORTRAN II 48 character set:
0123456789 ABCDEFGHIJKLMNOPQRSTUVWXYZ =+-*/().,$'
blank
128 ASCII (American Standard Code for Information Interchange) character set has
32 non-printing characters and 96 printing characters:
blank
!"#$%&`()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\^_`abcdefghijklmnopqrstuvwxyz{|}~
DEL
Unicode character set contains characters for most languages.
<digit> -> 0|1|2|3|4|5|6|7|8|9 <lcletter> -> a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z <ucletter> -> A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z <underscore> -> _
Before discussing what a name looks like, we need to discuss what aren't names:
C
or
*
in column 1 to indicate that the line is a comment.
Other languages typically have one comment symbol to indicate that the rest of the line is a comment, and a pair of brackets to indicate that the enclosing material is a comment. For example,
Language | Rest of Line Comment | Open Comment | Close Comment |
---|---|---|---|
bash | # |
none | none |
C | // |
/* | */ |
C++ | // |
/* | */ |
C# | // |
/* | */ |
Common Lisp | ; |
#| | |# |
Erlang | % |
none | none |
Fortran90,95 | ! |
none | none |
Haskell | -- |
{- | -} |
Java | // |
/* | */ |
Perl | # |
= | =cut |
Prolog | % |
/* | */ |
Python | # |
none | none |
Ruby | # |
=begin | =end |
There are also common practices, which IDE's and tools are sometimes
sensitive to. Such as, in Java, the open comment bracket
/**
begins a JavaDoc comment. And in Lisp
;;;
is used at the beginning of a line, for comments
that are outside any function definition;
;;
is used at the beginning of an indented line
(indented like other lines of code), for comments within a function
definition
;
is used after, but on the same line as normal code,
to comment on that line of code.
Bash, Fortran, Haskell, Python, and Ruby consider the end of a line to be a statement terminator. Some of these have a way to explicitly indicate continuation onto the next line, and a way to indicate that several statements occur on one line.
Fortran ignores spaces (blanks). For example in the Do statement,
if the comma is omitted, the statement will be interpreted as the assignment statementDo 50 n = 1, 9999
Do50n = 19999
Bash uses spaces as separators, especially between a command and its arguments:
bash-2.02$ x=3 bash-2.02$ echo $x 3 bash-2.02$ x = 3 bash: x: command not found
Haskell, Python, and Ruby use indentation at the beginning of the line to indicate a block. This is an example of Python:
if expr: print "Block line 1." print "Block line 2." else: print "In else block." print "Out of Block"
Better: A token is a terminal symbol of the programming language, that the reader (parser) passes to the compiler or interpreter.
a[i]
, the brackets separate the tokens a
and
i
and prevent the expression from looking like the
identifier ai
.
In Python, the commas in [1,2,3]
separate the elements
of the list.
In bash, punction marks are called metacharacters:
metacharacterA character that, when unquoted, separates words. One of the following:
| & ; ( ) < > space tab
[bash man page]
Operators usually separate other tokens. For example, in Java,= > < ! ~ ? : == <= >= != && || ++ -- + - * / & | ^ % << >> >>> += -= *= /= &= |= ^= %= <<= >>= >>>=
x+y
is the same as x + y
. In Lisp, however,
most of these symbols are ordinary characters, so that while (+
x y)
is an expression that evaluates to the sum of
x
and y
, (+xy)
is a call to the
function of no arguments whose name is +xy
, and
3+5
is a variable.
5
and
78.34
, but many languages have literals of other types,
such as the Java boolean literals true
and
false
, and Java's null
.
There is generally an involved syntax for numeric literals,
including optional signs, decimal points, exponentiation marks, and
radix indicators. For example, in C++ and Java 0x57
is a
hexadecimal integer equal to the decimal integer 87
, and
in Lisp, -3745e-2
is a floating point number equal to
-37.45
. Lisp also has literals of a ratio type, such as
3/5
.
"An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. An identifier cannot have the same spelling (Unicode character sequence) as a keyword, boolean literal, or the null literal." [The Java Standard, Section 3.8.]So for Java,
In Fortran77, a name may only be 1-6 letters and/or digits, the first of which must be a letter. Fortran90 allows names up to 31 characters long, and allows them to include the<javaletter> -> <lcletter> | <ucletter> | <underscore> <identifier> -> <javaletter> {<javaletter> | <digit>}
_
character. "Some processors will
allow lower-case as well as upper-case alphabetic characters in names and
programs; in such cases, FORTRAN considers the lower-case letter equivalent to
its upper-case correspondent" [S. L. Edgar, Fortran for the
'90s, W.H. Freeman & Company, 1992, p. 51].The default type of a variable beginning with<name> -> <ucletter> {<ucletter> | <digit>}5
I
,
J
, K
, L
,
M
, or N
is integer, otherwise,
it is real. (The default may be overridden by an explicit type declaration.)
Common Lisp allows names (symbols) to be of arbitrary length, and treats as a name any token that cannot be interpreted as a number. (See the Lisp Hyperspec Section 2.2 Reader Algorithm and Section 2.3.4 Symbols as Tokens) So Lisp names include
Also, the symbols that are operators in other languages, such as1+ /5 ^/- 734ff 89..93
+
and >
are names in Common Lisp.
In fact, Common Lisp treats any character preceded by the escape
character \
to be an alphabetic character. So the
following are also Lisp names
and evenab\(c quo\"te
several\ words\ strung\ together
, which includes
internal spaces. Even the newline character may be included in a Lisp
name if preceded by an escape character.
Common Lisp also includes escape brackets:
|several words strung together|
is the same name as
several\ words\ strung\ together
Common Lisp macro characters, when encountered by the reader, cause the reader to call a function that recursively reads the input file, and returns an object as if the reader read that in the first place.
Moreover, Common Lisp puts the attributes of characters in the
control of the programmer. For example, the programmer could make
(
and )
be considered simple alphabetic
characters, and make [
and ]
serve the role
(
and )
normally do.
Languages also differ about the significance of upper- and
lower-case letters. Most modern languages distinguish between them.
So HashTable
is a different name from Hashtable
.
Erlang and Prolog consider a name that starts with an upper-case letter to be a variable, while one that begins with a lower-case letter is considered to be a literal symbol.
In Haskell, a variable must start with a lower-case letter, and then can have mixed lower-case letters, upper-case-letters, digits, and single quote marks. An underscore is considered a lower-case letter.
<digit> -> 0|1|2|3|4|5|6|7|8|9 <small> -> _|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z <large> -> A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z <varid> -> (<small> {<small> | <large> | <digit> | ' })<reservedid>
In Perl, every variable name must start with a "funny character".
The name of a scalar variable, such as one that stores a number or
string, must start with a $
, such as $x
.
The name of a variable whose value is an array must start with an
@
, such as @monthTable
. The name of a
variable whose value is a hash table, called simply a "hash", must
start with a %
, such as
%addressBook
.
The first character of a Ruby variable indicates its scope.
ANSI Common Lisp and versions of ACL before version 6 differentiate upper-case from lower-case letters, but automatically upper-case non-escaped lower-case letters.
Although Emacs-Lisp is not a version of Common Lisp, like current ACL, it differentiates upper- from lower-case letters, and does not change either to the other.
The bash man page says,
Reserved words
are words that have a special meaning to the shell. The following words are recognized as reserved when unquoted and either the first word of a simple command ... or the third word of a case or for command:
! case do done elif else esac fi for function if in select then until while { } time [[ ]]
...
Note that unlike the metacharacters ( and ), { and } are reserved words ... Since they do not cause a word break, they must be separated from [other words] by whitespace.