Java is designed to be a simple and pure object-oriented language. It represents a programming style that has also found favor among C++ programmers. Although Java claims to have no pointers, a good way to understand both Java's behavior and its relation to C++ is: Java is what you get if you make everything (other than numbers and characters) a pointer! Put another way:
There is a fairly straightforward way of translating raw Java code into C++ code that behaves the same way (or should behave...)
Calls to standard Java library functions of course cannot be translated directly, but even here we shall see that the new ANSI C++ Standard (1999) and Java converged on a mostly common look for strings and vectors/arrays. The C++ code you will get by this translation is perfectly readable and well-motivated. It misses many opportunities for optimization that the C++ language provides with its many features---but part of the rationale for keeping Java simple is that those optimizations matter less on today's machines compared to yesterday's. Once you are comfortable with the C++ language, then we will examine the special C++ ways of doing things. Here are the guts of the translation:
For class data members, just follow the declaration rules above, so that they are all either primitive types or pointers to class objects. C++ offers the ability to have objects of one class A be members of other classes B "by value" rather than "by reference/pointer"---indeed, this uses the same *-less syntax as Java. The upside for "by-value" is to save one extra step of de-referencing a pointer, but the downside is that the compiler needs to know the compiled size of A objects in order to compile B! When you have a million lines of code, re-compilation time tends to be measured in hours and days rather than minutes, and too many by-value members start to become a pain in the compilation-dependence chart. Moreover, compilers are getting smarter about saving that step, and processors know that since a pointer de-reference is not a branch, it's hunky-dory for speculative lookahead execution anyway! [Later we will show and do by-value members, but this goes to say that our baby-first-steps approach to classes with everything as pointers is not bad, which is why Java got away with making it the only option to begin with!]
class Foo{ Bar myField; static final boolean MUTATIONS_ALLOWED = true; public Foo(Bar incoming){ myField = incoming; } public Bar getMyField(){ return myField; } void alterMyField(Bar packageEyesOnly){ myField = packageEyesOnly; } }
becomes the C++ code
#include "Bar.h" ///to make class Bar visible class Foo{ Bar* myField; static const bool MUTATIONS_ALLOWED; ///must be defined outside class! public: explicit Foo(Bar* incoming) : myField(incoming) ///what problem do we have?... {} virtual ~Foo() { delete(myField); } ///...see file for the fix! virtual Bar* getMyField() const; //code can be elsewhere, in "Foo.cc" private: //C++ does not have "package scope" :-( virtual void alterMyField(Bar* thisClassAndFriendsOnly); //NOT "const" since it alters the invoking object! }; //don't forget the semicolon!
It is typical to "inline" constructor code as in Java, but the bodies of member functions are often not inlined within the class---see under "Files" next. Whereas Java's strict filename and directory conditions tell it where many files are, in C++ you must give an #include statement for classes in other files that you wish to use---as also described next.
C++ files come in two flavors: so-called header files with the universal extension .h, and code files with extensions variously given as .cc or .C or .cpp etc. We prefer the first. The above class code goes in a file that should be called Foo.h. (It need not be---the name is not mandatory the way Foo.java would be in Java---but it should be called that.) Then the bodies of methods that were declared but not defined inside the class go in an associated file Foo.cc. Whereas all code for methods of a Java class must go between the class' braces, long C++ methods are supposed to be defined outside the class, in a separate file. Here is our Foo.cc:
#include <iostream> ///For ANSI libraries drop the ".h" #include "Bar.h" ///redundant as Foo.h has it, but useful. #include "Foo.h" ///need to include class' own header file! const bool Foo::MUTATIONS_ALLOWED = true; ///don't say "static" here! Bar* Foo::getMyField() const { ///don't say "virtual" here! return myField; } void Foo::alterMyField(Bar* thisClassAndFriendsOnly) { if (MUTATIONS_ALLOWED) { myField = thisClassAndFriendsOnly; } else { cerr << "Attempt to modify a class that doesn't want it\n"; } ///cerr is the name of the standard error-message stream. }
Note that the method bodies are not between any braces at all, but "float freely in the global file space". The class they belong to is indicated by prefixing the classname followed by ::, instead of Java's sticking to . for class members. (Discussion of << and C++ streams will come later.)
Unlike Java, C++ allows one to have "global" data and functions, meaning ones not defined inside a class and not local to another function. Of course C must have these functions, and C++ is compatible with C. However, we will follow Java usage and not allow you to have any global functions---other than main and certain "friends" of classes that are defined at file-scope "for symmetry".
In addition, one C++ .cc file generally has no .h counterpart---the one with the "main program" in it. Whereas in Java execution begins with the method
public static void main (String[] args){...}
in the so-called "main class", C++ execution begins by calling the function
int main (int argc, char** argv){...}
just like in C. You can call the file containing "main" anything you like, but if your Java main class would be called (say) MyProg, then the corresponding C++ file should be called myprog.cc (all-lowercase). Then "g++ myprog.cc" would be similar to "javac Myprog.java", except that whereas Java preserves your name as Myprog.class, C++ would give your executable the vanilla name a.out. To name your executable myprog, the command is "g++ -o myprog myprog.cc". There are a zillion options on most C++ compilers, most of them variations that Java disallows for standardness and sanity, and you might quickly find this way of doing things tiresome. Fortunately, you can batch all your compile and linking commands in a handy "make file" in the same directory as your myprog.cc that automates this process for you---see below. (Footnote on "int main()" and "argc,argv".)
Unlike Java, the C++ system enforces no correspondence between class name and filename/location. The C++ compiler needs to be told where things live, and this is done in two ways:
(i) #include statements at the top of files. Technically these are directives to the C++ pre-processor to do a literal substitution of the text of the referenced file at this spot, but you can think of it as working like import in Java. Currently, an ANSI Standard library header looks like
#include <string>
without the ".h", because
#include <string.h>
would refer to the older Kernighan-Ritchie C "string.h" library, which C++ retains for compatibility with past C++ programs but which we do not want you to use. Files residing in user paths rather than system paths should use "..." not <...>. If Foo.h is in a subdirectory Bar of the file containing main, then you write
#include "Bar/Foo.h"
on a UNIX system such as ours. C++ does not have a standard CLASSPATH variable to make file searching easier as Java does, but most C++ environments provide the equivalent or automate the organization and linking of class files for you.
A riddle: if #include is literal text inclusion, why don't you get errors from multiple declarations of the same items? The reason is that every .h file uses a standard-but-not-automatic protocol. It defines a "system constant" that is often the name of the file in ALLCAPS followed by __H. (Some texts recommend putting some underscores before the name too, but nowadays this usage is reserved for system and commercial libraries.) For example, Foo.h should have the structure:
/** File Foo.h, by KWR. OO simulation of the WWII "Foo Fighters" as UFOs. #ifndef FOO__H #define FOO__H <body of file> #endif ///this would be the last line of the file.
This says that if the environment constant FOO__H is not already defined, then define it now and include the text of the file, but if so, do nothing. Since this structure is simple and expected, one does not need to indent the code between #ifndef and #endif as one does in an if-statement. This ensures that every .h file is included exactly once, thus freeing the user to declare the needs of each file without worrying about the inclusion order. (Footnote on const versus #define and on C++ namespaces.)
(ii) The make utility. Code in .cc files, however, is usually not #included. Doing so would (re-)compile it in one big gulp and lose the benefits of separate compilation. However, the separately compiled code then needs to be linked. Java hides the linking step from users because its strict filename requirements automatically tell it where referenced files live and how to link them.
The details of make syntax can be imitated without needing to understand them in full---we will provide examples to build on and they will be covered in recitations. If your main file is myproj.cc, then put a file called myproj.make in the same directory, and enter the compilation and linking commands as needed. Then you can (re-)compile your project by entering the command "make -f myproj.make". The system will take over from there. To remove all your binaries, enter "make -f myproj.make clean". (On a home PC rather than our Unix systems, you will probably have a different way to automate project management.)
A Java interface is really a kind of abstract Java class, so let us discuss abstract classes first. When you declare an abstract class in Java, you are supposed to (but not required to) mark at least one method in the class abstract. C++ does not have the keyword abstract, so what it tells you to do is mark the method virtual (as you would do anyway under our translation) and initialize it to the null pointer! The ANSI standard style is to call the null pointer by the new keyword NULL (echoing Java null), but for this usage it is traditional (at least as far as I know, which isn't far) to catch the eye with a literal use of 0. Thus the Java code
abstract class ClosedCurve extends Shape { public abstract double area(); }
becomes in C++,
class ClosedCurve: public Shape { public: virtual double area() = 0; ///this doesn't say the area is always zero :-) }
Having one such "null virtual function" automatically makes the whole class "abstract"---meaning as in Java that you can't construct objects of the class directly, and any "concrete" subclass of Shape must override and define the methods set to 0. For example,
class Rectangle: public ClosedCurve { double length, width; public: virtual double area() { return length*width; } }
Abstract classes are good for enforcing conditions on their subclasses, thus making sure a large object hierarchy maintains essential common features. An abstract base class such as "Shape" that defines a null virtual function "show()" for displaying a shape is also handy for writing code such as
vector<Shape*> myDrawing; ///ANSI C++ vectors are re-sizable arrays! ... void paint() { for (int i = 0; i < myDrawing.length(); i++) { myDrawing.at(i)->show(); } } ///one line of code paints all the shapes!
Most of all, abstract classes permit the modeling of important relationships among objects that are not definite enough to be coded or constructed.
Now for a gritty technical matter of the kind C++ throws at you all the time. Per the discussion of "const" methods above, since the body of Rectangle::area() does not change the invoking object, the method should be marked const. However, because const was left out of the prototype in the ClosedCurve class, the C++ system will not consider this a legal override. This is because "const" is part of the basic type signature of the function, and wherever "const" appears! (Footnote on C++ types.) The writer of the abstract class you are extending could (and should!?!) have used "const" to force all implementations to be "const"! The file "shapetest.cc" in our cse250/Java2C++ directory illustrates this.
Interfaces. The point of the separate keyword interface in Java is that interfaces may be multiply inherited, but classes may not be. The reason Java did this is to avoid a famous problem of ambiguity in multiple inheritance when two inherited classes use the same name for a field...especially when this field comes from a common ancestor class! Java interfaces have no data fields other than fixed constants, thus (largely) avoiding the problem. C++ chose programmer freedom over safety here. Thus Java
class Foo extends Bar implements Int1,Int2,Int3 { ... }
simply becomes C++
class Foo: public Bar,Int1,Int2,Int3 { ... }
For a C++ class that translates a Java interface, it is a good idea to include a comment calling it an "interface". BTW, C++ and Java have the same comment syntax, except that C++ did not provide (does now?) a standard means of generating documentation like Java "/**" comments serve to do.
Nesting classes is a good idea that may be unfamiliar to a lot of you. The original 1995 version of Java did not allow nesting one class inside another, and what happened was that users quickly found themselves with zillions of tiny files, often including interfaces all called (typically) Sortable.java written by different programmers that were hard to keep track of even with Java's built-in package and directory organization. Say you have a binary search-tree class BST that needs to sort objects stored in its nodes, and maybe wants to have both a "<" and a "<=" kind of comparison for efficiency, and/or one that returns values -1,0,+1 with 0 for equal. Since this sorting interface will be particular to this BST class, let us nest it inside!
public class BST { public interface Sortable { ///name outside is BST.Sortable boolean lessThan(Sortable rhs); boolean lteq(Sortable rhs); int compareTo(Sortable rhs); //returns -1 for <, 0 for ==, +1 for > } ... }
A nested class is not a field, and Java emphasizes that a top-level nested class can be accessed freely outside its host. The only change is that its name outside the host class has the hosts's name prefixed to it, so client files call the interface type "BST.Sortable". See files NestTest.java and NestMain.java for more.
Now in C++ we can do essentially the same thing (see NestMain.cc):
class BST { public: class Sortable { public: virtual bool lessThan(Sortable* rhs) const = 0; virtual bool lteq(Sortable* rhs) const = 0; virtual int compare(Sortable* rhs) const = 0; // returns -1 for <, 0 for ==, +1 for > }; }; ... class MySortableItem : public BST::Sortable { string key; public: virtual bool lessThan(BST::Sortable* rhs) const { return key < ((MySortableItem*)rhs)->key; } ... };
Notice the places where Java's . becomes ::, places where it becomes ->, and remember that we're making every class variable have a pointer type, and this will be afirly automatic!
A corollary of the independence of nested classes from the host class is that it is OK for a class to refer to an interface nested in a class derived from it. (Referring to a field or method defined only in the derived class, however, is considered a basic O-O no-no.) We will use this in the design of an "Implementation Sandwich" into which much of the data-structure code in projects will go. The system will model "information storage and retrieval" of the simple "card-box" or "flat-file" kind---as opposed to a true relational database of the kind taught in CSE462. Public users will not see our implementation layer---they will see only a public class sitting above it (i.e., inheriting from it) that defines a type called "Cardbox". Users will only be able to use the constructors and methods specified in "Cardbox", such as insert, remove, find,... (The first is usually called add in Java; the name delete would be good for the second but it's a keyword in C++.) An abstract base class called "AbsISR" will make those methods null-virtual so that everyone in our company will use the required names for those methods in their implementation classes. We will start off with a "quick & dirty" implementation of the cardbox and its methods via a simple sorted array or linked list, but later we shall rotate in better implementations learned in this course. Finally, an interface class called Sortable nested inside Cardbox will lay out the sorting functions that users of Cardbox are required to define for their own records, so that our routines can manipulate them. Our base class and implementation classes can refer to the type Cardbox::Sortable and the methods listed in there. As a classic example of "Separating Interface from Implementation", to change our data structure we shall have to change only one line of code---in Cardbox.h---and users will not have to change anything in their code and shouldn't even have to re-compile it!
Java methods that are marked final cannot be overridden in a subclass. C++ has no such idea (except by a touchy mechanism called "private inheritance" that we'll ignore), but we can translate such methods by C++ methods without the keyword virtual. Leaving out virtual makes C++ use static binding for the method.
class Base { public: virtual void dynamicBind() {cout << "Base dynamicBind()" << endl;} void staticBind() {cout << "Base staticBind()" << endl;} }; class Derived: public Base { public: virtual void dynamicBind() {cout << "Derived dynamicBind()" << endl;} void staticBind(){cout << "Derived staticBind()" << endl;} ///dubious? }; int main(){ Base* bp; Derived* dp = new Derived(); bp = dp; ///now bp points to a Derived object tho its own type is Base*. bp->staticBind(); ///which class' version gets called? bp->dynamicBind(); ///and here? } The output: Base staticBind() Derived dynamicBind()
The two methods differ only in that one has "virtual" and behaves the way all Java methods do, while the other is static. The dynamic one looks up the type of the object currently pointed to, whereas the static one looks up the type of the variable. To get dynamic binding, both the method must have virtual and the variable invoking it must have type either Base* or Base& (see footnote for the latter). This is another reason why we begin the transition from Java to C++ using pointers for all class variables.
Static data is translated by static just as in Java. My recollection is that it used to be a "C++ gotcha!" that static members had to be defined outside the class, i.e. in the .cc not the .h file. However, inlining these as in Java now seems to be fine---either the standard changed or both g++ and Sun's CC are smart and forgiving of this one.
Static methods are supposed to be referenced only by prefixing the class name in Java, but for portability in C++ it is customary to invoke one by an object variable or pointer---using a dummy if need be.
This completes most of what you need to know to do the basic Java-to-C++ translation. To come are sections on doing things "the C++ way" to begin with: when to declare and hold objects by-value rather than by-pointer, when to use static binding for methods, when and how to use "overloaded operators", and most of all, using templates rather than "interfaces" to establish better-controlled relationships between a host class and an argument class. (It looks like Java itself has decided to offer the last one---an RFP for adding a template mechanism using the already-reserved keyword generic was approved last May.)