Java2C++

From Java to C++

Java is designed to be a simple and pure object-oriented language. It represents a programming style that has also found favor among C++ programmers. Although Java claims to have no pointers, a good way to understand both Java's behavior and its relation to C++ is: Java is what you get if you make everything (other than numbers and characters) a pointer! Put another way:

There is a fairly straightforward way of translating raw Java code into C++ code that behaves the same way (or should behave...)

Calls to standard Java library functions of course cannot be translated directly, but even here we shall see that the new ANSI C++ Standard (1999) and Java converged on a mostly common look for strings and vectors/arrays. The C++ code you will get by this translation is perfectly readable and well-motivated. It misses many opportunities for optimization that the C++ language provides with its many features---but part of the rationale for keeping Java simple is that those optimizations matter less on today's machines compared to yesterday's. Once you are comfortable with the C++ language, then we will examine the special C++ ways of doing things. Here are the guts of the translation:

Basic Syntax:

Java's primitive types (int, long, short, float, double, char, boolean) map directly to C++ counterparts called (int, long int, short int, float, double, char, bool) respectively. ANSI style dictates the use of bool in C++ for Boolean conditions, instead of the older C/C++ habit of using int---much as Java requires the use of boolean. Technically Java chars are 16-bit "wide chars" and map to newer C++ types called wchar and/or wchar_t, but we will not need to care about this.
Basic expression syntax is essentially the same! C++ allows user-defined operator symbols that Java forbade---more on this later.
Basic statement syntax is also much the same: if-else, for, while, do-while, switch, break, continue. C++ has goto, but since you live without it in Java, don't use it here.
All other types in Java are class types (even arrays!). All other types in your C++ programs should be class types too---there is no need to use struct if you know about that from C. (Actually, in C++ a struct is identical to a class except that the top-of-class region is public rather than private.)
Variable and method names are unchanged. The common Java style conventions of avoiding underscores by using middleCapitals and using StartingCapitals only for class names (and ALLCAPS only for constants) are fine in C++, and preferred by us. (Footnote on style for braces {...}.)
If you declare Foo x; in Java, then the corresponding declaration in C++ is Foo* x; (and we prefer to put the "*" next to "Foo" in order to recognize that "Foo*" is a type unto itself in C++). Pointer de-reference will not be needed!
The Java . becomes the C++ pointer arrow ->. For instance, x.bar() in Java becomes x->bar() in C++. (C++ uses the . too when x is a class variable of non-reference type, but since all Java class variables are references, the translation will not give us any such x!)
Footnote on raw C++ arrays and why not to use them.

Class Syntax:

It is considered good style in Java to group all private class data together, all public member functions together, and all private/"default" member functions together. (We will not get into protected functions; if you've used them a lot, you probably know what to do already.) This gives you 2 or 3 "regions", depending on whether you follow the style of putting class data at the top or bottom of classes. (Either is OK with us, but please be consistent.) Well, in C++, the keywords public: and private: can only be used to define regions of the class---they do not go with individual members, but should be on a (non-indented) line by themselves. They do not have to be indented. You can alternate between them as much as you like, but too much is poor style. The top region of a class is automatically private, so you don't need the keyword there.
The Java syntax "public class Foo extends Bar {...}" is translated as "class Foo: public Bar {...};" (note the trailing semicolon, a prime C++ "gotcha!"). C++ also offers "private inheritance", but we will not need or want to use it. Recall that the first class in a Java file is expected to be public---well, in C++ we will at first try to have just one class per file anyway. C++ unfortunately lacks the layer of package management that Java has.

For class data members, just follow the declaration rules above, so that they are all either primitive types or pointers to class objects. C++ offers the ability to have objects of one class A be members of other classes B "by value" rather than "by reference/pointer"---indeed, this uses the same *-less syntax as Java. The upside for "by-value" is to save one extra step of de-referencing a pointer, but the downside is that the compiler needs to know the compiled size of A objects in order to compile B! When you have a million lines of code, re-compilation time tends to be measured in hours and days rather than minutes, and too many by-value members start to become a pain in the compilation-dependence chart. Moreover, compilers are getting smarter about saving that step, and processors know that since a pointer de-reference is not a branch, it's hunky-dory for speculative lookahead execution anyway! [Later we will show and do by-value members, but this goes to say that our baby-first-steps approach to classes with everything as pointers is not bad, which is why Java got away with making it the only option to begin with!]

Member functions of classes use the same syntax as Java, except that (1) the keyword virtual must come before the type name, and (2) if the function is an accessor, meaning one that cannot possibly alter any data fields of the invoking object and isn't supposed to, then the keyword const should (in this course, must) come at the end. Technically this keyword is needed only to help the compiler optimize code, but good use of it is looked on in companies as a major facet of C++ literacy. (Sometimes one gets into a pretzel where one has to "cast away const", but this is usually a telltale sign of a design flaw in the software system to begin with.)
Constructors in C++ can use the same syntax as in Java, with all code being between the {...}, but C++ offers a better option for the part of the code that merely sets data fields, so called initializer syntax. This is marked by a colon : after the header ("header" means the return type, function name, and the parentheses with arguments---when followed by ; to mean that you've only declared the function, it is called a "prototype"), followed by a comma-separated list of pretending the names of fields are functions with one argument for the value being assigned to them. If all the work of the constructor is done this way, you can finish it off with "{}".
Objects in C++ can be allocated both on the heap using pointer syntax and on the stack without them. Java offers only the first option, and this is what we will use at first. Say Bar is a class and there is a Bar constructor with two int arguments. The recommended C++ syntax for heap allocation of a Bar object referenced through a pointer called bp is "Bar* bp = new Bar(3,17);" Erase the * and you see where Java got its construction syntax from! To construct a Bar object b directly on the stack (meaning it will be automatically destroyed when the current procedure exits, as with a local variable), C++ has you write "Bar b(3,17);" with the constructor arguments affixed to the variable name! [Note that when you declare "Bar* bp...", the pointer bp itself is a stack item that points to an anonymous heap item holding the object's fields. Exactly the same thing happens in Java, only you don't realize that two items not one get allocated---until you try to figure out what happens when bp becomes an argument to a method. The theory of C/C++ pointers actually helps you better understand Java, since C++ pointers are really there "under the hood".]
C++ does not provide automatic garbage collection of heap objects the way Java does. Consider a an object of class Foo that has a field Bar* bp; as above. When the process that created the Foo object exits, the pointer bp itself will be destroyed, but the Bar object it pointed to will persist. To avoid a "memory leak", we need to write code that calls the C++ function delete on the pointer in order to destroy the Bar object that the Foo object created. This is done by a destructor, whose name must be a tilde ~ followed by the name of the class. In this case, the code could be virtual ~Foo(){ delete(bp); }. (The reasons the destructor should usually be virtual are technical.) Under our Java-to-C++ translation, all class fields of object type will be pointers, and the destructor will need to call delete on every one of them. Data structure classes tend to have lots of pointers, for linked lists and trees and the like, and then the destructor code must carefully "deconstruct" the whole thing!
As in Java it is important to know the difference between merely copying a reference/pointer to an object and making a copy of the object itself ("cloning"). C++ has no standard "clone()" method, but every class Bar has automatically a so-called "default copy constructor" Bar(const Bar& x). The default one merely copies each field---since that often won't be good enough, we may wish to write it ourselves! [You can ignore the "&" for now, but note that since it is not a "*", if we want to copy the object pointed at by our Bar* pointer bp, we have to invoke the copy constructor as Bar(*bp) using the pointer-dereference operator * (formally called operator*).] Assuming class Bar has a correct copy constructor, try applying these ideas to understand and fix the problem noted in the code sample below, before looking at the answer in the file Foo.h in KWR's directory ~regan/cse250/Java2C++/ on the CSE undergrad machines.
And then we probably also want to overhaul the default manner of assigning Bar objects by writing a method operator= that replaces the default field-by-field manner of doing assignment. Destructors, copy constructors, and user-defined assignment are covered in detail as "The Big Three" in the excerpts from Mark Weiss' text given out in class, and will be covered in detail in recitations.
There is a famous C++ "gotcha!" that bites when constructors have exactly one argument---they act as an implicit conversion between the argument type Bar and your class Foo. Suppose you declare write "Foo x1,x2; Bar y;" and then accidentally write "x1 = y;" in your code instead of "x1 = x2;" as you intended. The Foo constructor will be called on y even though you didn't ask for it, and there may be disastrous side-effects (especially when Bar objects are raw C++ arrays---you could lose their data). To prevent this from happening, and to make sure the constructor is called only when it is explicitly used to construct a new Foo object, ANSI C++ introduced a new keyword explicit that should go at the beginning of the constructor declaration. This keyword can be used with any constructors, but is important only when there is a single argument.
To illustrate all this, the Java code

class Foo{
   Bar myField;
   static final boolean MUTATIONS_ALLOWED = true;
   public Foo(Bar incoming){
      myField = incoming;
   }
   public Bar getMyField(){
      return myField;
   }
   void alterMyField(Bar packageEyesOnly){
      myField = packageEyesOnly;
   }
}

becomes the C++ code

#include "Bar.h" ///to make class Bar visible
class Foo{
   Bar* myField;
   static const bool MUTATIONS_ALLOWED; ///must be defined outside class!
public:
   explicit Foo(Bar* incoming)
   : myField(incoming)                  ///what problem do we have?...
   {}
   virtual ~Foo() { delete(myField); }  ///...see file for the fix!

   virtual Bar* getMyField() const;
          //code can be elsewhere, in "Foo.cc"

private:  //C++ does not have "package scope" :-( 
   virtual void alterMyField(Bar* thisClassAndFriendsOnly);
          //NOT "const" since it alters the invoking object! 
}; //don't forget the semicolon!

It is typical to "inline" constructor code as in Java, but the bodies of member functions are often not inlined within the class---see under "Files" next. Whereas Java's strict filename and directory conditions tell it where many files are, in C++ you must give an #include statement for classes in other files that you wish to use---as also described next.

Files:

C++ files come in two flavors: so-called header files with the universal extension .h, and code files with extensions variously given as .cc or .C or .cpp etc. We prefer the first. The above class code goes in a file that should be called Foo.h. (It need not be---the name is not mandatory the way Foo.java would be in Java---but it should be called that.) Then the bodies of methods that were declared but not defined inside the class go in an associated file Foo.cc. Whereas all code for methods of a Java class must go between the class' braces, long C++ methods are supposed to be defined outside the class, in a separate file. Here is our Foo.cc:

#include <iostream>  ///For ANSI libraries drop the ".h"
#include "Bar.h"     ///redundant as Foo.h has it, but useful.
#include "Foo.h"     ///need to include class' own header file!

const bool Foo::MUTATIONS_ALLOWED = true; ///don't say "static" here!

Bar* Foo::getMyField() const {            ///don't say "virtual" here!
   return myField;
}

void Foo::alterMyField(Bar* thisClassAndFriendsOnly) {
   if (MUTATIONS_ALLOWED) {
      myField = thisClassAndFriendsOnly;
   } else {
      cerr << "Attempt to modify a class that doesn't want it\n";
   }  ///cerr is the name of the standard error-message stream.
}

Note that the method bodies are not between any braces at all, but "float freely in the global file space". The class they belong to is indicated by prefixing the classname followed by ::, instead of Java's sticking to . for class members. (Discussion of << and C++ streams will come later.)

Unlike Java, C++ allows one to have "global" data and functions, meaning ones not defined inside a class and not local to another function. Of course C must have these functions, and C++ is compatible with C. However, we will follow Java usage and not allow you to have any global functions---other than main and certain "friends" of classes that are defined at file-scope "for symmetry".

In addition, one C++ .cc file generally has no .h counterpart---the one with the "main program" in it. Whereas in Java execution begins with the method

public static void main (String[] args){...}

in the so-called "main class", C++ execution begins by calling the function

int main (int argc, char** argv){...}

just like in C. You can call the file containing "main" anything you like, but if your Java main class would be called (say) MyProg, then the corresponding C++ file should be called myprog.cc (all-lowercase). Then "g++ myprog.cc" would be similar to "javac Myprog.java", except that whereas Java preserves your name as Myprog.class, C++ would give your executable the vanilla name a.out. To name your executable myprog, the command is "g++ -o myprog myprog.cc". There are a zillion options on most C++ compilers, most of them variations that Java disallows for standardness and sanity, and you might quickly find this way of doing things tiresome. Fortunately, you can batch all your compile and linking commands in a handy "make file" in the same directory as your myprog.cc that automates this process for you---see below. (Footnote on "int main()" and "argc,argv".)

Unlike Java, the C++ system enforces no correspondence between class name and filename/location. The C++ compiler needs to be told where things live, and this is done in two ways:

(i) #include statements at the top of files. Technically these are directives to the C++ pre-processor to do a literal substitution of the text of the referenced file at this spot, but you can think of it as working like import in Java. Currently, an ANSI Standard library header looks like

#include <string>

without the ".h", because

#include <string.h>

would refer to the older Kernighan-Ritchie C "string.h" library, which C++ retains for compatibility with past C++ programs but which we do not want you to use. Files residing in user paths rather than system paths should use "..." not <...>. If Foo.h is in a subdirectory Bar of the file containing main, then you write

#include "Bar/Foo.h"

on a UNIX system such as ours. C++ does not have a standard CLASSPATH variable to make file searching easier as Java does, but most C++ environments provide the equivalent or automate the organization and linking of class files for you.

A riddle: if #include is literal text inclusion, why don't you get errors from multiple declarations of the same items? The reason is that every .h file uses a standard-but-not-automatic protocol. It defines a "system constant" that is often the name of the file in ALLCAPS followed by __H. (Some texts recommend putting some underscores before the name too, but nowadays this usage is reserved for system and commercial libraries.) For example, Foo.h should have the structure:

/** File Foo.h, by KWR. OO simulation of the WWII "Foo Fighters" as UFOs.

#ifndef FOO__H
#define FOO__H

<body of file>

#endif ///this would be the last line of the file.

This says that if the environment constant FOO__H is not already defined, then define it now and include the text of the file, but if so, do nothing. Since this structure is simple and expected, one does not need to indent the code between #ifndef and #endif as one does in an if-statement. This ensures that every .h file is included exactly once, thus freeing the user to declare the needs of each file without worrying about the inclusion order. (Footnote on const versus #define and on C++ namespaces.)

(ii) The make utility. Code in .cc files, however, is usually not #included. Doing so would (re-)compile it in one big gulp and lose the benefits of separate compilation. However, the separately compiled code then needs to be linked. Java hides the linking step from users because its strict filename requirements automatically tell it where referenced files live and how to link them.

The details of make syntax can be imitated without needing to understand them in full---we will provide examples to build on and they will be covered in recitations. If your main file is myproj.cc, then put a file called myproj.make in the same directory, and enter the compilation and linking commands as needed. Then you can (re-)compile your project by entering the command "make -f myproj.make". The system will take over from there. To remove all your binaries, enter "make -f myproj.make clean". (On a home PC rather than our Unix systems, you will probably have a different way to automate project management.)

Advanced Object-Oriented Features:

A Java interface is really a kind of abstract Java class, so let us discuss abstract classes first. When you declare an abstract class in Java, you are supposed to (but not required to) mark at least one method in the class abstract. C++ does not have the keyword abstract, so what it tells you to do is mark the method virtual (as you would do anyway under our translation) and initialize it to the null pointer! The ANSI standard style is to call the null pointer by the new keyword NULL (echoing Java null), but for this usage it is traditional (at least as far as I know, which isn't far) to catch the eye with a literal use of 0. Thus the Java code

abstract class ClosedCurve extends Shape {
   public abstract double area();  
}

becomes in C++,

class ClosedCurve: public Shape {
public:
   virtual double area() = 0;  ///this doesn't say the area is always zero :-)
}

Having one such "null virtual function" automatically makes the whole class "abstract"---meaning as in Java that you can't construct objects of the class directly, and any "concrete" subclass of Shape must override and define the methods set to 0. For example,

class Rectangle: public ClosedCurve {
   double length, width;
public:
   virtual double area() { return length*width; }
}

Abstract classes are good for enforcing conditions on their subclasses, thus making sure a large object hierarchy maintains essential common features. An abstract base class such as "Shape" that defines a null virtual function "show()" for displaying a shape is also handy for writing code such as

vector<Shape*> myDrawing; ///ANSI C++ vectors are re-sizable arrays!
...
void paint() {
   for (int i = 0; i < myDrawing.length(); i++) {
      myDrawing.at(i)->show();
   }
}  ///one line of code paints all the shapes!

Most of all, abstract classes permit the modeling of important relationships among objects that are not definite enough to be coded or constructed.

Now for a gritty technical matter of the kind C++ throws at you all the time. Per the discussion of "const" methods above, since the body of Rectangle::area() does not change the invoking object, the method should be marked const. However, because const was left out of the prototype in the ClosedCurve class, the C++ system will not consider this a legal override. This is because "const" is part of the basic type signature of the function, and wherever "const" appears! (Footnote on C++ types.) The writer of the abstract class you are extending could (and should!?!) have used "const" to force all implementations to be "const"! The file "shapetest.cc" in our cse250/Java2C++ directory illustrates this.

Interfaces. The point of the separate keyword interface in Java is that interfaces may be multiply inherited, but classes may not be. The reason Java did this is to avoid a famous problem of ambiguity in multiple inheritance when two inherited classes use the same name for a field...especially when this field comes from a common ancestor class! Java interfaces have no data fields other than fixed constants, thus (largely) avoiding the problem. C++ chose programmer freedom over safety here. Thus Java

class Foo extends Bar implements Int1,Int2,Int3 { ... }

simply becomes C++

class Foo: public Bar,Int1,Int2,Int3 { ... }

For a C++ class that translates a Java interface, it is a good idea to include a comment calling it an "interface". BTW, C++ and Java have the same comment syntax, except that C++ did not provide (does now?) a standard means of generating documentation like Java "/**" comments serve to do.

Nesting classes is a good idea that may be unfamiliar to a lot of you. The original 1995 version of Java did not allow nesting one class inside another, and what happened was that users quickly found themselves with zillions of tiny files, often including interfaces all called (typically) Sortable.java written by different programmers that were hard to keep track of even with Java's built-in package and directory organization. Say you have a binary search-tree class BST that needs to sort objects stored in its nodes, and maybe wants to have both a "<" and a "<=" kind of comparison for efficiency, and/or one that returns values -1,0,+1 with 0 for equal. Since this sorting interface will be particular to this BST class, let us nest it inside!

public class BST {
   public interface Sortable {  ///name outside is BST.Sortable
      boolean lessThan(Sortable rhs);
      boolean lteq(Sortable rhs);
      int compareTo(Sortable rhs); //returns -1 for <, 0 for ==, +1 for >
   }
   ...
}

A nested class is not a field, and Java emphasizes that a top-level nested class can be accessed freely outside its host. The only change is that its name outside the host class has the hosts's name prefixed to it, so client files call the interface type "BST.Sortable". See files NestTest.java and NestMain.java for more.

Now in C++ we can do essentially the same thing (see NestMain.cc):

class BST {
public:
   class Sortable {
   public:
      virtual bool lessThan(Sortable* rhs) const = 0;
      virtual bool lteq(Sortable* rhs) const = 0;
      virtual int compare(Sortable* rhs) const = 0;
      // returns -1 for <, 0 for ==, +1 for >
   };
};
...
class MySortableItem : public BST::Sortable {
   string key;
public:
   virtual bool lessThan(BST::Sortable* rhs) const {
      return key < ((MySortableItem*)rhs)->key;
   }
   ...
};

Notice the places where Java's . becomes ::, places where it becomes ->, and remember that we're making every class variable have a pointer type, and this will be afirly automatic!

A corollary of the independence of nested classes from the host class is that it is OK for a class to refer to an interface nested in a class derived from it. (Referring to a field or method defined only in the derived class, however, is considered a basic O-O no-no.) We will use this in the design of an "Implementation Sandwich" into which much of the data-structure code in projects will go. The system will model "information storage and retrieval" of the simple "card-box" or "flat-file" kind---as opposed to a true relational database of the kind taught in CSE462. Public users will not see our implementation layer---they will see only a public class sitting above it (i.e., inheriting from it) that defines a type called "Cardbox". Users will only be able to use the constructors and methods specified in "Cardbox", such as insert, remove, find,... (The first is usually called add in Java; the name delete would be good for the second but it's a keyword in C++.) An abstract base class called "AbsISR" will make those methods null-virtual so that everyone in our company will use the required names for those methods in their implementation classes. We will start off with a "quick & dirty" implementation of the cardbox and its methods via a simple sorted array or linked list, but later we shall rotate in better implementations learned in this course. Finally, an interface class called Sortable nested inside Cardbox will lay out the sorting functions that users of Cardbox are required to define for their own records, so that our routines can manipulate them. Our base class and implementation classes can refer to the type Cardbox::Sortable and the methods listed in there. As a classic example of "Separating Interface from Implementation", to change our data structure we shall have to change only one line of code---in Cardbox.h---and users will not have to change anything in their code and shouldn't even have to re-compile it!

Other Elements:

Java methods that are marked final cannot be overridden in a subclass. C++ has no such idea (except by a touchy mechanism called "private inheritance" that we'll ignore), but we can translate such methods by C++ methods without the keyword virtual. Leaving out virtual makes C++ use static binding for the method.

Static binding has nothing to do with the keyword "static". Suppose we have the following:

class Base {
public:
   virtual void dynamicBind() {cout << "Base dynamicBind()" << endl;}
   void staticBind() {cout << "Base staticBind()" << endl;}
};

class Derived: public Base {
public:
   virtual void dynamicBind() {cout << "Derived dynamicBind()" << endl;}
   void staticBind(){cout << "Derived staticBind()" << endl;} ///dubious?
};

int main(){
   Base* bp;
   Derived* dp = new Derived();
   bp = dp; ///now bp points to a Derived object tho its own type is Base*.
   bp->staticBind();  ///which class' version gets called?
   bp->dynamicBind(); ///and here?
}

The output:
Base staticBind()
Derived dynamicBind()

The two methods differ only in that one has "virtual" and behaves the way all Java methods do, while the other is static. The dynamic one looks up the type of the object currently pointed to, whereas the static one looks up the type of the variable. To get dynamic binding, both the method must have virtual and the variable invoking it must have type either Base* or Base& (see footnote for the latter). This is another reason why we begin the transition from Java to C++ using pointers for all class variables.

Static data is translated by static just as in Java. My recollection is that it used to be a "C++ gotcha!" that static members had to be defined outside the class, i.e. in the .cc not the .h file. However, inlining these as in Java now seems to be fine---either the standard changed or both g++ and Sun's CC are smart and forgiving of this one.

Static methods are supposed to be referenced only by prefixing the class name in Java, but for portability in C++ it is customary to invoke one by an object variable or pointer---using a dummy if need be.

This completes most of what you need to know to do the basic Java-to-C++ translation. To come are sections on doing things "the C++ way" to begin with: when to declare and hold objects by-value rather than by-pointer, when to use static binding for methods, when and how to use "overloaded operators", and most of all, using templates rather than "interfaces" to establish better-controlled relationships between a host class and an argument class. (It looks like Java itself has decided to offer the last one---an RFP for adding a template mechanism using the already-reserved keyword generic was approved last May.)