Home Page Freebies Contact Us

Ray tracer language comparison

We have ported four progressively optimised versions of our cut-down ray tracer to C++, Java, OCaml and SML. Joe Marshall, Nathan Baum, Juho Snellman, Jeffrey Siskind and others have kindly ported the programs to Common Lisp and Scheme. This page describes the verbosity vs speed trade-offs of these languages as measured by the ray tracer implementations.

OptimisationC++JavaOCamlSML
(for MLton)
Stalin
(by Jef Siskind)
Lisp
(by Juho Snellman)
Minimal ray.cpp ray.java ray.ml ray.sml ray.sc ray.lisp
Tighter bounding
volumes
ray.cpp ray.java ray.ml ray.sml ray.sc ray.lisp
Partially specialised
shadow intersection
ray.cpp ray.java ray.ml ray.sml ray.sc ray.lisp
Fully specialised
shadow intersection
ray.cpp ray.java ray.ml ray.sml ray.sc ray.lisp
Low-level ray.cpp ray.ml ray.sml ray.lisp

The following compiler versions and command-line arguments were used to generate the executables or heap images (32-bit then 64-bit):

The verbosity and speed of these implementations can be measured and visualised. The following results were measured for 512x512 resolution, 16x oversampling and 9 levels of spheres (87,381 spheres):


Figure: Ray traced scene.


Figure: Speed on AMD64 vs verbosity (in lines of code), in 32-bit (red) and 64-bit (white).


Figure: Speed on AMD64 vs verbosity (in non-whitespace bytes), in 32-bit (red) and 64-bit (white).

Verbosity

The OCaml and SML implementations of the ray tracer are by far the most succinct. The shortest OCaml implementation of the ray tracer fits in only 55 lines. The longer OCaml and SML implementations are significantly shorter and faster than the shortest implementations in all of the other languages. For example, the fastest OCaml program contains fewer lines of code and fewer non-whitespace bytes than the shortest Lisp but is 19x faster. The brevity is mainly derived from the use of ML's variant types and pattern matching.

C++ is the next most verbose language in this comparison. Both the C++ and Java implementations use object orientation to achieve the same functionality as variant types in the MLs. However, the object oriented equivalent (which is also possible in OCaml) is substantially more verbose. Here is a side-by-side comparison of the type of code required to implement the scene tree in C++/Java (an inheritance hierarchy) and SML/OCaml (a variant type):

C++:
struct Scene {
virtual ~Scene() {};
...
};

struct Sphere : public Scene {
Vec center;
double radius;
Sphere(Vec c, double r) : center(c), radius(r) {}
~Sphere() {}
...
};

typedef list<Scene *> Scenes;
struct Group : public Scene {
Sphere bound;
Scenes child;
Group(Sphere b, Scenes c) : bound(b), child(c) {}
~Group() {
for (Scenes::const_iterator it=child.begin();
	xit!=child.end(); ++it)
delete *it;
}
...
};
OCaml:
type scene =
| Sphere of vec * float 
| Group of vec * float * scene list

The Java, Lisp and Scheme implementations are the most verbose. The fourth Java implementation is 1.9x longer than the fourth OCaml implementation, for example. In Java, the verbosity can be attributed to the repetition of keywords such as "public", "new", "class" and "return" as well as the verbosity of common constructs such as loops and the inheritance hierarchy used to represent the scene tree. In Lisp and Scheme, the verbosity in terms of lines of code is most easily attributed to the homogeneous syntax requiring simple expressions to be split over many lines in order to maintain clarity. For example, Jef Siskind's Stalin implementation spreads the dot product function over four lines whereas the OCaml, SML, C++ and Java implementations use only a single line:

Stalin:
(define (dot a b)
(+ (* (point-x a) (point-x b))
(* (point-y a) (point-y b))
(* (point-z a) (point-z b))))
OCaml:
let dot a b = a.x *. b.x +. a.y *. b.y +. a.z *. b.z

Pattern matching not only leads to more concise code when used explicitly via "match" and "function" in OCaml or function definitions and "case" in SML. Pattern matching is also used to deconstruct tuples in function definitions. For example, the Lisp implementation of the "intersect" function defines the aliases "center" and "radius" for the center and radius of the sphere, extracted from the sphere type by the functions "sphere-center" and "sphere-radius", respectively. This is much more verbose that the equivalent ML code that uses the pattern "(center, radius, scene)" to deconstruct the scene in the function definition:

Lisp:
(defun intersect (orig dir scene)
(labels ((aux (lam normal scene)
		(let* ((center (sphere-center scene))
		(radius (sphere-radius scene))
		(lamt (ray-sphere orig dir center radius)))
		(if (>= lamt lam) (values lam normal)
		(etypecase
		scene
		(group
		(dolist (kid (group-children scene))
			(setf (values lam normal)
			(aux lam normal kid)))
		(values lam normal))
		(sphere
		(values lamt (unitise
				(-v (+v orig (*v lamt dir)) center)))))))))
	(aux infinity zero scene)))
OCaml:
let rec intersect orig dir (lambda, _ as hit) (center, radius, scene) =
let lambda' = ray_sphere orig dir center radius in
if lambda' >= lambda then hit else match scene with
| Sphere -> lambda', unitise (orig +| lambda' *| dir -| center)
| Group scenes -> List.fold_left (intersect orig dir) hit scenes

Overall, OCaml and SML are the most concise languages and Lisp, Java and C++ are the most verbose.

Speed

This comparison is as much about simplicity as it is about performance. All of these programs are largely unoptimised. Slight changes to the compile-line arguments or source code can substantially alter run-time performance. Consequently, our performance measurements are only a rough guide to the performance of the languages.

The fastest languages are C++ and OCaml in 64-bit and C++ and Stalin-compiled Scheme in 32-bit. The SBCL-compiled Lisp is very slow without low-level optimisations:

but it is possible to approach the performance of C++ and OCaml. Specifically, the fastest Lisp implementation is 45% slower than the fastest OCaml but requires 65% more non-whitespace characters of source code to achieve this.

In addition to examining the fastest implementations, it is interesting to look at the performance of simple implementations.

In 64-bit, the first OCaml implementation is both the shortest and the fastest of all the implementations. C++ is only slightly slower but has 47% more non-whitespace bytes. The Java is just as verbose as the C++ but 19% slower. Lisp is slightly more concise but (without type declarations and inlining) is 4.4x slower than Java.

In 32-bit, Stalin-compiled Scheme is fastest, followed closely by the more concise MLton-compiled SML program. OCaml, SML/NJ-compiled SML and g++-compiled C++ all give similar performance. The Lisp is not only 30% more verbose than the OCaml but is also 4.9x slower.

Given their performance, it is remarkable that the Stalin and MLton implementations are devoid of low-level optimisations and the C++ and OCaml implementations only had minor design decisions made based upon performance (the use of pass-by-reference in C++ and the representation of vectors as records rather than tuples in OCaml).

Not uncoincidentally, two of the best optimising compilers, Stalin and MLton, are both whole-program optimising compilers, meaning they can only compile whole (self-contained) programs. The C++ and OCaml compilers allow partial recompilation, compile this whole program much more quickly and still achieve very competitive performance.

The Lisp language provides great generality by default but this comes at a grave cost in terms of performance. In order to get within an order of magnitude of the performance of C++, OCaml or SML, it is necessary to litter the source code with type declarations. In contrast, this process is automated by Stalin, ocamlopt, MLton and SML/NJ.

Although Java is known to compete with C++ in some benchmarks, it is clearly 2-3x slower than C++ for this benchmark.

Ease of use

We can also make some statements about the ease of use of the languages and compilers.

When developing, the time taken to recompile is important. The compiles times were, in increasing order:

Language: CompilerCompile time
SML: SML/NJ0.70s
OCaml: ocamlopt1.25s
C++: g++1.5s
Java: JDK 1.52.2s
Lisp: SBCL3.9s
SML: MLton8.6s
Scheme: Stalin680s

Note that OCaml and SML/NJ compile faster than all of the other languages and Stalin compiles the Scheme implementation three orders of magnitude slower than SML/NJ, taking over 10 minutes to compile this tiny program. In practice, compile times <2s are not significant. MLton's compile time is noticeable and Stalin's compile time is a serious impediment to development.

OCaml provides two compilers, one byte-code (ocamlc) and one native-code (ocamlopt). The ocamlc compiler is 7x faster at compiling these ray tracers than the ocamlopt compiler but native-code executables produced by ocamlopt run much faster. The ocamlopt compiler combines one of the fastest compilation times with one of the fastest running times. OCaml also provides an interactive top-level (ocaml) that is useful for testing program fragments.

SML can be compiled by a variety of compilers. We have chosen to use the two most popular SML compilers, MLton and SML/NJ. These two compilers can fill somewhat similar roles to ocamlc and ocamlopt for OCaml in that SML/NJ is fast at compiling and MLton produces fast executables. However, ocamlopt compiles almost as quickly as SML/NJ whilst simultaneously producing executables that are roughly as fast as MLton's. SML/NJ is an interactive top-level, providing the same functionality as OCaml's top-level but with better performance as SML/NJ compiles to native-code whereas OCaml's top-level compiles to an interpreted byte-code.

Unfortunately, despite the existence of a "standard", the MLton and SML/NJ compilers cannot compile the same source code for this benchmark. Specifically, we found:

Thus, we resorted to combining code fragments using a bash script in order to factor out the SML code common to the SML/NJ and MLton implementations. Although this was a significant source of irritation when developing the SML implementations, build tools are available to smooth out the differences between compilers for more significant projects and we believe that our tiny program happened to hit upon unusual discrepancies.

Both OCaml and SML go to great lengths to verify programs during compilation. This design is intended to put them in the unique position compared to the other languages that they catch many more bugs at compile time than the other languages and compilers. We have indeed found this to be the case. The OCaml and SML implementations were about 10x faster to develop than the other languages because many errors were caught immediately.

The g++ compiler, used to compile the C++ implementations, is the next fastest at compiling. However, C++ is the least safe language of those tested here. Minor mistakes when writing C++ programs can lead to segmentation faults or, worse, errors silently creeping in to the results of computations. This is even true when using the STL and avoiding all pointer arithmetic. As program robustness is always more important than performance, we do not hesitate to recommend any of the other languages over C++.

The Java compiler (Sun's JDK 1.5) is unique among these languages because it defers some compilation to run-time. We have been careful to test that Java's unusually slow startup time (of ~1s) does not significantly affect the results. Potentially, Java's just-in-time (JIT) compiler could use information that can only be obtained only at run time to further optimise the running program. However, Java's poor performance clearly indicates that Sun's current compiler is not succeeding in this respect on this particular benchmark.

SBCL Lisp also provides an interactive top-level, like OCaml and SML/NJ. However, the SBCL compiler does virtually nothing to check the Lisp code that it is given at compile time. Instead, testing is deferred until run time. This is a considerable hindrance to development as the program will run for an arbitrarily long time and the program state must then be examined by hand just to find simple type errors. During the development of this ray tracer, we often encountered type errors due to simple typographical errors in the source code (such as missing or extra pairs of parentheses) only after several minutes of execution. This hindrance is typical of dynamically typed languages.

Like Lisp, the generality of the Scheme language puts it at a great disadvantage to the other languages in terms of performance. In order to achieve competitive run-time performance, the Stalin Scheme compiler clearly does an incredible job of optimising unspecialised code and can even beat all of the other implementations in this benchmark with only a few low-level optimisations. However, Stalin's unparalleled ability to optimise comes at a grave cost in terms of the time taken to compile. The compile time is three orders of magnitude slower than that of the fastest compiler (SML/NJ). By requesting fewer optimisations on the command line, the compile time can be greatly reduced to around 30s. However, that is still an irritatingly long time and the resulting executables are considerably slower.

Conclusions

Fast development time, succinct code, fast compile time and fast run time make both OCaml and SML very desirable languages for serious software development. Efficiency makes these languages suitable for performance critical work. Rapid development and brevity makes them ideal for prototyping. Static type checking makes them ideal for intrinsically complicated programs.

In order to choose between OCaml and SML, it is instructive to study their differences in more detail:

OCamlMLton/SML
Two almost-entirely compatible compilers (ocamlc and ocamlopt)Many somewhat-incompatible compilers (MLton, SML/NJ, ML-kit, PolyML, MoscowML, ...)
Recent innovations, e.g. polymorphic variants and objectsFew additions to the 1997 standard
Fast 0.36s compile timeSlow 9.4s compile time with MLton or slow run time with any other compiler
Supports partial recompilationMLton must recompile the whole program
Concise grammarSlightly more verbose grammar ("end", "val", curried anonymous functions)
x86, AMD64, IA64, PowerPC, Alpha, MIPS, StrongARM, Sparc and HPPAOnly x86, PowerPC and Sparc
For and while loopsOnly while loops (Read more...)
Labelled and optional function argumentsRecords can be (ab)used to get the same effect
Functional record update ("with")No functional record update (Read more...)
Optional guards on pattern matches ("when")No guards on pattern matches
Polymorphic variantsNo polymorphic variants
Object orientationRecords can be (ab)used to get some of the effects (Read more...)
Dynamic loading of byte-codeNo dynamic loading of code with MLton
Stacks, queues, sets, maps, hash tables, 1D, 2D and 3D numerical arrays (big arrays) and many other data structuresVectors, slices, 2D polymorphic arrays and many other data structures
Maximum array length can be as small as 2,097,151 elementsMaximum array length of at least 1,073,741,823 elements
32-bit floats only in "big arrays"32-bit floats anywhere
Files are implicitly modulesFiles are independent of the module system
No immutable arrays or stringsBoth mutable and immutable arrays and strings
Polymorphic comparison and hashing can produce run-time type errorsWell-defined and total polymorphic equality that is statically type checked but no polymorphic inequalities or hashing
Define symbol infix operators with implicit precedences and associativities anywhereDefine any infix operators with any given precedences and associativities but only for the local scope (Read more...)
Type unsafe marshallingNo marshalling

Unlike SML, the OCaml language is not standardised and continues to evolve. The evolution of OCaml has allowed it to adopt many new programming constructs not found in SML, including polymorphic variants and objects. However, the evolution of the OCaml language sometimes renders old code uncompilable or, worse, different in terms of performance or even correctness.

We felt that the OCaml compilers were more helpful than MLton and SML/NJ when reporting errors. This is not a technical discrepancy but, rather, is due to the fact that the OCaml compilers report errors in plain English with minimal use of symbols, minimal assumed knowledge and no references to the internals of the compilers.

The evolution of functional programming languages is easily seen by comparing the functionality provided by Common Lisp (1956-1984) with SML (1990-1997) and OCaml (1996-present). In the context of this benchmark, both OCaml and SML clearly improve upon SBCL-compiled Lisp in terms of code size, compile-time performance and run-time performance.

However, the designers of the ML family of languages (including OCaml and SML) deliberately avoided some of the functionality provided by Lisp in order to facilitate static type checking and improve performance. Specifically, Lisp provides macros to customise syntax and allows them to be entwined with ordinary code, and provides run-time code manipulation. SML provides neither macros nor run-time code generation. OCaml provides camlp4 macros, a limited form that are separate from the language, and the derived language MetaOCaml also provides run-time code generation.

We found Stalin-compiled Scheme much easier to develop than SBCL-compiled Lisp thanks to Stalin's additional compile-time checking. Thus, we recommend Stalin and Scheme over SBCL and Lisp to people interested in learning about this approach to syntax extension and run-time code generation. Interestingly, Mathematica also provides the functionality of run-time code generation and extends macros to include typesetting extensions.

Many programmers are currently migrating from C++ to Java because of the increase in program stability offered by Java (e.g. no pointer arithmetic, garbage collection). For precisely those reasons, we certainly concur that Java is preferable to C++ for serious programming. However, given our results, we believe that these programmers would do well to learn either SML or OCaml as well. These languages are smaller, simpler and more expressive, faster and easier to develop in and produce faster executables. Above all, they're more fun!

Related Links

The following webpages are derived from this work:

This work was inspired by the excellent book "An Introduction to Ray Tracing" by Andrew S. Glassner.

Thomas Fischbacher has written a spoof language comparison based upon this one as he believes that my experimental methodology is flawed and can be abused to prove anything.

Further work

This study has raised some interesting questions which can be addressed by future work.

Disclaimer

The author of this work (Jon Harrop) has considerably more experience in C++ and OCaml than in Java, SML, Lisp and Scheme. Although every effort has been taken to make this study impartial, it cannot be expected to be completely unbiased.

The notion of verbosity is inherently subjective. Our use of lines of code (LOC) and bytes is clearly imperfect. However, we have tried to adhere to reasonable (but succinct) coding styles in order to make the results meaningful.


The graphs on this page were generated by Mathematica

Contact the Webmaster