Ray tracer language comparison

Performance

Overview

There are many problems with quantifying the relative performance of programming languages:

  • Benchmarks are inherently limited in scope.
  • Performance is very task-dependent (e.g. C++ is fast for array-based floating point computations and MLton-compiled Standard ML is fast for symbolic manipulation).
  • Development time is the single most important factor when optimising programs because many optimizations simply cannot be implemented in a practicable amount of time, and development time is very difficult to measure representatively.
  • Changes in architecture, compiler and compilation options can drastically affect performance.

However, any information on the performance of programs written in different languages is of such great interest that it is well worth the effort to quantify the performance of C++, Java, OCaml, Standard ML, Lisp and Scheme on this benchmark. For our conclusions to be worthwhile, we must address the problems listed above.

Although this ray tracer benchmark might appear to be very limited in scope it actually stresses many different and important aspects of languages and compilers:

  • Floating point performance is stressed by the ray-sphere intersection routines.
  • Allocation and deallocation is stressed by the rapid recycling of vector values.
  • Traversal of tree data structures is stressed by the scene intersection routine.

Consequently, implementations in different languages are limited by different bottlenecks. Several interesting conclusions can be drawn immediately from this observation:

  • Garbage collection is not slow. Both the Scheme and the OCaml are garbage collected and achieve competitive performance. Specifically, the fastest C++ is only 9% faster than the fastest OCaml.
  • Static type information greatly facilitates optimization. The OCaml, C++ and Standard ML languages all expose static type information to the compiler, resulting in much faster programs. The Stalin Scheme compiler goes to great lengths to infer static type information, resulting in faster programs at the cost of vastly slower compile times.

The fastest languages are C++ and OCaml in 64-bit and C++ and Stalin-compiled Scheme in 32-bit. The SBCL-compiled Lisp is very slow without low-level optimizations:

  • Unsafe compilation
  • Manual inlining
  • Imperative style
  • Type declarations
  • Semi-automated boxing

but it is possible to approach the performance of C++ and OCaml. Specifically, the fastest Lisp implementation is only 75% slower than the fastest C++.

Simple implementations

In addition to examining the fastest implementations, it is interesting to look at the performance of the simplest implementations in each language.

The OCaml language allows the ray tracer to be described very concisely. However, the most concise OCaml implementation is also one of the slowest implementations. The only results that are slower than the most concise (OCaml) implementation are written in Lisp.

The second most concise OCaml implementation is more verbose but gives much more competitive performance. The second OCaml implementation is faster than 11 of the 50 results.

Given their performance, it is remarkable that the Stalin and MLton implementations are devoid of low-level optimizations and the C++ and OCaml implementations only had minor design decisions made based upon performance (the use of pass-by-reference in C++ and the representation of vectors as records rather than tuples in OCaml).

Although the simplest C++ implementation gives competitive performance, the C++ language is unable to compete in terms of verbosity. Consequently, two of the OCaml implementations are both shorter and faster than this C++ implementation.

The Lisp language provides great generality by default but this comes at a grave cost in terms of performance. In order to get within an order of magnitude of the performance of C++, OCaml or SML, it is necessary to litter the source code with type declarations. In contrast, this process is automated by Stalin, ocamlopt, MLton and SML/NJ. The shortest Lisp implementation demonstrates the abysmal performance offered by the Lisp language which is also coupled with considerable verbosity: four OCaml and one Standard ML implementations are both shorter and much faster than the shortest Lisp.

Fastest implementations

C++ is the fastest language in both 32-bit and 64-bit. However, the higher-level Scheme and OCaml languages are not much slower. Stalin-compiled Scheme is only 30% slower than C++ in 32-bit. The fastest OCaml is only 9% slower than C++ in 64-bit.

Not uncoincidentally, two of the best optimizing compilers, Stalin and MLton, are both whole-program optimizing compilers, meaning they can only compile whole (self-contained) programs. The C++ and OCaml compilers allow partial recompilation, compile this whole program much more quickly and still achieve very competitive performance. However, neither Stalin nor MLton support 64-bit.

Although Java is known to compete with C++ in some benchmarks, it is clearly 2-3x slower than C++ for this benchmark.

Previous: Verbosity Next: Conclusions