High-Level Virtual Machine (HLVM)

Welcome to the home page of the HLVM project!

Introduction

HLVM is a cross-platform open-source high-level virtual machine with the following design goals:

  • Ideal for high-performance interactive technical computing.
  • Safety.
  • Garbage collection.
  • High-level target language derived from OCaml.
  • Performant language implementation, e.g. unboxed tuples.
  • Easy high-performance parallelism for multicores using Cilk-style task queues.
  • Easy interoperability with C.
  • Commerce friendly: easily and robustly deployable closed source libraries.
  • Efficient generics using type specialization.

The virtual machine is written in OCaml and uses the excellent LLVM library for efficient high-performance native code generation.

In particular, simplicity has been an important driving factor in the design and implementation of HLVM. Every development is an attainable incremental step towards entirely feasible goals. Only the most mundane and predictable novel features are introduced. HLVM is not used as a platform for computer science research: it is intended to be a tool for users in science, engineering and finance.

If you would like to keep up to date with respect to HLVM development, please subscribe to The OCaml Journal.

Motivation

Microsoft's Common Language Run-time (CLR) was a fantastic idea. The ability to interoperate safely and at a high-level between different languages, from managed C++ to F#, has greatly accelerated development on the Microsoft platform. The resulting libraries, like Windows Presentation Foundation, are already a generation ahead of anything available on any other platform.

Linux and Mac OS X do not currently have the luxury of a solid foundation like the CLR. Consequently, they are composed entirely from uninteroperable components written in independent languages, from unmanaged custom C++ dialects to Objective C and Python. Some developers choose to restrict themselves to the lowest common denominator C (e.g. writing GTK in C) which aids interoperability but only at a grave cost in productivity and reliability because C is a low-level and unsafe language. Other developers gravitate to huge libraries written in custom dialects of comparatively uninteroperable languages (e.g. Qt in proprietary C++ dialect). Both approaches have a bleak future.

The situation is compounded by the fact that Linux has a far richer variety of programming languages than Windows, thanks to Linux being the platform of choice for academics such as programming language researchers who develop and maintain a variety of state-of-the-art programming languages, libraries and tools on the Linux platform. However, despite any benefits of languages like OCaml, Erlang, Haskell, Lisp, Scheme, ATS, Pure and others, these languages are almost entirely uninteroperable because they do not have a shared run-time and many do not even have easy foreign function interfaces (FFIs) to access existing unmanaged libraries. Lack of mundane but practically-essential features is an unfortunate side-effect of research driven languages. Attempts to seek industrial funding for the implementation of such features in academic languages (e.g. the CAML Consortium and Industrial Haskell Group) have failed.

HLVM is solving this problem by providing a safe high-level common-language run-time suitable for a wide variety of different languages including high-performance computing. This will make it possible to build a better future for software development on these platforms. The impedance mismatch between different languages (including C) will be a lot smaller and the ability to write and consume libraries from other languages will greatly improve productivity. The time is ripe for a new open source common-language run-time because most of today's open source high-level language implementations impede shared-memory parallelism and this is now of critical importance in the context of performance.

HLVM already sports an impressive list of features:

  • Unit, bool, int and double-precision float primitive types.
  • Tuples as unboxed C-compatible structs.
  • Polymorphic homogeneous arrays.
  • Union types.
  • Function pointers.
  • Tail call elimination of all tail calls.
  • Generic printing.
  • Foreign function interface to call C directly.
  • POSIX threads.
  • Mark and sweep garbage collector that allows threads to run in parallel.
  • Standalone compilation to high-performance native executables.
  • Incremental JIT compilation to high-performance native code, ideal for REPLs.

In particular, HLVM recently reached a major milestone with the ability to run threads in parallel efficiently with unreachable values being recycled by a stop-the-world mark-sweep garbage collector.

Novel features

Although HLVM is not a platform for research it has included some unusual features:

  • Fat references: whereas conventional language implementations place header information in the heap object, HLVM pulls this metadata into the reference itself in order to provide direct interoperability with C. Consequently, references are 128-bits in HLVM (on 32-bit machines).
  • Incrementally generated garbage collector: HLVM relies upon a "visit" function reachable via the run-time type in every fat reference to traverse the references within a heap allocated. These "visit" functions are generated for each type as it is defined and are type specialized to improve performance.
  • Brevity: the VM is incredibly concise, weighing in at only 1,890 lines of OCaml code at the time of writing.

These novel features are interesting but quite mundane and predictable. Moreover, writing the garbage collector in HLVM's own intermediate language actually simplified the overall design considerably.

Future developments

The following work will be done in the future:

  • Closures for functional programming.
  • Task queues for easy high-performance parallel programming.
  • Generics for high-level programming.
  • Comprehensive open-source standard library including data structures, algorithms, IO and graphics.
  • Testing on other platforms (x64 Linux and Mac OS X).
  • Commercial software including high-quality OpenGL-based visualization.
  • Free documentation of the language, tools and standard library.
  • Commercial literature (e.g. printed books) explaining how best to use HLVM for high-performance interactive technical computing.

Once completed, we believe HLVM will become the defacto-standard VM for high performance interactive technical computing on the Linux and Mac OS X platforms.

Download

HLVM is open source software, available for free under a commerce-friendly BSD license. To obtain HLVM, check out the latest subversion repository with:

svn checkout svn://svn.forge.ocamlcore.org/var/lib/gforge/chroot/scmrepos/svn/hlvm

and follow the instructions therein.

Funding

HLVM is an industrial project and, to date, has been funded entirely by Flying Frog Consultancy. If you would like to join a consortium of industrial users to fund specific developments on HLVM, please contact us.

Further reading