OCaml Journal: Introducing OCaml

Introducing OCaml

This is the freely-available first article from The OCaml Journal . To read dozens of other articles like this, subscribe today:

Subscription

All existing articles

The OCaml programming language is developed at INRIA in France and was first released in 1996. Since then, OCaml has excelled in a variety of application domains and has become one of the world's most popular functional programming languages. The most notable developments include:

  • Jane St. Capital have over 30 full time OCaml developers working on the in-house trading software that they use to invest their own funds.
  • Citrix employ around 20 OCaml developers to work on the management tool stack of their XenServer product line.
  • Cilk Arts exploy full-time OCaml programmers to work on the transformation code that underpins the Windows version of their commercial product.
  • The Fast Fourier Transform routines in Matlab are written primarily in OCaml.
  • Microsoft develop their driver verification software in OCaml.
  • IBM sell a Migration Toolkit written in OCaml. This is designed to help with the static analysis and conversion of databases.
  • Dassault Systemes sell a computer-aided manufacturing environment called Delmia that contains a domain-specific language implemented in OCaml called CellControl for programming assembly-line automata and robots.
  • Intel use high-level modelling and verification tools called reFLect written in OCaml for the hardware verification of their integrated circuits, part of their Forte environment.
  • LexiFi use OCaml extensively for the modeling of financial products (swaps, options etc). Their solution includes software for modeling the life cycle of a product, as well as pricing them. It is closely related to the solution described in the paper by Jean-Marc Eber and Simon Peyton-Jones.
  • Astree is a static analyzer to remove large classes of important bugs from mission critical code used by Airbus.
  • Coherent Graphics Ltd. distribute libraries, tools and services for manipulating PDF documents, written in OCaml
  • Wink Technologies provide a people search engine written primarily in OCaml.
  • SkyDeck create products for the mobile phone industry using software written in OCaml.
  • AT&T are developing Galax in OCaml, to provide the W3C XML Query Language.
  • Wolfram Research are developing an experimental Mathematica implementation in OCaml.
  • Many small businesses are exploiting the speed of OCaml development to gain an advantage over their competitors.
  • Open source developers are using OCaml to create widely-used tools such as FFTW , MLDonkey , unison , Hevea , LEdit , Planets, FreeTennis, Polygen, MTASC, ADVI, Haxe, Orpie, CDuce, bibtex2html, Hlins, Coq, Coq IDE, Ara and Headache..
  • Scientists, particularly bioinformaticians, and engineers are adopting OCaml for its ability to simplify the implementation and use of sophisticated data structures and algorithms.
  • Microsoft's next-generation .NET language, F# , is derived from the CAML family of languages.

OCaml's rapid adoption can be attributed to its unique combination of many of the most powerful features and programming paradigms found in other languages:

  • Garbage collection
  • Functional programming
  • Type inference
  • Pattern matching
  • Object-oriented programming
  • Interactivity and scripting
  • Native-code performance
  • Packages for all major Linux distros, Mac OS X and Windows

This combination makes OCaml a concise, expressive and efficient language for a wide variety of applications, ranging from numerical algorithms to GUI programming. Moreover, the OCaml programming language benefits from high-quality bindings to many existing libraries under Linux and OS X, including GTK, LAPACK, FFTW and even Apache and Mathematica.

All of these features and libraries will be covered in detail in future OCaml Journal articles. This article provides a kick-start for anyone wanting to start using the OCaml programming language to solve their own problems.

Installation

The OCaml language and standard library are easy to install on Debian or Ubuntu Linux systems or Mac OS X systems using Fink or DarwinPorts :

# apt-get install ocaml

OCaml is also available for many other platforms and architectures (including IA32, PowerPC, AMD64, Alpha, Sparc, Mips, IA64, HPPA and StrongArm) from the official site .

Compilers and Interactivity

OCaml provides three different ways to evaluate OCaml code:

  • Interactive top-level ocaml : code can be evaluated a definition at a time.
  • Byte-code compiler ocamlc : produces platform-independent intermediate bytecode.
  • Native-code compiler ocamlopt : produces high-performance code specific to a platform (such as x86, x64 or PPC).

A simple hello world program in the file hellow.ml :

print_endline "Hello world!"

can be compiled to byte-code and executed from the bash $ prompt using:

$ ocamlc hellow.ml -o hellow
$ ./hellow
Hello world!
$

or compiled to native-code using:

$ ocamlopt hellow.ml -o hellow
$ ./hellow
Hello world!
$

Running the ocaml program provides a # prompt that can be used to define variables and functions and evaluate expressions:

$ ocaml
        Objective Caml version 3.10.0
#

OCaml code is evaluated when ;; is reached. So you can evaluate code by adding ;; to the end and pressing return.

When a new definition is created, the interactive top-level prints the variable name, its type and its value. For example, defining a variable called x of the type int to have the value 3 gives:

# let x = 3;;
val x : int = 3

Note that the type of x was inferred to be int , we did not specify it. This is an important advantage of OCaml and we exploit this in our books and journal articles by providing programs in bite-sized snippets along with the top-level's response.

Expressions can also be evaluated without binding the result to a variable. For example, the OCaml top-level may be used as a calculator:

# 2 * 3 + 4;;
- : int = 10

The use of a suitable development environment is a great help when developing larger projects, particularly as a way to see the types that are being inferred inside OCaml code.

Development Environment

The Emacs editor and its Tuareg mode provide many useful features for editing OCaml source code:

  • Color syntax highlighting
  • Throwback of inferred types
  • Automatic indenting

The relevant Debian packages may be installed from a root shell prompt with:

# apt-get install emacs21 tuareg-mode

Type throwback requires the installation of the caml-types Emacs extension from the OCaml distribution, which is most easily installed with:

# apt-get source ocaml
# cd ocaml-3.10.0/emacs
# make install

OCaml source code files should now be color syntax highlighted correctly:

As OCaml code is entered, it will be indented. Blocks of unindented or incorrectly indented code may be automatically indented with ALT+Q. The type of the expression under the cursor, or of a selection, may be obtained with CTRL+C CTRL+T after the file has been compiled with the -dtypes option.

In Emacs, files are saved with CTRL-X CTRL-S and Emacs is quit with CTRL+X CTRL-C. Selections can be made by holding down the shift key whilst moving the cursor with the arrow keys. Selections are cut with CTRL-W and pasted with CTRL-Y.

The remainder of this introductory article describes the basic syntax of the OCaml programming language and should be enough to get you started developing your own applications in OCaml. For more articles on OCaml, subscribe to The OCaml Journal .

Basic syntax

Comments

Comments are enclosed between (* and *) and may be nested.

OCaml provides a documentation generator called ocamldoc that can generate HTML and PDF documentation from source code comments that are of the form:

(** An important variable. *)
let everything = 42

When developing large projects or libraries, this style of commenting can be used to make the API easier to navigate and explore.

Primitive types

The OCaml language provides a variety of primitive types, including unit , bool , char , int and float , as well as several compound types such as string .

The unit type serves the purpose of the void type in C, C++, Java and C#. This type indicates a value that conveys no information. The only value of type unit is denoted () .

The following example expressions demonstrate the most important primitive types:

# ();;
- : unit = ()
# true;;
- : bool = true
# 'a';;
- : char = 'a'
# 3;;
- : int = 3
# 2.4;;
- : float = 2.4
# "foobar";;
- : string = "foobar"

The OCaml language provides an unusual feature called type inference . This allows the compiler to work out the type of an expresson without requiring explicit annotation. Note that none of the previous examples required any type annotations. This is typical of OCaml code.

Arithmetic

The numeric types, such as int and float , provide various arithmetic operations:

# 1 + 2 * 3;;
- : int = 7
# 3.4 +. 2.3 *. 7.7;;
- : float = 21.11

Note that the arithmetic operators for the float type all have the suffix . , e.g. +. , -. and *. .

Some other types also provide operators. For example, the ^ operator can be used to concatenate strings:

# "foo" ^ "bar";;
- : string = "foobar"

The syntax described so far allows an OCaml interactive session to be used as a calculator. The language becomes much more useful when functions and compound types, such as tuples, are used to compose programs.

Tuples

The simplest set of inferred types in OCaml are the tuples. A tuple is simply a compound type composed of a fixed number of other types. Tuples are written as comma-separated values. For example, the pair (2, 3) :

# 2, 3;;
- : int * int = (2, 3)

Note that the type has been inferred as int * int , denoting a 2-tuple of int values.

As we shall see, tuples are ideal for returning multiple values from a function.

Records

User-defined data structures include records and variants. Both are defined with the type declaration. For example, the following declares a record type representing rational numbers:

# type rational = {num: int; denom: int};;
type rational = { num : int; denom : int; }

The fields of a record r are accessed using the syntax r.num . A function to add two of these rational numbers may be written:

# let add p q =
    {num = p.num * q.denom + q.num * p.denom;
     denom = p.denom * q.denom};;
val add_ratio : ratio -> ratio -> ratio = <fun>

For example, 1/3 + 2/5 = 11/15 :

# add_ratio {num=1; denum=3} {num=2; denum=5};;
- : ratio = {num = 11; denum = 15}

Tuples and records are both product types. This makes them equivalent to C structs. Records are preferable to tuples in more complicated programs because they present more explicit information to the compiler and help to prevent bugs.

Variants

Variants are sum types that allow a value to take on one of several different types. Each alternative is identified by a name, called a constructor , that serves both for constructing values of the variant type and inspecting them using pattern-matching. Constructor names are capitalized to distinguish them from variable names (which start with a lowercase letter).

The simplest example of a sum type provides two alternatives and is equivalent to a boolean:

# type state = On | Off
type state = On | Off

The type state has two values, referred to as On and Off :

# On;;
- : state = On
# Off;;
- : state = Off

A slightly more advanced example is a sum type int_option that represents an optional integer, i.e. either nothing or a value of the type int :

# type int_option = Nothing | AnInteger of int
type int_option = Nothing | AnInteger of int

This type allows a wider variety of values:

# Nothing;;
- : int_option = Nothing
# AnInteger 3;;
- : int_option = AnInteger 3
# AnInteger 7;;
- : int_option = AnInteger 7

Sum types may refer to themselves (recursion) which is useful when defining trees, such as a binary tree:

# type bin_tree = Empty | Node of bin_tree * bin_tree;;
type bin_tree = Empty | Node of bin_tree * bin_tree;;
# Node(Node(Empty, Empty), Empty);;
- : bin_tree = Node (Node (Empty, Empty), Empty)

For example, the following variant type represents a symbolic expression that may contain integers, variables, sums and products:

# type expr =
    | Int of int
    | Var of string
    | Add of expr * expr
    | Mul of expr * expr;;
type expr =
  | Int of int
  | Var of string
  | Add of expr * expr
  | Mul of expr * expr;;

Values of this type may be constructed using the constructors Int , Var , Add and Mul . For example, the symbolic expression 1 + 2 &times; y may be expressed as:

# Plus(Int 1, Times(Int 2, Var "y"));;
- : expr = Plus(Int 1, Times(Int 2, Var "y"))

Variant types are very useful and are used to declare a variety of concrete data structures such as binary trees.

Compared to object-oriented programming, a single variant type is equivalent to a class hierarchy composed of an abstract base class or interface representing the type and derived classes representing each of the variant type constructors. The OCaml approach is clearly much simpler and more elegant. However, the ability to manipulate data using pattern matching is an even more important advantage.

Pattern matching

The utility of variant types lies in the ability to pattern match over them.

The OCaml language provides two basic syntaxes for pattern matching. The match ... with construct and the function ... construct. The latter is a special form of function, discussed in the next section.

The match ... with construct evaluates the given expression and then compares the result with a sequence of pattern matches:

match expression with
| pattern -> ...
| pattern -> ...
| pattern -> ...

The expression corresponding to the first pattern that matches is then evaluated and returned.

Patterns may contain primitive values, tuples, records, variant type constructors, variable names and a catchall pattern denoted _ that matches any value.

Subpatterns may contain alternatives, denoted pat1 | pat2 . For example, the pattern 1 | 2 matches both 1 and 2.

For example, a function to invert a given state may be written:

# let invert state =
    match state with
    | On -> Off
    | Off -> On;;
val invert : state -> state = <fun>

Note that the function calls its argument state even though there is a type called state . This is perfectly legitimate and quite common in OCaml, and it replaces some of the most common forms of repetition found in object-oriented languages:

DocumentBuilder documentBuilder = factory.newDocumentBuilder();

In the context of the variant type representing a symbolic expression, the following pattern match tests whether the expression e is an atom or a sum/product:

match e with
| Int _ | Var _ -> true
| Add _ | Mul _ -> false

For example, Int 7 is an atomic expression:

# match Int 7 with
  | Int _ | Var _ -> true
  | Add _ | Mul _ -> false;;
- : bool = true

Pattern matching is a vital aspect of the OCaml language that underpins its expressive power. This is so important that we devote the whole of the first journal article to pattern matching.

Before concluding this article, we shall define some functions that use pattern matching to manipulate data.

Functions and Variables

As OCaml is a functional programming language, functions are just like any other type. A function that accepts an int and returns a float is said to have the type int -> float .

Functions are declared using one of three different syntaxes. The simplest syntax fun x y -> ... where x and y are the arguments denotes an anonymous function. For example, a function that doubles its argument:

# fun n -> 2 * n;;
- : int -> int = <fun>

This function may be applied to an argument by bracketing the function definition and using the syntax f x to apply x to f . For example, doubling 3 gives 6:

# (fun n -> 2 * n) 3;;
- : int = 6

As we have seen, a variable name x is bound to a value 3 using the syntax let x = 3 . As a function is treated like any other value in OCaml, a function definition simply binds a function value to a variable name. For example, giving the name double to the previous example function:

# let double = fun n -> 2 * n;;
val double : int -> int

Function definitions like this are so common that there is a shorthand notation:

# let double n = 2 * n;;
val double : int -> int

As values in OCaml programs are often immutable, a variable definition often refers to the previous definition of a variable with the same name. For example, the following computes 2 * 3 - 1 one step at a time, superceding the variable n at each step:

# let x = 2;;
val x : int = 2
# let x = 3 * x;;
val x : int = 6
# let x = x - 1;;
val x : int = 5

However, function definitions often need to refer to themselves ( recursion ). A let binding can be made to refer to itself by adding the rec keyword before the variable name.

The following defines a recursive factorial function:

# let rec factorial n =
    if n=0 then 1 else n * factorial(n - 1);;
val factorial : int -> int = <fun>

For example, 5! = 120 :

# factorial 5;;
- : int = 120

The function syntax mentioned earlier is shorthand for a function that pattern matches over its argument. The factorial example may be written more clearly as:

# let rec factorial = function
    | 0 -> 1
    | n -> n * factorial(n - 1);;
val factorial : int -> int = <fun>

Even with the minimal subset of the OCaml language covered so far, it is already possible to write some interesting example programs.

Example: Symbolic Derivative

We can already present an example that can be written simply in OCaml using the minimal functionality described so far but which is much more difficult to write in most other languages.

The derivative of a symbolic expression of the type expr can be found by applying simple rules, each of which can be represented elegantly using pattern matching:

# let rec d(f, x) =
    match f with
    | Var y when x=y -> Int 1
    | Int _ | Var _ -> Int 0
    | Add(f, g) -> Add(d(f, x), d(g, x))
    | Mul(f, g) -> Add(Mul(f, d(g, x)), Mul(g, d(f, x)));;
val d : expr * string -> expr = <fun>

For example, the derivative of ax 2 +bx+c is:

# let a, b, c, x = Var "a", Var "b", Var "c", Var "x";;
val a : expr = Var "a"
val b : expr = Var "b"
val c : expr = Var "c"
val x : expr = Var "x"
# let f = Add(Add(Mul(Mul(a, x), x), Mul(b, x)), c);;
val f : expr =
  Add (Add (Mul (Mul (Var "a", Var "x"), Var "x"), Mul (Var "b", Var "x")),
   Var "c")
# d(f, "x");;
- : expr =
  Add
   (Add
     (Add (Mul (Mul (Var "a", Var "x"), Int 1),
       Mul (Var "x", Add (Mul (Var "a", Int 1), Mul (Var "x", Int 0)))),
     Add (Mul (Var "b", Int 1), Mul (Var "x", Int 0))),
   Int 0)

The ability to manipulate symbolic expressions this effectively is a widely-known benefit of OCaml-like languages. However, OCaml's capabilities excel in many other areas as well, including graphics.

To read more articles about the OCaml programming language, please subscribe to the OCaml Journal .