OCaml for Scientists 
Objective Caml for ScientistsThis is a webfriendly version of the first chapter of the book OCaml for Scientists.
Chapter 1: IntroductionA Brief History of OCamlThe MetaLanguage (ML) was originally developed at Edinburgh University in the 1970's as a language designed to efficiently represent other languages. The language was pioneered by Robin Milner for the Logic of Computable Functions (LCF) theorem prover. The original ML, and its derivatives, were designed to stretch theoretical computer science to the limit, yielding remarkably robust and concise programming languages which can also be very efficient.
The Categorical Abstract Machine Language (CAML) was the acronym originally used to describe what is now known as the Caml family of languages. G�rard Huet designed and implemented Caml at Institut National de Recherche en Informatique et en Automatique (INRIA) in France, until 1994. Since then, development has continued as part of projet Cristal, now led by Xavier Leroy.
Objective Caml (OCaml^{}) is the current flagship language of projet Cristal. The Cristal group have produced freely available tools for this language. Most notably, an interpreter which runs OCaml code in a virtual machine (VM) and two compilers, one which compiles OCaml to a machine independent bytecode which can then be executed by a bytecode interpreter and another which compiles OCaml directly to native code. At the time of writing, the nativecode compiler is capable of producing code for Alpha, Sparc, x86, MIPS, HPPA, PowerPC, ARM, ia64 and x8664 CPUs and the associated runtime environment has been ported to the Linux, Windows, MacOS X, BSD, Solaris, HPUX, IRIX and Tru64 operating systems.
Benefits of OCamlBefore delving into the syntax of the language itself, we shall list the main, advantageous features offered by the OCaml language:
Running OCaml programsOCaml provides three different ways to execute code. We shall now examine each of these three approaches, explaining how code can be executed using them and noting their relative advantages and disadvantages.
Toplevel
The OCaml toplevel interactively interprets OCaml code and
is started by running the program
$ ocaml Objective Caml version 3.08.0 #
OCaml code may then be entered at this
# 1 + 3;;  : int = 4 #
The toplevel will also print the type of the result as well as its
value (when the result has a value). For example, the following defines
a variable called
# let sqr x = x *. x;; val sqr : float > float = <fun> #
This response indicates that a function called
 : type = value
or consisting of one or more descriptions of the form:
val name : type = value
where
Programs entered into the toplevel execute almost as quickly as bytecode compiled programs (which is often quite a bit slower than nativecode compiled programs). However, the interactivity of the toplevel makes testing the validity of code segments much easier.
In the remainder of this book, we shall write numerous code snippets in this style, as if they had been entered into the toplevel.
Bytecode compilation
When stored in a plain text file with the suffix “.ml”, an OCaml
program can be compiled to a machine independent bytecode using the
let _ = print_endline "Hello world!"
This file may be compiled at the Unix shell
$ ocamlc test.ml o test
and then executed:
$ ./test Hello world!
In this case, the result was to print the string “Hello world!” onto the screen. Bytecode compilation is an adequate way to execute OCaml programs which do not perform intensive computations. If the time taken to execute a program needs to be reduced then nativecode compilation can be used instead.
Nativecode compilationThe “test.ml” program could equivalently have been compiled to native code, creating a standalone, nativecode executable called “test”, using:
$ ocamlopt test.ml o test
The resulting executable runs in exactly the same way:
$ ./test Hello world!
Programs compiled to native code, particularly in the context of numerically intensive programs, can be considerably faster to execute.
OCaml syntaxBefore we consider the features offered by OCaml, a brief overview of the syntax of the language is instructive, so that we can provide actual code examples later. Other books give more systematic, thorough and formal introductions to the whole of the OCaml language.
Language overviewIn this section we shall evolve the notions of values, types, variables, functions, simple containers (lists and arrays) and program flow control. These notions will then be used to introduce more advanced features in the later sections of this chapter.
When presented with a block of code, even the most seasoned and fluent
programmer will not be able to infer the purpose of the code. Consequently,
programs should contain additional descriptions written in plain English,
known as comments. In OCaml, comments are enclosed between
Just as numbers can be defined to be members of sets such as integer ( in Z), real ( in R), complex ( in C) and so on, so values in programs are also defined to be members of sets. These sets are known as types.
Types
Fundamentally, languages provide basic types and, often, allow more
sophisticated types to be defined in terms of the basic types. OCaml
provides a number of builtin types:
Only one value is of type
Integers are written
# 3;;  : int = 3 # 5.;;  : float = 5.
Arithmetic can be performed using the conventional
# 1 * 2 + 2 * 3;;  : int = 8
The floatingpoint infix functions have slightly different names,
suffixed by a fullstop:
# 1. *. 2. +. 2. *. 3.;;  : float = 8.
The distinct names of the operators for different types arises as
the most elegant solution to allowing the unambiguous inference of
types in the presence of different forms of arithmetic. The definition
of new operators is discussed later, in section .
In order to perform arithmetic using mixed types, functions such as
Unlike other languages, OCaml is phenomenally pedantic about types.
For example, the following fails because
# 2 * 2.;; This expression has type float but is here used with type int
Note that the OCaml toplevel underlines the erroneous portion of the code.
Explicitly converting the value of type
# 2 * (int_of_float 2.);;  : int = 4
In general, arithmetic is typically performed using a single number
representation (e.g. either
Single characters (of type
Strings are written in double quotes, e.g.
# "Hello world!".[4];;  : char = 'o'
The character at index
A pair of strings may be concatenated using the
# "Hello " ^ "world!";;  : string = "Hello world!"
Booleans are either
# 1 < 3 && 2.5 < 2.7;;  : bool = true
Values may be assigned, or bound, to names. As OCaml is a functional language, these values may be expressions mapping values to values — functions. We shall now examine the binding of values and expressions to variable and function names.
Variables and functions
Variables and functions are both defined using the
# let a = 2;; val a : int = 2
Note that the language automatically infers types. In this case,
Definitions using
let name = expr1 in expr2
This evaluates expr1 and binds the result to the variable name before evaluating expr2. For example, the following evaluates a^{2} in the context a=3, giving 9:
# let a = 3 in a * a;;  : int = 9
Note that the value
# a;;  : int = 2
More recent definitions shadow previous definitions. For example, the following supersedes the definition a=2 with a=a� a in order to calculate 2�2�2�2=16:
# let a = 2 in let a = a * a in a * a;;  : int = 16
As OCaml is a functional language, values can be functions and variables
can be bound to them in exactly the same way as we have just seen.
Specifically, function definitions append a list of arguments between
the name of the function and the
# let sqr x = x * x;; val sqr : int > int = <fun>
In this case, the use of the integer multiply
The function
# sqr 5;;  : int = 25
Typically, more sophisticated computations require the use of more complicated types. We shall now examine the three simplest ways by which more complicated types may be constructed.
Tuples, records and variants
Tuples are the simplest form of compound types, containing a fixed
number of values which may be of different types. The type of a tuple
is written analogously to conventional settheoretic style, using
# (1, 2, 3);;  : int * int * int = (1, 2, 3)
A tuple containing n values is described as an ntuple, e.g. the
tuple
Records are essentially tuples with named components, known as fields.
Records and, in particular, the names of their fields must be defined
using a
# type vec2 = { x:float; y:float };; type vec2 = { x:float; y:float }
A value of this type representing the zero vector can then be defined using:
# let zero = { x=0.; y=0. };; val zero : vec2 = {x = 0.; y = 0.}
Note that the use of a record with fields
Whereas the tuples are orderdependent, i.e. (1,2)≠(2,1), the
named fields of a record may appear in any order, i.e. { x=1,y=2}≡{ y=2,x=1}.
Thus, we could, equivalently, have provided the
# let zero = { y=0.; x=0. };; val zero : vec2 = {x = 0.; y = 0.}
The fields in this record can be extracted individually using the
notation record
# zero.x;;  : float = 0.
Also, a shorthand
# let x_axis = { zero with x=1. };; val x_axis : vec2 = {x = 1.; y = 0.}
Although OCaml is a functional language, OCaml does support imperative
programming. Fundamentally, record fields can be marked as mutable,
in which case their value may be changed. For example, the type of
a mutable, twodimensional vector called
# type vec2 = { mutable x:float; mutable y:float };; type vec2 = { mutable x : float; mutable y : float; }
A value
# let r = { x=1.; y=2. };; val r : vec2 = {x = 1.; y = 2.}
The xcoordinate of the vector
# r.x < 3.;;  : unit = ()
The sideeffect of this expression has mutated the value of the variable
# r;;  : vec = {x = 3.; y = 2.}
However, a record with a single, mutable field can often be useful.
This data structure, called a reference, is already provided
by the type
# let a = ref 2;; val a : int ref = {contents = 2}
The type of
# !a;;  : int = 2
The value of
# a := 3;  : unit = () # a;;  : int ref = {contents = 3}
In the case of references to integers, two additional functions are
provided,
# incr a;;  : unit = () # a;; val a : int ref = {contents = 4}
The types of values stored in tuples and records are defined at compiletime. OCaml completely verifies the correct use of these types at compiletime. However, this is too restrictive in many circumstances. These requirements can be slightly relaxed by allowing a type to be defined which can acquire one of several possible types at runtime. These are known as variant types. OCaml still verifies the correct use of variant types as far as is theoretically possible.
Variant types are defined using the
# type button = On  Off;; type button = On  Off
The constructors
# On;;  : button = On # Off;;  : button = Off
In this case, the constructors
More usefully, constructors may take arguments, allowing them to convey
information by carrying data. The arguments are defined using
# type button = On of int * string  Off;; type button = On of int * string  Off
The
# On (1, "mine");;  : button = On (1, "mine") # On (2, "hers");;  : button = On (2, "hers") # Off;;  : button = Off
Types can also be defined recursively, which is very useful when defining more sophisticated data structures, such as trees. For example, a binary tree contains either zero or two binary trees and can be defined as:
# type binary_tree = Leaf  Node of binary_tree * binary_tree;; type binary_tree = Leaf  Node of binary_tree * binary_tree
A value of type
# Node (Node (Leaf, Leaf), Leaf);;  : binary_tree = Node (Node (Leaf, Leaf), Leaf)
Of course, we could also place data in the nodes to make a more useful data structure. This line of thinking will be pursued in chapter . In the mean time, let us consider two special data structures which have notations built into the language.
Lists and arrays
Lists are written
The types of lists and arrays of integers, for example, are written
# [1; 2; 3];;  : int list = [1; 2; 3] # [1; 2; 3];;  : int array = [1; 2; 3]
In the case of lists, the infix cons operator
# 1 :: [2; 3];;  : int list = [1; 2; 3]
In the case of arrays, the notation array
# [1; 2; 3].(1);;  : int = 2
Also, a shorthand notation can be used to represent lists or arrays
of tuples by omitting the parentheses. For example,
# [1, 2; 3, 4];;  : (int * int) list = [(1, 2); (3, 4)] # [1, 2; 3, 4];;  : (int * int) array = [(1, 2); (3, 4)]
The use and properties of lists, arrays and several other data structures will be discussed in chapter . In the mean time, we shall examine programming constructs which allow more interesting computations to be performed.
If
Like many other programming languages, OCaml provides an
if expr1 then expr2 if expr1 then expr2 else expr3
In both cases, expr1 must evaluate to a value of type
The former evaluates the boolean expression expr1 and, only
if the result is
if expr1 then expr2 else ()
The latter similarly evaluates expr1 but returning the result
of either expr2, if expr1 evaluated to
For example, the following function prints “Less than three” if the given argument is less than three:
# let f x = if x < 3 then print_endline "Less than three";; val f : int > unit = <fun> # f 1;; Less than three  : unit = () # f 5;;  : unit = ()
The following function returns the string “Less” if the argument is less than 3 and “Greater” otherwise:
# let f x = if x < 3 then "Less" else "Greater";; val f : int > string = <fun> # f 1;;  : string = "Less" # f 5;;  : string = "Greater"
The parts of the language we have covered can already be used to write some interesting programs. However, attention should be paid to the way in which programs are constructed from these parts.
Program composition
As we have seen, program segments may be written in the toplevel
which replies by reciting the automatically inferred types and executing
expressions. However, the
# let f1 x = if x < 3 then print_endline "Less than three" let f2 x = if x < 3 then "Less" else "Greater";; val f1 : int > unit = <fun> val f2 : int > string = <fun>
Note that OCaml has determined that this input corresponds to two
separate function definitions. In fact, when written for the
let f1 x = if x < 3 then print_endline "Less than three" let f2 x = if x < 3 then "Less" else "Greater"
As we have seen, expressions which act by way of a sideeffect (such
as printing) produce the value
# let f () = print_endline "A"; print_endline "B"; print_endline "C";; val f : unit > unit = <fun>
Note that there is no final
More about functionsFunctions can also be defined anonymously, known as λabstraction in computer science. For example, the following defines a function f(x)=x� x which has a type representing^{} f:Z→Z:
# fun x > x * x;;  : int > int = <fun>
This is an anonymous equivalent to the
# (fun x > x * x) 2;; val : int = 4
Consequently, we could have defined
# let sqr = fun x > x * x;; val sqr : int > int = <fun>
Once defined, this version of the
The
# let ipow3 x = let sqr x = x * x in x * sqr x;; val ipow3 : int > int = <fun>
Note that the function application
The
# let (a, b) = (3, 4);; val a : int = 3 val b : int = 4
This is particularly useful when factoring code. For example, the
following definition of the
# let ipow4 x = let sqr x = x * x in (sqr x) * (sqr x);; val ipow4 : int > int = <fun>
Just as common subexpressions in a mathematical expression can be
factored, so the
# let (ipow3, ipow4) = let sqr x = x * x in ((fun x > x * (sqr x)), fun x > (sqr x) * (sqr x));; val ipow3 : int > int = <fun> val ipow4 : int > int = <fun>
Factoring code is an important way to keep programs manageable. In particular, programs can be factored much more aggressively in the presence of higherorder functions — something which can be done in OCaml but not Java, C++ or Fortran. We shall discuss such factoring of OCaml programs as a means of code structuring in chapter . In the mean time, we shall examine functions which perform computations by applying themselves.
As we have already seen, variable names in variable and function definitions
refer to their previously defined values. This default behaviour can
be overridden using the
# let rec ipow n m = if m = 0 then 1 else n * ipow n (m  1);; val ipow : int > int > int = <fun>
For example, 2^{16}=65,536:
# ipow 2 16;;  : int = 65536
All of the programming constructs we have just introduced may be structured into modules.
Modules
In OCaml, modules are the most commonly used means to encapsulate
related definitions. For example, many function definitions pertaining
to lists are encapsulated in the
# List.length ["one"; "two"; "three"];;  : int = 3
The
The OCaml module system and program structuring in general are examined in chapter . We shall now examine some of the more advanced features of OCaml in more detail.
Pattern matching
As a program is executed, it is quite often necessary to choose the
future course of action based upon the value of a previously computed
result. As we have already seen, a twoway choice can be implemented
using the
Unlike conventional languages, OCaml allows the value of a previous result to be compared against various patterns  pattern matching. As we shall see, this approach is considerably more powerful than the conventional approaches.
The most common pattern matching construct in OCaml is in the
match expr with pattern1 > expr1  pattern2 > expr2  pattern3 > expr3  …
This evaluates expr and compares the resulting value firstly with pattern1 then with pattern2 and so on, until a pattern is found which matches the value of expr, in which case the corresponding expression (e.g. expr2) is evaluated and returned. A pattern is an expression composed of constants and variable names. When a pattern matches an argument, the variables are bound to values of the corresponding expressions.
Patterns may contain arbitrary data structures (tuples, records, variant
types, lists and arrays) and, in particular, the cons operator
For example, the following function compares its argument with several
possible patterns of type
# let f i = match i with 0 > "Zero"  3 > "Three"  _ > "Neither zero nor three";;
Applying this function to some expressions of type
# f 0;;  : string = "Zero" # f 1;;  : string = "Neither zero nor three" # f (1 + 2);;  : string = "Three"
As pattern matching is such a fundamental concept in OCaml programming, we shall provide several more examples using pattern matching in this section.
A function
# let is_empty_list l = l = [];; val is_empty_list : 'a list > bool = <fun>
Using pattern matching, this example may be written using the
# let is_empty_list l = match l with [] > true  _ > false;; val is_empty_list : 'a list > bool = <fun>
Note the use of the anonymous
The
# let is_empty_list = function [] > true  _ > false;; val is_empty_list : 'a list > bool = <fun>
In general, functions which pattern match over their last argument
may be rewritten more succinctly using
Guarded patterns
Patterns can also have arbitrary tests associated with them, written
using the
# let f = function [i; j; k] when i  j  k = 0 > true  _ > false;; val f : int list > bool = <fun> # f [2; 3];;  : bool = false # f [5; 2; 3];;  : bool = true # f [1; 2; 3];;  : bool = false
Subsequent patterns sharing the same variable bindings and corresponding expression may be written in the shorthand notation:
match … with pattern1  pattern2  … > …  …
For example, the following function returns
# let is_sign = function 1  0  1 > true  _ > false;; val is_sign : int > bool = <fun>
The sophistication provided by pattern matching may be misused. Fortunately, the OCaml compilers go to great lengths to enforce correct use, even brashly criticising the programmers style when appropriate.
Erroneous patterns
Sequences of patterns which match to the same corresponding expression
are required to share the same set of variable bindings. For example,
although the following function makes sense to a human, the OCaml
compilers object to the patterns
# let product a b = match (a, b) with (a, 0.)  (0., b) > 0.  (a, b) > a *. b;; Variable a must occur on both sides of this  pattern
In this case, this function could be corrected by using the anonymous
# let product a b = match (a, b) with (_, 0.)  (0., _) > 0.  (a, b) > a *. b;; val product : float > float > float = <fun>
This actually conveys useful information about the code. Specifically,
that the values matched by
OCaml uses type information to determine the possible values of expression being matched over. If the set of pattern matches fails to cover all of the possible values of the input then, at compiletime, the compiler emits:
Warning: this patternmatching is not exhaustive
followed by examples of values which could not be matched. If a program
containing such pattern matches is executed and no matching pattern
is found at runtime then the
For example, in the context of the variant type:
# type int_option = None  Some of int;; type int_option = None  Some of int
The OCaml compiler will warn of a function matching only
# let extract = function Some i > i;; Warning: this patternmatching is not exhaustive. Here is an example of a value that is not matched: None val extract : int_option > int = <fun>
This
# extract (Some 3);;  : int = 3
but causes a
# extract None;; Exception: Match_failure ("", 5, 40).
As some approaches to pattern matching lead to more robust programs, some notions of good and bad programming styles arise in the context of pattern matching.
Good styleThe compiler cannot prove^{} that any given pattern match covers all eventualities in the general case. Thus, some style guidelines may be productively adhered to when writing pattern matches, to aid the compiler in its proofs:
As proof generation cannot be automated in general, the OCaml compilers do not try to prove that a sequence of guarded patterns will match all possible inputs. Instead, the programmer is expected to adhere to a good programming style, making the breadth of the final match explicit by removing the guard. For example, the OCaml compilers do not prove that the following pattern match covers all possibilities:
# let sign = function i when i < 0. > 1  0. > 0  i when i > 0. > 1;; Warning: this patternmatching is not exhaustive. Here is an example of a value that is not matched: 1. (However, some guarded clause may match this value.) val sign : float > int = <fun>
In this case, the function should have been written without the guard on the last pattern:
# let sign = function i when i < 0. > 1  0. > 0  _ > 1;; val sign : float > int = <fun>
Also, the OCaml compilers will try to determine any patterns which can never be matched. If such a pattern is found, the compiler will emit a warning. For example, in this case the first match accounts for all possible input values and, therefore, the second match will never be used:
# let product a b = match (a, b) with (a, b) > a *. b  (_, 0.)  (0., _) > 0.;; Warning: this match case is unused. val product : float > float > float = <fun>
When matching over the constructors of a type, all eventualities should be caught explicitly, i.e. the final pattern should not be made completely general. For example, in the context of a type which can represent different number representations:
# type number = Integer of int  Real of float;; type number = Integer of int  Real of float
A function to test for equality with zero could be written in the following, poor style:
# let bad_is_zero = function Integer 0 > true  Real 0. > true  _ > false;; val bad_is_zero : number > bool = <fun>
When applied to various values of type
# bad_is_zero (Integer (1));;  : bool = false # bad_is_zero (Integer 0);;  : bool = true # bad_is_zero (Real 0.);;  : bool = true # bad_is_zero (Real 2.6);;  : bool = false
Although the
# let good_is_zero = function Integer i > i = 0  Real x > x = 0.;; val good_is_zero : number > bool = <fun>
Not only is the latter style more concise but, more importantly, this
style is more robust. For example, if whilst developing our program,
we were to supplement the definition of our
# type number = Integer of int  Real of float  Complex of float * float;; type number = Integer of int  Real of float  Complex of float * float
the
# let bad_is_zero = function Integer 0 > true  Real 0. > true  _ > false;; val bad_is_zero : number > bool = <fun>
Specifically, this function treats all values which are not zerointegers and zeroreals as being nonzero. Thus, zerocomplex z=0+0i is incorrectly deemed to be nonzero:
# bad_is_zero (Complex (0., 0.));;  : bool = false
In contrast, the
# let good_is_zero = function Integer i > i = 0  Real x > x = 0.;; Warning: this patternmatching is not exhaustive. Here is an example of a value that is not matched: Complex (_, _) val good_is_zero : number > bool = <fun>
The programmer could then supplement this function with a case for complex numbers:
# let good_is_zero = function Integer i > i = 0  Real x > x = 0.  Complex (x, y) > x = 0. && y = 0.;; val good_is_zero : number > bool = <fun>
The resulting function would then provide the correct functionality:
# good_is_zero (Complex (0., 0.));;  : bool = true
Clearly, the ability have such safety checks performed at compiletime can be very valuable. This is another important aspect of safety provided by the OCaml language, which results in considerably more robust programs.
Due to the ubiquity of pattern matching in OCaml programs, the number and structure of pattern matches can be nontrivial. In particular, patterns may be nested and may be performed in parallel.
Nested patterns
In some cases, nested pattern matches may be desirable. Inner pattern
matches may be bundled either into parentheses
# let number_equal a b = match a with Integer i > begin match b with Integer j when i = j > true  Complex (x, 0.)  Real x when x = float_of_int i > true  Integer _  Real _  Complex _ > false end  Real x  Complex (x, 0.) > begin match b with Integer i when x = float_of_int i > true  Complex (y, 0.)  Real y when x = y > true  Integer _  Real _  Complex _ > false end  Complex (x1, y1) > begin match b with Complex (x2, y2) when x1 = x2 && y1 = y2 > true  Integer _  Real _  Complex _ > false end;; val number_equal : number > number > bool = <fun>
In many cases, nested patterns may be written more succinctly and, in fact, more efficient, when presented as a single pattern match which matches different values simultaneously.
Parallel pattern matchingIn many cases, nested pattern matches may be combined into a single pattern match. This functionality is often obtained by combining variables into a tuple which is then matched over. This is known as parallel pattern matching. For example, the previous function could have been written:
# let number_equal a b = match (a, b) with (Integer i, Integer j) > i = j  (Integer i, (Real x  Complex (x, 0.)))  ((Real x  Complex (x, 0.)), Integer i) > x = float_of_int i  ((Real x1  Complex (x1, 0.)), (Complex (x2, 0.)  Real x2)) > x1 = x2  ((Integer _  Real _), Complex _)  (Complex _, (Integer _  Real _)) > false  (Complex (x1, y1), Complex (x2, y2)) > x1 = x2 && y1 = y2;; val number_equal : number > number > bool = <fun>
As a core feature of the OCaml language, pattern matching will be used extensively in the remainder of this book, particularly when dissecting data structures in chapter . One remaining form of pattern matching in OCaml programs appears in the handling of exceptions.
ExceptionsIn many programming languages, program execution can be interrupted by the raising^{} of an exception. This is a useful facility, typically used to handle problems such as failing to open a file or an unexpected flow of execution (e.g. due to a program being given invalid input) but exceptions are also useful as an efficient means to escape a computation, as we shall see in section .
Like a variant constructor in OCaml, the name of an exception must begin with a capital letter and an exception may or may not carry an associated value. Before an exception can be used, it must declared. An exception which does not carry associated data may be declared as:
exception Name
An exception which carries associated data of type type may be declared:
exception Name of type
Exceptions are raised using the
# raise (Failure "My problem");; Exception: Failure "My problem".
Exceptions may also be caught using the syntax:
try expr with pattern1 > expr1  pattern2 > expr2  pattern3 > expr3  …
where expr is evaluated and its result returned if no exception was raised. If an exception was raised then the exception is matched against the patterns and the value of the corresponding expression (if any) is returned instead.
Note that, unlike other pattern matching constructs, patterns matching over exceptions need not account for all eventualities — any uncaught exceptions simply continue to propagate.
For example, an exception called
# exception ZeroLength;; exception ZeroLength
A function to normalise a 2D vector r=(x,y) to create a unit 2D vector r=r/r, catching the erroneous case of a zerolength vector, may then be written:
# let norm (x, y) = let l = sqrt (x*.x +. y*.y) in if l = 0. then raise ZeroLength else let il = 1. /. l in (il*.x, il*.y);; val norm : float * float > float * float = <fun>
Applying the
# norm (3., 4.);;  : float * float = (0.600000000000000089, 0.8)
Applying the
# norm (0., 0.);; Exception: ZeroLength.
A “safe” version of the norm function might catch this exception and return some reasonable result in the case of a zerolength vector:
# let safe_norm r = try norm r with ZeroLength > (0., 0.);; val safe_norm : float * float > float * float = <fun>
Applying the
# safe_norm (3., 4.);;  : float * float = (0.600000000000000089, 0.8)
However, applying the
# safe_norm (0., 0.);;  : float * float = (0., 0.)
The use of exceptions to handle unusual occurrences, such as in the
Another important application is the use of exceptions to escape computations. The usefulness of this way of exploiting exceptions cannot be fully understood without first understanding data structures and algorithms and, therefore, this topic will be discussed in much more detail in chapter and again, in the context of performance, in chapter .
The
Support for exceptions is not uncommon in modern languages. However, the automatic generalisation of functions over all types of data for which they are valid is rather unusual and is discussed next.
Polymorphism
As we have seen, OCaml will infer types in a program. But what if
a specific type cannot be inferred? In this case, OCaml will create
a polymorphic function which can act on any suitable type. For example,
the following defines a higherorder function
# let f g x = g (g x);; val f : ('a > 'a) > 'a > 'a = <fun>
Note that OCaml uses the notation
# f (fun x > x * x) 2;;  : int = 16 # f (fun x > x *. x) 2.;;  : float = 16.
Types may be constrained by specifying types in a definition using
the syntax
# let f g (x : float) = g (g x);; val f : (float > float) > float > float = <fun>
Although omitting the brackets results in the same types being inferred in this case:
# let f g x : float = g (g x);; val f : (float > float) > float > float = <fun>
The syntax of this latter form actually constrains the return type
of the function
Variant types may contain polymorphic types, in which case the name of the variant type must be preceded by its polymorphic type arguments. For example, the polymorphic option:
# type 'a option = None  Some of 'a type 'a option = None  Some of 'a
can have values such as
# (Some 1, Some 2, None);;  : int option * int option * 'a option = (Some 1, Some 2, None)
In contrast, the elements of a list must all be of the same type and,
therefore, a
# [Some 1; Some 2; None];;  : int option list = [Some 1; Some 2; None]
The polymorphic option type
Many polymorphic functions are provided by the language. Most notably
the comparison operators # compare 1 2;;  : int = 1 # compare "slug" "plug";;  : int = 1
The
# max 1 2;;  : int = 2 # min "slug" "plug";;  : string = "plug"
Before completing this introduction to OCaml, we have one remaining exotic topic to cover.
CurryingA curried function is a function which returns a function as its result. Curried functions are best introduced as a more powerful alternative to the conventional (noncurried) functions provided by imperative programming languages.
Effectively, imperative languages only allow functions which accept a single value (often a tuple) as an argument. For example, a raisetothepower function for integers would have to accept a single tuple as an argument which contained the two values used by the function:
# let rec ipow (x, n) = if n = 0 then 1. else x *. ipow (x, n  1);; val ipow : float * int > float = <fun>
But, as we have seen, OCaml also allows:
# let rec ipow x n = if n = 0 then 1. else x *. ipow x (n  1);; val ipow : float > int > float = <fun>
This latter approach is actually a powerful generalization of the former, only available in functional programming languages.
The difference between these two styles is subtle but important. In the latter case, the type can be understood to be:
val ipow : float > (int > float)
i.e. this
Now that we have examined the syntax of the OCaml language, we shall explain why the exotic programming styles offered by OCaml are highly relevant in the context of scientific computing.
Functional vs Imperative programmingThe vast majority of current programmers write in imperative languages and use an imperative style. This refers to the use of statements or expressions which are designed to act by way of a sideeffect.
For example, the following declares a mutable variable called
# let x = ref 2;; val x : int ref = {contents = 2} # x := !x * !x;;  : unit = () # !x;;  : int = 4
The only action of the statement “
The functional equivalent to this imperative style is to define a new value (in this case, of the same name so that the old value is superseded):
# let x = 2;; val x : int = 2 # let x = x * x;; val x : int = 4 # x;;  : int = 4
Purely functional programming has several advantages over imperative programming:
OCaml supports both functional and imperative programming and, hence, is known as an impure functional programming language. In particular, the OCaml core library provides implementations of several imperative data structures (strings, arrays and hash tables) as well as functional data structures (lists, sets and maps). We shall examine these data structures in detail in chapter .
In addition to mutable data structures, the OCaml language provides
looping constructs for imperative programming. The
# let x = ref 5;; val x : int ref = {contents = 5} # while !x > 0 do decr x; done;;  unit = () # !x;;  : int = 0
The
# for i = 1 to 5 do incr x; done;;  unit = () # !x;;  : int = 5
Thus,
In practice, the ability to choose between imperative and functional
styles when programming in OCaml is very productive. Many programming
tasks are naturally suited to either an imperative or a functional
style. For example, portions of a program which deal with user input,
such as mouse movements and keypresses, are likely to benefit from
an imperative style where the program maintains a state and user input
may result in a change of state. In contrast, functions dealing with
the manipulation of complex data structures, such as trees and graphs,
are likely to benefit from being written in a functional style, using
recursive functions and immutable data, as this greatly simplifies
the task of writing such functions correctly. In both cases, functions
can refer to themselves — recursive functions. However, recursive
functions are pivotal in functional programming, where they are used
to implement functionality equivalent to the
RecursionWhen a programmer is introduced to the concept of functional programming for the first time, the way to implement simple programming constructs such as loops does not appear obvious. If the loop variable cannot be changed then how can the loop proceed?
In essence, the answer to this question lies in the ability to convert looping constructs, such as mathematical sums and products, into recursive constructs, such as recurrence relations. For example, the factorial function is typically considered to be a product with the special case 0!=1. However, this may also be expressed as a recurrence relation. Both the product and recurrencerelation forms of the factorial function may be expressed in OCaml. The product form is most obviously implemented in an imperative style, using mutable variables which are iteratively updated to accumulate the value of the product:
# let factorial n = let ans = ref 1 and n = ref n in while (!n > 1) do ( ans := !ans * !n; decr n ) done; !ans;; val factorial : int > int = <fun> # factorial 5;;  val int = 120
In contrast, the recurrence relation can be implemented more simply, as a recursive function:
# let rec factorial n = if n < 1 then 1 else n * factorial (n  1);; val factorial : int > int = <fun> # factorial 5;;  val int = 120
In the case of the factorial function, the functional style is considerably more concise and, more importantly, is much easier to reason over, i.e. the functional version is more easily seen to be correct. For sophisticated and intrinsically complicated computations, these advantages result in functional programs often being both simpler and more reliable than their imperative equivalents.
However, functional programming is not always preferable to imperative programming. Many problems naturally lend themselves to either imperative or functional styles. Clearly the factorial function is most easily implemented when considered as a recurrence relation. Other computations are most naturally represented as sums and products. For example, the dot product a.b of a pair of ddimensional vectors a and b is most naturally represented as a sum. This sum can be computed by a rather obfuscated recursive function:
# let dot a b = let len = Array.length a in if len <> Array.length b then invalid_arg “dot” else let rec aux i accu = if i < len then aux (i+1) (accu +. a.(i) *. b.(i)) else accu in aux 0 0.;; val dot : float array > float array > float = <fun>
or by a clearer iterative function:
# let dot a b = let len = Array.length a in if len <> Array.length b then invalid_arg “dot” else let r = ref 0. in for i = 0 to len  1 do r := !r +. a.(i) *. b.(i) done; !r;; val dot : float array > float array > float = <fun>
For example, (1,2,3).(2,3,4)=20:
# dot [1.; 2.; 3.] [2.; 3.; 4.];;  : float = 20.
In this case, the imperative form of the vector dot product is easier to understand than the recursive form. Regardless of the choice of functional or imperative style, structured design and implementation is an important way to manage complicated problems.
Finally, this introductory chapter would not be complete without providing a taste of the value of OCaml in the context of scientific computing.
ApplicabilityConventional languages vehemently separate functions from data. In contrast, OCaml allows the seamless treatment of functions as data. Specifically, OCaml allows functions to be stored as values in data structures, passed as arguments to other functions and returned as the results of expressions, including the returnvalues of functions. As we shall now demonstrate, this ability can be of direct relevance to scientific applications.
Many numerical algorithms are most obviously expressed as a function
which accepts and acts upon another function. For example, consider
a function called d which calculates a numerical approximation
to the derivative of a given, oneargument function. The function
d accepts a function f:R→R and a
value x and returns a function to compute an approximation to the
derivative df/dx given by d. This is easily written in OCaml as the curried function
# let d f x = let eps = sqrt epsilon_float in ((f (x +. eps)) . (f (x . eps))) /. (2. *. eps);; val d : (float > float) > float > float = <fun>
For example, consider the function f(x)=x^{3}−x−1:
# let f x = x *. x *. x . x . 1.;; val f : float > float = <fun>
The higherorder function
# d f 2.;;  : float = 10.9999999701976776
More importantly, as
# let f' = d f;; val f' : float > float = <fun>
The function f' can now be used to calculate a numerical approximation to the derivative of f for any x. For example, f'(2)=11:
# f' 2.;;  : float = 10.9999999701976776
As this demonstrates, functional programming languages such as OCaml offer many considerable improvements over conventional languages used for scientific computing. Before continuing, readers should be warned that, once learned, the techniques presented in this book soon become indispensable and, therefore, there is no going back after this chapter.

© Flying Frog Consultancy Ltd., 2007  Contact the webmaster 