![]() |
![]() |
| OCaml for Scientists |
Objective Caml for ScientistsThis is a web-friendly version of the first chapter of the book OCaml for Scientists. Chapter 1: IntroductionA Brief History of OCamlThe Meta-Language (ML) was originally developed at Edinburgh University in the 1970's as a language designed to efficiently represent other languages. The language was pioneered by Robin Milner for the Logic of Computable Functions (LCF) theorem prover. The original ML, and its derivatives, were designed to stretch theoretical computer science to the limit, yielding remarkably robust and concise programming languages which can also be very efficient.
The Categorical Abstract Machine Language (CAML) was the acronym originally used to describe what is now known as the Caml family of languages. Gérard Huet designed and implemented Caml at Institut National de Recherche en Informatique et en Automatique (INRIA) in France, until 1994. Since then, development has continued as part of projet Cristal, now led by Xavier Leroy.
Objective Caml (OCaml) is the current flagship language of projet Cristal. The Cristal group have produced freely available tools for this language. Most notably, an interpreter which runs OCaml code in a virtual machine (VM) and two compilers, one which compiles OCaml to a machine independent byte-code which can then be executed by a byte-code interpreter and another which compiles OCaml directly to native code. At the time of writing, the native-code compiler is capable of producing code for Alpha, Sparc, x86, MIPS, HPPA, PowerPC, ARM, ia64 and x86-64 CPUs and the associated run-time environment has been ported to the Linux, Windows, MacOS X, BSD, Solaris, HPUX, IRIX and Tru64 operating systems.
Benefits of OCamlBefore delving into the syntax of the language itself, we shall list the main, advantageous features offered by the OCaml language:
Running OCaml programsOCaml provides three different ways to execute code. We shall now examine each of these three approaches, explaining how code can be executed using them and noting their relative advantages and disadvantages.
Top-level
The OCaml top-level interactively interprets OCaml code and
is started by running the program
$ ocaml Objective Caml version 3.08.0 #
OCaml code may then be entered at this
# 1 + 3;; - : int = 4 #
The top-level will also print the type of the result as well as its
value (when the result has a value). For example, the following defines
a variable called
# let sqr x = x *. x;; val sqr : float -> float = <fun> #
This response indicates that a function called
- : type = value
or consisting of one or more descriptions of the form:
val name : type = value
where
Programs entered into the top-level execute almost as quickly as byte-code compiled programs (which is often quite a bit slower than native-code compiled programs). However, the interactivity of the top-level makes testing the validity of code segments much easier.
In the remainder of this book, we shall write numerous code snippets in this style, as if they had been entered into the top-level.
Byte-code compilation
When stored in a plain text file with the suffix “.ml”, an OCaml
program can be compiled to a machine independent byte-code using the
let _ = print_endline "Hello world!"
This file may be compiled at the Unix shell
$ ocamlc test.ml -o test
and then executed:
$ ./test Hello world!
In this case, the result was to print the string “Hello world!” onto the screen. Byte-code compilation is an adequate way to execute OCaml programs which do not perform intensive computations. If the time taken to execute a program needs to be reduced then native-code compilation can be used instead.
Native-code compilationThe “test.ml” program could equivalently have been compiled to native code, creating a stand-alone, native-code executable called “test”, using:
$ ocamlopt test.ml -o test
The resulting executable runs in exactly the same way:
$ ./test Hello world!
Programs compiled to native code, particularly in the context of numerically intensive programs, can be considerably faster to execute.
OCaml syntaxBefore we consider the features offered by OCaml, a brief overview of the syntax of the language is instructive, so that we can provide actual code examples later. Other books give more systematic, thorough and formal introductions to the whole of the OCaml language.
Language overviewIn this section we shall evolve the notions of values, types, variables, functions, simple containers (lists and arrays) and program flow control. These notions will then be used to introduce more advanced features in the later sections of this chapter.
When presented with a block of code, even the most seasoned and fluent
programmer will not be able to infer the purpose of the code. Consequently,
programs should contain additional descriptions written in plain English,
known as comments. In OCaml, comments are enclosed between
Just as numbers can be defined to be members of sets such as integer ( in Z), real ( in R), complex ( in C) and so on, so values in programs are also defined to be members of sets. These sets are known as types.
Types
Fundamentally, languages provide basic types and, often, allow more
sophisticated types to be defined in terms of the basic types. OCaml
provides a number of built-in types:
Only one value is of type
Integers are written
# 3;; - : int = 3 # 5.;; - : float = 5.
Arithmetic can be performed using the conventional
# 1 * 2 + 2 * 3;; - : int = 8
The floating-point infix functions have slightly different names,
suffixed by a full-stop:
# 1. *. 2. +. 2. *. 3.;; - : float = 8.
The distinct names of the operators for different types arises as
the most elegant solution to allowing the unambiguous inference of
types in the presence of different forms of arithmetic. The definition
of new operators is discussed later, in section .
In order to perform arithmetic using mixed types, functions such as
Unlike other languages, OCaml is phenomenally pedantic about types.
For example, the following fails because
# 2 * 2.;; This expression has type float but is here used with type int
Note that the OCaml top-level underlines the erroneous portion of the code.
Explicitly converting the value of type
# 2 * (int_of_float 2.);; - : int = 4
In general, arithmetic is typically performed using a single number
representation (e.g. either
Single characters (of type
Strings are written in double quotes, e.g.
# "Hello world!".[4];; - : char = 'o'
The character at index
A pair of strings may be concatenated using the
# "Hello " ^ "world!";; - : string = "Hello world!"
Booleans are either
# 1 < 3 && 2.5 < 2.7;; - : bool = true
Values may be assigned, or bound, to names. As OCaml is a functional language, these values may be expressions mapping values to values — functions. We shall now examine the binding of values and expressions to variable and function names.
Variables and functions
Variables and functions are both defined using the
# let a = 2;; val a : int = 2
Note that the language automatically infers types. In this case,
Definitions using
let name = expr1 in expr2
This evaluates expr1 and binds the result to the variable name before evaluating expr2. For example, the following evaluates a2 in the context a=3, giving 9:
# let a = 3 in a * a;; - : int = 9
Note that the value
# a;; - : int = 2
More recent definitions shadow previous definitions. For example, the following supersedes the definition a=2 with a=a× a in order to calculate 2×2×2×2=16:
# let a = 2 in let a = a * a in a * a;; - : int = 16
As OCaml is a functional language, values can be functions and variables
can be bound to them in exactly the same way as we have just seen.
Specifically, function definitions append a list of arguments between
the name of the function and the
# let sqr x = x * x;; val sqr : int -> int = <fun>
In this case, the use of the integer multiply
The function
# sqr 5;; - : int = 25
Typically, more sophisticated computations require the use of more complicated types. We shall now examine the three simplest ways by which more complicated types may be constructed.
Tuples, records and variants
Tuples are the simplest form of compound types, containing a fixed
number of values which may be of different types. The type of a tuple
is written analogously to conventional set-theoretic style, using
# (1, 2, 3);; - : int * int * int = (1, 2, 3)
A tuple containing n values is described as an n-tuple, e.g. the
tuple
Records are essentially tuples with named components, known as fields.
Records and, in particular, the names of their fields must be defined
using a
# type vec2 = { x:float; y:float };;
type vec2 = { x:float; y:float }
A value of this type representing the zero vector can then be defined using:
# let zero = { x=0.; y=0. };;
val zero : vec2 = {x = 0.; y = 0.}
Note that the use of a record with fields
Whereas the tuples are order-dependent, i.e. (1,2)≠(2,1), the
named fields of a record may appear in any order, i.e. { x=1,y=2}≡{ y=2,x=1}.
Thus, we could, equivalently, have provided the
# let zero = { y=0.; x=0. };;
val zero : vec2 = {x = 0.; y = 0.}
The fields in this record can be extracted individually using the
notation record
# zero.x;; - : float = 0.
Also, a shorthand
# let x_axis = { zero with x=1. };;
val x_axis : vec2 = {x = 1.; y = 0.}
Although OCaml is a functional language, OCaml does support imperative
programming. Fundamentally, record fields can be marked as mutable,
in which case their value may be changed. For example, the type of
a mutable, two-dimensional vector called
# type vec2 = { mutable x:float; mutable y:float };;
type vec2 = { mutable x : float; mutable y : float; }
A value
# let r = { x=1.; y=2. };;
val r : vec2 = {x = 1.; y = 2.}
The x-coordinate of the vector
# r.x <- 3.;; - : unit = ()
The side-effect of this expression has mutated the value of the variable
# r;;
- : vec = {x = 3.; y = 2.}
However, a record with a single, mutable field can often be useful.
This data structure, called a reference, is already provided
by the type
# let a = ref 2;;
val a : int ref = {contents = 2}
The type of
# !a;; - : int = 2
The value of
# a := 3;
- : unit = ()
# a;;
- : int ref = {contents = 3}
In the case of references to integers, two additional functions are
provided,
# incr a;;
- : unit = ()
# a;;
val a : int ref = {contents = 4}
The types of values stored in tuples and records are defined at compile-time. OCaml completely verifies the correct use of these types at compile-time. However, this is too restrictive in many circumstances. These requirements can be slightly relaxed by allowing a type to be defined which can acquire one of several possible types at run-time. These are known as variant types. OCaml still verifies the correct use of variant types as far as is theoretically possible.
Variant types are defined using the
# type button = On | Off;; type button = On | Off
The constructors
# On;; - : button = On # Off;; - : button = Off
In this case, the constructors
More usefully, constructors may take arguments, allowing them to convey
information by carrying data. The arguments are defined using
# type button = On of int * string | Off;; type button = On of int * string | Off
The
# On (1, "mine");; - : button = On (1, "mine") # On (2, "hers");; - : button = On (2, "hers") # Off;; - : button = Off
Types can also be defined recursively, which is very useful when defining more sophisticated data structures, such as trees. For example, a binary tree contains either zero or two binary trees and can be defined as:
# type binary_tree = Leaf | Node of binary_tree * binary_tree;; type binary_tree = Leaf | Node of binary_tree * binary_tree
A value of type
# Node (Node (Leaf, Leaf), Leaf);; - : binary_tree = Node (Node (Leaf, Leaf), Leaf)
Of course, we could also place data in the nodes to make a more useful data structure. This line of thinking will be pursued in chapter . In the mean time, let us consider two special data structures which have notations built into the language.
Lists and arrays
Lists are written
The types of lists and arrays of integers, for example, are written
# [1; 2; 3];; - : int list = [1; 2; 3] # [|1; 2; 3|];; - : int array = [|1; 2; 3|]
In the case of lists, the infix cons operator
# 1 :: [2; 3];; - : int list = [1; 2; 3]
In the case of arrays, the notation array
# [|1; 2; 3|].(1);; - : int = 2
Also, a short-hand notation can be used to represent lists or arrays
of tuples by omitting the parentheses. For example,
# [1, 2; 3, 4];; - : (int * int) list = [(1, 2); (3, 4)] # [|1, 2; 3, 4|];; - : (int * int) array = [|(1, 2); (3, 4)|]
The use and properties of lists, arrays and several other data structures will be discussed in chapter . In the mean time, we shall examine programming constructs which allow more interesting computations to be performed.
If
Like many other programming languages, OCaml provides an
if expr1 then expr2 if expr1 then expr2 else expr3
In both cases, expr1 must evaluate to a value of type
The former evaluates the boolean expression expr1 and, only
if the result is
if expr1 then expr2 else ()
The latter similarly evaluates expr1 but returning the result
of either expr2, if expr1 evaluated to
For example, the following function prints “Less than three” if the given argument is less than three:
# let f x = if x < 3 then print_endline "Less than three";; val f : int -> unit = <fun> # f 1;; Less than three - : unit = () # f 5;; - : unit = ()
The following function returns the string “Less” if the argument is less than 3 and “Greater” otherwise:
# let f x = if x < 3 then "Less" else "Greater";; val f : int -> string = <fun> # f 1;; - : string = "Less" # f 5;; - : string = "Greater"
The parts of the language we have covered can already be used to write some interesting programs. However, attention should be paid to the way in which programs are constructed from these parts.
Program composition
As we have seen, program segments may be written in the top-level
which replies by reciting the automatically inferred types and executing
expressions. However, the
# let f1 x = if x < 3 then print_endline "Less than three" let f2 x = if x < 3 then "Less" else "Greater";; val f1 : int -> unit = <fun> val f2 : int -> string = <fun>
Note that OCaml has determined that this input corresponds to two
separate function definitions. In fact, when written for the
let f1 x = if x < 3 then print_endline "Less than three" let f2 x = if x < 3 then "Less" else "Greater"
As we have seen, expressions which act by way of a side-effect (such
as printing) produce the value
# let f () = print_endline "A"; print_endline "B"; print_endline "C";; val f : unit -> unit = <fun>
Note that there is no final
More about functionsFunctions can also be defined anonymously, known as λ-abstraction in computer science. For example, the following defines a function f(x)=x× x which has a type representing f:Z→Z:
# fun x -> x * x;; - : int -> int = <fun>
This is an anonymous equivalent to the
# (fun x -> x * x) 2;; val : int = 4
Consequently, we could have defined
# let sqr = fun x -> x * x;; val sqr : int -> int = <fun>
Once defined, this version of the
The
# let ipow3 x = let sqr x = x * x in x * sqr x;; val ipow3 : int -> int = <fun>
Note that the function application
The
# let (a, b) = (3, 4);; val a : int = 3 val b : int = 4
This is particularly useful when factoring code. For example, the
following definition of the
# let ipow4 x = let sqr x = x * x in (sqr x) * (sqr x);; val ipow4 : int -> int = <fun>
Just as common subexpressions in a mathematical expression can be
factored, so the
# let (ipow3, ipow4) = let sqr x = x * x in ((fun x -> x * (sqr x)), fun x -> (sqr x) * (sqr x));; val ipow3 : int -> int = <fun> val ipow4 : int -> int = <fun>
Factoring code is an important way to keep programs manageable. In particular, programs can be factored much more aggressively in the presence of higher-order functions — something which can be done in OCaml but not Java, C++ or Fortran. We shall discuss such factoring of OCaml programs as a means of code structuring in chapter . In the mean time, we shall examine functions which perform computations by applying themselves.
As we have already seen, variable names in variable and function definitions
refer to their previously defined values. This default behaviour can
be overridden using the
# let rec ipow n m = if m = 0 then 1 else n * ipow n (m - 1);; val ipow : int -> int -> int = <fun>
For example, 216=65,536:
# ipow 2 16;; - : int = 65536
All of the programming constructs we have just introduced may be structured into modules.
Modules
In OCaml, modules are the most commonly used means to encapsulate
related definitions. For example, many function definitions pertaining
to lists are encapsulated in the
# List.length ["one"; "two"; "three"];; - : int = 3
The
The OCaml module system and program structuring in general are examined in chapter . We shall now examine some of the more advanced features of OCaml in more detail.
Pattern matching
As a program is executed, it is quite often necessary to choose the
future course of action based upon the value of a previously computed
result. As we have already seen, a two-way choice can be implemented
using the
Unlike conventional languages, OCaml allows the value of a previous result to be compared against various patterns - pattern matching. As we shall see, this approach is considerably more powerful than the conventional approaches.
The most common pattern matching construct in OCaml is in the
match expr with pattern1 -> expr1 | pattern2 -> expr2 | pattern3 -> expr3 | …
This evaluates expr and compares the resulting value firstly with pattern1 then with pattern2 and so on, until a pattern is found which matches the value of expr, in which case the corresponding expression (e.g. expr2) is evaluated and returned. A pattern is an expression composed of constants and variable names. When a pattern matches an argument, the variables are bound to values of the corresponding expressions.
Patterns may contain arbitrary data structures (tuples, records, variant
types, lists and arrays) and, in particular, the cons operator
For example, the following function compares its argument with several
possible patterns of type
# let f i = match i with 0 -> "Zero" | 3 -> "Three" | _ -> "Neither zero nor three";;
Applying this function to some expressions of type
# f 0;; - : string = "Zero" # f 1;; - : string = "Neither zero nor three" # f (1 + 2);; - : string = "Three"
As pattern matching is such a fundamental concept in OCaml programming, we shall provide several more examples using pattern matching in this section.
A function
# let is_empty_list l = l = [];; val is_empty_list : 'a list -> bool = <fun>
Using pattern matching, this example may be written using the
# let is_empty_list l = match l with [] -> true | _ -> false;; val is_empty_list : 'a list -> bool = <fun>
Note the use of the anonymous
The
# let is_empty_list = function [] -> true | _ -> false;; val is_empty_list : 'a list -> bool = <fun>
In general, functions which pattern match over their last argument
may be rewritten more succinctly using
Guarded patterns
Patterns can also have arbitrary tests associated with them, written
using the
# let f = function [i; j; k] when i - j - k = 0 -> true | _ -> false;; val f : int list -> bool = <fun> # f [2; 3];; - : bool = false # f [5; 2; 3];; - : bool = true # f [1; 2; 3];; - : bool = false
Subsequent patterns sharing the same variable bindings and corresponding expression may be written in the short-hand notation:
match … with pattern1 | pattern2 | … -> … | …
For example, the following function returns
# let is_sign = function -1 | 0 | 1 -> true | _ -> false;; val is_sign : int -> bool = <fun>
The sophistication provided by pattern matching may be misused. Fortunately, the OCaml compilers go to great lengths to enforce correct use, even brashly criticising the programmers style when appropriate.
Erroneous patterns
Sequences of patterns which match to the same corresponding expression
are required to share the same set of variable bindings. For example,
although the following function makes sense to a human, the OCaml
compilers object to the patterns
# let product a b = match (a, b) with (a, 0.) | (0., b) -> 0. | (a, b) -> a *. b;; Variable a must occur on both sides of this | pattern
In this case, this function could be corrected by using the anonymous
# let product a b = match (a, b) with (_, 0.) | (0., _) -> 0. | (a, b) -> a *. b;; val product : float -> float -> float = <fun>
This actually conveys useful information about the code. Specifically,
that the values matched by
OCaml uses type information to determine the possible values of expression being matched over. If the set of pattern matches fails to cover all of the possible values of the input then, at compile-time, the compiler emits:
Warning: this pattern-matching is not exhaustive
followed by examples of values which could not be matched. If a program
containing such pattern matches is executed and no matching pattern
is found at run-time then the
For example, in the context of the variant type:
# type int_option = None | Some of int;; type int_option = None | Some of int
The OCaml compiler will warn of a function matching only
# let extract = function Some i -> i;; Warning: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: None val extract : int_option -> int = <fun>
This
# extract (Some 3);; - : int = 3
but causes a
# extract None;;
Exception: Match_failure ("", 5, -40).
As some approaches to pattern matching lead to more robust programs, some notions of good and bad programming styles arise in the context of pattern matching.
Good styleThe compiler cannot prove that any given pattern match covers all eventualities in the general case. Thus, some style guidelines may be productively adhered to when writing pattern matches, to aid the compiler in its proofs:
As proof generation cannot be automated in general, the OCaml compilers do not try to prove that a sequence of guarded patterns will match all possible inputs. Instead, the programmer is expected to adhere to a good programming style, making the breadth of the final match explicit by removing the guard. For example, the OCaml compilers do not prove that the following pattern match covers all possibilities:
# let sign = function i when i < 0. -> -1 | 0. -> 0 | i when i > 0. -> 1;; Warning: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: 1. (However, some guarded clause may match this value.) val sign : float -> int = <fun>
In this case, the function should have been written without the guard on the last pattern:
# let sign = function i when i < 0. -> -1 | 0. -> 0 | _ -> 1;; val sign : float -> int = <fun>
Also, the OCaml compilers will try to determine any patterns which can never be matched. If such a pattern is found, the compiler will emit a warning. For example, in this case the first match accounts for all possible input values and, therefore, the second match will never be used:
# let product a b = match (a, b) with (a, b) -> a *. b | (_, 0.) | (0., _) -> 0.;; Warning: this match case is unused. val product : float -> float -> float = <fun>
When matching over the constructors of a type, all eventualities should be caught explicitly, i.e. the final pattern should not be made completely general. For example, in the context of a type which can represent different number representations:
# type number = Integer of int | Real of float;; type number = Integer of int | Real of float
A function to test for equality with zero could be written in the following, poor style:
# let bad_is_zero = function Integer 0 -> true | Real 0. -> true | _ -> false;; val bad_is_zero : number -> bool = <fun>
When applied to various values of type
# bad_is_zero (Integer (-1));; - : bool = false # bad_is_zero (Integer 0);; - : bool = true # bad_is_zero (Real 0.);; - : bool = true # bad_is_zero (Real 2.6);; - : bool = false
Although the
# let good_is_zero = function Integer i -> i = 0 | Real x -> x = 0.;; val good_is_zero : number -> bool = <fun>
Not only is the latter style more concise but, more importantly, this
style is more robust. For example, if whilst developing our program,
we were to supplement the definition of our
# type number = Integer of int | Real of float | Complex of float * float;; type number = Integer of int | Real of float | Complex of float * float
the
# let bad_is_zero = function Integer 0 -> true | Real 0. -> true | _ -> false;; val bad_is_zero : number -> bool = <fun>
Specifically, this function treats all values which are not zero-integers and zero-reals as being non-zero. Thus, zero-complex z=0+0i is incorrectly deemed to be non-zero:
# bad_is_zero (Complex (0., 0.));; - : bool = false
In contrast, the
# let good_is_zero = function Integer i -> i = 0 | Real x -> x = 0.;; Warning: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: Complex (_, _) val good_is_zero : number -> bool = <fun>
The programmer could then supplement this function with a case for complex numbers:
# let good_is_zero = function Integer i -> i = 0 | Real x -> x = 0. | Complex (x, y) -> x = 0. && y = 0.;; val good_is_zero : number -> bool = <fun>
The resulting function would then provide the correct functionality:
# good_is_zero (Complex (0., 0.));; - : bool = true
Clearly, the ability have such safety checks performed at compile-time can be very valuable. This is another important aspect of safety provided by the OCaml language, which results in considerably more robust programs.
Due to the ubiquity of pattern matching in OCaml programs, the number and structure of pattern matches can be non-trivial. In particular, patterns may be nested and may be performed in parallel.
Nested patterns
In some cases, nested pattern matches may be desirable. Inner pattern
matches may be bundled either into parentheses
# let number_equal a b = match a with Integer i -> begin match b with Integer j when i = j -> true | Complex (x, 0.) | Real x when x = float_of_int i -> true | Integer _ | Real _ | Complex _ -> false end | Real x | Complex (x, 0.) -> begin match b with Integer i when x = float_of_int i -> true | Complex (y, 0.) | Real y when x = y -> true | Integer _ | Real _ | Complex _ -> false end | Complex (x1, y1) -> begin match b with Complex (x2, y2) when x1 = x2 && y1 = y2 -> true | Integer _ | Real _ | Complex _ -> false end;; val number_equal : number -> number -> bool = <fun>
In many cases, nested patterns may be written more succinctly and, in fact, more efficient, when presented as a single pattern match which matches different values simultaneously.
Parallel pattern matchingIn many cases, nested pattern matches may be combined into a single pattern match. This functionality is often obtained by combining variables into a tuple which is then matched over. This is known as parallel pattern matching. For example, the previous function could have been written:
# let number_equal a b = match (a, b) with (Integer i, Integer j) -> i = j | (Integer i, (Real x | Complex (x, 0.))) | ((Real x | Complex (x, 0.)), Integer i) -> x = float_of_int i | ((Real x1 | Complex (x1, 0.)), (Complex (x2, 0.) | Real x2)) -> x1 = x2 | ((Integer _ | Real _), Complex _) | (Complex _, (Integer _ | Real _)) -> false | (Complex (x1, y1), Complex (x2, y2)) -> x1 = x2 && y1 = y2;; val number_equal : number -> number -> bool = <fun>
As a core feature of the OCaml language, pattern matching will be used extensively in the remainder of this book, particularly when dissecting data structures in chapter . One remaining form of pattern matching in OCaml programs appears in the handling of exceptions.
ExceptionsIn many programming languages, program execution can be interrupted by the raising of an exception. This is a useful facility, typically used to handle problems such as failing to open a file or an unexpected flow of execution (e.g. due to a program being given invalid input) but exceptions are also useful as an efficient means to escape a computation, as we shall see in section .
Like a variant constructor in OCaml, the name of an exception must begin with a capital letter and an exception may or may not carry an associated value. Before an exception can be used, it must declared. An exception which does not carry associated data may be declared as:
exception Name
An exception which carries associated data of type type may be declared:
exception Name of type
Exceptions are raised using the
# raise (Failure "My problem");; Exception: Failure "My problem".
Exceptions may also be caught using the syntax:
try expr with pattern1 < |