![]() |
![]() |
| Benefits of OCaml |
Benefits of OCamlParsingMany applications, most notably compiler writing, require streams of characters to be converted into data structures such as trees. This conversion is known as parsing. As a language designed for compiler writing, OCaml provides tools to help with the parsing of character data. OCaml provides two different forms of parsing:
Recursive descent parsers are easily written using OCaml's stream parsing syntax. In this case, a parser typically consists of either a simple lexer generated by the Grammar-based parsing requires a grammar written in Backus-Naur Form (BNF) with associated actions written in OCaml. The ocamlyacc compiler can then be used to compile this grammar into OCaml source code implementing a parser for the grammar. Recursive descent parsers tend to be faster but can be more complicated. Consequently, recursive descent parsers are generally only used for simple tasks and grammar based parsers are used to parse more sophisticated languages. These techniques are best introduced by way of a simple calculator example. Our examples section contains a walkthrough of a complete interpreter for a purely functional programming language. Recursive descent parsingA calculator can be implemented in only a page of OCaml code using stream-based recursive descent parsing. The OCaml syntax extension for stream parsing is provided by camlp4, which must be loaded before the rest of the program:
# #load "camlp4o.cma";;
Camlp4 Parsing version 3.09.2
#
Now we're ready to start writing the calculator. We begin by defining a lexer:
# open Genlex;;
# let keywords = ["("; ")"; "+"; "-"; "*"];;
val keywords : string list = ["("; ")"; "+"; "-"; "*"]
# let lex stream =
let rec aux = parser
| [< 'Int n when n<0; t=aux >] -> [< 'Kwd "-"; 'Int (-n); t >]
| [< 'h; t=aux >] -> [< 'h; t >]
| [< >] -> [< >] in
aux(make_lexer keywords stream);;
val lex : char Stream.t -> Genlex.token Stream.t = <fun>
Note that we are careful to replace negative integers in the input stream with a minus sign followed by a positive integer. Next, we define a type to represent abstract syntax trees: # type expr = Num of int | Add of expr * expr | Mul of expr * expr;; type expr = Num of int | Add of expr * expr | Mul of expr * expr The recursive descent parser consists of three mutually-recursive functions:
# let rec parse_atom = parser
| [< 'Int n >] -> Num n
| [< 'Kwd "("; e=parse_expr; 'Kwd ")" >] -> e
and parse_factor = parser
| [< e1=parse_atom; stream >] ->
(parser
| [< 'Kwd "*"; e2=parse_factor >] -> Mul(e1, e2)
| [< >] -> e1) stream
and parse_expr = parser
| [< e1=parse_factor; stream >] ->
(parser
| [< 'Kwd "+"; e2=parse_expr >] -> Add(e1, e2)
| [< 'Kwd "-"; e2=parse_expr >] ->
Add(e1, Mul(Num(-1), e2))
| [< >] -> e1) stream;;
val parse_atom : Genlex.token Stream.t -> expr = <fun>
val parse_factor : Genlex.token Stream.t -> expr = <fun>
val parse_expr : Genlex.token Stream.t -> expr = <fun>
That is all that is required to parse simple arithmetic expressions. We can test it by lexing and parsing a string to get the abstract syntax tree representing the expression: # parse_expr(lex(Stream.of_string "1+2*(3+4)"));; - : expr = Add (Num 1, Mul (Num 2, Add (Num 3, Num 4))) The same program can be written slightly more verbosely but more extensibly using ocamlyacc. Grammar-based parsingA grammar-based parser is split into an ocamllex lexer, LexingWe begin by describing a lexer in ocamllex syntax in the file "calcLexer.mll":
rule lex = parse
| [' ' '\n' '\t'] { lex lexbuf }
| ['0'-'9']+ as s { INT(int_of_string s) }
| '+' { PLUS }
| '-' { MINUS }
| '*' { TIMES }
| '(' { OPEN }
| ')' { CLOSE }
| eof { EOF }
This lexer ignores whitespace, converts sequences of one or more digits into The tokens are actually defined in the parser (i.e. the lexer depends upon the parser). ParserYacc grammars are defined in Backus Naur form (BNF) and work by spotting patterns in an incoming stream of tokens. A simple ocamlyacc grammar for a calculator might be:
%{ type expr = Int of int | Add of expr * expr | Mul of expr * expr %}
%token <int> INT
%token PLUS MINUS TIMES OPEN CLOSE EOF
%start expr1
%type <expr> expr1
%left PLUS MINUS
%left TIMES DIVIDE
%%
expr:
| INT { Int $1 }
| OPEN expr CLOSE { $2 }
| expr PLUS expr { Add($1, $3) }
| expr MINUS expr { Add($1, Mul(Int (-1), $3)) }
| expr TIMES expr { Mul($1, $3) };
expr1:
| expr EOF { $1 };
The parser contains a preamble that defines the We can test this lexer and parser from the top-level by compiling them to OCaml source code: $ ocamllex calcLexer.mll 9 states, 267 transitions, table size 1122 bytes $ ocamlyacc calcParser.mly and then loading them into a running top-level: # #use "calcParser.ml";; ... # #use "calcLexer.ml";; ... Lexing is performed by a function called # expr1 lex (Lexing.from_string "1+2*(3+4)");; - : expr = Add (Int 1, Mul (Int 2, Add (Int 3, Int 4))) The ability to write efficient lexers and parsers succinctly in OCaml makes it ideally suited to interpreter and compiler writing.
|
| © Flying Frog Consultancy Ltd., 2007 | Contact the webmaster |