Batch compilation (camlc)
This chapter describes how Caml Light programs can be compiled
non-interactively, and turned into standalone executable files. This
is achieved by the command camlc, which compiles and links Caml
Light source files.
- Mac:
- This command is not a standalone Macintosh application.
To run camlc, you need the Macintosh Programmer's Workshop (MPW)
programming environment. The programs generated by camlc are also
MPW tools, not standalone Macintosh applications.
Overview of the compiler
The camlc command has a command-line interface similar to the one of
most C compilers. It accepts several types of arguments: source files
for module implementations; source files for module interfaces; and
compiled module implementations.
-
Arguments ending in .mli are taken to be source files for module
interfaces. Module interfaces declare exported global identifiers,
define public data types, and so on. From the file x.mli, the camlc
compiler produces a compiled interface in the file x.zi.
-
Arguments ending in .ml are taken to be source files for module
implementation. Module implementations bind global identifiers to
values, define private data types, and contain expressions to be
evaluated for their side-effects. From the file x.ml, the camlc
compiler produces compiled object code in the file x.zo. If
the interface file x.mli exists, the module implementation x.ml
is checked against the corresponding compiled interface x.zi,
which is assumed to exist. If no interface x.mli is provided, the
compilation of x.ml produces a compiled interface file x.zi in
addition to the compiled object code file x.zo. The
file x.zi produced corresponds to an interface that exports
everything that is defined in the implementation x.ml.
-
Arguments ending in .zo are taken to be compiled object code. These
files are linked together, along with the object code files obtained
by compiling .ml arguments (if any), and the Caml Light standard
library, to produce a standalone executable program. The order in
which .zo and .ml arguments are presented on the command line is
relevant: global identifiers are initialized in that order at
run-time, and it is a link-time error to use a global identifier
before having initialized it. Hence, a given x.zo file must come
before all .zo files that refer to identifiers defined in the file
x.zo.
The output of the linking phase is a file containing compiled code
that can be executed by the Caml Light runtime system: the
command named camlrun. If caml.out is the name of the file
produced by the linking phase, the command
camlrun caml.out arg1 arg2 ... argn
executes the compiled code contained in caml.out, passing it as
arguments the character strings arg1 to argn.
(See chapter 7 for more details.)
- Unix:
- On most Unix systems, the file produced by the linking
phase can be run directly, as in:
./caml.out arg1 arg2 ... argn
The produced file has the executable bit set, and it manages to launch
the bytecode interpreter by itself.
- PC:
- The output file produced by the linking phase is directly
executable, provided it is given extension .EXE. Hence, if the
output file is named caml_out.exe, you can execute it with the
command
caml_out arg1 arg2 ... argn
Actually, the produced file caml_out.exe is a tiny executable
file prepended to the bytecode file. The executable simply runs the
camlrun runtime system on the remainder of the file. (As a
consequence, this is not a standalone executable: it still requires
camlrun.exe to reside in one of the directories in the path.)
Options
The following command-line options are recognized by camlc.
- -c
-
Compile only. Suppress the linking phase of the
compilation. Source code files are turned into compiled files, but no
executable file is produced. This option is useful to
compile modules separately.
- -ccopt option
-
Pass the given option to the C compiler and linker, when linking in
``custom runtime'' mode (see the -custom option). For instance,
-ccopt -Ldir causes the C linker to search for C libraries in
directory dir.
- -custom
-
Link in ``custom runtime'' mode. In the default linking mode, the
linker produces bytecode that is intended to be executed with the
shared runtime system, camlrun. In the custom runtime mode, the
linker produces an output file that contains both the runtime system
and the bytecode for the program. The resulting file is considerably
larger, but it can be executed directly, even if the camlrun command
is not installed. Moreover, the ``custom runtime'' mode enables
linking Caml Light code with user-defined C functions, as described in
chapter 13.
- Unix:
- Never strip an executable produced with the -custom
option.
- PC:
- This option requires the DJGPP port of the GNU C compiler
to be installed.
- -g
-
Cause the compiler to produce additional debugging information.
During the linking phase, this option add information at the end of
the executable bytecode file produced. This information is required
by the debugger camldebug and also by the catch-all exception
handler from the standard library module printexc.
During the compilation of an implementation file (.ml file), when
the -g option is set, the compiler adds debugging information to the
.zo file. It also writes a .zix file that describes the full
interface of the .ml file, that is, all types and values defined in
the .ml file, including those that are local to the .ml file
(i.e. not declared in the .mli interface file). Used in conjunction
with the -g option to the toplevel system
(chapter 6), the .zix file gives access to the local
values of the module, making it possible to print or ``trace'' them.
The .zix file is not produced if the implementation file has no
explicit interface, since, in this case, the module has no local
values.
- -i
-
Cause the compiler to print the declared types, exceptions, and global
variables (with their inferred types) when compiling an implementation
(.ml file). This can be useful to check the types inferred by the
compiler. Also, since the output follows the syntax of module
interfaces, it can help in writing an explicit interface (.mli file)
for a file: just redirect the standard output of the compiler to a
.mli file, and edit that file to remove all declarations of
unexported globals.
- -I directory
-
Add the given directory to the list of directories searched for
compiled interface files (.zi) and compiled object code files
(.zo). By default, the current directory is searched first, then the
standard library directory. Directories added with -I are searched
after the current directory, but before the standard library
directory. When several directories are added with several -I
options on the command line, these directories are searched from right
to left (the rightmost directory is searched first, the leftmost is
searched last). (Directories can also be added to the search path from
inside the programs with the #directory directive; see
chapter 4.)
- -lang language-code
-
Translate the compiler messages to the specified language.
The language-code is fr for French, es for Spanish, de for
German, ... (See the file camlmsgs.txt in the Caml Light
standard library directory for a list of available languages.)
When an unknown language is specified, or no translation is available
for a message, American English is used by default.
- -o exec-file
-
Specify the name of the output file produced by the linker.
- Unix:
- The default output name is a.out, in keeping with the
tradition.
- PC:
- The default output name is caml_out.exe.
- Mac:
- The default output name is Caml.Out.
- -O module-set
-
Specify which set of standard modules is to be implicitly ``opened''
at the beginning of a compilation. There are three module sets
currently available:
- cautious
- provides the standard operations on integers,
floating-point numbers, characters, strings, arrays, ..., as well
as exception handling, basic input/output, etc. Operations from the
cautious set perform range and bound checking on string and array
operations, as well as various sanity checks on their arguments.
- fast
- provides the same operations as the cautious set, but
without sanity checks on their arguments. Programs compiled with
-O fast are therefore slightly faster, but unsafe.
- none
- suppresses all automatic opening of modules. Compilation
starts in an almost empty environment. This option is not of general
use, except to compile the standard library itself.
The default compilation mode is -O cautious. See
chapter 14 for a complete listing of the modules in the
cautious and fast sets.
- -p
-
Compile and link in profiling mode. See the description of the
profiler camlpro in chapter 11.
- -v
-
Print the version number of the compiler.
- -W
-
Print extra warning messages for the following events:
- A #open directive is useless (no identifier in the opened
module is ever referenced).
- A variable name in a pattern matching is capitalized (often
corresponds to a misspelled constant constructor).
- Unix:
- The following environment variable is also consulted:
- LANG
- When set, control which language is used to print the
compiler messages (see the -lang command-line option).
- PC:
- The following option is also supported:
- @response-file
-
Process the files whose names are listed in file
response-file, just as if these names appeared on the command line.
File names in response-file are separated by blanks (spaces,
tabs, newlines). This option allows to overcome silly limitations on
the length of the command line.
The following environment variables are also consulted:
- CAMLLIB
- Contain the path to the standard library directory.
- LANG
- When set, control which language is used to print the
compiler messages (see the -lang command-line option).
Modules and the file system
This short section is intended to clarify the relationship between the
names of the modules and the names of the files that contain their
compiled interface and compiled implementation.
The compiler always derives the name of the compiled module by taking
the base name of the source file (.ml or .mli file). That is, it
strips the leading directory name, if any, as well as the .ml or
.mli suffix. The produced .zi and .zo files have the same base
name as the source file; hence, the compiled files produced by the
compiler always have their base name equal to the name of the module
they describe (for .zi files) or implement (for .zo files).
For compiled interface files (.zi files), this invariant must be
preserved at all times, since the compiler relies on it to load the
compiled interface file for the modules that are used from the module
being compiled. Hence, it is risky and generally incorrect to rename
.zi files. It is admissible to move them to another directory, if
their base name is preserved, and the correct -I options are given to
the compiler.
Compiled bytecode files (.zo files), on the other hand, can be
freely renamed once created. That's because 1- .zo files contain the
true name of the module they define, so there is no need to derive
that name from the file name; 2- the linker never attempts to find by
itself the .zo file that implements a module of a given name: it
relies on the user providing the list of .zo files by hand.
Common errors
This section describes and explains the most frequently encountered
error messages.
- Cannot find file filename
-
The named file could not be found in the current directory, nor in the
directories of the search path. The filename is either a
compiled interface file (.zi file), or a compiled bytecode file
(.zo file). If filename has the format mod.zi, this
means you are trying to compile a file that references identifiers
from module mod, but you have not yet compiled an interface for
module mod. Fix: compile mod.mli or mod.ml
first, to create the compiled interface mod.zi.
If filename has the format mod.zo, this
means you are trying to link a bytecode object file that does not
exist yet. Fix: compile mod.ml first.
If your program spans several directories, this error can also appear
because you haven't specified the directories to look into. Fix: add
the correct -I options to the command line.
- Corrupted compiled interface file filename
-
The compiler produces this error when it tries to read a compiled
interface file (.zi file) that has the wrong structure. This means
something went wrong when this .zi file was written: the disk was
full, the compiler was interrupted in the middle of the file creation,
and so on. This error can also appear if a .zi file is modified after
its creation by the compiler. Fix: remove the corrupted .zi file,
and rebuild it.
- This expression has type t1, but is used with type t2
-
This is by far the most common type error in programs. Type t1 is
the type inferred for the expression (the part of the program that is
displayed in the error message), by looking at the expression itself.
Type t2 is the type expected by the context of the expression; it
is deduced by looking at how the value of this expression is used in
the rest of the program. If the two types t1 and t2 are not
compatible, then the error above is produced.
In some cases, it is hard to understand why the two types t1 and
t2 are incompatible. For instance, the compiler can report that
``expression of type foo cannot be used with type foo'', and it
really seems that the two types foo are compatible. This is not
always true. Two type constructors can have the same name, but
actually represent different types. This can happen if a type
constructor is redefined. Example:
type foo = A | B;;
let f = function A -> 0 | B -> 1;;
type foo = C | D;;
f C;;
This result in the error message ``expression C of type foo cannot
be used with type foo''.
Incompatible types with the same names can also appear when a module
is changed and recompiled, but some of its clients are not recompiled.
That's because type constructors in .zi files are not represented by
their name (that would not suffice to identify them, because of type
redefinitions), but by unique stamps that are assigned when the type
declaration is compiled. Consider the three modules:
mod1.ml: type t = A | B;;
let f = function A -> 0 | B -> 1;;
mod2.ml: let g x = 1 + mod1__f(x);;
mod3.ml: mod2__g mod1__A;;
Now, assume mod1.ml is changed and recompiled, but mod2.ml is not
recompiled. The recompilation of mod1.ml can change the stamp
assigned to type t. But the interface mod2.zi will still use the
old stamp for mod1__t in the type of mod2__g. Hence, when
compiling mod3.ml, the system complains that the argument type of
mod2__g (that is, mod1__t with the old stamp) is not compatible
with the type of mod1__A (that is, mod1__t with the new stamp).
Fix: use make or a similar tool to ensure that all clients of a
module mod are recompiled when the interface mod.zi
changes. To check that the Makefile contains the right dependencies,
remove all .zi files and rebuild the whole program; if no ``Cannot
find file'' error appears, you're all set.
- The type inferred for name,
that is, t,
contains non-generalizable type variables
-
Type variables ('a, 'b, ...) in a type t can be in either
of two states: generalized (which means that the type t is valid
for all possible instantiations of the variables) and not generalized
(which means that the type t is valid only for one instantiation
of the variables). In a let binding let name = expr,
the type-checker normally generalizes as many type variables as
possible in the type of expr. However, this leads to unsoundness
(a well-typed program can crash) in conjunction with polymorphic
mutable data structures. To avoid this, generalization is performed at
let bindings only if the bound expression expr belongs to the
class of ``syntactic values'', which includes constants, identifiers,
functions, tuples of syntactic values, etc. In all other cases (for
instance, expr is a function application), a polymorphic mutable
could have been created and generalization is therefore turned off.
Non-generalized type variables in a type cause no difficulties inside
a given compilation unit (the contents of a .ml file,
or an interactive session), but they cannot be allowed in types
written in a .zi compiled interface file, because they could be used
inconsistently in other compilation units. Therefore, the compiler
flags an error when a .ml implementation without a .mli interface
defines a global variable name whose type contains non-generalized type
variables. There are two solutions to this problem:
- Add a type constraint or a .mli interface to give a
monomorphic type (without type variables) to name. For instance,
instead of writing
let sort_int_list = sort (prefix <);;
(* inferred type 'a list -> 'a list, with 'a not generalized *)
write
let sort_int_list = (sort (prefix <) : int list -> int list);;
- If you really need name to have a polymorphic type, turn
its defining expression into a function by adding an extra parameter.
For instance, instead of writing
let map_length = map vect_length;;
(* inferred type 'a vect list -> int list, with 'a not generalized *)
write
let map_length lv = map vect_length lv;;
- mod__name is referenced before being defined
-
This error appears when trying to link an incomplete or incorrectly
ordered set of files. Either you have forgotten to provide an
implementation for the module named mod on the command line
(typically, the file named mod.zo, or a library containing
that file). Fix: add the missing .ml or .zo file to the command
line. Or, you have provided an implementation for the module named
mod, but it comes too late on the command line: the
implementation of mod must come before all bytecode object files
that reference one of the global variables defined in module
mod. Fix: change the order of .ml and .zo files on the command
line.
Of course, you will always encounter this error if you have mutually
recursive functions across modules. That is, function mod1__f calls
function mod2__g, and function mod2__g calls function mod1__f.
In this case, no matter what permutations you perform on the command
line, the program will be rejected at link-time. Fixes:
- Put f and g in the same module.
- Parameterize one function by the other.
That is, instead of having
mod1.ml: let f x = ... mod2__g ... ;;
mod2.ml: let g y = ... mod1__f ... ;;
define
mod1.ml: let f g x = ... g ... ;;
mod2.ml: let rec g y = ... mod1__f g ... ;;
and link mod1 before mod2.
- Use a reference to hold one of the two functions, as in :
mod1.ml: let forward_g =
ref((fun x -> failwith "forward_g") : <type>);;
let f x = ... !forward_g ... ;;
mod2.ml: let g y = ... mod1__f ... ;;
mod1__forward_g := g;;
- Unavailable C primitive f
-
This error appears when trying to link code that calls external
functions written in C in ``default runtime'' mode. As explained in
chapter 13, such code must be linked in ``custom runtime''
mode. Fix: add the -custom option, as well as the (native code)
libraries and (native code) object files that implement the required
external functions.