Batch compilation (camlc)

This chapter describes how Caml Light programs can be compiled non-interactively, and turned into standalone executable files. This is achieved by the command camlc, which compiles and links Caml Light source files.

Mac:: This command is not a standalone Macintosh application. To run camlc, you need the Macintosh Programmer's Workshop (MPW) programming environment. The programs generated by camlc are also MPW tools, not standalone Macintosh applications.

Overview of the compiler

The camlc command has a command-line interface similar to the one of most C compilers. It accepts several types of arguments: source files for module implementations; source files for module interfaces; and compiled module implementations.

Arguments ending in .mli are taken to be source files for module interfaces. Module interfaces declare exported global identifiers, define public data types, and so on. From the file x.mli, the camlc compiler produces a compiled interface in the file x.zi.
Arguments ending in .ml are taken to be source files for module implementation. Module implementations bind global identifiers to values, define private data types, and contain expressions to be evaluated for their side-effects. From the file x.ml, the camlc compiler produces compiled object code in the file x.zo. If the interface file x.mli exists, the module implementation x.ml is checked against the corresponding compiled interface x.zi, which is assumed to exist. If no interface x.mli is provided, the compilation of x.ml produces a compiled interface file x.zi in addition to the compiled object code file x.zo. The file x.zi produced corresponds to an interface that exports everything that is defined in the implementation x.ml.
Arguments ending in .zo are taken to be compiled object code. These files are linked together, along with the object code files obtained by compiling .ml arguments (if any), and the Caml Light standard library, to produce a standalone executable program. The order in which .zo and .ml arguments are presented on the command line is relevant: global identifiers are initialized in that order at run-time, and it is a link-time error to use a global identifier before having initialized it. Hence, a given x.zo file must come before all .zo files that refer to identifiers defined in the file x.zo.

The output of the linking phase is a file containing compiled code that can be executed by the Caml Light runtime system: the command named camlrun. If caml.out is the name of the file produced by the linking phase, the command

        camlrun caml.out arg₁ arg₂ ... arg_n

executes the compiled code contained in caml.out, passing it as arguments the character strings arg₁ to arg_n. (See chapter 7 for more details.)

Unix:

On most Unix systems, the file produced by the linking phase can be run directly, as in:

        ./caml.out arg₁ arg₂ ... arg_n

The produced file has the executable bit set, and it manages to launch the bytecode interpreter by itself.

PC:

The output file produced by the linking phase is directly executable, provided it is given extension .EXE. Hence, if the output file is named caml_out.exe, you can execute it with the command

        caml_out arg₁ arg₂ ... arg_n

Actually, the produced file caml_out.exe is a tiny executable file prepended to the bytecode file. The executable simply runs the camlrun runtime system on the remainder of the file. (As a consequence, this is not a standalone executable: it still requires camlrun.exe to reside in one of the directories in the path.)

Options

The following command-line options are recognized by camlc.

-c

Compile only. Suppress the linking phase of the compilation. Source code files are turned into compiled files, but no executable file is produced. This option is useful to compile modules separately.

-ccopt option

Pass the given option to the C compiler and linker, when linking in ``custom runtime'' mode (see the -custom option). For instance, -ccopt -Ldir causes the C linker to search for C libraries in directory dir.

-custom

Link in ``custom runtime'' mode. In the default linking mode, the linker produces bytecode that is intended to be executed with the shared runtime system, camlrun. In the custom runtime mode, the linker produces an output file that contains both the runtime system and the bytecode for the program. The resulting file is considerably larger, but it can be executed directly, even if the camlrun command is not installed. Moreover, the ``custom runtime'' mode enables linking Caml Light code with user-defined C functions, as described in chapter 13.

Unix:: Never strip an executable produced with the -custom option.

PC:: This option requires the DJGPP port of the GNU C compiler to be installed.

-g

Cause the compiler to produce additional debugging information. During the linking phase, this option add information at the end of the executable bytecode file produced. This information is required by the debugger camldebug and also by the catch-all exception handler from the standard library module printexc.

During the compilation of an implementation file (.ml file), when the -g option is set, the compiler adds debugging information to the .zo file. It also writes a .zix file that describes the full interface of the .ml file, that is, all types and values defined in the .ml file, including those that are local to the .ml file (i.e. not declared in the .mli interface file). Used in conjunction with the -g option to the toplevel system (chapter 6), the .zix file gives access to the local values of the module, making it possible to print or ``trace'' them. The .zix file is not produced if the implementation file has no explicit interface, since, in this case, the module has no local values.

-i

Cause the compiler to print the declared types, exceptions, and global variables (with their inferred types) when compiling an implementation (.ml file). This can be useful to check the types inferred by the compiler. Also, since the output follows the syntax of module interfaces, it can help in writing an explicit interface (.mli file) for a file: just redirect the standard output of the compiler to a .mli file, and edit that file to remove all declarations of unexported globals.

-I directory

Add the given directory to the list of directories searched for compiled interface files (.zi) and compiled object code files (.zo). By default, the current directory is searched first, then the standard library directory. Directories added with -I are searched after the current directory, but before the standard library directory. When several directories are added with several -I options on the command line, these directories are searched from right to left (the rightmost directory is searched first, the leftmost is searched last). (Directories can also be added to the search path from inside the programs with the #directory directive; see chapter 4.)

-lang language-code

Translate the compiler messages to the specified language. The language-code is fr for French, es for Spanish, de for German, ... (See the file camlmsgs.txt in the Caml Light standard library directory for a list of available languages.) When an unknown language is specified, or no translation is available for a message, American English is used by default.

-o exec-file

Specify the name of the output file produced by the linker.

Unix:: The default output name is a.out, in keeping with the tradition.

PC:: The default output name is caml_out.exe.

Mac:: The default output name is Caml.Out.

-O module-set

Specify which set of standard modules is to be implicitly ``opened'' at the beginning of a compilation. There are three module sets currently available:

cautious: provides the standard operations on integers, floating-point numbers, characters, strings, arrays, ..., as well as exception handling, basic input/output, etc. Operations from the cautious set perform range and bound checking on string and array operations, as well as various sanity checks on their arguments.
fast: provides the same operations as the cautious set, but without sanity checks on their arguments. Programs compiled with -O fast are therefore slightly faster, but unsafe.
none: suppresses all automatic opening of modules. Compilation starts in an almost empty environment. This option is not of general use, except to compile the standard library itself.

The default compilation mode is -O cautious. See chapter 14 for a complete listing of the modules in the cautious and fast sets.

-p

Compile and link in profiling mode. See the description of the profiler camlpro in chapter 11.

-v

Print the version number of the compiler.

-W

Print extra warning messages for the following events:

A #open directive is useless (no identifier in the opened module is ever referenced).
A variable name in a pattern matching is capitalized (often corresponds to a misspelled constant constructor).

Unix:

The following environment variable is also consulted:

LANG: When set, control which language is used to print the compiler messages (see the -lang command-line option).

PC:

The following option is also supported:

@response-file: Process the files whose names are listed in file response-file, just as if these names appeared on the command line. File names in response-file are separated by blanks (spaces, tabs, newlines). This option allows to overcome silly limitations on the length of the command line.

The following environment variables are also consulted:

CAMLLIB: Contain the path to the standard library directory.
LANG: When set, control which language is used to print the compiler messages (see the -lang command-line option).

Modules and the file system

This short section is intended to clarify the relationship between the names of the modules and the names of the files that contain their compiled interface and compiled implementation.

The compiler always derives the name of the compiled module by taking the base name of the source file (.ml or .mli file). That is, it strips the leading directory name, if any, as well as the .ml or .mli suffix. The produced .zi and .zo files have the same base name as the source file; hence, the compiled files produced by the compiler always have their base name equal to the name of the module they describe (for .zi files) or implement (for .zo files).

For compiled interface files (.zi files), this invariant must be preserved at all times, since the compiler relies on it to load the compiled interface file for the modules that are used from the module being compiled. Hence, it is risky and generally incorrect to rename .zi files. It is admissible to move them to another directory, if their base name is preserved, and the correct -I options are given to the compiler.

Compiled bytecode files (.zo files), on the other hand, can be freely renamed once created. That's because 1- .zo files contain the true name of the module they define, so there is no need to derive that name from the file name; 2- the linker never attempts to find by itself the .zo file that implements a module of a given name: it relies on the user providing the list of .zo files by hand.

Common errors

This section describes and explains the most frequently encountered error messages.

Cannot find file filename

The named file could not be found in the current directory, nor in the directories of the search path. The filename is either a compiled interface file (.zi file), or a compiled bytecode file (.zo file). If filename has the format mod.zi, this means you are trying to compile a file that references identifiers from module mod, but you have not yet compiled an interface for module mod. Fix: compile mod.mli or mod.ml first, to create the compiled interface mod.zi.

If filename has the format mod.zo, this means you are trying to link a bytecode object file that does not exist yet. Fix: compile mod.ml first.

If your program spans several directories, this error can also appear because you haven't specified the directories to look into. Fix: add the correct -I options to the command line.

Corrupted compiled interface file filename

The compiler produces this error when it tries to read a compiled interface file (.zi file) that has the wrong structure. This means something went wrong when this .zi file was written: the disk was full, the compiler was interrupted in the middle of the file creation, and so on. This error can also appear if a .zi file is modified after its creation by the compiler. Fix: remove the corrupted .zi file, and rebuild it.

This expression has type t₁, but is used with type t₂

This is by far the most common type error in programs. Type t₁ is the type inferred for the expression (the part of the program that is displayed in the error message), by looking at the expression itself. Type t₂ is the type expected by the context of the expression; it is deduced by looking at how the value of this expression is used in the rest of the program. If the two types t₁ and t₂ are not compatible, then the error above is produced.

In some cases, it is hard to understand why the two types t₁ and t₂ are incompatible. For instance, the compiler can report that ``expression of type foo cannot be used with type foo'', and it really seems that the two types foo are compatible. This is not always true. Two type constructors can have the same name, but actually represent different types. This can happen if a type constructor is redefined. Example:

        type foo = A | B;;
        let f = function A -> 0 | B -> 1;;
        type foo = C | D;;
        f C;;

This result in the error message ``expression C of type foo cannot be used with type foo''.

Incompatible types with the same names can also appear when a module is changed and recompiled, but some of its clients are not recompiled. That's because type constructors in .zi files are not represented by their name (that would not suffice to identify them, because of type redefinitions), but by unique stamps that are assigned when the type declaration is compiled. Consider the three modules:

        mod1.ml:    type t = A | B;;
                    let f = function A -> 0 | B -> 1;;

        mod2.ml:    let g x = 1 + mod1__f(x);;

        mod3.ml:    mod2__g mod1__A;;

Now, assume mod1.ml is changed and recompiled, but mod2.ml is not recompiled. The recompilation of mod1.ml can change the stamp assigned to type t. But the interface mod2.zi will still use the old stamp for mod1__t in the type of mod2__g. Hence, when compiling mod3.ml, the system complains that the argument type of mod2__g (that is, mod1__t with the old stamp) is not compatible with the type of mod1__A (that is, mod1__t with the new stamp). Fix: use make or a similar tool to ensure that all clients of a module mod are recompiled when the interface mod.zi changes. To check that the Makefile contains the right dependencies, remove all .zi files and rebuild the whole program; if no ``Cannot find file'' error appears, you're all set.

The type inferred for name, that is, t, contains non-generalizable type variables

Type variables ('a, 'b, ...) in a type t can be in either of two states: generalized (which means that the type t is valid for all possible instantiations of the variables) and not generalized (which means that the type t is valid only for one instantiation of the variables). In a let binding let name = expr, the type-checker normally generalizes as many type variables as possible in the type of expr. However, this leads to unsoundness (a well-typed program can crash) in conjunction with polymorphic mutable data structures. To avoid this, generalization is performed at let bindings only if the bound expression expr belongs to the class of ``syntactic values'', which includes constants, identifiers, functions, tuples of syntactic values, etc. In all other cases (for instance, expr is a function application), a polymorphic mutable could have been created and generalization is therefore turned off.

Non-generalized type variables in a type cause no difficulties inside a given compilation unit (the contents of a .ml file, or an interactive session), but they cannot be allowed in types written in a .zi compiled interface file, because they could be used inconsistently in other compilation units. Therefore, the compiler flags an error when a .ml implementation without a .mli interface defines a global variable name whose type contains non-generalized type variables. There are two solutions to this problem:

Add a type constraint or a .mli interface to give a monomorphic type (without type variables) to name. For instance, instead of writing

    let sort_int_list = sort (prefix <);;
    (* inferred type 'a list -> 'a list, with 'a not generalized *)

write

    let sort_int_list = (sort (prefix <) : int list -> int list);;

If you really need name to have a polymorphic type, turn its defining expression into a function by adding an extra parameter. For instance, instead of writing

    let map_length = map vect_length;;
    (* inferred type 'a vect list -> int list, with 'a not generalized *)

write

        let map_length lv = map vect_length lv;;

mod__name is referenced before being defined

This error appears when trying to link an incomplete or incorrectly ordered set of files. Either you have forgotten to provide an implementation for the module named mod on the command line (typically, the file named mod.zo, or a library containing that file). Fix: add the missing .ml or .zo file to the command line. Or, you have provided an implementation for the module named mod, but it comes too late on the command line: the implementation of mod must come before all bytecode object files that reference one of the global variables defined in module mod. Fix: change the order of .ml and .zo files on the command line.

Of course, you will always encounter this error if you have mutually recursive functions across modules. That is, function mod1__f calls function mod2__g, and function mod2__g calls function mod1__f. In this case, no matter what permutations you perform on the command line, the program will be rejected at link-time. Fixes:

Put f and g in the same module.

Parameterize one function by the other. That is, instead of having

mod1.ml:    let f x = ... mod2__g ... ;;
mod2.ml:    let g y = ... mod1__f ... ;;

define

mod1.ml:    let f g x = ... g ... ;;
mod2.ml:    let rec g y = ... mod1__f g ... ;;

and link mod1 before mod2.

Use a reference to hold one of the two functions, as in :

mod1.ml:    let forward_g =
                ref((fun x -> failwith "forward_g") : <type>);;
            let f x = ... !forward_g ... ;;
mod2.ml:    let g y = ... mod1__f ... ;;
            mod1__forward_g := g;;

Unavailable C primitive f

This error appears when trying to link code that calls external functions written in C in ``default runtime'' mode. As explained in chapter 13, such code must be linked in ``custom runtime'' mode. Fix: add the -custom option, as well as the (native code) libraries and (native code) object files that implement the required external functions.