ML Notes: Q&A

How can I return early from a function?
Can I generate a segfault in OCaml?
I just changed this function's type. Why is OCaml giving me errors like it hasn't changed?
How can I use a ppx with utop? -ppx tells me it can't find some command...
What's mangling the binary data that I'm inserting into this sqlite3 database?
Why does List.map manually unroll the loop?

How can I return early from a function?

Raise an exception. It's probably much more efficient than you expect.

let f x y z =
  let exception Return of int in
  try
    if x = 1 then raise (Return 0);
    if y = 2 then raise (Return 1);
    z
  with Return x -> x

Emphatically from that thread:

This -is- the way to do it in OCaml. Each language has idioms that are …. “intended”, and others that are “useful” or “it works”. And exceptions are -the- “intended” way to do nonlocal but AST-structural control-flow.

Can I generate a segfault in OCaml?

No. For example,

(* segfault.ml *)
external segfault : unit -> unit = "segfault"
let () = segfault ()

/* segfault_helper.c */
#include <caml/mlvalues.h>
#include <caml/bigarray.h>

CAMLprim value segfault(value unit) {
	int *p = 0;
	*p=1;
	return unit;
}

this does not demonstrate a segfault in OCaml:

$ gcc -I$(opam var lib)/ocaml -c -fPIC segfault_helper.c
$ ocamlopt -o segfault segfault.ml segfault_helper.o
$ ./segfault 
Segmentation fault (core dumped)

Because that's not OCaml. That's OCaml and C.

Likewise this does not demonstrate a segfault in OCaml:

$ ocaml
OCaml version 5.4.0
Enter #help;; for help.

# print_endline (Obj.magic 100);;
Segmentation fault (core dumped)

Because that's not OCaml. That's OCaml and the OCaml compiler implmentation.

By the way, it's not important for the C file to be named _helper, but it can't share the name of the .ml file, or you'll get a bunch of linking errors like:

:(.data+0x28): multiple definition of `camlSegfault.data_end'; segfault.o::(.data+0x28): first defined here
/usr/bin/ld: segfault.o: in function `camlSegfault.data_end':
:(.data+0x30): multiple definition of `camlSegfault.frametable'; segfault.o::(.data+0x30): first defined here

This also isn't a segfault in OCaml, but a segfault in Ctypes:

# #require "ctypes";;
# Ctypes.(!@(coerce (Ctypes.ptr void) (Ctypes.ptr int) null));;
Segmentation fault         (core dumped) utop

I just changed this function's type. Why is OCaml giving me errors like it hasn't changed?

OCaml's using the type in your .mli file, which you didn't update when you changed the function's type in the .ml file.

How can I use a ppx with utop? `-ppx` tells me it can't find some command...

-ppx is indeed for when you have a preprocessor command. A normal ppx you can -require like a library:

$ utop -require ppx_defer
utop # [%defer print_endline "world"]; print_endline "hello";;
hello
world
- : unit = ()

Or after starting utop without any options, with #require "ppx_defer";;

What's mangling the binary data that I'm inserting into this sqlite3 database?

Probably, nothing is, and what's surprising you is that the sqlite3 CLI application itself is mangling the data after querying it. Consider:

module DB = Sqlite3

let () =
  let gz = "before" ^ Ezgzip.compress "hello" ^ "after" in
  let db = DB.db_open "test.sqlite" in
  DB.Rc.check (DB.exec db "CREATE TABLE IF NOT EXISTS t (b BLOB)");
  let stmt = DB.prepare db "INSERT INTO t (b) VALUES (?)" in
  DB.Rc.check (DB.bind_values stmt [DB.Data.BLOB gz]);
  DB.Rc.check (DB.step stmt);
  DB.Rc.check (DB.finalize stmt);
  if not (DB.db_close db) then failwith "db_close"

and this interaction with the produced test.sqlite:

$ sqlite3 test.sqlite 'select b from t limit 1'|od -c
0000000   b   e   f   o   r   e   ^   _ 213   ^   H  \n
0000014
$ sqlite3 test.sqlite "select writefile('out.bin', b) from t limit 1"
36
$ od -c < out.bin
0000000   b   e   f   o   r   e 037 213  \b  \0  \0  \0  \0  \0  \0 377
0000020 313   H 315 311 311  \a  \0 206 246 020   6 005  \0  \0  \0   a
0000040   f   t   e   r
0000044

Why does List.map manually unroll the loop?

Why does List.map manually unroll the loop?

In response to deech's FP puzzle, OCaml has @tail_mod_cons. But if you look at List.map, it't not quite what you'd expect:

let[@tail_mod_cons] rec map f = function
    [] -> []
  | [a1] ->
      let r1 = f a1 in
      [r1]
  | a1::a2::l ->
      let r1 = f a1 in
      let r2 = f a2 in
      r1::r2::map f l

Why this extra trouble? silene explains: the TRMC transformation adds a call per recursion to caml_initialize which is expensive enough to cause a performance regression when List.map was made tail-recursive with this annotation. That C function is only

CAMLexport CAMLweakdef void caml_initialize (volatile value *fp, value val) {
  *fp = val;
  if (!Is_young((value)fp) && Is_block_and_young (val))
    Ref_table_add(&Caml_state->minor_tables->major_ref, fp);
}

but it's still not nothing.

Bonus: why name the results and not just return f a1 :: f a2 :: map f l? To ensure evaluation order, to not produce unnecessary garbage if one of the functions throws an exception, and to simplify code for the optimizer including the TRMC transformation. It's also nicer for debugging. A very easy discipline that separates list construction from the f's work.

More discussion: antron's List.map alternative.