-
null : 'a list -> bool
returns true
if the list
in argument is empty. This is equivalent to
fn nil => true | _ => false
hd : 'a list -> 'a
returns the first element of
the list in argument, or raises Match
if the list is empty.
This is equivalent to:
fun hd (x :: _) = x
tl : 'a list -> 'a list
returns the list composed
of all but the first element of the list in argument, or raises
Match
if the list is empty. This is equivalent to:
fun tl (_ :: x) = x
len : 'a list -> int
computes the
length of a list.
nth : 'a list * int -> 'a
gets the n+1th element of
the list l when applied to the arguments l and n. If n is
out of range (n<0 or n>=len
l), Nth
is raised.
Note that elements are indexed starting at 0. nth
is
declared infix, left associative of precedence 9.
nthtail : 'a list * int -> 'a list
gets the nth tail of the list l when applied
to the arguments l and n. If n out of range (n<0 or
n>len
l), Nth
is raised. Note that tails are indexed starting at 0.
When n<len
l, op nth (
l,
n)
is the
first element of nthtail
l,
n)
; if n=len
l,
nthtail
l,
n)
is nil
.
@ : 'a list * 'a list -> 'a list
concatenates two lists. It could have been defined as:
fun op @ (nil,l) = l | op @ (a::l1,l2) = a:: op @(l1,l2)
@
is declared infix, right associative (as in Standard ML of
New Jersey; the Definition of
Standard ML defines it to be left associative) of
precedence 5.
append : 'a list list -> 'a list
is the
distributed concatenation of lists.
It could be defined as:
fun append nil = nil | append (l::r) = l @ append r
The application of append
to a list
comprehension is compiled in an
optimized form, where the concatenations are done on the fly, without
building the list comprehension first.
revappend : 'a list * 'a list -> 'a list
appends the
reversion of the first list to the
second list. We could define:
fun revappend (nil,l) = l | revappend (a::l1,l2) = revappend (l1,a::l2)
rev : 'a list -> 'a list
reverses
a list. It could be defined as:
fun rev l = revappend (l,nil)
map : ('a -> 'b) -> 'a list -> 'b list
applies its
first argument to each element in turn of the list in second
argument, and return the list of all results. This is equivalent
to:
fun map f l = [f x | x in list l]
app : ('a -> 'b) -> 'a list -> unit
applies its
first argument to each element in turn of the list in second
argument. This is used purely for side-effects.
fun app f l = iterate f x | x in list l end
fold : ('a * 'b -> 'b) -> 'b -> 'a list -> 'b
combines the elements of the list in third argument by applying
the binary operation in first argument on all elements of the
list. The second argument is assumed to be the neutral element of
the binary operation. For example, fold (op +) 0 l
computes
the sum of the elements of the list l
. fold
computes
from the right end of the list towards the left end. It can be
defined by:
fun fold f b =
let fun f2 nil = b
| f2 (e :: r) = f(e,f2 r)
in
f2
end
revfold : ('a * 'b -> 'b) -> 'b -> 'a list -> 'b
combines the elements of the list in third argument by applying
the binary operation in first argument on all elements of the
list. The second argument is assumed to be the neutral element of
the binary operation. For example, revfold (op +) 0 l
computes the sum of the elements of the list l
.
revfold
computes from the left end of the list towards the
right end. It can be defined by:
fun revfold f b =
let fun f2 b nil = b
| f2 b (e::r) = f2 (f (e,b)) r
in
f2
end
-
empty : (''a -m> 'b) -> bool
tests whether
a map is empty. It is equivalent to fn m => m={}
.
? : (''a -m> 'b) -> ''a -> 'b
applies a map to an element.
It is currified, so that the expression ?m(a)
retrieves the
element associated with a
in m
. If there is no
element associated with a
in m
, the exception
MapGet
is raised.
inset : ''a * (''a -m> 'b) -> bool
returns true
if its first argument belongs to the domain of the map in second argument. It can be defined as:
fun op inset (a,m) = (?m(a); true) handle MapGet => false
It is declared infix, of precedence 4.
?
and inset
share a one-entry cache, where the last maplet is stored, so that testing
inset
then using ?
incurs almost no speed penalty.
inmap : (''a * ''b) * (''a -m> 'b) -> bool
returns true
if its first argument is a couple (x,y) such that the maplet
x =>
y is
in the map in second argument. It can be defined as:
fun op inset (a,m) = (?m(a)=b) handle MapGet => false
It is declared infix, of precedence 4.
?
and inset
share a one-entry cache, where the last maplet is stored, so that testing
inset
then using ?
incurs almost no speed penalty.
subset : (''a -m> 'b) * (''a -m> 'b) -> bool
tests the inclusion of
the domains of maps in argument. It may be defined as:
fun op subset (m1,m2) = all x inset m2 | x in set m1 end
It is declared infix, of precedence 4.
submap : (''a -m> ''b) * (''a -m> ''b) -> bool
tests the
inclusion of maps: it
returns true if the first map is a submap of the second. It may be
defined as:
fun op submap (m1,m2) =
all x inset m2 andalso ?m1(x) = ?m2(x) | x in set m1 end
It is declared infix, of precedence 4.
dom : (''a -m> 'b) -> ''a set
computes the domain of a map; it is the
identity on sets. It is syntactic sugar for:
fun dom m = {x | x in set m}
rng : (''a -m> ''b) -> ''b set
computes the range of a map. The range of
a non-empty set is always {()}
. rng
is syntactic
sugar for:
fun rng m = {y | _ => y in map m}
card : (''a -m> 'b) -> int
computes
the cardinal of a map. It may be defined as:
fun card {} = 0 | card {_ => _} U m' = 1+card m'
<| : (''a -m> 'c) * (''a -m> 'b) -> (''a -m> 'b)
.
If s is a map and m is another map, then s <|
m is
the map m, whose domain is restricted to the elements in the
domain of s. This may be defined by:
fun op <| (s,m) = {x => y | x => y in map m such that x inset s}
It is declared infix, right associative of precedence 7.
<-| : (''a -m> 'c) * (''a -m> 'b) -> (''a -m> 'b)
.
If s is a map and m is another map, then s <-|
m is
the map m, whose domain is restricted to the elements outside
the domain of s. This may be defined by:
fun op <-| (s,m) = {x => y | x => y in map m such that not (x inset s)}
It is declared infix, right associative of precedence 7.
|> : (''a -m> ''b) * (''b -m> 'c) -> (''a -m> ''b)
if
the “range restrict to” function. If
s is a map and m is another map, the m |>
s is the
map m, where only the maplets x =>
y such that y is
in the domain of s are considered. This may be defined by:
fun op |> (m,s) = {x => y | x => y in map m such that y inset s}
It is declared infix, left associative of precedence 7.
|-> : (''a -m> ''b) * (''b -m> 'c) -> (''a -m> ''b)
if
the “range restrict by” function. If
s is a map and m is another map, the m |->
s is the
map m, where only the maplets x =>
y such that y is
outside the domain of s are considered. This may be defined by:
fun op |-> (m,s) = {x => y | x => y in map m such that not (y inset s)}
It is declared infix, left associative of precedence 7.
++ : (''a -m> 'b) * (''a -m> 'b) -> (''a -m> b)
is the map
overwriting operator. It takes two maps m
and m', and returns a map m'', whose domain is the union of
the domains of m and m', and which maps every element x of
the domain of m' to ?
m'(
x)
, and every
other element x to ?
m(
x)
. Thus m'' is
m, over which the associations of m' have been written.
To underwrite instead of overwriting, write
m' ++
m instead of m ++
m'. The only
difference is that m' will be evaluated before m.
++
is declared infix, left associative of precedence 6.
overwrite : (''a -m> 'b) list -> ''a -m> 'b
is the
distributed overwriting function. It returns
the first map in the list, overwritten by the second, the third,
etc. It is equivalent to:
fun overwrite nil = {}
| overwrite (m1::rest) = m1 ++ overwrite rest
The application of overwrite
to a list
comprehension is compiled in an
optimized form, where the overwritings are done on the fly, without
building the list comprehension first.
underwrite : (''a -m> 'b) list -> ''a -m> 'b
is the
distributed underwriting function. It
returns the last map in the list, overwritten by the next to last,
the penultimate, etc. It is equivalent to:
fun underwrite nil = {}
| underwrite (m1::rest) = underwrite rest ++ m1
The application of underwrite
to a list
comprehension is compiled in an
optimized form, where the underwritings are done on the fly, without
building the list comprehension first.
Note that overwrite o rev
=underwrite
, and
underwrite o rev
=overwrite
.
delta : (''a -m> 'b) * (''a -m> 'b) -> (''a -m> 'b)
computes the symmetric difference of
two maps. This is the overwriting of one map by another,
restricted by the intersection of the domains. It may be defined
by:
fun op delta (m,m') = (m' <-| m) ++ (m <-| m')
(or equivalently, (m <-| m') ++ (m' <-| m)
). It generalizes the
classical notion of symmetric difference of sets. It is declared
infix, left associative of precedence 7.
choose : (''a -m> 'b) -> ''a
chooses an element in the
domain of the map in argument. It raises Empty
if the map
is empty. It is syntactic sugar for:
fun choose {} => raise Empty
| choose {x => _,...} = x
or for:
fun choose m = case (some x | x in set m end) of
SOME x => x
| NONE => raise Empty
choose_rng : (''a -m> 'b) -> 'b
chooses an element in the
range, and raises Empty
if the map is empty. The element
chosen in the range is precisely the image by the map of the one
chosen by choose
, as shows the equivalent form:
fun choose_rng m = ?m(choose m)
split : (''a -m> 'b) -> (''a -m> 'b) * (''a -m> 'b)
splits a map in two disjoint maps, whose union is the original
one. In general, the splitting won't yield
maps of equal (or nearly equal) cardinalities. However, the
splitting has the following properties:
-
Splitting a map of cardinality greater than or equal to 2
yields two non empty maps. Hence, recursive procedures that
decompose their map arguments with
split
until its
cardinality goes down to 1 or 0 will always terminate.
- If
(
m1,
m2)=split
m, then all
elements in the domain of m1 are less than elements in the
domain of m2 in the system order.
- Splitting depends only on the domain, not on the range. That is, assume
if
dom
m=dom
m', and both
(
m1,
m2)
=split
m and
(
m'1,
m'2)
=split
m', then dom
m1=dom
m'1 and dom
m2=dom
m'2.
- Splitting has the statistical property that if two maps have
similar domains (that is, their domains differ for a small number of
elements), recursively splitting both domains will on average
unearth an optimum number of common subdomains. For a more detailed
description of this property, refer to [11].
As a consequence, recursive memo functions with
map arguments that recurse through splittings of their arguments
should be good incremental functions,
recomputing quickly their results on data similar to previous
similar data.
U : ''a set * ''a set -> ''a set
is the union of two sets. It may be defined as a
special case of map overwriting, because sets
are special cases of maps:
fun op U (s : ''a set,s' : ''a set) = s ++ s'
(or equivalently, s' ++ s
). It is declared infix, left
associative, of precedence 6.
& : ''a set * ''a set -> ''a set
is the intersection of two sets. It may
be defined as a special case of map domain restriction,
because sets are special cases of maps:
fun op & (s : ''a set,s' : ''a set) = s <| s'
It is declared infix, left associative, of precedence 7.
intersects : (''a -m> 'b) * (''a -m> 'c) -> bool
returns true
if its two arguments
have domains with a non-empty
intersection. It can be defined as:
fun op intersects (m,m') = not (empty (m <| m'))
It is declared infix, of precedence 4.
\ : ''a set * ''a set -> ''a set
is the
difference of two sets. It may be
defined as a special case of the domain restrict by operator,
because sets are special cases of maps:
fun op & (s : ''a set,s' : ''a set) = s' <-| s
It is declared infix, left associative, of precedence 7.
union : (''a set -m> 'b) -> ''a set
is the distributed union of the sets in the domain
of the argument, it is quite similar to the overwrite
and
underwrite
functions. It can be defined as:
fun union {} = {}
| union {s => _} = s
| union ss = let val (s1,s2) = split ss
in
union s1 U union s2
end
The application of union
to a set or map
comprehension is compiled in an
optimized form, where the unions are done on the fly, without building
the set comprehension first.
inter : ((''a -m> ''b) -m> 'c) -> (''a -m> ''b)
is the distributed intersection of maps (and
hence of sets, too) in the domain of its arguments. It is mostly
used when ''b
and 'c
are both unit
, in which
case it computes the distributed intersection of a set of sets.
It raises Empty
on the empty set. It can be defined as:
fun inter {} = raise Empty
| inter {s => _} = s
| inter ss = let val (s1,s2) = split ss
in
inter s1 & inter s2
end
The application of inter
to a set or map
comprehension is compiled in an
optimized form, where the intersections are done on the fly, without
building the set comprehension first.
mapadd : (''a * 'b) * (''a -m> 'b) -> (''a -m> 'b)
adds one maplet to a map,
overwriting the map. It may be defined as:
fun mapadd ((x,y),m) = m ++ {x => y}
It is only a bit faster than writing m ++ {x => y}
.
mapaddunder : (''a * 'b) * (''a -m> 'b) -> (''a -m> 'b)
adds one maplet to a map,
underwriting the map. It may be defined as:
fun mapaddunder ((x,y),m) = {x => y} ++ m
It is only a bit faster than writing {x => y} ++ m
(and the
order of evaluation is different).
mapremove : ''a * (''a -m> 'b) -> (''a -m> 'b)
removes a maplet from a
map. It may be defined as:
fun mapremove (x,m) = {x} <-| m
It is only a bit faster than writing {x} <-| m
.
inv : (''a -m> ''b) -> (''b -m> ''a)
inverses a map. Its definition is the
same as:
fun inv m = {y => x | x => y in map m}
so in case m
is not invertible, inv
returns a map that
maps y
to the largest x
in the system order such that
m
maps x
to y
.
O : (''b -m> 'c) * (''a -m> ''b) -> (''a -m> 'c)
is
the composition of maps. It is
precisely defined by:
fun op O (m,m') = {x => ?m(y) | x => y in map m' such that y inset m}
It is declared infix, left associative of precedence 3. It is the
map version of the o
function composition operator.
to : int * int -> int set
is
the range function. If a and b are
the first and second argument respectively, to
returns the
set of all integers x such that a≤ x≤ b. We could have
defined to
by:
fun op to (a,b) =
let fun f x = if x>b then {} else {x} U f(x+1)
in
f a
end
to
is declared infix, left associative of precedence 9, so
that we may write a to b
.
mapoflist : 'a list -> (int -m> 'a)
converts a list to a map from
its indices to its elements. It may be defined by:
fun mapoflist l =
let fun f(_,nil) = {}
| f(n,a::l) = {n => a} ++ f(n+1,l)
in
f(0,l)
end
inds : 'a list -> int set
computes
the set of indices of a list l
,
i.e, {0,
…,len l-1}
. It may be defined by:
fun inds l = 0 to (len l-1)
or by val inds = dom o mapoflist
.
elems : ''a list -> ''a set
computes the set of elements
of the list in argument. It may be defined by:
fun elems l = {x | x in list l}
or by val elems = rng o mapoflist
.
There is now an alternative data type for maps in HimML, the
-
table : unit -> (''_a, '_b) table
creates a fresh,
empty table. If tables were implemented as objects
of type (''a -m> 'b) ref
, this would be equivalent to:
fun table () = ref {};
t_get : (''a, 'b) table -> ''b -> 'a option
reads
an element off a table. Precisely, t_get t x
returns SOME y
if x
is mapped to y
by the table t
, and NONE
if
x
has no entry in t
. If tables were implemented as objects
of type (''a -m> 'b) ref
, this would be equivalent to:
fun t_get t x =
SOME (?(!t) x) handle MapGet => NONE;
t_put : (''a, 'b) table -> ''a * 'b -> unit
, called
as t_put t (x, y)
adds a new binding from x
to y
to the
table t
, erasing any previously existing binding for x
. If
tables were implemented as objects of type (''a -m> 'b) ref
, this
would be equivalent to:
fun t_put t (x, y) =
t := !t ++ {x => y};
t_put_behind : (''a, 'b) table -> ''a * 'b -> unit
, called
as t_put_behind t (x, y)
adds a new binding from x
to y
to the
table t
, except if any binding for x
already existed. If
tables were implemented as objects of type (''a -m> 'b) ref
, this
would be equivalent to:
fun t_put_behind t (x, y) =
t := {x => y} ++ !t;
t_remove : (''a, 'b) table -> ''a -> unit
,
removes any entry associated with its second argument from the table given
in first argument. If tables were implemented as objects of type (''a -m> 'b) ref
, this would be equivalent to:
fun t_remove t x =
t := {x} <-| !t;
t_iter : (''a, 'b) table -> (''a * 'b -> bool) -> bool
is the standard iterator over tables. This is meant to implement a form of
existential quantification, but can be used to loop over the table, and
doing some computation on each entry, just like the iterate
quantifier. Calling t_iter t f
iterates over all elements of the
table t
, in some indefinite order, calls f (x,y)
for each
entry x => y
in t
. This stops when the table has been fully
traversed, and then returns false
, or after the first call to
f
returns true
, in which case iter t f
returns
true
.
If tables were implemented as objects of type (''a -m> 'b) ref
, this would be equivalent to:
fun t_iter t f =
exists
f (x, y)
| x => y in map !t
end;
This can be used to implement iterate
-style loops, by writing
t_iter t (fn (x, y) => (f(x,y); false)); ()
Warning: it is definitely a bad idea to modify the table t
while you
iterate on it.
t_collect : (''a, 'b) table -> (''a * 'b -> (''c -m> 'd)) -> (''c -m> 'd)
is an iterator over tables, just like t_iter
.
This one is rather meant to implement set and map comprehensions:
t_collect t f
iterates over all elements of the table t
,
in some indefinite order (although it is guaranteed to be the same as
the one used by t_iter
), calls f (x,y)
for each entry x => y
in t
, and computes the overwrite
of all maps returned by each
call to f
.
If tables were implemented as objects of type (''a -m> 'b) ref
, this would be equivalent to:
fun t_collect t f =
overwrite [f (x, y)
| x => y in map !t];
For example, getting the contents of the table t
as a map
can be effected as:
t_collect t (fn (x, y) => {x => y})
Building the map of all x => y
in t
such that the predicate
P (x,y)
holds can be done by:
t_collect t (fn (x, y) => if P (x,y) then {x => y} else {})
Warning: it is definitely a bad idea to modify the table t
while you
iterate on it.
t_reset : (''a, 'b) table -> unit
resets the
table in argument. If tables were implemented as objects of type (''a -m> 'b) ref
, this would be equivalent to:
fun t_reset t = (t := {});
-
explode : string -> string list
converts a string into the list of its characters, where
characters are represented as one element strings. For instance
explode "abc"
is ["a","b","c"]
.
implode : string list -> string
is the inverse of explode
. If given a list of characters (one-character strings), it
produces the corresponding string. Actually, it accepts any list of
strings, whatever their size, and concatenates them, as a generalization of the intended semantics.
^ : string * string -> string
concatenates two strings. It
could have been defined as:
fun op ^ (s,s') = implode (explode s @ explode s')
(or implode [s,s']
, noticing the generalization of
implode
though ^
is more efficient. ^
is declared
infix, right associative of precedence 6.
concat : string list -> string
is the distributed concatenation
of strings. Because of the general character of implode
,
implode
and concat
are synonymous.
size : string -> int
computes the size of a string.
This may be defined by:
fun size s = len (explode s)
although it does not build the exploded list.
chr : int -> string
builds the string
having as only character the character with code n given as
argument. If n is
not the code of a character, the exception Ascii
is raised.
ord : string -> int
gets the code of the first character
of the string. The exception StringNth
is raised if the string
is empty (""
).
ordof : string * int -> int
gets the code of the n+1th character
in the string s, where s and n are the arguments. The index
n starts from 0. If n<0 or n is greater than or equal to
the size of s, the exception StringNth
is raised. This is
equivalent to:
fun ordof (s,n) = ord (explode s nth n) handle Nth => raise StringNth
substr : string * int * int -> string
extracts a substring from the string s, given
as first argument. Call i the second argument, and j the
third. substr(
s,
i,
j)
returns the
string of all characters from indices i included to j
excluded. substr
raises StringNth
if i<0 or
j>len
s, or i or j is not integer. It may be
defined as:
fun substr (s,i,j) =
let fun f k =
if k>=j
then ""
else chr (ordof (s,k)) ^ f(k+1)
in
f i
end
strless : string * string -> bool
compares strings in lexicographical
order, the order on characters being defined by the order on their
codes. Equivalently:
fun op strless (s,s') =
let fun less (_,nil) = false
| less (nil,_) = true
| less (c::l,c'::l') =
ord c<ord c' orelse
ord c=ord c' andalso less(l,l')
in
less (explode s,explode s')
end
strless
is declared infix, of precedence 4.
regexp : string ->
|[match : string -> (int * int) option,
matches : string -> bool,
matchsub : string * int * int -> (int * int) option,
subst : string -> string]|
builds a regular expression matcher
and substitution engine, on the model of the Unix grep
utility. This is based on Harry Spencer's regexp(3) package.
regexp
re builds a non-deterministic finite automaton
with some optimizations for recognizing the language defined by
the regular expression re. It returns a record with four methods,
match
, matches
and
matchsub
for matching, and subst
for substituting substrings matched previously by match
in a given string.
The match
function, applied on a string s to be searched,
looks for the first (leftmost) occurrence
of a substring of s that is matched by the
regular expression re. It returns
SOME
(i,j) if it has found one, and then
substring
(s,i,j) is the matched substring. If it
cannot find one, it returns NONE
.
For example:
val |[match,...]| = regexp "yp";
builds a match
function for finding the first occurrence of
"yp"
in a string. Then, to look for an occurrence of the latter
in various strings:
match "type checking";
returns SOME (1,3)
.
The syntax of regular expressions is mostly as in any other regexp
package. The special characters are:
-
^
matches the empty string, but only at the beginning
of the string s to be matched.
$
, similarly, matches the empty string, but at the
end of the string s to be matched. Therefore, to test whether
a string s is in the language defined by re, it is enough
to test:
case #match (regexp "^
re$") (
s,0,size
s) of SOME _ => true | _ => false
\\<
matches the empty string at the beginning of a word
(any sequence of letters, digits and underscores).
\\>
matches the empty string at the end of a word.
\\b
matches the empty string at the beginning or end of a word.
\\B
matches the empty string everywhere except
at the beginning or end of a word.
\\w
matches any word-constituent character (a letter, a digit,
or an underscore).
\\W
matches any non word-constituent character (anything but
a letter, a digit, or an underscore).
\\
(one backslash character) can be used to quote
the next character. This is useful when you wish to include
a special character, such as .
, as an ordinary character.
.
matches any single character.
[
...]
defines a character group, so that
[a-z]
matches any letter in small caps, [^A-Z]
matches any character that is not a capital letter, and so on.
(
...)
defines a group, on which operators
like *
or +
for example can be applied as a whole.
Groups can also be used to mark matched substrings for future
reference by the subst
function. For example:
val |[match,subst,...]| = regexp "^(foo)*(a*)(bar)*$";
match "foofooaaaaaabar";
subst "\\0;\\1;\\2;\\3";
returns "foofooaaaaaabar;foo;aaaaaa;bar"
.
\\1
, ..., \\9
can also be used to match
whatever was last recognized by the corresponding parenthesized
group. For example:
val |[subst,match,...]| = regexp "^(a*)b\\1$";
match "aaabaaa";
returns SOME (0,7)
, and indeed "aaabaaa"
is a group of
n a
s (here, n=3), followed by a b
, and then the
same group of n a
s. This can be used to recognize sets of
words that are non regular. (This feature does not work in Harry
Spencer's version of the regexp package on which the HimML version
is based, but works in HimML.)
*
is a suffix. If g is a letter, a character group or
a parenthesized group, g*
matches any sequence of zero
or more words, each one matched by g.
+
is as *
, except that it matches a sequence of
at least one word.
?
is as *
, except that it matches a sequence of
at most one word.
The matches
function is a trimmed down version of match
,
which only returns whether there is a match or none. So
#matches (regexp
re)
s is equivalent to:
(case #match (regexp
re)
s of SOME _ => true | NONE => false)
The matchsub
function is, on the contrary, a puffed up version
of match
, which does not examine the whole string to be
matched, but only a substring: #matchsub (regexp
re) (
s,
i,j)
looks for a substring of s that would be matched by
re, but only between positions i and j. It then returns
SOME (
i',j')
if it has found a match: i' and
j' are the bounds for the leftmost match between positions
i and j. Otherwise, it returns NONE
. Notice that
i' and j' are positions in s, not in the substring
substr (
s,i,j)
. In fact, #matchsub (regexp
re) (
s,
i,j)
is equivalent to:
(case |
#match (regexp re) (substr ( s,i,j)) of |
|
SOME (i',j') => SOME (i+i',i+j') |
|
| NONE => NONE)
|
As substr
, matchsub
raise the exception StringNth
if the positions i, j are out of bounds.
Any of these functions may raise the exception RE
n, where
n is an error code. regexp
itself may raise it, when the
regular expression re contains a syntax error, or contains too
deeply nested parentheses (the current limit is 10 levels), etc.
The remsg
function provides a hopefully legible error message.
A possible bug is that NUL characters (of code 0) may be recognized
erratically. It is better not to include any NUL characters in either
the regular expression or the string to be matched.
remsg : int -> string
translates the regular
expression error code in argument (as returned as argument of a RE
exception) into a string in human-readable form.
intofstring : string -> int
converts the
input string into an integer. This assumes the integer is written in
decimal; minus signs may be written in standard notation (-
) or
as in ML (~
). No error is returned, even if the input string is
not the decimal representation of an integer. To check this, call
regexp "[~-]?[0-9]+"
first.
Note: to convert back an integer i
to a string, use
let val f as |[convert, ...]| = outstring ""
in
print f (pack (i:int));
convert ()
end
numofstring : string -> num
converts the
input string into a real number. This assumes the real is written in
decimal; minus signs may be written in standard notation (-
) or
as in ML (~
). No error is returned, even if the input string is
not the decimal representation of a real. To check this, call
regexp "[~-]?[0-9]+(\\.[0-9]*)?([eEfFgG][~-]?[0-9]+)?"
first.
Note: to convert back a number x
to a string, use
let val f as |[convert, ...]| = outstring ""
in
print f (pack (x:num));
convert ()
end
-
primes : unit -> unit -> int
is a small prime enumerator
generator. That is, first call
primes ()
; this returns a function, call it nextprime
. Then
calling nextprime ()
repeatedly returns 2, then 3, 5, 7,
11, ..., i.e., all primes that hold in a machine integer (small
primes). After nextprime
has exhausted all small primes, raises
Lip 17
.
Note that each new call to primes ()
generates a new prime enumerator.
That is, just calling primes () ()
repeatedly will just return 2 each time,
by computing a new enumerator each time, and calling it only once.
zpollardrho : Int * int -> (Int * Int) option
is an implementation of Pollard's ρ algorithm. zpollardrho
(n, k)
tries to factor n, in k iterations or less. The value k=0 is special
and is meant to denote no bound on the number of iterations.
If the algorithm succeeds, it returns SOME
(r, c), where
r is a non-trivial factor of n, c = n/r and r≤ c. If
n < 0, returns SOME
(−1, n). If the algorithm fails,
returns NONE
.
Raises Lip 12
if a bug occurred in zpollardrho
. If
this happens, this should be reported to the author
goubault@lsv.ens-cachan.fr
, see MAINTENANCE at the end of
the OPTIONS file.
ztrydiv : Int * int * int -> (int * Int) option
applies to a large integer n, and two machine integers a and
b, and computes the smallest (positive) prime divisor p of n
that is ≥ a and ≤ b. If p exists, it returns
SOME
(p, n/p), otherwise NONE
.
This works by calling primes ()
to get a function that
enumerates all small primes. All primes < a are first
discarded, then all primes between a and b are tested. This
is only intended for large n, as it does not stop as soon as it
goes past sqrt(n), and for small a, as enumerating all primes
<a takes some time for large a.
zprime : Int * int -> bool
takes a large integer m
and a machine integer k, and tests whether m is prime. This runs a
probabilistic tests for at most k+1 runs. It first uses ztrydiv
to find small factors first. If none is found, the Miller-Rabin test
is run k+1 times: when zprime
returns false
, then m
is definitely not prime, otherwise it is prime with probability at
least 1−(1/4)k+1.
zrandl : int * (int -> int) -> Int
draws an equidistributed large integer of |k| bits, where
k is the first argument. If k<0, returns opposites of such
numbers. The second argument is a pseudo-random generator
function, like irand
, which takes a bound x and returns
an equidistributed random integer in [0, x[. Using irand
is not recommended for cryptographic applications, instead
cryptographically secure pseudo-random number generators should be
used.
The second argument should preferrably be a function that does not
raise any exception, otherwise memory leaks may occur.
zrandl1 : int -> Int
is equivalent to
fn nbits => zrandl (nbits, irand)
, but faster.
zrand : Int * (int -> int) -> Int
draws an equidistributed large integer of absolute value <m, where
m is the first argument (or returns 0 if m=0), and of the same
sign as m. The second argument is a pseudo-random generator
function, like irand
, which takes a bound x and returns
an equidistributed random integer in [0, x[. Using irand
is not recommended for cryptographic applications, instead
cryptographically secure pseudo-random number generators should be
used.
The second argument should preferrably be a function that does not
raise any exception, otherwise memory leaks may occur.
zrand1 : Int -> Int
is equivalent to
fn nbits => zrand (nbits, irand)
, but faster.
zrandprime : int * int * (int -> int) -> Int
applied to k, n and f, draws random primes of n bits.
This calls f to draw random numbers, which are then tested for
primality.
More precisely, if |n|≥ 2, then this returns a random probable
prime of |n| bits, where prime testing is done by calling
zprime
with argument k; if |n|<2, then it raises Lip 18
.
If n<0, in addition, the returned prime number is congruent
to 3 modulo 4.
This works by picking odd numbers of the right size, keeping
adding two until it is probably prime; or it is too large, in
which case we pick again, and start adding again, in accordance
with NIST DSS Appendix. Using f=irand
is not recommended for cryptographic applications, instead
cryptographically secure pseudo-random number generators should be
used.
The third argument should preferrably be a function that does not
raise any exception, otherwise memory leaks may occur.
zrandprime1 : int * int -> Int
is equivalent to
fn (nbits, ntries) => zrandprime (nbits, ntries, irand)
zrandpprime : int * int * Int * (int -> int) -> Int * Int
is used to generate a probable prime number p of exactly |k|
bits such that q divides p−1, where q is given. The
arguments are k, a machine integer n (used in calling
zprime
, so that p is prime with probability ≥ 1 −
(1/4)n+1), the large integer q, and a pseudo-random
generator function f taking a bound x and returning an
equidistributed random integer in [0, x[, e.g., irand
.
Raises Lip 14
if q≤ 0, Lip 18
if failed to
generate p, typically because q already uses ≥ |k| bits.
Otherwise, returns (p, ⌊ p/q ⌋). If k<0, in
addition, p will be congruent to 3 modulo 4.
This works by generating p at random amongst |k|-bit numbers such
that q divides p−1, until p is probably prime, in accordance
with NIST DSS Appendix. This only works if n is substantially larger
than znbits
(q). Using f=irand
is not recommended for cryptographic applications, instead
cryptographically secure pseudo-random number generators should be
used.
zrandpprime
is used to generate pairs (p, q) of prime numbers such
that q divides p−1, where p and q have specified numbers of bits.
To do this, first generate q using zrandprime
, then call zrandpprime
to generate p.
The fourth argument should preferrably be a function that does not
raise any exception, otherwise memory leaks may occur.
zrandpprime1 : int * int * Int -> Int * Int
is equivalent to
fn (k, n, q) => zrandpprime (k, n, q, irand)
zrandfprime : int * int * Int * (int -> int) -> Int * Int
generates a pair of probable prime numbers p and q such that
q has |k| bits and p = qr+1, where k and r are given. The
arguments are (k, n, r, f), where n is given to zprime
to test
whether both p and q are probably prime, r is the machine integer
above, and f is a pseudo-random generator taking a bound x and
returning an equidistributed random integer in [0, x[, e.g.,
irand
. If k<0, in addition q will be congruent to 3 modulo
4. Raises Lip 18
if fails, typically if r<2 or r is odd.
Using f=irand
is not recommended for cryptographic
applications, instead cryptographically
secure pseudo-random number generators should be used.
The fourth argument should preferrably be a function that does not
raise any exception, otherwise memory leaks may occur.
zrandfprime1 : int * int * Int -> Int * Int
is equivalent to
fn (k, n, r) => zrandfprime (k, n, r, irand)
zrandgprime : int * int * bool * (int -> int) -> Int * Int * Int
takes four arguments k, n, first, and f, and returns p, q, g
where p and q random probable primes (tested with zprime
with
argument n; if k<0, in addition, q is congruent to 3 modulo 4)
such that p=2q+1 (this uses zrandfprime
), and a generator g of
the multiplicative group of all numbers prime to p less than p. If
first is true
, then g will be the smallest generator, otherwise
g will be selected at random. Raises Lip 18
if it fails.
Using f=irand
is not recommended for cryptographic
applications, instead cryptographically
secure pseudo-random number generators should be used.
The fourth argument should preferrably be a function that does not
raise any exception, otherwise memory leaks may occur.
zrandgprime1 : int * int * bool -> Int * Int * Int
is equivalent to
fn (k, n, first) => zrandfprime (k, n, first, irand)
zecm : Int * int * int * int -> (Int * bool) option
, called on
(n, ncurves, ϕ1, ntries), tries to factor the large integer n.
It returns NONE
if n was found to be probably prime after ntries
primality tests. Otherwise, it returns SOME
(f, b): if b=true
,
then f is a non-trivial factor of n; if b=false
(which should
only happen in very exceptional circumstances), then a non-trivial factor
of n can be obtained by looking at the factorization of fn−f.
This runs by using Arjen Lenstra's elliptic curve
algorithm. Up to ncurves random elliptic curves are drawn with first
phase bound ϕ1 (this increases by 1.02 at each new elliptic curve).
This may raise Lip 19
if zecm
fails to reach any conclusion,
or if n is too small (on 32-bit machines, n≤ 215=32768; in general,
n > sqrt(max_int) is never too small: in these cases, use ztrydiv
first).
Raises Lip 15
if a bug occurred in zecm
. In case this
happens, this should be reported to the author
goubault@lsv.ens-cachan.fr
, see MAINTENANCE at the end of the
OPTIONS file.
-
print : 'a outstream -> dynamic -> unit
prints the value stored in the
dynamic on the specified output stream, in the
same format that is used by the toplevel loop.
The type is not printed, though. Strings are printed
with quotes, and with control characters escaped. If it is
desired to print strings merely as the sequence of their
characters, apply the put
method of the stream. Try:
#put stdout "Hello, world\n"
. (And don't forget to
#flush stdout ()
afterwards to see your result printed.)
pretty : 'a outstream -> dynamic -> unit
pretty-prints
the value stored in the dynamic on the specified
output stream, as done by the toplevel loop. The
type is not printed, and the right margin is the system default
(usually, 80).
stdout : |[put:string -> unit,flush:unit -> unit]|
is the standard output stream, that
prints on the console if not redirected. This stream is buffered,
the flush
method flushes the buffer.
stderr : |[put:string -> unit,flush:unit -> unit]|
is the standard error stream; it prints on the
console if not redirected. This stream is buffered, the flush
method flushes the buffer (on most operating systems, an end-of-line also
flushes the buffer).
outfile : string -> |[put:string -> unit,
flush:unit -> unit,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
truncate:unit -> unit,
close:unit -> unit]|
opens the file whose name is
given, creating it if it does not exist, reinitializing it to zero
length, and returns an output stream. If the file could not be
opened, the IO
exception is raised, applied to the error
code.
The stream is buffered, the flush
method flushes the buffer.
appendfile : string -> |[put:string -> unit,
flush:unit -> unit,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
truncate:unit -> unit,
close:unit -> unit]|
opens the file whose name is
given, creating it if it does not exist, setting the current
position to the end of file, and returns an output stream. If the
file could not be opened, the IO
exception is raised, applied
to the error code.
The stream is buffered, the flush
method flushes the buffer.
outprocess : string * string list -> |[put:string -> unit,
flush:unit -> unit,
kill:unit -> unit]|
creates a process which will execute the shell command given in
argument in parallel with the current HimML process. This shell
command is given in the form (command name, arguments). The
command name is searched for, using the PATH
shell variable,
with arguments as given.
Strings can be sent to the standard input of this process by using
the put
method (followed by flush
to really send
the message), and the process can be terminated by calling the
kill
method.
Contrarily to files, which can be closed and then revived when
necessary, a killed process cannot be revived, and an IO
exception will be raised when attempting to write to a dead process.
If the process could not be created, then an IO
exception is
raised (normally, IO 2
, “no such file or directory”).
The kill
method raises an IO
exception when the process
exited with a non-zero exit code (as returned by the C function
exit
). Then, if n is this code, IO
n is raised.
(To be fair, n is first taken modulo 256, and only if the result
is non-zero is the exception raised, with n mod 256 as argument.)
The child process can exit by itself, and this can be detected by
the fact that put
ting a string to the child (then
flush
ing, to be sure that the message has really been sent)
will raise an IO
error (normally, IO 32
, “broken
pipe”). It is then good policy to kill
the process, as it
allows the operating system to reclaim process structures allocated
for the child (at least on Unix, where this is necessary).
stdin : |[get:int -> string,getline:unit -> string]|
is the standard input stream, that reads
from the console if not redirected. This stream is usually
buffered, so that characters cannot be read until a newline
character \n
is typed as input.
infile : string -> |[get:int -> string,
getline:unit -> string,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
close:unit -> unit]|
opens the file whose name is
given, and returns an input stream. If the file could not be opened,
the IO
exception is raised, applied to the error code. The
stream is buffered for speed, but no flushing method should be
necessary.
inprocess : string * string list -> |[get:int -> string,
getline:unit -> string,
kill:unit -> unit]|
creates a process which will execute the shell command given in
argument in parallel with the current HimML process. This shell
command is given in the form (command name, arguments). The
command name is searched for, using the PATH
shell variable,
with arguments as given.
All text that is printed on the parallel process's standard output can be
read by the HimML process by using the get
and getline
methods, and the process can be terminated by calling the kill
method.
Contrarily to files, which can be closed and then revived when
necessary, a killed process cannot be revived, and an IO
exception will be raised when attempting to read from a dead process.
If the process could not be created, then an IO
exception is
raised (normally, IO 2
, “no such file or directory”).
The kill
method raises an IO
exception when the process
exited with a non-zero exit code (as returned by the C function
exit
). Then, if n is this code, IO
n is raised.
(To be fair, n is first taken modulo 256, and only if the result
is non-zero is the exception raised, with n mod 256 as argument.)
The child process can exit by itself, and this can be detected by
the fact that the get
and getline
methods will all
return empty strings (end of file, at least after the internal
buffer is emptied). It is then good policy to kill
the process,
as it allows the operating system to reclaim process structures
allocated for the child (at least on Unix, where this is necessary).
instring : string -> |[get:int -> string,
getline:unit -> string,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int]|
opens a stream for reading on the string given as argument
. This is a souped up version of the
input stream example at the beginning of the section.
outstring : string -> |[put:string -> unit,
seek:int -> unit,
advance:int -> unit,
seekend:int -> unit,
tell:unit -> int,
truncate:unit -> unit,
convert:unit -> string]|
opens a stream to write on, as if it were a file. The convert
method is used to get the current contents of the stream in the form
of a string. This is useful notably to print data to a
string. The stream is initialized with
the string given as argument to outstring
.
This is a souped up version of the output stream example at the
beginning of the section.
inoutprocess : string * string list -> |[get:int -> string,
getline:unit -> string,
put:string -> int,
flush:unit -> unit,
kill:unit -> unit]|
creates a process which will execute the shell command given in
argument in parallel with the current HimML process. This shell
command is given in the form (command name, arguments). The
command name is searched for, using the PATH
shell variable,
with arguments as given.
All text that is printed on the parallel process's standard output
can be read by the HimML process by using the get
and getline
methods; moreover, HimML can sent data, as text, by writing to its standard
input with the put
method, while flush
empties the output
buffers to really send the text to the process; and the latter can be
terminated by calling the kill
method.
Contrarily to files, which can be closed and then revived when
necessary, a killed process cannot be revived, and an IO
exception will be raised when attempting to read from a dead process.
For other remarks, see items inprocess
and outprocess
.
delete : string -> unit
deletes the file
whose name is given in argument. If any I/O error occurs, then the
exception IO
is raised, applied to the corresponding error
code. In particular, if the argument is the name of a directory,
rmdir
should be used instead.
rename : string * string -> unit
renames
the file whose name is given as first argument to the name given
as second argument. If any I/O error occurs, then the exception
IO
is raised, applied to the corresponding error code. Note
that the capabilities of rename
vary greatly from system to
system. For example, rename
can move files from any place
to any other place on BSD Unix; this is restricted on Unix System
V, Amiga and Mac systems to move files only inside the same volume
(file system).
filetype : string -> string set
takes
the name of a file and returns a set of properties of this file as
character strings. If the file does not exist or any other I/O
error occurs, then the exception IO
is raised, applied to
the corresponding error code. Otherwise, the following strings are
used for properties:
-
"a"
means the file is an archive (Amiga only);
"b"
means this is a block special file (Unix only);
"c"
means this is a character special file (Unix only);
"d"
means the file is a directory;
"e"
means the file is erasable, either by delete
or by rmdir
(by the owner, on Unix systems; by all, on other
systems); on Unix, "eg"
means the file is writable by users
in the same group, "eo"
means the file is writable by all
other users.
"f"
means this is a fifo, a.k.a. a pipe (Unix only);
"g"
means the file has the setgid bit set (Unix only);
"l"
means the file is a symbolic link (Unix only);
"n"
means this is a regular (normal) file;
"p"
means the file is pure (Amiga only);
"r"
means the file is readable (by the owner, on Unix
systems; by all, on other systems); on Unix, "rg"
means the
file is readable by users in the same group, "ro"
means the
file is readable by all other users.
"s"
means the file is a script file (Amiga only);
"S"
means the file is a socket (Unix only);
"u"
means the file has the setuid bit set (Unix only);
"v"
means the file has the “save swapped text after
use” option (Unix only, should be obsolete);
"w"
means the file is writable (by the owner, on Unix
systems; by all, on other systems); on Unix systems, if a file is
writable, it is also erasable; on Unix again, "wg"
means the
file is writable by users in the same group, "wo"
means the
file is writable by all other users.
"x"
means the file is executable (by the owner, on Unix
systems; by all, on other systems); on Unix again, "xg"
means
the file is writable by users in the same group, "xo"
means
the file is writable by all other users.
dir : string -> string set
takes a path
as input, which should be a correct path name for a directory, and
returns the set of all file names (except .
and ..
on Unix systems) inside this directory. If any I/O error occurs,
then the exception IO
is raised, applied to the
corresponding error code.
cd : string -> unit
changed the current directory
to the one specified by the path given as input. If any I/O error
occurs, then the exception IO
is raised, applied to the
corresponding error code.
pwd : unit -> string
returns a name for the
current directory, as changed by cd
. If any I/O error
occurs, then the exception IO
is raised, applied to the
corresponding error code. This may notably be the case if the path
name is too long.
mkdir : string -> unit
creates a new directory
by the name given in argument. If any I/O error occurs, then the
exception IO
is raised, applied to the corresponding error
code.
rmdir : string -> unit
creates a new directory
by the name given in argument. If any I/O error occurs, then the
exception IO
is raised, applied to the corresponding error
code. In particular, on most operating systems, rmdir
can
be used on empty directories only. To delete files, use
delete
.
system : string -> int
issues the shell
command given as argument. On Unix systems, this calls the
system(3S)
function, which calls a sh
-compatible
shell to execute the command. system
launches the
command, waits for it to return, and returns the exit code
of the command (0 if all went well). If an error occurs
(not enough memory, not enough processes available, command
interrupted by a signal, typically), then an IO
exception
is raised.
getenv : string -> string option
reads
the value of the environment variable whose name is given, and
returns NONE
if it has not been defined, and SOME
of its value, otherwise.
args : unit -> string list
returns the list of command-line
options given after the --
switch on HimML's command-line.
iomsg : int -> string
translates the I/O error
code in argument (as returned as argument of a IO
exception)
into a string in human-readable form. This is the same message as
the one printed by the C function perror()
on Unix systems.
leftmargin : int ref
defines the left
margin for printing, as a number of spaces to print at the
beginning of a new line when pretty-printing. It is ref 0
by default.
rightmargin : int ref
defines the
right margin for printing, as a number of columns (counted as
characters) from the beginning of the line where a new line has to
be forced by the pretty-printing functions. It is ref 80
by
default.
maxprintlines : int ref
defines
the limit on the number of lines printed by pretty-printing
functions, mainly to avoid looping while printing infinite
structures, or to avoid printing structures of humongous sizes
fully. This is also valid for printing values defined on the
toplevel, since pretty
is used for this purpose. The
default value is ref 100
. To suppress the limit in
practice, write maxprintlines := max_int
.
numformat : string ref
is a reference
to the string that is used to format floating-point values for
printing. Any C-style format for printing doubles may be used,
i.e. it is
%
[-
∣+
∣
∣#
]*[0−9]*(\ .[0−9]+)?[feEgG
], where -
forces left adjustment, +
forces a sign in front of the value, a blank puts a space instead of a
plus sign in front of the value, a hash sign (#
) forces a radix
character (i.e, a dot in general; besides, trailing zeroes are not
removed in the case of g
and G
conversions); the
following optional decimal number specifies the minimum field width
(with the left adjustment flag -
, the number is padded on the
right if it is not large enough); if this is followed by a dot, and a
decimal number, this specifies the number of digits to appear after
the radix character for e
and f
conversions, the maximum
number of significant digits for the g
conversion. The possible
conversion characters are: f
prints in the format [ ] ddd.ddd
with 6 digits by default (this can be modified by precision
specifications; 0 says not to output the radix character); e
or
E
print in the format [ ] d.dde[+ ]dd (or with E instead of e,
respectively), with one digit before the radix character, and 6 digits
by default; g
or G
choose between f
and e
(resp. E
) conversions: the latter is chosen if the exponent is
less than −4 or greater than or equal to the precision (by default,
6).
The default is "%G"
.
-
it : unit
is a special variable containing the last
expression evaluated at
toplevel. It is implicitly redefined every time
an expression (not a declaration) is input at
toplevel. Indeed, an expression e at
toplevel is conceptually equivalent to writing
the declaration:
val it =
e
features : |[OS : string,
reftyping : string,
continuations : string set,
numbers : string set,
structures : string set,
profiling : string set,
debugging : string set,
floatformat : string,
version : int * int * string * int,
recompiled : string,
started : string,
maintenance : string,
creator : string]|
(may be extended in future versions) is a description of the
features in the implementation and the
session. Its use is informative, but it may also serve for
conditional compilation. The fields currently defined are:
-
OS
defines the operating system
for which this HimML compiler was compiled; it may be
"Mac"
, "Amiga"
, "Unix System V"
or
"Unix BSD"
for now. Currently, The Mac port is lagging
behind, due to the demise of my old Mac II. A port for
PCs should be possible, though there seems to be some
major hurdles to overcome (mainly because of segmenting schemes).
reftyping
defines the scheme used to handle types of
imperative constructs in HimML. Its
value may be "imperative tyvars"
, meaning the Standard
ML typing scheme [6], "weak tyvars"
, meaning the
Standard ML of New Jersey
typing scheme with weak type variables (assumed throughout this
document), or "effect tyvars"
, meaning typing with more precise effect
descriptions[14], or yet another, not yet
implemented.
Frankly, the latter was never fully implemented, and while the
weak tyvars discipline remained the default for years, and was
buggy and not really required. Today, HimML uses the imperative
tyvars discipline.
continuations
defines the style of continuation reifying
mechanism. It is a set of strings, which may contain
"callcc"
(callcc
reifies the whole functional state
of evaluation, i.e. the whole stack) and "catch"
(catch
does the same as callcc
, but the continuation
it reifies is valid only
while we are still evaluating the catch
call).
numbers
is a set of strings defining the styles of numerical
objects we may handle: it may contain "complex floating point"
, which defines complex floating
point values with attached
dimensions and scales, and also "big rationals"
which
defines infinite precision rational numbers. Only the first of these options has been implemented.
structures
defines the structuring mechanism used in HimML
(modules). Its value is a
set of strings that may contain "Standard ML structures"
, indicating the structure system
described in [6] and [9], and "separate compilation a la CaML"
, indicating we have a separate compilation
module system a la CaML. The first (not yet implemented) provides
a sound way of structuting name spaces for types and values. The
second only provides separate compilation facilities. It is
described in Section 5.6.
profiling
defines the profiling
possibilities. It is a set that may contain:
-
"call counts"
, meaning that the number of times each function
is called is tallied,
"proper times"
, meaning that a statistical estimation of the
time spent in each function (excluding any other profiled function it
has called) is computed,
"total times"
(same, but including time spent in every called
function),
"proper allocations"
and "total allocations"
. The
last two options mean that memory consumption is recorded, as well, but
have not been implemented yet. See Section 5.4 for a
detailed description of the profiler.
debugging
defines the debugging
support. It may contain:
-
"standard"
, which means we can set
breakpoints and read values at breakpoints, and we
can consult the call stack;
- and
"replaying debugger"
, if the
debugger may reverse execution up to
specified control points to show the origins of a bug. Only the last
option has not been implemented in HimML. (See Section 5.3
for a description of the debugger.)
floatformat
defines
the format of floating point
numbers. This is important since certain numerical features are
not provided in certain modes, and since the management of
rounding may differ with a format or another. The
floatformat
may be "IEEE 754"
or
"unknown"
.
version
defines the current version
of HimML. It is composed of:
-
a major version (currently 1), incremented each time a
fundamental decision is made by the creator of the language which
affects deeply the way we program in it (deletions or modifications
of values or constructions in the language may qualify if they are
important enough)
- a minor version (currently 0), incremented for any minor
revision (addition of functionalities qualify; deletion or
modification of minor functions qualify too)
- the code status, a string which may be:
-
"alpha"
- , if only the creator (or close friends)
have access to the implementation;
- "beta"
- , if released for beta-testing to a group
of people;
- "gamma"
- , if deemed enough stable and bug-free to
be used by final users;
- "release"
- , if distributable.
- the revision (currently 11), incremented each time a bug
is corrected or something is made better inside the language,
without changing its overall behavior.
recompiled
is the last date of recompilation of
this revision of HimML.
started
is the date the current session was started. This may be used inside a HimML program to see if it
has been restarted from disk or if it has just been transferred to
another session.
maintenance
is a description of the person or service
the user should contact if there is any problem with
HimML.
creator
is the name of the creator
of the language, that is, "Jean Goubault"
.
times : unit -> time * time
computes the user time
(first component of the returned couple) and the system time
(second component). On Unix systems, this time takes its origin
at the moment the current HimML process was launched. On other
systems, the origin may be the time at which the machine was last
booted. These times are in seconds, the default
scale of dimension time
, defined as:
dimension time(s)
force : 'a promise -> 'a
forces
evaluation of a promise (created with the
delay
construct), and returns the result. A delayed
expression is evaluated at most once: the result
of forcing is memoized.
callcc : ('1a cont -> '1a) -> '1a
reifies the current continuation,
and passes it to the function in argument. This continuation
includes a copy of the current stack of exception
handlers, so that triggering a
continuation reinstalls the exception handlers in use when
reifying the continuation: this
is the reason why imperative types are used.
The extent of the continuation is indefinite, i.e. it can be thrown at
any time, even after the call to callcc
is completed.
catch : ('1a cont -> '1a) -> '1a
reifies the current continuation,
and passes it to the function in argument. This continuation
includes a copy of the current stack of exception
handlers, so that triggering a
continuations reinstalls the exception handlers in use when
reifying
the continuation: this
is the reason why imperative types are used.
Contrarily to callcc
, the extent of the continuation is not
indefinite, that is, throwing the reified continuation is valid only
until the call to catch
is completed. Afterwards, any attempt
to throw the continuation may raise the exception Catch
. (This
is not systematic, however.)
catch
is the basic mechanism for implementing
threads, a.k.a. coroutines,
i.e. separate computations sharing the same process space, and which
can be suspended or resumed at will. The current implementation
actually builds catch
as basically a thread mechanism, where
the process stack is split up in several different stack zones, called
thread caches, and thread switching is implemented as changing caches.
callcc
is implemented in the same way, except that thread
caches are copied back to the heap for possible future reuse. Thread
caches may also be copied to the heap to make room for a new thread
when there is no remaining available cache in catch
or
callcc
, and they are copied back from the heap when executing
throw
throw : 'a cont -> 'a -> 'b
reactivates a continuation, by
passing it the return value of the continuation.
gc : unit -> unit
forces
a garbage collection—not necessarily
a major one—to take place. This is usually not needed. It may
be used in alpha status to scramble the heap and see if it incurs
any nasty memory management bug in the HimML run-time. It may also
be used prior to doing some benchmarks, but as of yet, there is no
guarantee that the garbage collection will indeed collect every
free objects (this does not force a major collection).
major_gc : string -> unit
forces a major garbage collection. This can be used in theory to get back some memory
that you know can be freed right away, but it takes time. Its
main use is in conjunction with the -gctrace
<file-name>
command-line option: then each garbage collection
will write some statistics to <file-name>, which can be used
to detect space leaks.
Typically, when you wish to see how much memory is allocated or
freed in a call to some function f
, call major_gc "before f"
before calling f
, and major_gc "after f"
afterwards. Then consult <file-name>. There should be some
information on the status of memory before f
, beginning
with a line of the form:
==[before f]=========================
then sometime later another block of information on the status of
memory after the call to f
, beginning with a line of the
form:
==[after f]=========================
abort : unit -> 'a
aborts the current
computation, and returns to
toplevel without completing the
current toplevel declaration (actually, it
invokes the toplevel continuation).
This is the function that is internally used when syntax errors,
type errors or uncaught exceptions are triggered. Following the
definition of Standard ML, the current
toplevel declaration is abandoned, so that its
effect are nil, except possibly on the store.
quit : int -> 'a
aborts all computations, and returns to the operating
system with the return code in argument.
stamp_world : unit -> unit
internally records
a checksum and a copy of the current HimML environment. All
modules (see Section 5.6) are compiled in the latest
stamped copy of the HimML environment, and are saved to disk with
the corresponding checksum. This way, when a module is reloaded,
its checksum is compared to the checksums of all stamped
environments in memory, and an error is triggered if no
environment exists with the same checksum (meaning that we cannot
safely use the module, either because it needed a richer or
incompatible environment, or because the version of HimML has
changed too much). The environment is automatically stamped just
before HimML starts up.
system_less : ''a * ''a -> bool
is
the system order. It returns true
if the first argument is
less than the second in the system order. You may use this ordering as a total
ordering on all data of equality admitting types.
Another total ordering on equality admitting types is
fun system_less_2 (x,y) =
case {x,y} of {z,...} => z=x
but this is slower. It is also a different order.
inline_limit : int ref
is a reference to a limit
on the size of functions that the compiler will accept to inline.
Its default value is 30. (Units are unspecified, but are roughly
the numbers of basic operations and of variables in the function.)
In version 1.0 alpha 17, there is no compiler yet, but one that
produces C source code to feed to your favorite C compiler is in
the works.