mm(1) FreeBSD General Commands Manual mm(1)
NAME
MiniMunger -- Language for writing text-processing filters
SYNOPSIS
mm.munger <source-file>
DESCRIPTION
MiniMunger is a properly tail-recursive, compilable subset of Munger(1)
with first-class continuations, but without first-class lists, first-
class symbols, local side-effects, macros, "eval", "extend", nor runtime
error-checking. MM is specialized for, and limited to, writing filters.
This manual page describes only the differences between Munger and MM.
For more information, see the Munger manual page.
Some example programs are included in the source distribution:
grep.mm is an egrep-like filter.
fmt.mm is a fmt-like filter.
options.mm helps process command-line arguments.
stacks.mm provides functions to apply functions to the elements
of stacks.
cgi.mm provides functions to simplify CGI scripts.
IMPLEMENTATION NOTES
The MM compiler is a whole-program compiler, reading one input file, and
producing two output files of C code, which may then be compiled by the C
compiler and linked with the MM object file to produce an executable.
Instructions for using the compiler follow this section.
o MM does not support lists nor any list-related functions. Programs
are written as S-expressions, but programs may not create S-expres-
sions. The standard aggregate type of MM is the stack, a dynami-
cally-resizable, one-dimensional array.
o Side-effects are only permissable on globals.
o The first-class data types supported are: stacks, tables, records,
closures, continuations, compiled regular expressions, 8-bit-clean
strings and fixnums. A fixnum is 1 bit less in size than the size of
a C int on the hardware on which MM is running.
o The MM runtime does no type-checking for maximum execution speed. If
you call an intrinsic with an argument of the wrong type, your pro-
gram will crash.
o Although lambda-expressions can be bound to variables in "let" and
"letn" forms, there is no "labels" nor "letf" to allow temporary
functions to see their own bindings. Any function which calls itself
must have a toplevel binding. MM forces the programmer to break out
all but the most trivial helper functions, into separately-defined
functions.
o There are no looping constructs. All iteration is done via recur-
sion. CPS conversion is performed during compilation, turning all
calls into tailcalls. Despite this, tail-recursion will be more
space-efficient than recursion from non-tail positions, because the
CPS-converted code of functions which recurse in non-tail positions
will create closures to capture state. The stack won't grow, but the
heap will.
o First-class continuations are captured with the "call_cc" intrinsic
which behaves like "call/cc" does in Scheme.
o User-defined functions have fixed-size argument lists.
o User-defined macros are not supported.
COMPILING MM PROGRAMS
MM depends upon the libtre regular expression library, which must
installed before you can compile MM programs. If you are using the ports
system, libtree resides in textproc/libtre. The following example
assumes PREFIX in your build of libtre was /usr/local.
The MM compiler is written in Munger, and compiles MM code to an interme-
diate language, defined as a set of macros in the source of the C run-
time. Most of the macros expand to in-line code for speed, resulting in
larger executables than might be expected from the size of the original
MM programs. The compiler assumes munger is in /usr/local/bin. To
change this location, edit the shebang line at the top of mm.munger.
To compile a MM program the compiler must be invoked on the main source
file:
% ./mm.munger ./grep.mm
The compiler will take some time to perform source-to-source conversions
before it begins to emit code, printing status messages as it does so.
When it has finished, two files will have been created, one named "func-
tions.c" and one name "functions.h". To create an executable from these
files one must invoke the C compiler on them and the MM object files.
% make
% cc -o grep functions.c runtime.o -I/usr/local/include -L/usr/local/lib -ltre
The main source file of a program may include other source files with the
"include" directive. The "include" directive resembles its similarly-
named C preprocessor counterpart, and consists of the word "include" pre-
ceded by an octothorpe (#), and succeed by a double-quote delimited file-
name. For example:
#include "options.mm"
If the filename itself contains double quotes, they do not need to be
escaped. Include directives must start in column zero to be recognized.
Otherwise, they will be treated as comments. Included files themselves
may also "include" other source files.
DEBUGGING MM PROGRAMS
To debug MM programs with gdb, you must create a debugging copy of the
runtime. You can build the debugging runtime by invoking:
make debug
in the Munger source directory. The current compiler will not detect two
types of errors. The first type of error is the referencing of a vari-
able before it has been initialized with "setq". The reference will
cause a global variable to be automatically created and initialized to a
NULL pointer internally in your program. Attempting to access the NULL
pointer will crash your program, as the runtime engine performs no type-
checking for maximum speed. There are comments placed in the file func-
tions.c, output by the compiler, beside each variable reference, naming
the accessed variable, which will help the programmer find and fix errors
using gdb. The second type of error is the calling of a intrinsic func-
tion with the arguments of the wrong type. This error will also cause
your program to crash.
When debugging, the source displayed will consist mostly of the C macros
which define the intermediate language output by the compiler. The pro-
grammer may find it useful to run the C preprocessor over functions.c
first, separately, to generate an expanded source file, and then compile
that, in order to see the actual C being executed while tracing programs.
cc -E -o grep.c functions.c
cc -ggdb -O0 -o grep grep.c runtime.o -I/usr/local/include -L/usr/local/lib -ltre
THE INTRINSICS
The MM intrinsics bear strong resemblence to their similarly-named Munger
counterparts. Some behave differently. Some accept a differing number
of arguments. Some accept differing types of arguments. Some have dif-
ferent names. The differences, in all cases, however, are minor. This
summary does not completely document the operation of the intrinsic func-
tions, but merely lists which are available and how they differ from
their Munger counterparts. For complete documentation of an intrinsic,
see the Munger(1) manual page.
Control Flow / Side-Effects
The empty string and 0 are boolean false values. All other objects are
considered boolean true values. The forms below function identically to
their Munger counterparts, with the exception of the conditionals. Note
that "setq" is the only means of accomplishing side-effects on variables,
and that side-effects are only permissable upon globals.
When "if" is invoked with only a "true" subsequent clause, and the test
condition evaluates to a false value, 0 is returned, and not the value of
the failed test condition. Similarly, if all test clauses of an invoca-
tion of "cond" fail, then 0 is returned, rather than the value of the
last failed test condition. Both "when" and "unless" also return 0 if
their test conditions fail.
Form Use
setq (setq symbol expr)
if (if test expr1 expr2 ...)
cond (cond (test_expr subsequent ...)+)
when (when test expr ...)
unless (unless test expr ...)
progn (progn expr ...)
eq (eq expr1 expr2)
or (or expr ...)
and (and expr ...)
not (not expr)
let (let ((symbol expr)+) expr+)
letn (letn ((symbol expr)+) expr+)
exit (exit expr)
die (die ...)
call_cc is used to capture the current continuation. It functions
exactly as call/cc does in Scheme:
call_cc (call_cc monadic_function)
Regular Expressions
Note that "regexpp" in Munger is "regexp" in MM.
Intrinsic Use Return Value
regcomp (regcomp str) compiled rx
match (match rx str) 0 or stack of 2
fixnums
matches (matches rx str) stack of 20 strings
substitute (substitute rx rep str cnt) string
regexp (regexp expr) 0 or 1
Tables
Note that the "hash" and "unhash" intrinsics of Munger become "associate"
and "dissociate", and that both return the affected table.
Intrinsic Use Return Value
table (table) new table
tablep (tablep expr) 0 or 1
associate (associate table expr1 expr2) table
dissociate (dissociate table expr1) table
lookup (lookup table expr) associated expr
keys (keys table) stack of keys
values (values table) stack of values
Stacks
Note that the "unshift", "push", and "store", intrinsics all return the
affected stack instead of their second arguments.
The "exec_stack" and "join_stack" intrinsics both take a stack of strings
as argument.
"exec_stack" treats the first element as a program to be fed to the
execvp() system call, and the remaining elements as the arguments to that
program. Note that the MM runtime will automatically ensure that the
first element is also included in the argument list, in order to adhere
to the UNIX convention that the first argument to a program be the name
under which it was invoked.
"join_stack" joins a stack of strings together, treating the first ele-
ment of the stack as a delimiter to place in between each of the other
elements in the string being composed.
The "append" intrinsic appends one or more stacks into a single stack.
The function creates a new stack and fills it will all the members of all
its arguments, in order. The "substack" intrinsic returns a contiguous
subset of the elements of a stack, as a new stack. The first must evalu-
ate to a stack, while the second and third arguments must evaluate to
numbers specifying the range of indices to be included in the substack.
Intrinsic Use Return Value
stack (stack) new stack
shift (shift stack) item at index 0
unshift (unshift stack expr) stack
push (push stack expr) stack
pop (pop stack) item at topidx
assign (assign stack ...) stack
append (append stack ...) new stack
substack (substack stack expr expr) new stack
index (index stack expr) item at index expr
store (store stack fixnum expr) stack
clear (clear stack) stack
used (used stack) stored item count
sort_numbers (sort_numbers stack) stack (sorted in situ)
sort_strings (sort_strings stack) stack (sorted in situ)
topidx (topidx stack) index of top item
exec_stack (exec_stack stack) Does not return.
join_stack (join_stack stack) string.
stackp (stackp expr) 0 or 1
Records
Intrinsic Use Return Value
record (record n) new record of size n
setfield (setfield expr1 expr2 expr3) expr3
getfield (getfield expr1 expr2) item in pos expr2
Fixnums
Each of these functions accept only TWO arguments, unlike their Munger
counterparts. Note that "=" is actually a synonym for "eq".
Intrinsic Use Return Value
= (= expr1 expr2) 0 or 1
< (< expr1 expr2) 0 or 1
<= (<= expr1 expr2) 0 or 1
> (> expr1 expr2) 0 or 1
>= (>= expr1 expr2) 0 or 1
+ (+ expr1 expr2) sum
- (- expr1 expr2) difference
* (* expr1 expr2) product
% (% expr1 expr2) remainder
/ (/ expr1 expr2) quotient
abs (abs expr) absolute value
Note that "stringify" accepts only one argument, which must evalute to a
fixnum.
Intrinsic Use Return Value
stringify (stringify expr) string representation of expr
numberp (numberp expr) 0 or 1
char (char expr) one-character string
I/O
Theses are the general I/O functions. "print" and "println" are compiler
macros which insert code to call "display" for each of their arguments.
Note that both "getline" and "reachars" return the empty string, instead
of 0, upon encountering EOF. "display_error" and "newline_error" send
their output to the standard error.
Intrinsic Use Return Value
print (print expr ...) value of last expr
println (println expr ...) 1
display (display expr) value of expr
display_error (display_error expr) value of expr
newline (newline) 1
newline_error (newline_error) 1
die (die ...) does not return
warn (warn expr ...) value of last expr
getline (getline) string
readchars (readchars expr) string
These are the intrinsics redirecting the standard descriptors onto files
and processes. These functions return 1 upon success, or a string
describing an error condition.
Intrinsic Use
pipe (pipe desc program)
with_input_process (with_input_process program expr ...)
with_output_process (with_output_process program expr ...)
redirect (redirect desc file appending)
with_input_file (with_input_file file expr ...)
with_output_file (with_output_file file expr ...)
with_output_file_appending (with_output_file_appending file expr
...)
resume (resume desc)
System-Related
"random" returns a fixnum in the range of 0 to one less than its argu-
ment. "setenv" returns the value of the setenv system call, therefore 0
indicates success. Because fixnums are 31 bits in width on a 32-bit
machine, it is impossible to have the "time" intrinsic return a fixnum,
so it returns a string padded with leading zeroes until it occupies six-
teen characters. The "stat" intrinsic returns a five element stack, con-
taining all strings: owner name or uid, group name or uid, time of last
access, time of last modification, and size. The last three values are
all padded with leading zeros to become sixteen-character strings, so
that they may be compared with "strcmp" to "time" values and each other,
to determine which represents an earlier time.
Intrinsic Use Return Value
basename (basename path) string
dirname (dirname path) string
directory (directory expr) stack of filenames
symlink (symlink from to) 0 or error string
rename (rename from to) 0 or error string
remove (remove expr) 0 or error string
stat (stat expr) stack or error string
setenv (setenv str str) fixnum
getenv (getenv string) string or 0
system (system string) 0 or error code
exec (exec expr) does not return
fork (fork) same as fork(2)
time (time) string
date (date) string
random (random expr) fixnum
Command-Line Args
These function identically to their Munger counterparts.
Intrinsic Use Return Value
next (next) 0 or string
previous (previous) 0 or string
current (current) string
rewind (rewind) string
Strings
The ability of the "split" intrinsic in Munger to explode a string into a
list of one-character strings, is not present in the MM "split". The
"explode" intrinsic does this.
Intrinsic Use Return Value
chop (chop expr) string
chomp (chomp expr) string
length (length expr) fixnum
digitize (digitize expr) fixnum
code (code expr) fixnum
explode (explode expr) stack of strings
stringp (stringp expr) 0 or 1
join (join delim expr ...) string
split (split delims string) stack of strings
concat (concat expr1 expr2 ...) string
substring (substring string expr1 expr2) string
strcmp (strcmp expr1 expr2) fixnum
expand_tabs (expand_tabs expr1 string) string
AUTHORS
James Bailie <jimmy@mammothcheese.ca>
http://www.mammothcheese.ca
Mar 09, 2009