mm(1)                   FreeBSD General Commands Manual                  mm(1)

NAME
     MiniMunger -- Language for writing text-processing filters

SYNOPSIS
     mm.munger <source-file>

DESCRIPTION
     MiniMunger is a properly tail-recursive, compilable subset of Munger(1)
     with first-class continuations, but without first-class lists, first-
     class symbols, local side-effects, macros, "eval", "extend", nor runtime
     error-checking.  MM is specialized for, and limited to, writing filters.
     This manual page describes only the differences between Munger and MM.
     For more information, see the Munger manual page.

     Some example programs are included in the source distribution:

     grep.mm           is an egrep-like filter.

     fmt.mm            is a fmt-like filter.

     options.mm        helps process command-line arguments.

     stacks.mm         provides functions to apply functions to the elements
                       of stacks.

     cgi.mm            provides functions to simplify CGI scripts.

IMPLEMENTATION NOTES
     The MM compiler is a whole-program compiler, reading one input file, and
     producing two output files of C code, which may then be compiled by the C
     compiler and linked with the MM object file to produce an executable.
     Instructions for using the compiler follow this section.

     o   MM does not support lists nor any list-related functions.  Programs
         are written as S-expressions, but programs may not create S-expres-
         sions.  The standard aggregate type of MM is the stack, a dynami-
         cally-resizable, one-dimensional array.

     o   Side-effects are only permissable on globals.

     o   The first-class data types supported are:  stacks, tables, records,
         closures, continuations, compiled regular expressions, 8-bit-clean
         strings and fixnums.  A fixnum is 1 bit less in size than the size of
         a C int on the hardware on which MM is running.

     o   The MM runtime does no type-checking for maximum execution speed.  If
         you call an intrinsic with an argument of the wrong type, your pro-
         gram will crash.

     o   Although lambda-expressions can be bound to variables in "let" and
         "letn" forms, there is no "labels" nor "letf" to allow temporary
         functions to see their own bindings.  Any function which calls itself
         must have a toplevel binding.  MM forces the programmer to break out
         all but the most trivial helper functions, into separately-defined
         functions.

     o   There are no looping constructs.  All iteration is done via recur-
         sion.  CPS conversion is performed during compilation, turning all
         calls into tailcalls.  Despite this, tail-recursion will be more
         space-efficient than recursion from non-tail positions, because the
         CPS-converted code of functions which recurse in non-tail positions
         will create closures to capture state.  The stack won't grow, but the
         heap will.

     o   First-class continuations are captured with the "call_cc" intrinsic
         which behaves like "call/cc" does in Scheme.

     o   User-defined functions have fixed-size argument lists.

     o   User-defined macros are not supported.

COMPILING MM PROGRAMS
     MM depends upon the libtre regular expression library, which must
     installed before you can compile MM programs.  If you are using the ports
     system, libtree resides in textproc/libtre.  The following example
     assumes PREFIX in your build of libtre was /usr/local.

     The MM compiler is written in Munger, and compiles MM code to an interme-
     diate language, defined as a set of macros in the source of the C run-
     time.  Most of the macros expand to in-line code for speed, resulting in
     larger executables than might be expected from the size of the original
     MM programs.  The compiler assumes munger is in /usr/local/bin.  To
     change this location, edit the shebang line at the top of mm.munger.

     To compile a MM program the compiler must be invoked on the main source
     file:

     % ./mm.munger ./grep.mm

     The compiler will take some time to perform source-to-source conversions
     before it begins to emit code, printing status messages as it does so.
     When it has finished, two files will have been created, one named "func-
     tions.c" and one name "functions.h".  To create an executable from these
     files one must invoke the C compiler on them and the MM object files.

     % make
     % cc -o grep functions.c runtime.o -I/usr/local/include -L/usr/local/lib -ltre

     The main source file of a program may include other source files with the
     "include" directive.  The "include" directive resembles its similarly-
     named C preprocessor counterpart, and consists of the word "include" pre-
     ceded by an octothorpe (#), and succeed by a double-quote delimited file-
     name.  For example:

     #include "options.mm"

     If the filename itself contains double quotes, they do not need to be
     escaped.  Include directives must start in column zero to be recognized.
     Otherwise, they will be treated as comments.  Included files themselves
     may also "include" other source files.

DEBUGGING MM PROGRAMS
     To debug MM programs with gdb, you must create a debugging copy of the
     runtime.  You can build the debugging runtime by invoking:

     make debug

     in the Munger source directory.  The current compiler will not detect two
     types of errors.  The first type of error is the referencing of a vari-
     able before it has been initialized with "setq".  The reference will
     cause a global variable to be automatically created and initialized to a
     NULL pointer internally in your program.  Attempting to access the NULL
     pointer will crash your program, as the runtime engine performs no type-
     checking for maximum speed.  There are comments placed in the file func-
     tions.c, output by the compiler, beside each variable reference, naming
     the accessed variable, which will help the programmer find and fix errors
     using gdb.  The second type of error is the calling of a intrinsic func-
     tion with the arguments of the wrong type.  This error will also cause
     your program to crash.

     When debugging, the source displayed will consist mostly of the C macros
     which define the intermediate language output by the compiler.  The pro-
     grammer may find it useful to run the C preprocessor over functions.c
     first, separately, to generate an expanded source file, and then compile
     that, in order to see the actual C being executed while tracing programs.

     cc -E -o grep.c functions.c
     cc -ggdb -O0 -o grep grep.c runtime.o -I/usr/local/include -L/usr/local/lib -ltre

THE INTRINSICS
     The MM intrinsics bear strong resemblence to their similarly-named Munger
     counterparts.  Some behave differently.  Some accept a differing number
     of arguments.  Some accept differing types of arguments.  Some have dif-
     ferent names.  The differences, in all cases, however, are minor.  This
     summary does not completely document the operation of the intrinsic func-
     tions, but merely lists which are available and how they differ from
     their Munger counterparts.  For complete documentation of an intrinsic,
     see the Munger(1) manual page.

   Control Flow / Side-Effects
     The empty string and 0 are boolean false values.  All other objects are
     considered boolean true values.  The forms below function identically to
     their Munger counterparts, with the exception of the conditionals.  Note
     that "setq" is the only means of accomplishing side-effects on variables,
     and that side-effects are only permissable upon globals.

     When "if" is invoked with only a "true" subsequent clause, and the test
     condition evaluates to a false value, 0 is returned, and not the value of
     the failed test condition.  Similarly, if all test clauses of an invoca-
     tion of "cond" fail, then 0 is returned, rather than the value of the
     last failed test condition.  Both "when" and "unless" also return 0 if
     their test conditions fail.

         Form      Use
         setq      (setq symbol expr)
         if        (if test expr1 expr2 ...)
         cond      (cond (test_expr subsequent ...)+)
         when      (when test expr ...)
         unless    (unless test expr ...)
         progn     (progn expr ...)
         eq        (eq expr1 expr2)
         or        (or expr ...)
         and       (and expr ...)
         not       (not expr)
         let       (let ((symbol expr)+) expr+)
         letn      (letn ((symbol expr)+) expr+)
         exit      (exit expr)
         die       (die ...)

     call_cc is used to capture the current continuation.  It functions
     exactly as call/cc does in Scheme:

         call_cc  (call_cc monadic_function)

   Regular Expressions
     Note that "regexpp" in Munger is "regexp" in MM.

         Intrinsic     Use                              Return Value
         regcomp       (regcomp str)                    compiled rx
         match         (match rx str)                   0 or stack of 2
                                                                                   fixnums
         matches       (matches rx str)                 stack of 20 strings
         substitute    (substitute rx rep str cnt)      string
         regexp        (regexp expr)                    0 or 1

   Tables
     Note that the "hash" and "unhash" intrinsics of Munger become "associate"
     and "dissociate", and that both return the affected table.

         Intrinsic     Use                              Return Value
         table         (table)                          new table
         tablep        (tablep expr)                    0 or 1
         associate     (associate table expr1 expr2)    table
         dissociate    (dissociate table expr1)         table
         lookup        (lookup table expr)              associated expr
         keys          (keys table)                     stack of keys
         values        (values table)                   stack of values

   Stacks
     Note that the "unshift", "push", and "store", intrinsics all return the
     affected stack instead of their second arguments.

     The "exec_stack" and "join_stack" intrinsics both take a stack of strings
     as argument.

     "exec_stack" treats the first element as a program to be fed to the
     execvp() system call, and the remaining elements as the arguments to that
     program.  Note that the MM runtime will automatically ensure that the
     first element is also included in the argument list, in order to adhere
     to the UNIX convention that the first argument to a program be the name
     under which it was invoked.

     "join_stack" joins a stack of strings together, treating the first ele-
     ment of the stack as a delimiter to place in between each of the other
     elements in the string being composed.

     The "append" intrinsic appends one or more stacks into a single stack.
     The function creates a new stack and fills it will all the members of all
     its arguments, in order.  The "substack" intrinsic returns a contiguous
     subset of the elements of a stack, as a new stack.  The first must evalu-
     ate to a stack, while the second and third arguments must evaluate to
     numbers specifying the range of indices to be included in the substack.

         Intrinsic       Use                           Return Value
         stack           (stack)                       new stack
         shift           (shift stack)                 item at index 0
         unshift         (unshift stack expr)          stack
         push            (push stack expr)             stack
         pop             (pop stack)                   item at topidx
         assign          (assign stack ...)            stack
         append          (append stack ...)            new stack
         substack        (substack stack expr expr)    new stack
         index           (index stack expr)            item at index expr
         store           (store stack fixnum expr)     stack
         clear           (clear stack)                 stack
         used            (used stack)                  stored item count
         sort_numbers    (sort_numbers stack)          stack (sorted in situ)
         sort_strings    (sort_strings stack)          stack (sorted in situ)
         topidx          (topidx stack)                index of top item
         exec_stack      (exec_stack stack)            Does not return.
         join_stack      (join_stack stack)            string.
         stackp          (stackp expr)                 0 or 1

   Records
         Intrinsic    Use                             Return Value
         record       (record n)                      new record of size n
         setfield     (setfield expr1 expr2 expr3)    expr3
         getfield     (getfield expr1 expr2)          item in pos expr2

   Fixnums
     Each of these functions accept only TWO arguments, unlike their Munger
     counterparts.  Note that "=" is actually a synonym for "eq".

         Intrinsic    Use                 Return Value
         =            (= expr1 expr2)     0 or 1
         <            (< expr1 expr2)     0 or 1
         <=           (<= expr1 expr2)    0 or 1
         >            (> expr1 expr2)     0 or 1
         >=           (>= expr1 expr2)    0 or 1
         +            (+ expr1 expr2)     sum
         -            (- expr1 expr2)     difference
         *            (* expr1 expr2)     product
         %            (% expr1 expr2)     remainder
         /            (/ expr1 expr2)     quotient
         abs          (abs expr)          absolute value

     Note that "stringify" accepts only one argument, which must evalute to a
     fixnum.

         Intrinsic    Use                 Return Value
         stringify    (stringify expr)    string representation of expr
         numberp      (numberp expr)      0 or 1
         char         (char expr)         one-character string

   I/O
     Theses are the general I/O functions.  "print" and "println" are compiler
     macros which insert code to call "display" for each of their arguments.
     Note that both "getline" and "reachars" return the empty string, instead
     of 0, upon encountering EOF.  "display_error" and "newline_error" send
     their output to the standard error.

         Intrinsic        Use                     Return Value
         print            (print expr ...)        value of last expr
         println          (println expr ...)      1
         display          (display expr)          value of expr
         display_error    (display_error expr)    value of expr
         newline          (newline)               1
         newline_error    (newline_error)         1
         die              (die ...)               does not return
         warn             (warn expr ...)         value of last expr
         getline          (getline)               string
         readchars        (readchars expr)        string

     These are the intrinsics redirecting the standard descriptors onto files
     and processes.  These functions return 1 upon success, or a string
     describing an error condition.

         Intrinsic                     Use
         pipe                          (pipe desc program)
         with_input_process            (with_input_process program expr ...)
         with_output_process           (with_output_process program expr ...)
         redirect                      (redirect desc file appending)
         with_input_file               (with_input_file file expr ...)
         with_output_file              (with_output_file file expr ...)
         with_output_file_appending    (with_output_file_appending file expr
                                                                                     ...)
         resume                        (resume desc)

   System-Related
     "random" returns a fixnum in the range of 0 to one less than its argu-
     ment.  "setenv" returns the value of the setenv system call, therefore 0
     indicates success.  Because fixnums are 31 bits in width on a 32-bit
     machine, it is impossible to have the "time" intrinsic return a fixnum,
     so it returns a string padded with leading zeroes until it occupies six-
     teen characters.  The "stat" intrinsic returns a five element stack, con-
     taining all strings: owner name or uid, group name or uid, time of last
     access, time of last modification, and size.  The last three values are
     all padded with leading zeros to become sixteen-character strings, so
     that they may be compared with "strcmp" to "time" values and each other,
     to determine which represents an earlier time.

         Intrinsic    Use                 Return Value
         basename     (basename path)     string
         dirname      (dirname path)      string
         directory    (directory expr)    stack of filenames
         symlink      (symlink from to)   0 or error string
         rename       (rename from to)    0 or error string
         remove       (remove expr)       0 or error string
         stat         (stat expr)         stack or error string
         setenv       (setenv str str)    fixnum
         getenv       (getenv string)     string or 0
         system       (system string)     0 or error code
         exec         (exec expr)         does not return
         fork         (fork)              same as fork(2)
         time         (time)              string
         date         (date)              string
         random       (random expr)       fixnum

   Command-Line Args
     These function identically to their Munger counterparts.

         Intrinsic   Use           Return Value
         next        (next)        0 or string
         previous    (previous)    0 or string
         current     (current)     string
         rewind      (rewind)      string

   Strings
     The ability of the "split" intrinsic in Munger to explode a string into a
     list of one-character strings, is not present in the MM "split".  The
     "explode" intrinsic does this.

         Intrinsic      Use                               Return Value
         chop           (chop expr)                       string
         chomp          (chomp expr)                      string
         length         (length expr)                     fixnum
         digitize       (digitize expr)                   fixnum
         code           (code expr)                       fixnum
         explode        (explode expr)                    stack of strings
         stringp        (stringp expr)                    0 or 1
         join           (join delim expr ...)             string
         split          (split delims string)             stack of strings
         concat         (concat expr1 expr2 ...)          string
         substring      (substring string expr1 expr2)    string
         strcmp         (strcmp expr1 expr2)              fixnum
         expand_tabs    (expand_tabs expr1 string)        string

AUTHORS
     James Bailie <jimmy@mammothcheese.ca>
     http://www.mammothcheese.ca

                                 Mar 09, 2009