12. Larceny's R5RS libraries

The procedures described in this chapter are nonstandard. Some are deprecated after being rendered obsolete by R7RS or R6RS standard libraries. Others still provide useful capabilities that the standard libraries don't.

12.1. Strings

Larceny provides Unicode strings with R6RS semantics.

The string-downcase and string-upcase procedures perform Unicode-compatible case folding, which can result in a string whose length is different from that of the original.

Larceny may still provide string-downcase! and string-upcase! procedures, but they are deprecated.

12.2. Bytevectors

A bytevector is a data structure that stores bytes — exact 8-bit unsigned integers. Bytevectors are useful in constructing system interfaces and other low-level programming. In Larceny, many bytevector-like structures — bignums, for example — are implemented in terms of a lower-level bytevector-like data type. The operations on generic bytevector-like structures are particularly fast but useful largely in code that manipulates Larceny's data representations.

The (rnrs bytevectors) library now provides a large set of procedures that, in Larceny, are defined using the procedures described below.

Integrable procedure make-bytevector

(make-bytevector length) => bytevector

(make-bytevector length fill) => bytevector

Returns a bytevector of the desired length. If no second argument is given, then the bytevector has not been initialized and most likely contains garbage.

Operations on bytevector structures

(bytevector? obj) => boolean

(bytevector-length bytevector) => integer

(bytevector-ref bytevector offset) => byte

(bytevector-set! bytevector offset byte) => unspecified

(bytevector-equal? bytevector1 bytevector2) => boolean

(bytevector-fill! bytevector byte) => unspecified

(bytevector-copy bytevector) => bytevector

These procedures do what you expect. All are integrable, except bytevector-equal? and bytevector-copy. The bytevector-equal? name is deprecated, since the R6RS calls it bytevector=?.

Operations on bytevector-like structures

(bytevector-like? obj) => boolean

(bytevector-like-length bytevector) => integer

(bytevector-like-ref bytevector offset) => byte

(bytevector-like-set! bytevector offset byte) => unspecified

(bytevector-like-equal? bytevector1 bytevector2) => boolean

(bytevector-like-copy bytevector) => bytevector

A bytevector-like structure is a low-level representation for indexed arrays of uninterpreted bytes. Bytevector-like structures are used to represent types such as bignums and flonums.

There is no way to construct a "generic" bytevector-like structure; use the constructors for specific bytevector-like types.

The bytevector-like operations operate on all bytevector-like structures. All are integrable, except bytevector-like-equal? and bytevector-like-copy. All are deprecated because they violate abstraction barriers and make your code representation-dependent; they are useful mainly to Larceny developers, who might otherwise be tempted to write some low-level operations in C or assembly language.

12.3. Vectors

Procedure vector-copy

(vector-copy vector) => vector

Returns a shallow copy of its argument.

Operations on vector-like structures

(vector-like? object) => boolean

(vector-like-length vector-like) => fixnum

(vector-like-ref vector-like k) => object

(vector-like-set! vector-like k object) => unspecified

A vector-like structure is a low-level representation for indexed arrays of Scheme objects. Vector-like structures are used to represent types such as vectors, records, symbols, and ports.

There is no way to construct a "generic" vector-like structure; use the constructors for specific data types.

The vector-like operations operate on all vector-like structures. All are integrable. All are deprecated because they violate abstraction barriers and make your code representation-dependent; they are useful mainly to Larceny developers, who might otherwise be tempted to write some low-level operations in C or assembly language.

12.4. Procedures

Operations on procedures

(make-procedure length) => procedure

(procedure-length procedure) => fixnum

(procedure-ref procedure offset) => object

(procedure-set! procedure offset object) => unspecified

These procedures operate on the representations of procedures and allow user programs to construct, inspect, and alter procedures.

Procedure procedure-copy

(procedure-copy procedure) => procedure

Returns a shallow copy of the procedure.

Warning

The procedures above are deprecated because they violate abstraction barriers and make your code representation-dependent; they are useful mainly to Larceny developers, who might otherwise be tempted to write some low-level operations in C or assembly language.

The rest of this section describes some procedures that reach through abstraction barriers in a more controlled way to extract heuristic information from procedures for debugging purposes.

Note

The following text is copied from a straw proposal authored by Will Clinger and sent to rrr-authors on 09 May 1996. The text has been edited lightly. See the end for notes about the Larceny implementation.

The procedures that extract heuristic information from procedures are permitted to return any result whatsoever. If the type of a result is not among those listed below, then the result represents an implementation-dependent extension to this interface, which may safely be interpreted as though no information were available from the procedure. Otherwise the result is to be interpreted as described below.

Procedure procedure-arity

(procedure-arity proc)

Returns information about the arity of proc. If the result is #f, then no information is available. If the result is an exact non-negative integer k, then proc requires exactly k arguments. If the result is an inexact non-negative integer n, then proc requires n or more arguments. If the result is a pair, then it is a list of non-negative integers, each of which indicates a number of arguments that will be accepted by proc; the list is not necessarily exhaustive.

Procedure procedure-documentation-string

(procedure-documentation-string proc)

Returns general information about proc. If the result is #f, then no information is available. If the result is a string, then it is to be interpreted as a "documentation string" (see Common Lisp).

Procedure procedure-name

(procedure-name proc)

Returns information about the name of proc. If the result is #f, then no information is available. If the result is a symbol or string, then it represents a name. If the result is a pair, then it is a list of symbols and/or strings representing a path of names; the first element represents an outer name and the last element represents an inner name.

Procedure procedure-source-file

(procedure-source-file proc)

Returns information about the name of a file that contains the source code for proc. If the result is #f, then no information is available. If the result is a string, then the string is the name of a file.

Procedure procedure-source-position

(procedure-source-position proc)

Returns information about the position of the source code for proc whithin the source file specified by procedure-source-file. If the result is #f, then no information is available. If the result is an exact integer k, then k characters precede the opening parenthesis of the source code for proc within that source file.

Procedure procedure-expression

(procedure-expression proc)

Returns information about the source code for proc. If the result is #f, then no information is available. If the result is a pair, then it is a lambda expression in the traditional representation of a list.

Procedure procedure-environment

(procedure-environment proc)

Returns information about the environment of proc. If the result is #f, then no information is available. In any case the result may be passed to any of the environment inquiry functions.

Notes on the Larceny implementation

Twobit does not yet produce data for all of these functions, so some of them always return #f.

12.5. Pairs and Lists

The (rnrs lists) library now provides a set of procedures that may supersede some of the procedures described below. If one of Larceny's procedures duplicates the semantics of an R6RS procedure whose name is different, then Larceny's name is deprecated.

Procedure append!

(append! list1 list2 … obj) => object

append! destructively appends its arguments, which must be lists, and returns the resulting list. The last argument can be any object. The argument lists are appended by changing the cdr of the last pair of each argument except the last to point to the next argument.

Procedure every?

(every? procedure list1 list2 …) => object

every? applies procedure to each element tuple of list_s in first-to-last order, and returns #f as soon as _procedure returns #f. If procedure does not return #f for any element tuple of list_s, then the value returned by _procedure for the last element tuple of _list_s is returned.

Procedure last-pair

(last-pair list-structure) => pair

last-pair returns the last pair of the list structure, which must be a sequence of pairs linked through the cdr fields.

Procedure list-copy

(list-copy list-copy) => list

list-copy makes a shallow copy of the list and returns that copy.

Procedure remove

(remove key list) => list

Procedure remq

(remq key list) => list

Procedure remv

(remv key list) => list

Procedure remp

(remp pred? list) => list

Each of these procedures returns a new list which contains all the elements of list in the original order, except that those elements of the original list that were equal to key (or that satisfy pred?) are not in the new list. Remove uses equal? as the equivalence predicate; remq uses eq?, and remv uses eqv?.

Procedure remove!

(remove! key list) => list

Procedure remq!

(remq! key list) => list

Procedure remv!

(remv! key list) => list

Procedure remp!

(remp! pred? list) => list

These procedures are like remove, remq, remv, and remp, except they modify list instead of returning a fresh list.

Procedure reverse!

(reverse! list) => list

reverse! destructively reverses its argument and returns the reversed list.

Procedure some?

(some? procedure list1 list2 …) => object

some? applies procedure to each element tuple of list_s in first-to-last order, and returns the first non-false value returned by _procedure. If procedure does not return a true value for any element tuple of _list_s, then some? returns #f.

12.6. Sorting

The (rnrs sorting) library now provides a small set of procedures that supersede most of the procedures described below. All of the procedures described below are therefore deprecated.

Procedures sort and sort!

(sort list less?) => list

(sort vector less?) => vector

(sort! list less?) => list

(sort! vector less?) => vector

These procedures sort their argument (a list or a vector) according to the predicate less?, which must implement a total order on the elements in the data structures that are sorted.

sort returns a fresh data structure containing the sorted data; sort! sorts the data structure in-place.

12.7. Records

Note

Larceny's records have been extended to implement all SRFI 99 and R6RS procedures from

(srfi :99 records procedural)
(srfi :99 records inspection)
(rnrs records procedural)
(rnrs records inspection)

We recommend that Larceny programmers use the SRFI 99 APIs instead of the R6RS APIs. This should entail no loss of portability, since the standard reference implementation of SRFI 99 records should run efficiently in any implementation of the R7RS/R6RS that permits new libraries to defined at all.

Larceny now has two kinds of records: old-style and R7RS/R6RS/SRFI99/ERR5RS. Old-style records cannot be created in R6RS-conforming mode, so our extension of R6RS procedures to accept old-style records does not affect R6RS conformance.

Note

The following specification describes Larceny's old-style record API, which is now deprecated. It is based on a proposal posted by Pavel Curtis to rrrs-authors on 10 Sep 1989, and later re-posted by Norman Adams to comp.lang.scheme on 5 Feb 1992. The authorship and copyright status of the original text are unknown to me.

This document differs from the original proposal in that its record types are extensible, and that it specifies the type of record-type descriptors.

12.7.1. Specification

Procedure make-record-type

(make-record-type type-name field-names)

Returns a "record-type descriptor", a value representing a new data type, disjoint from all others. The type-name argument must be a string, but is only used for debugging purposes (such as the printed representation of a record of the new type). The field-names argument is a list of symbols naming the "fields" of a record of the new type. It is an error if the list contains any duplicates.

If the parent-rtd argument is provided, then the new type will be a subtype of the type represented by parent-rtd, and the field names of the new type will include all the field names of the parent type. It is an error if the complete list of field names contains any duplicates.

Record-type descriptors are themselves records. In particular, record-type descriptors have a field printer that is either #f or a procedure. If the value of the field is a procedure, then the procedure will be called to print records of the type represented by the record-type descriptor. The procedure must accept two arguments: the record object to be printed and an output port.

Procedure record-constructor

(record-constructor rtd)

Returns a procedure for constructing new members of the type represented by rtd. The returned procedure accepts exactly as many arguments as there are symbols in the given list, field-names; these are used, in order, as the initial values of those fields in a new record, which is returned by the constructor procedure. The values of any fields not named in that list are unspecified. The field-names argument defaults to the list of field-names in the call to make-record-type that created the type represented by rtd; if the field-names argument is provided, it is an error if it contains any duplicates or any symbols not in the default list.

Procedure record-predicate

(record-predicate rtd)

Returns a procedure for testing membership in the type represented by rtd. The returned procedure accepts exactly one argument and returns a true value if the argument is a member of the indicated record type or one of its subtypes; it returns a false value otherwise.

Procedure record-accessor

(record-accessor rtd field-name)

Returns a procedure for reading the value of a particular field of a member of the type represented by rtd. The returned procedure accepts exactly one argument which must be a record of the appropriate type; it returns the current value of the field named by the symbol field-name in that record. The symbol field-name must be a member of the list of field-names in the call to make-record-type that created the type represented by rtd, or a member of the field-names of the parent type of the type represented by rtd.

Procedure record-updater

(record-updater rtd field-name)

Returns a procedure for writing the value of a particular field of a member of the type represented by rtd. The returned procedure accepts exactly two arguments: first, a record of the appropriate type, and second, an arbitrary Scheme value; it modifies the field named by the symbol field-name in that record to contain the given value. The returned value of the updater procedure is unspecified. The symbol field-name must be a member of the list of field-names in the call to make-record-type that created the type represented by rtd, or a member of the field-names of the parent type of the type represented by rtd.

(record? obj)

Returns a true value if obj is a record of any type and a false value otherwise. Note that record? may be true of any Scheme value; of course, if it returns true for some particular value, then record-type-descriptor is applicable to that value and returns an appropriate descriptor.

Procedure record-type-descriptor

(record-type-descriptor record)

Returns a record-type descriptor representing the type of the given record. That is, for example, if the returned descriptor were passed to record-predicate, the resulting predicate would return a true value when passed the given record. Note that it is not necessarily the case that the returned descriptor is the one that was passed to record-constructor in the call that created the constructor procedure that created the given record.

Procedure record-type-name

(record-type-name rtd)

Returns the type-name associated with the type represented by rtd. The returned value is eqv? to the type-name argument given in the call to make-record-type that created the type represented by rtd.

Procedure record-type-field-names

(record-type-field-names rtd)

Returns a list of the symbols naming the fields in members of the type represented by rtd.

Procedure record-type-parent

(record-type-parent rtd)

Returns a record-type descriptor for the parent type of the type represented by rtd, if that type has a parent type, or a false value otherwise. The type represented by rtd has a parent type if the call to make-record-type that created rtd provided the parent-rtd argument.

Procedure record-type-extends?

(record-type-extends? rtd1 rtd2)

Returns a true value if the type represented by rtd1 is a subtype of the type represented by rtd2 and a false value otherwise. A type s is a subtype of a type t if s=t or if the parent type of s, if it exists, is a subtype of t.

12.7.2. Implementation

The R6RS spouts some tendentious nonsense about procedural records being slower than syntactic records, but this is not true of Larceny's records, and is unlikely to be true of other implementations either. Larceny's procedural records are fairly efficient already, and will become even more efficient in future versions as interlibrary optimizations are added.

12.8. Input, Output, and Files

The (scheme base), (scheme file), (rnrs io ports), and (rnrs files) libraries now provide a set of procedures that may supersede some of the procedures described below. If one of Larceny's procedures duplicates the semantics of an R7RS or R6RS procedure whose name is different, then Larceny's name is deprecated.

Procedure close-open-files

(close-open-files ) => unspecified

Closes all open files.

Procedure console-input-port

(console-input-port ) => input-port

Returns a character input port such that no read from the port has signalled an error or returned the end-of-file object.

Rationale: console-input-port and console-output-port are artifacts of Unix interactive I/O conventions, where an interactive end-of-file does not mean "quit" but rather "done here". Under these conventions the console port should be reset following an end-of-file. Resetting conflicts with the semantics of ports in Scheme, so console-input-port and console-output-port return a new port if the current port is already at end-of-file.

Since it is convenient to handle errors in the same manner as end-of-file, these procedures also return a new port if an error has been signalled during an I/O operation on the port.

Console-input-port and console-output-port simply call the port generators installed in the parameters console-input-port-factory and console-output-port-factory, which allow user programs to install their own console port generators.

Procedure console-output-port

(console-output-port ) => output-port

Returns a character output port such that no write to the port has signalled an error.

See console-input-port for a full explanation.

Parameter console-input-port-factory

The value of this parameter is a procedure that returns a character input port such that no read from the port has signalled an error or returned the end-of-file object.

See console-input-port for a full explanation.

Parameter console-output-port-factory

The value of this parameter is a procedure that returns a character output port such that no write the port has signalled an error.

See console-input-port for a full explanation.

Parameter current-input-port

The value of this parameter is a character input port.

Parameter current-output-port

The value of this parameter is a character output port.

Procedure delete-file

(delete-file filename) => unspecified

Deletes the named file. No error is signalled if the file does not exist.

Procedure eof-object

(eof-object ) => end-of-file object

Eof-object returns an end-of-file object.

Procedure file-exists?

(file-exists? filename) => boolean

File-exists? returns #t if the named file exists at the time the procedure is called.

Procedure file-modification-time

(file-modification-time filename) => vector or #f

File-modification-time returns the time of last modification of the file as a vector, or #f if the file does not exist. The vector has six elements: year, month, day, hour, minute, second, all of which are exact nonnegative integers. The time returned is relative to the local timezone.

(file-modification-time "larceny") => #(1997 2 6 12 51 13)

(file-modification-time "geekdom") => #f

Procedure flush-output-port

(flush-output-port ) => unspecified

(flush-output-port port) => unspecified

Write any buffered data in the port to the underlying output medium.

Procedure get-output-string

(get-output-string string-output-port) => string

Retrieve the output string from the given string output port.

Procedure open-input-string

(open-input-string string) => input-port

Creates an input port that reads from string. The string may be shared with the caller. A string input port does not need to be closed, although closing it will prevent further reads from it.

Procedure open-output-string

(open-output-string ) => output-port

Creates an output port where any output is written to a string. The accumulated string can be retrieved with get-output-string at any time.

Procedure port?

(port? object) => boolean

Tests whether its argument is a port.

Procedure port-name

(port-name port) => string

Returns the name associated with the port; for file ports, this is the file name.

Procedure port-position

(port-position port) => fixnum

Returns the number of characters that have been read from or written to the port.

Procedure rename-file

(rename-file from to) => unspecified

Renames the file from and gives it the name to. No error is signalled if from does not exist or to exists.

Procedure reset-output-string

(reset-output-string port) => unspecified

Given a port created with open-output-string, deletes from the port all the characters that have been output so far.

Procedure with-input-from-port

(with-input-from-port input-port thunk) => object

Calls thunk with current input bound to input-port in the dynamic extent of thunk. Returns whatever value was returned from thunk.

Procedure with-output-to-port

(with-output-to-port output-port thunk) => object

Calls thunk with current output bound to output-port in the dynamic extent of thunk. Returns whatever value was returned from thunk.

12.9. Operating System Interface

Procedure command-line-arguments

(command-line-arguments ) => vector

Returns a vector of strings: the arguments supplied to the program by the user or the operating system.

Procedure dump-heap

(dump-heap filename procedure) => unspecified

Dump a heap image to the named file that will start up with the supplied procedure. Before procedure is called, command line arguments will be parsed and any init procedures registered with add-init-procedure! will be called.

Note: Currently, heap dumping is only available with the stop-and-copy collector (-stopcopy command line option), although the heap image can be used with all the other collectors.

Procedure dump-interactive-heap

(dump-interactive-heap filename) => unspecified

Dump a heap image to the named file that will start up with the standard read-eval-print loop. Before the read-eval-print loop is called, command line arguments will be parsed and any init procedures registered with add-init-procedure! will be called.

Note: Currently, heap dumping is only available with the stop-and-copy collector (-stopcopy command line option), although the heap image can be used with all the other collectors.

Procedure getenv

(getenv key) => string or #f

Returns the operating system environment mapping for the string key, or #f if there is no mapping for key.

Note

This is now a synonym for the get-environment-variable exported by the (scheme process-context) library.

Procedure setenv

(setenv key val) => unspecified

Sets the operating system environment mapping for the string key to val.

Procedure system

(system command) => status

Send the command to the operating system's command processor and return the command's exit status, if any. On Unix, command is a string and status is an exact integer.

12.10. Fixnum primitives

Fixnums are small exact integers that are likely to be represented without heap allocation. Larceny never represents a number that can be represented as a fixnum any other way, so programs that can use fixnums will do so automatically. However, operations that work only on fixnums can sometimes be substantially faster than generic operations, and the following primitives are provided for use in those programs that need especially good performance.

The (rnrs arithmetic fixnums) library now provides a large set of procedures, some of them similar to the procedures described below. If one of Larceny's procedures duplicates the semantics of an R6RS procedure whose name is different, then Larceny's name is deprecated within R7RS/R6RS code.

All arguments to the following procedures must be fixnums.

Procedure fixnum?

(fixnum? obj) => boolean

Returns #t if its argument is a fixnum, and #f otherwise.

Procedure fx+

(fx+ fix1 fix2) => fixnum

Returns the fixnum sum of its arguments. If the result is not representable as a fixnum, then an error is signalled (unless error checking has been disabled).

Procedure fx-

Returns the fixnum difference of its arguments. If the result is not representable as a fixnum, then an error is signalled.

Procedure fx—

(fx— fix1) => fixnum

Returns the fixnum negative of its argument. If the result is not representable as a fixnum, then an error is signalled.

Procedure fx*

(fx* fix1 fix2) => fixnum

Returns the fixnum product of its arguments. If the result is not representable as a fixnum, then an error is signalled.

Procedure fx=

(fx= fix1 fix2) => boolean

Returns #t if its arguments are equal, and #f otherwise.

Procedure fx<

(fx< fix1 fix2) => boolean

Returns #t if fix1 is less than fix2, and #f otherwise.

Procedure fx<=

(fx<= fix1 fix2) => boolean

Returns #t if fix1 is less than or equal to fix2, and #f otherwise.

Procedure fx>

(fx> fix1 fix2) => boolean

Returns #t if fix1 is greater than fix2, and #f otherwise.

Procedure fx>=

(fx>= fix1 fix2) => boolean

Returns #t if fix1 is greater than or equal to fix2, and #f otherwise.

Procedure fxnegative?

(fxnegative? fix) => boolean

Returns #t if its argument is less than zero, and #f otherwise.

Procedure fxpositive?

(fxpositive? fix) => boolean

Returns #t if its argument is greater than zero, and #f otherwise.

Procedure fxzero?

(fxzero? fix) => boolean

Returns #t if its argument is zero, and #f otherwise.

Procedure fxlogand

(fxlogand fix1 fix2) => fixnum

Returns the bitwise and of its arguments.

Procedure fxlogior

(fxlogior fix1 fix2) => fixnum

Returns the bitwise inclusive or of its arguments.

Procedure fxlognot

(fxlognot fix) => fixnum

Returns the bitwise not of its argument.

Procedure fxlogxor

(fxlogxor fix1 fix2) => fixnum

Returns the bitwise exclusive or of its arguments.

Procedure fxlsh

(fxlsh fix1 fix2) => fixnum

Returns fix1 shifted left fix2 places, shifting in zero bits at the low end. If the shift count exceeds the number of bits in the machine's word size, then the results are machine-dependent.

Procedure most-positive-fixnum

(most-positive-fixnum ) => fixnum

Returns the largest representable positive fixnum.

Procedure most-negative-fixnum

(most-negative-fixnum ) => fixnum

Returns the smallest representable negative fixnum.

Procedure fxrsha

(fxrsha fix1 fix2) => fixnum

Returns fix1 shifted right fix2 places, shifting in a copy of the sign bit at the left end. If the shift count exceeds the number of bits in the machine's word size, then the results are machine-dependent.

Procedure fxrshl

(fxrshl fix1 fix2) => fixnum

Returns fix1 shifted right fix2 places, shifting in zero bits at the high end. If the shift count exceeds the number of bits in the machine's word size, then the results are machine-dependent.

12.11. Numbers

Larceny has six representations for numbers: fixnums are small, exact integers; bignums are unlimited-precision exact integers; ratnums are exact rationals; flonums are inexact rationals; rectnums are exact complexes; and compnums are inexact complexes.

Number-representation predicates

(fixnum? obj) => boolean

(bignum? obj) => boolean

(ratnum? obj) => boolean

(flonum? obj) => boolean

(rectnum? obj) => boolean

(compnum? obj) => boolean

These predicates test whether an object is a number of a particular representation and return #t if so, #f if not.

Procedure random

(random limit) => exact integer

Returns a pseudorandom nonnegative exact integer in the range 0 through limit-1.

12.12. Hashtables and hash functions

Hashtables represent finite mappings from keys to values. If the hash function is a good one, then the value associated with a key may be looked up in constant time (on the average).

Note

R6RS hashtables are a big improvement over Larceny's traditional hash tables, and should be used instead of the API described below.

Note

To resolve a clash of names and semantics with the R6RS make-hashtable procedure, Larceny's traditional make-hashtable procedure has been renamed to make-oldstyle-hashtable.

12.12.1. Hash tables

Procedure make-oldstyle-hashtable

(make-oldstyle-hashtable hash-function bucket-searcher size) => hashtable

Returns a newly allocated mutable hash table using hash-function as the hash function and bucket-searcher, e.g. assq, assv, assoc, to search a bucket with size buckets at first, expanding the number of buckets as needed. The hash-function must accept a key and return a non-negative exact integer.

(make-oldstyle-hashtable hash-function bucket-searcher) => hashtable

Equivalent to (make-oldstyle-hashtable hash-function bucket-searcher n) for some value of n chosen by the implementation.

(make-oldstyle-hashtable hash-function) => hashtable

Equivalent to (make-oldstyle-hashtable hash-function assv).

(make-oldstyle-hashtable ) => hashtable

Equivalent to (make-oldstyle-hashtable object-hash assv).

Procedure hashtable-contains?

(hashtable-contains? hashtable key) => bool

Returns true iff the hashtable contains an entry for key.

Procedure hashtable-fetch

(hashtable-fetch hashtable key flag) => object

Returns the value associated with key in the hashtable if the hashtable contains key; otherwise returns flag.

Procedure hashtable-get

(hashtable-get hashtable key) => object

Equivalent to (hashtable-fetch #f).

Procedure hashtable-put!

(hashtable-put! hashtable key value) => unspecified

Changes the hashtable to associate key with value, replacing any existing association for key.

Procedure hashtable-remove!

(hashtable-remove! hashtable key) => unspecified

Removes any association for key within the hashtable.

Procedure hashtable-clear!

(hashtable-clear! hashtable) => unspecified

Removes all associations from the hashtable.

Procedure hashtable-size

(hashtable-size hashtable) => integer

Returns the number of keys contained within the hashtable.

Procedure hashtable-for-each

(hashtable-for-each procedure hashtable) => unspecified

The procedure must accept two arguments, a key and the value associated with that key. Calls the procedure once for each key-value association in hashtable. The order of these calls is indeterminate.

Procedure hashtable-map

(hashtable-map procedure hashtable)

The procedure must accept two arguments, a key and the value associated with that key. Calls the procedure once for each key-value association in hashtable, and returns a list of the results. The order of the calls is indeterminate.

Procedure hashtable-copy

(hashtable-copy hashtable) => hashtable

Returns a copy of the hashtable.

12.12.2. Hash functions

The hash values returned by these functions are nonnegative exact integer suitable as hash values for the hashtable functions.

Procedure equal-hash

(equal-hash object) => integer

Returns a hash value for object based on its contents.

Procedure object-hash

(object-hash object) => integer

Returns a hash value for object based on its identity.

Warning

This hash function performs extremely poorly on pairs, vectors, strings, and bytevectors, which are the objects with which it is mostly likely to be used. For efficient hashing on object identity, create the hashtable with make-eq-hashtable or make-eqv-hashtable of the (rnrs hashtables) library.

Procedure string-hash

(string-hash string) => fixnum

Returns a hash value for string based on its content.

Procedure symbol-hash

(symbol-hash symbol) => fixnum

Returns a hash value for symbol based on its print name. The symbol-hash is very fast, because the hash code is cached in the symbol data structure.

12.13. Parameters

Parameters are procedures that serve as containers for values.

When called with no arguments, a parameter returns its current value. The value of a parameter can be changed temporarily using the parameterize syntax described below.

The effect of passing arguments to a parameter is implementation-dependent. In Larceny, passing one argument to a parameter changes the current value of the parameter to the result of applying a converter procedure to that argument, as described by SRFI 39.

Procedure make-parameter

(make-parameter init) => procedure

(make-parameter init converter) => procedure

(make-parameter name init predicate) => procedure

Creates a parameter.

When make-parameter is called with one argument init, the parameter's initial value is init, and the parameter's converter will be the identity function.

When make-parameter is called with two arguments, converter must be a procedure that accepts one argument, and the parameter's initial value is the result of calling converter on init.

Larceny extends SRFI 39 and the R7RS specification of make-parameter by allowing it to be called with three arguments. The first argument, name, must be a symbol or string giving the print name of the parameter. The second argument, init, will be the initial value of the parameter. The third argument is a predicate from which Larceny constructs a converter procedure that acts like the identity function on arguments that satisfy the predicate but raises an exception on arguments that don't.

(make-parameter name init) => procedure

Larceny's parameter objects predate SRFI 39. For backward compatibility, Larceny's make-parameter will accept two arguments even if the second is not a procedure, provided the first argument is a symbol or string. In that special case, the two arguments will be treated as the name and init arguments to Larceny's three-argument version, with the predicate defaulting to the identity function. This extension is strongly deprecated.

Syntax parameterize

(parameterize ((parameter0 value0) …) expr0 expr1 …)

Parameterize temporarily overrides the values of a set of parameters while the expressions in the body of the parameterize expression are evaluated. (It is like fluid-let for parameters instead of variables.)

12.14. Property Lists

The property list of a symbol is an association list that is attached to that symbol. The association list maps properties, which are themselves symbols, to arbitrary values.

Procedure putprop

(putprop symbol property obj) => unspecified

If an association exists for property on the property list of symbol, then its value is replaced by the new value obj. Otherwise, a new association is added to the property list of symbol that associates property with obj.

Procedure getprop

(getprop symbol property) => obj

If an association exists for property on the property list of symbol, then its value is returned. Otherwise, #f is returned.

Procedure remprop

(remprop symbol property) => unspecified

If an association exists for property on the property list of symbol, then that association is removed. Otherwise, this is a no-op.

12.15. Symbols

Procedure gensym

(gensym string) => symbol

Gensym returns a new uninterned symbol, the name of which contains the given string.

Procedure oblist

(oblist ) => list

Oblist returns the list of interned symbols.

Procedure oblist-set!

(oblist-set! list) => unspecified

(oblist-set! list table-size) => unspecified

oblist-set! sets the list of interned symbols to those in the given list by clearing the symbol hash table and storing the symbols in list in the hash table. If the optional table-size is given, it is taken to be the desired size of the new symbol table.

See also: symbol-hash.

12.16. System Control and Performance Measurement

Procedure collect

(collect ) => unspecified

(collect generation) => unspecified

(collect generation method) => unspecified

Collect initiates a garbage collection. If the system has multiple generations, then the optional arguments are interpreted as follows. The generation is the generation to collect, where 0 is the youngest generation. The method determines how the collection is performed. If method is the symbol collect, then a full collection is performed in that generation, whatever that means — in a normal multi-generational copying collector, it means that all live objects in the generation's current semispace and all live objects from all younger generations are copied into the generation's other semispace. If method is the symbol promote, then live objects are promoted from younger generations into the target generation — in our example collector, that means that the objects are copied into the target generation's current semispace.

The default value for generation is 0, and the default value for method is collect.

Note that the collector's internal policy settings may cause it to perform a more major type of collection than the one requested; for example, an attempt to collect generation 2 could cause the collector to promote all live data into generation 3.

Procedure gc-counter

(gc-counter ) => fixnum

gc-counter returns the number of garbage collections performed since startup. On a 32-bit system, the counter wraps around every 1,073,741,824 collections.

gc-counter is a primitive and compiles to a single load instruction on the x86 and ARM.

Procedure major-gc-counter

(major-gc-counter ) => fixnum

major-gc-counter returns the number of major garbage collections performed since startup, where a major collection is defined as a collection that may change the address of objects that have already survived a previous collection. On a 32-bit system, the counter wraps around every 1,073,741,824 collections.

major-gc-counter is a primitive and compiles to a single load instruction on the x86 and ARM.

Note

Larceny uses gc-counter and major-gc-counter to implement efficient hashtables that hash on object identity by using an object's current address to compute its hash code. Hash tables that use this kind of hash function (notably make-eq-hashtable and make-eqv-hashtable) may have to rehash some of their keys after a garbage collection that relocates objects.

Procedure gcctl

(gcctl heap-number operation operand) => unspecified

[GCCTL is largely obsolete in the new garbage collector but may be resurrected in the future. It can still be used to control the non-predictive collector.]

gcctl controls garbage collection policy on a heap-wise basis. The heap-number is the heap to operate on, like for the command line switches: heap 1 is the youngest. If the given heap number does not correspond to a heap, gcctl fails silently.

The operation is a symbol that selects the operation to perform, and the operand is the operand to that operation, always a number. For the non-predictive garbage collector, the following operator/operand pairs are meaningful:

  • j-fixed, n: after a collection, the collector parameter j should be set to the value n, if possible. (Non-predictive heaps only.)
  • j-percent, n: after a collection, the collector parameter j should be set to be n percent of the number of free steps. (Non-predictive heaps only.)
  • incr-fixed, n: when growing the heap, the growing should be done in increments of n. In the non-predictive heap, n is the number of steps. In other heaps, n denotes kilobytes.
  • incr-percent, n: when growing the heap, the growing should be done in increments of n percent.

Example: if the non-predictive heap is heap number 2, then the expressions

(gcctl 2 'j-fixed 0)
(gcctl 2 'incr-fixed 1)

makes the non-predictive collector simulate a normal stop-and-copy collector (because j is always set to 0), and grows the heap only one step at a time as necessary. This may be useful for certain kinds of experiments.

Example: ditto, the expressions

(gcctl 2 'j-percent 50)
(gcctl 2 'incr-percent 20)

selects the default policy settings.

Note: The gcctl facility is experimental. A more developed facility will allow controlling heap contraction policy, as well as setting all the watermarks. Certainly one can envision other uses, too. Finally, it needs to be possible to get current values.

Note: Currently the non-predictive heap (np-sc-heap.c) and the standard stop-and-copy "old" heap (old-heap.c) are supported, but not the standard "young" heap (young-heap.c), nor the stop-and-copy collector (sc-heap.c).

Procedure sro

(sro pointer-tag type-tag limit) => vector

SRO ("standing room only") is a system primitive that traverses the entire heap and returns a vector that contains all live objects in the heap that satisfy the constraints imposed by its parameters:

  • If pointer-tag is -1, then object type is unconstrained; otherwise, the object type is constrained to have a pointer tag that matches pointer-tag. You can read all about pointer tags here, but the short story is that 1=pair, 3=vector-like, 5=bytevector-like, and 7=procedure-like.
  • If type-tag is -1, then object type is unconstrained by type-tag; otherwise, only objects with a matching type-tag are selected (after selection by pointer tag). Pairs don't have type-tags, but other objects do. You can read all about type-tags here.
  • Limit constrains the selected objects by the number of references. If limit is -1, then no constraints are imposed; otherwise, only objects (selected by pointer-tag and type-tag) with no more than limit references to them are selected.

For example, (sro -1 -1 -1) returns a vector that contains all live objects (not including the vector), and (sro 5 2 3) returns a vector containing all live flonums (bytevector-like, with typetag 2) that are referred to in no more than 3 places.

Procedure stats-dump-on

(stats-dump-on filename) => unspecified

Stats-dump-on turns on garbage collection statistics dumping. After each collection, a complete RTS statistics dump is appended to the file named by filename.

The file format and contents are documented in a banner written at the top of the output file. In addition, accessor procedures for the output structure are defined in the program Util/process-stats.sch.

Stats-dump-on does not perform an initial dump when the file is first opened; only at the first collection is the first set of statistics dumped. The user might therefore want to initiate a minor collection just after turning on dumping in order to have a baseline set of data.

Procedure stats-dump-off

(stats-dump-off ) => unspecified

Stats-dump-off turns off garbage collection statistics dumping (which was turned on with stats-dump-on). It does not dump a final set of statistics before closing the file; therefore, the user may wish to initiate a minor collection before calling this procedure.

Procedure system-features

(system-features ) => alist

System-features returns an association lists of system features. Most entries are self-explanatory. The following are a more subtle:

  • The value of architecture-name is Larceny's notion of the architecture for which it was compiled, not the architecture the program is currently running on. For example, the value of this feature is "Standard-C" if you're running Petit Larceny.
  • The value of heap-area-info is a vector of vectors, one subvector for each heap area in the running system. The subvector has four entries: the generation number, the area type, the current size, and additional information.

Procedure display-memstats

(display-memstats vector) => unspecified

(display-memstats vector minimal) => unspecified

(display-memstats vector minimal full) => unspecified

Display-memstats takes as its argument a vector as returned by memstats and displays the contents of the vector in human-readable form on the current output port. By default, not all of the values in the vector are displayed.

If the symbol minimal is passed as the second argument, then only a small number of statistics generally relevant to running benchmarks are displayed.

If the symbol full is passed as the second argument, then all statistics are displayed.

Procedure memstats

(memstats ) => vector

Memstats returns a freshly allocated vector containing run-time-system resource usage statistics. Many of these will make no sense whatsoever to you unless you also study the RTS sources. A listing of the contents of the vector is available here.

Procedure run-with-stats

(run-with-stats thunk) => obj

Run-with-stats evaluates thunk, then prints a short summary of run-time statistics, as with

(display-memstats ... 'minimal),

and then returns the result of evaluating thunk.

Procedure run-benchmark

(run-benchmark name k thunk ok?) => obj

Run-benchmark prints a short banner (including the identifying name) to identify the benchmark, then runs thunk k times, and finally tests the value returned from the last call to thunk by applying the predicate ok? to it. If the predicate returns true, then run-benchmark prints summary statistics, as with

([display-memstats][5] ... 'minimal).

If the predicate returns false, an error is signalled.

12.17. SRFI Support

SRFIs (Scheme Requests For Implementations) describe and implement additional Scheme libraries. The SRFI effort is open to anyone, and is described at http://srfi.schemers.org.

SRFIs are numbered. Importing SRFIs into an R7RS library or program is straightforward:

(import (srfi 19)
        (srfi 27))

The R6RS forbids numbers within library names, so R6RS libraries and programs must import SRFI libraries using the SRFI 97 naming convention in which a colon precedes the number:

(import (srfi :19)
        (srfi :27))

To test whether particular SRFIs are available, use the R7RS cond-expand feature:

(cond-expand
 ((and (library (srfi 19))
       (library (srfi 27)))
  (import (srfi 19))
  (import (srfi 27))))

cond-expand is not available to R6RS libraries or programs.

R5RS programs can use cond-expand as implemented by SRFI 0, "Feature-based conditional expansion construct". (SRFI 0 must be loaded into Larceny before it can be used; see below.) Larceny provides the following nonstandard key for use in SRFI 0:

    larceny

Larceny currently supports many SRFIs, though not as many as it should. Some SRFIs are built into Larceny's R5RS mode, but most must be loaded dynamically using Larceny's require procedure:

    > (require 'srfi-0)

The design documents for SRFI 0 and other SRFIs are available at http://srfi.schemers.org.

12.18. SLIB support

SLIB is a large collection of useful libraries that have been written or collected by Aubrey Jaffer.

Larceny supports SLIB via SRFI 96, but SLIB itself is not shipped with Larceny; it must be downloaded separately and then installed. For the most up-to-date information on installing and using SLIB with Larceny, see doc/HOWTO-SLIB.

12.19. Foreign-Function Interface to C

Larceny provides a general foreign-function interface (FFI) substrate on which other FFIs can be built; see Larceny Note #7. The FFI described in this manual section is a simple example of a derived FFI. It is not yet fully evolved, but it is useful.

Warning

This section has undergone signficant revision, but not all of the material has been properly vetted. Some of the information in this section may be out of date.

Note

Some of the text below is adapted from the 2008 Scheme Workshop paper, “The Layers of Larceny's Foreign Function Interface,” by Felix S Klock II. That paper may provide additional insight for those searching for implementation details and motivations.

12.19.1. Introducing the FFI

There are a number of different potential ways to use the FFI. One client may want to develop code in C and load it into Larceny. Another client may want to load native libraries provided by the host operating system, enabling invocation of foreign code from Scheme expressions without developing any C code or even running a C compiler. Larceny's FFI can be used for both of these cases, but many of its facilities target a third client in between the two extremes: a client with a C compiler and the header files and object code for the foreign libraries, but who wishes to avoid writing glue code in C to interface with the libraries.

There are four main steps to interacting with foreign code:

  1. identifying the space of values manipulated by the foreign code that will also be manipulated in Scheme,
  2. describing how to marshal values between foreign and Scheme code,
  3. loading library file(s) holding foreign object code, and
  4. linking procedures from the loaded library.

Step 1 is conceptual, while steps 2 through 4 yield artifacts in Scheme source code.

12.19.2. The space of foreign values

At the machine code level, foreign values are uninterpreted sequences of bits. Often foreign object code is oriented around manipulating word-sized bit-sequences (words) or arrays and tuples of words.

Many libraries are written with a particular interpretation of such values. In C code, explicit types are often used hints to guide such interpretation; for example, a 0 of type bool is usually interpreted as false, while a 1 (or other non-zero value) of type bool is usually interpreted as true. Another example are C enumerations (or enums). An enum declaration defines a set of named integral constants. After the C declaration:

enum months { JAN = 1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC };

a JAN in C code now denotes 1, FEB is 2, and so on. Furthermore, tools like debuggers may render a variable x dynamically assigned the value 2 (and of static type enum months) as FEB. Thus the enum declaration intoduces a new interpretation for a finite set of integers.

This leads to questions for a client of an FFI; we explore some below.

  • Should foreign words be passed over to the Scheme world as uninterpreted numbers (and thus be converted into Scheme integers, usually fixnums), or should they be marshaled into interpreted values, such as #f and #t for the bool type, or the Scheme symbols {JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC} for the enum months type?
  • Similarly, how should Scheme values be marshaled into foreign words?
  • A foreign library might leave the mapping of names like FEB to words like 2 unspecified in the library interface. That is, while the C compiler will know FEB maps to 2 according to a particular version of the library's header file, the library designer may intend to change this mapping in the future, and clients writing C code should only use the names to refer to a enum months value, and not integer expressions.

    • How should this constraint be handled in the FFI; should the library client revise their code in reaction to such changes to the mapping?
    • Or should the system derive the mapping from the header files, in the same manner that the C compiler does?
  • Foreign libraries often manipulate mutable entities, like arrays of words where modifications can be observed (often by design).

    • How should such values be marshaled?
    • Is it sound to copy such values to the Scheme heap? If so, is a shallow copy sufficient?
  • Will the foreign code hold references to heap-allocated objects? Heap-allocated objects that leak out to foreign memory must be treated with care; garbage collection presents two main problems.

    • First, such objects must not move during a garbage collection; Larceny supports this via special-purpose allocation routines: cons-nonrelocatable, make-nonrelocatable-bytevector, and make-nonrelocatable-vector.
    • Second, the garbage collector must know to hold on to (i.e. trace) such values as long as they are needed by foreign code; otherwise the objects or their referents may be collected without the knowledge of the foreign code.

Answering these questions may require deep knowledge of the intended usage of the foreign library.

The Larceny FFI attempts to ease interfacing with foreign code in the presence of the above concerns, but the nature of the header files included with most foreign libraries means that the FFI cannot infer the answers unassisted.

Note

Foreign C code developed to work in concert with Larceny could hypothetically be written to cope with holding handles for objects managed by the the garbage collector, but there is currently no significant support for this use-case.

Note

One class of foreign values is not addressed by the Larceny FFI: structures passed by value (as opposed to by reference, ie pointers to structures). There is no way to describe the interface to a foreign procedure that accepts or produces a C struct (at least not properly nor portably).

This tends to not matter for many foreign libraries (since many C programmers eschew passing structures by value), but it can arise.

If the foreign library of interest has procedures that accept or produce a C struct, we currently recommend either avoiding such procedures, or writing adapter code in C that marshals between values handled by the FFI and the C struct.

The conclusion is: when designing an interface to a foreign library, you should analyze the values manipulated on the foreign side and identify their relationship with values on the Scheme side. After you have identified the domains of interest, you then describe how the values will be marshaled back and forth between the two domains.

12.19.3. Marshalling via ffi-attributes

This section describes the marshalling protocol defined in lib/Base/std-ffi.sch.

Foreign functions automatically marshal their inputs and outputs according to type-descriptors attached to each foreign function.

Type-descriptors are S-expressons formed according to the following grammar:

TypeDesc ::= CoreAttr | ArrowT | MaybeT | OneOfT

CoreAttr ::= PrimAttr | VoidStar | ---

PrimAttr ::= CurrentPrimAttr | DeprecatedPrimAttr

CurrentPrimAttr
         ::= int | uint | byte | short | ushort | char | uchar
          |  long | ulong | longlong | ulonglong
          |  size_t | float | double |  bool | string | void

DeprecatedPrimAttr
         ::= unsigned | boxed

VoidStar ::= void* | ---

ArrowT   ::= (-> (TypeDesc ...) TypeDesc)

MaybeT   ::= (maybe TypeDesc)

OneOfT   ::= (oneof (Any Fixnum) ... TypeDesc)

where --- represents a user-extensible part of the grammar (see below), Any represents any Scheme value, and Fixnum represents any word-sized integer.

A central registry maps CoreAttr's to a foreign representation and two conversion routines: one to convert a Scheme value to a foreign argument, and another to convert a foreign result back back to a Scheme value. The denoted components are collectively referred to as a type within the FFI documentation. The registry is extensible; the ffi-add-attribute-core-entry! procedure adds new CoreAttr's to the registry, and one can alternatively add short-hands for type-descriptors via the ffi-add-alias-of-attribute-entry! procedure. Finally, one can add new VoidStar productions (subtypes of the void* type-descriptor) via the ffi-install-void*-subtype procedure (defined in the lib/Standard/foreign-stdlib.sch library).

12.19.3.1. Primitive Attribute Types

The following is a list of the accepted types and their conversions at the boundary between Scheme and foreign code:

int
Exact integer values in the range [-231,231-1]. Scheme integers in that range are converted to and from C "int".
uint
Exact integer values in the range [0,232-1]. Scheme integers in that ranges are converted to and from C "unsigned int".
byte
Synonymous with int in the current implementation.
short
Synonymous with int in the current implementation.
ushort
Synonymous with unsigned in the current implementation.
char
Scheme ASCII characters are converted to and from C "char".
uchar
Scheme ASCII characters are converted to and from C "unsigned char".
long
Synonymous with int in the current implementation.
ulong
Synonymous with unsigned in the current implementation.
longlong
Exact integer values in the range [-263,263-1]. Scheme integers in that range are converted to and from C "long long".
ulonglong
Exact integer values in the range [0,264-1]. Scheme integers in that range are converted to and from C "unsigned long long".
size_t
Synonymous with uint in the current implementation.
float
Scheme flonums are converted to and from C "float". The conversion to float is performed via a C (float) cast from a C double.
double
Scheme flonums are converted to and from C "double".
bool
Scheme objects are converted to C "int"; #f is converted to 0, and all other objects to 1. In the reverse direction, 0 is converted to #f and all other integers to #t.
string
A Scheme string holding ASCII characters is copied into a NUL-terminated bytevector, passing a pointer to its first byte to the foreign procedure; #f is converted to a C "(char*)0" value. In the reverse direction, a pointer to a NUL-terminated sequence of bytes interpreted as ASCII characters is copied into a freshly allocated Scheme string; a NULL pointer is converted to #f.
void
No return value. (Only used in return position for foreign functions; all Scheme procedures passed to the FFI are invoked in a context expecting one value.)
unsigned
Synonymous with uint; deprecated.
boxed
Any heap-allocated data structure (pair, bytevector-like, vector-like, procedure) is converted to a C "void*" to the first element of the structure. The value #f is also acceptable. It is converted to a C "(void*)0" value. (Only used in argument position for foreign functions; foreign functions are not expected to return direct references to heap-allocated values.)
12.19.3.2. Extending the Core Attribute Registry

The public interface to many foreign libraries is written in terms of types defined within that foreign library. One can introduce new types to the Larceny FFI by extending the core attribute entry table.

Procedure ffi-add-attribute-core-entry!

(ffi-add-attribute-core-entry! entry-name rep-sym marshal unmarshal) => unspecified

ffi-add-attribute-core-entry! extends the internal registry with the new entry specified by its arguments.

  • entry-name is a symbol (the symbolic type name being introduced to the ffi).
  • rep-name is a low-level type descriptor symbol, one of signed32, unsigned32, signed64, unsigned64 (representing varieties of fixed width integers), ieee32 (representing “floats”), ieee64 (representing “doubles”), or pointer (representing “(void*)” in C).
  • marshal is a marshaling function that accepts a Scheme object and a symbol (the name of the invoking procedure); it is responsible for checking the Scheme object's validity and then producing a corresponding instance of the low-level representation.
  • unmarshal is either #f or an unmarshalling function that accepts an instance of the low-level representation and produces a corresponding Scheme object.
12.19.3.3. Attribute Type Constructors

Core attributes suffice for linking to simple functions. Constructured FFI attributes express more complex marshaling protocols

Arrow Type Constructors. A structured FFI attribute of the form (-> (s_1s_n) s_r) (called an arrow type) allows passing functions from Scheme to C and back again. Each of the s_1, …, s_n, s_r is an FFI attribute. When an arrow type describes an input to a foreign function, it marshals a Scheme procedure to a C function pointer by generating glue code to hook the two together and marshal values as described by the FFI attributes within the arrow type. Likewise, when an arrow type describes an output from a foreign function, it marshals a C function pointer to a Scheme procedure, again by generating glue code. These two mappings naturally generalize to arbitrary nesting of arrow types, so one can create callbacks that consume callouts, return callouts that consume callbacks, and so on.

Warning

The current implementation of arrow types introduces an unnecessary space leak, because none of Larceny's current garbage collectors attempt to reclaim some of the structure allocated (in particular, the so-called trampolines) when functions are marshaled via arrow types.

The FFI could be revised to reduce the leak (e.g. it could keep a cache of generated trampolines and reuse them, but currently do not do so).

Many foreign libraries have a structure where one only sets up a fixed set of callbacks, and then all further computation does not require arrow type marshaling. This is one reason why fixing this problem has been a low priority item for the Larceny development team.

Maybe Type Constructor. (maybe t) captures the pattern of passing NULL in C and #f in Scheme to represent the absence of information. The FFI attribute t within the maybe type describes the typical information passed; the constructed maybe type marshals #f to the foreign null pointer or 0 (as appropriate), and otherwise applies the marshaling of t. Likewise, it unmarshals the foreign null pointer and 0 to #f, and otherwise applies the unmarshaling of t.

(There are a few other built-in type constructors, such as the oneof type constructor, but they are not as fully-developed as the two above, and are intended for use only for internal development for now.)

12.19.3.4. void* Type Hierarchies

Using the void* attribute wraps foreign addresses up in a Larceny record, so that standard numeric operations cannot be directly applied by accident. The FFI uses two features of Larceny's record system: the record type descriptor is a first class value with an inspectable name, and record types are extensible via single-inheritance.

Basic Operations on void*The FFI provides void*-rt, a record type descriptor with a single field (a wrapped address). There is also a family of functions for dereferencing the pointer within a void*-rt and manipulating the state it references.

Procedure void*->address

(void*->address x) => number

Extracts the underlying address held in a void*.

Procedure void*?

(void*? x) => boolean

Distinquishes void*'s from other Scheme values.

Procedure void*-byte-ref

(void*-byte-ref x idx) => number

Extracts byte at offset from address within x.

Procedure void*-byte-set!

(void*-byte-set! x idx val) => unspecified

Modifies byte at offset from address within x.

Procedure void*-word-ref

(void*-word-ref x idx) => number

Extracts word-sized integer at offset from address within x.

Procedure void*-word-set!

(void*-word-set! x idx val) => unspecified

Modifies word-sized integer at offset from address within x.

Procedure void*-void*-ref

(void*-void*-ref x idx) => void*

Extracts address (and wraps it in a void*) at offset from address within x.

Procedure void*-void*-set!

(void*-void*-set! x idx val) => unspecified

Modifies address at offset from address within x.

Procedure void*-double-ref

(void*-double-ref x idx) => number

Extracts 64-bit flonum at offset from address within x.

Procedure void*-double-set!

(void*-double-set! x idx val) => unspecified

Modifies 64-bit flonum at offset from address within x.

Type Hierarchies. Procedures for establishing type hierarchies are provided by the lib/Standard/foreign-stdlib.sch library; see ffi-install-void*-subtype and establish-void*-subhierarchy!.

12.19.4. Creating loadable modules

You must first compile your C code and create one or more loadable object modules. These object modules may then be loaded into Larceny, and Scheme foreign functions may link to specific functions in the loaded module. Defining foreign functions in Scheme is covered in a later section.

The method for creating a loadable object module varies from platform to platform. In the following, assume you have to C source files file1.c and file2.c that define functions that you want to make available as foreign functions in Larceny.

12.19.4.1. SunOS 4

Compile your source files and create a shared library. Using GCC, the command line might look like this:

gcc -fPIC -shared file1.c file2.c -o my-library.so

The command creates my-library.so in the current directory. This library can now be loaded into Larceny using foreign-file. Any other shared libraries used by your library files should also be loaded into Larceny using foreign-file before any procedures are linked using foreign-procedure.

By default, /lib/libc.so is made available to the dynamic linker and to the foreign function interface, so there is no need for you to load that library explicitly.

12.19.4.2. SunOS 5

Compile your source files and create a shared library, linking with all the necessary libraries. Using GCC, the command line might look like this:

gcc -fPIC -shared file1.c file2.c -lc -lm -lsocket -o my-library.so

Now you can use foreign-file to load my-library.so into Larceny.

By default, /lib/libc.so is made available to the foreign function interface, so there is no need for you to load that library explicitly.

12.19.5. The Interface

12.19.5.1. Procedures

Procedure foreign-file

(foreign-file filename) => unspecified

foreign-file loads the named object file into Larceny and makes it available for dynamic linking.

Larceny uses the operating system provided dynamic linker to do dynamic linking. The operation of the dynamic linker varies from platform to platform:

  • On some versions of SunOS 4, if the linker is given a file that does not exist, it will terminate the process. (Most likely this is a bug.) This means you should never call foreign-file with the name of a file that does not exist.
  • On SunOS 5, if a foreign file is given to foreign-file without a directory specification, then the dynamic linker will search its load path (the LD_LIBRARY_PATH environment variable) for the file. Hence, a foreign file in the current directory should be "./file.so", not "file.so".

Procedure foreign-procedure

(foreign-procedure name (arg-type …) return-type) => unspecified

FIXME: The interface to this function has been extended to support hooking into Windows procedures that use the Pascal calling convention instead of the C one. The way to select which convention to use should be documented.

Returns a Scheme procedure p that calls the foreign procedure whose name is name. When p is called, it will convert its parameters to representations indicated by the arg-types and invoke the foreign procedure, passing the converted values as parameters. When the foreign procedure returns, its return value is converted to a Scheme value according to return-type.

Types are described below.

The address of the foreign procedure is obtained by searching for name in the symbol tables of the foreign files that have been loaded with foreign-file.

Procedure foreign-null-pointer

(foreign-null-pointer ) => integer

Returns a foreign null pointer.

Procedure foreign-null-pointer?

(foreign-null-pointer? integer) => boolean

Tests whether its argument is a foreign null pointer.

12.19.6. Foreign Data Access

12.19.6.1. Raw memory access

The two primitives peek-bytes and poke-bytes are provided for reading and writing memory at specific addresses. These procedures are typically used for copying data from foreign data structures into Scheme bytevectors for subsequent decoding.

(The use of peek-bytes and poke-bytes can often be avoided by keeping foreign data in a Scheme bytevector and passing the bytevector to a call-out using the boxed parameter type. However, this technique is inappropriate if the foreign code retains a pointer to the Scheme datum, which may be moved by the garbage collector.)

Procedure peek-bytes

(peek-bytes addr bytevector count) => unspecified

Addr must be an exact nonnegative integer. Count must be a fixnum. The bytes in the range from addr through addr+count-1 are copied into bytevector, which must be long enough to hold that many bytes.

If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults.

Procedure poke-bytes

(poke-bytes addr bytevector count) => unspecified

Addr must be an exact nonnegative integer. Count must be a fixnum. The count first bytes from bytevector are copied into memory in the range from addr through addr+count-1.

If any address in the range is not an address accessible to the process, unpredictable things may happen. Typically, you'll get a segmentation fault. Larceny does not yet catch segmentation faults.

Also, it's possible to corrupt memory with poke-bytes. Don't do that.

12.19.6.2. Foreign data sizes

The following variables constants define the sizes of basic C data types:

  • sizeof:short The size of a "short int".
  • sizeof:int The size of an "int".
  • sizeof:long The size of a "long int".
  • sizeof:pointer The size of any pointer type.
12.19.6.3. Decoding foreign data

Foreign data is visible to a Scheme program either as an object pointed to by a memory address (which is itself represented as an integer), or as a bytevector that contains the bytes of the foreign datum.

A number of utility procedures that make reading and writing data of common C primitive types have been written for both these kinds of foreign objects.

Bytevector accessor procedures

(%get16 bv i) => integer

(%get16u bv i) => integer

(%get32 bv i) => integer

(%get32u bv i) => integer

(%get-int bv i) => integer

(%get-unsigned bv i) => integer

(%get-short bv i) => integer

(%get-ushort bv i) => integer

(%get-long bv i) => integer

(%get-ulong bv i) => integer

(%get-pointer bv i) => integer

These procedures decode bytevectors that contain the bytes of foreign objects. In each case, bv is a bytevector and i is the offset of the first byte of a field in that bytevector. The field is fetched and returned as an integer (signed or unsigned as appropriate).

Bytevector updater procedures

(%set16 bv i val) => unspecified

(%set16u bv i val) => unspecified

(%set32 bv i val) => unspecified

(%set32u bv i val) => unspecified

(%set-int bv i val) => unspecified

(%set-unsigned bv i val) => unspecified

(%set-short bv i val) => unspecified

(%set-ushort bv i val) => unspecified

(%set-long bv i val) => unspecified

(%set-ulong bv i val) => unspecified

(%set-pointer bv i val) => unspecified

These procedures update bytevectors that contain the bytes of foreign objects. In each case, bv is a bytevector, i is an offset of the first byte of a field in that bytevector, and val is a value to be stored in that field. The values must be exact integers in a range implied by the data type.

Foreign-pointer accessor procedures

(%peek8 addr) => integer

(%peek8u addr) => integer

(%peek16 addr) => integer

(%peek16u addr) => integer

(%peek32 addr) => integer

(%peek32u addr) => integer

(%peek-int addr) => integer

(%peek-long addr) => integer

(%peek-unsigned addr) => integer

(%peek-ulong addr) => integer

(%peek-short addr) => integer

(%peek-ushort addr) => integer

(%peek-pointer addr) => integer

(%peek-string addr) => integer

These procedures read raw memory. In each case, addr is an address, and the value stored at that address (the size of which is indicated by the name of the procedure) is fetched and returned as an integer.

%Peek-string expects to find a NUL-terminated string of 8-bit bytes at the given address. It is returned as a Scheme string.

Foreign-pointer updater procedures

(%poke8 addr val) => unspecified

(%poke8u addr val) => unspecified

(%poke16 addr val) => unspecified

(%poke16u addr val) => unspecified

(%poke32 addr val) => unspecified

(%poke32u addr val) => unspecified

(%poke-int addr val) => unspecified

(%poke-long addr val) => unspecified

(%poke-unsigned addr val) => unspecified

(%poke-ulong addr val) => unspecified

(%poke-short addr val) => unspecified

(%poke-ushort addr val) => unspecified

(%poke-pointer addr val) => unspecified

These procedures update raw memory. In each case, addr is an address, and val is a value to be stored at that address.

12.19.7. Heap dumping and the FFI

If foreign functions are linked into Larceny using the FFI, and a Larceny heap image is subsequently dumped (with dump-interactive-heap or dump-heap), then the foreign functions are not saved as part of the heap image. When the heap image is subsequently loaded into Larceny at startup, the FFI will attempt to re-link all the foreign functions in the heap image.

During the relinking phase, foreign files will again be loaded into Larceny, and Larceny's FFI will use the file names as they were originally given to the FFI when it tries to load the files. In particular, if relative pathnames were used, Larceny will not have converted them to absolute pathnames.

An error during relinking will result in Larceny aborting with an error message and returning to the operating system. This is considered a feature.

12.19.8. Examples

12.19.8.1. Change directory

This procedure uses the chdir() system call to set the process's current working directory. The string parameter type is used to pass a Scheme string to the C procedure.

(define cd
  (let ((chdir (foreign-procedure "chdir" '(string) 'int)))
    (lambda (newdir)
      (if (not (zero? (chdir newdir)))
      (error "cd: " newdir " is not a valid directory name."))
      (unspecified))))
12.19.8.2. Print Working Directory

This procedure uses the getcwd() (get current working directory) system call to retrieve the name of the process's current working directory. A bytevector is created and passed in as a buffer in which to store the return value — a 0-terminated ASCII string. Then the FFI utility function ffi/asciiz->string is called to convert the bytevector to a string.

(define pwd
  (let ((getcwd (foreign-procedure "getcwd" '(boxed int) 'int)))
    (lambda ()
      (let ((s (make-bytevector 1024)))
    (getcwd s 1024)
    (ffi/asciiz->string s)))))
12.19.8.3. Quicksort

Warning

this example is bogus. It is not safe to pass a collectable object into a C procedure when the callback invocation might cause a garbage collection, thus moving the object and invalidating the address stored in the C machine context.

This demonstrates how to use a callback such as the comparator argument to qsort. It is specified in the type signature using -> as a type constructor. (Note that one should probably use the built-in sort routines rather than call out like this; this example is for demonstrating callbacks, not how to sort.)

(define qsort!
  (foreign-procedure "qsort" '(boxed ushort ushort (-> (void* void*) int)) 'void))
(let ((bv (list->vector '(40 10 30 20 1 2 3 4))))
  (qsort! bv 8 4
          (lambda (x y)
            (let ((x (/ (void*-word-ref x 0) 4))
                  (y (/ (void*-word-ref y 0) 4)))
              (- x y))))
  bv)
(let ((bv (list->bytevector '(40 10 30 20 1 2 3 4))))
  (qsort! bv 8 1
          (lambda (x y)
            (let ((x (void*-byte-ref x 0))
                  (y (void*-byte-ref y 0)))
              (- x y))))
  bv)
12.19.8.4. Other examples

The Experimental directory contains several examples of use of the FFI. See in particular the files unix.sch (Unix system calls) and socket.sch (procedures for communicating over sockets).

12.19.9. Higher level layers

The general foreign-function interface functionality described above is powerful but awkward to use in practice. A user might be tempted to hard code values of offsets or constants that are compiler dependent. Also, the FFI will marshall some low-level values such as strings or integers, but other values such as enumerations which could be naturally mapped to sets of symbols are not marshalled since the host environment does not provide the necessary type information to the FFI.

This section documents a collection of libraries to mitigate these and other problems.

12.19.9.1. foreign-ctools

Foreign data access is performed by peeking at manually calculated addresses, but in practice one often needs to inspect fields of C structures, whose offsets are dependant on the application binary interface (ABI) of the host environment. Similarly, C programs often use refer to values via constant macro definitions; since the values of such names are not provided by the object code and Scheme programs do not have a C preprocessor run on them prior to execution, it is difficult to refer to the same value without encoding "magic numbers" into the Scheme source code.

The foreign-ctools library is meant to mitigate problems like the two described above. It provides special forms for introducing global definitions of values typically available at compile-time for a C program. The library assumes the presence of a C compiler (such as cc on Unix systems or cl.exe on Windows systems). The special forms work by dynamically generating, compiling, and running C code at expansion time to determine the desired values of structure offsets or macro constants.

Here is a grammar for the define-c-info form provided by the foreign-ctools library.

<exp>     ::= (define-c-info <c-decl> ... <c-defn> ...)

<c-decl>  ::= (compiler <cc-spec>)
           |  (path <include-path>)
           |  (include <header>)
           |  (include<> <header>)

<cc-spec> ::= cc | cl

<c-defn>  ::= (const <id> <c-type> <c-expr>)
           |  (sizeof <id> <c-type-expr>)
           |  (struct <c-name> <field-clause> ...)
           |  (fields <c-name> <field-clause> ...)
           |  (ifdefconst <id> <c-type> <c-name>)

<c-type>  ::= int | uint | long | ulong

<include-path>
          ::= <string-literal>

<header>  ::= <string-literal>

<field-clause>
          ::= (<offset-id> <c-field>)
           |  (<offset-id> <c-field> <size-id>)

<c-expr>  ::= <string-literal>

<c-type-expr>
          ::= <string-literal>

<c-name>  ::= <string-literal>

<c-field> ::= <string-literal>

Syntax define-c-info

(define-c-info <c-decl> … <c-defn> …)

The <c-decl> clauses of define-c-info control how header files are processed. The compiler clause selects between cc (the default UNIX system compiler) and cl (the compiler included with Microsoft's Windows SDK). The path clause adds a directory to search when looking for header files. The include and include<> clauses indicate header files to include when executing the <c-defn> clauses; the two variants correspond to the quoted and bracketed forms of the C preprocessor's #include directive.

The <c-defn> clauses bind identifiers. A (const x t "ae") clause binds x to the integer value of ae according to the C language; ae can be any C arithmetic expression that evaluates to a value of type t. (The expected usage is for ae to be an expression that the C preprocessor expands to an arithmetic expression.)

The remaining clauses provide similar functionality:

  • (sizeof x "te") binds x to the size occupied by values of type te, where te is any C type expression.
  • (struct "cn" … (x "cf" y) …) binds x to the offset from the start of a structure of type struct cn to its cf field, and binds y, if present, to the field's size. A fields clause is similar, but it applies to structures of type cn rather than struct cn.
  • (ifdefconst x t "cn") binds x to the value of cn if cn is defined; x is otherwise bound to Larceny's unspecified value.
12.19.9.2. foreign-sugar

The foreign-procedure function is sufficient to link in dynamically loaded C procedures, but it can be annoying to use when there are many procedures to define that all follow a regular pattern where one could infer a mapping between Scheme identifiers and C function names.

For example, some libraries follow a naming convention where a words within a name are separated by underscores; such functions could be immediately mapped to Scheme names where the underscores have been replaced by dashes.

The foreign-sugar library provides a special form, define-foreign, which gives the user a syntax for defining foreign functions using a syntax where one provides only the Scheme name, the argument types, and the return type. The define-foreign form then attempts to infer what C function the name was meant to refer to.

Syntax define-foreign

(define-foreign (name arg-type …) result-type)

Note

There is other functionality provided allowing the user to introduce new rules for inferring C function names, but they are undocumented because they will probably have to change when we switch to an R6RS macro expander.

12.19.9.3. foreign-stdlib

Procedure stdlib/malloc

(stdlib/malloc rtd [ctor]) => procedure

Given a record extension of void*-rt, returns an allocator that uses the C malloc procedure to allocate instances of such an object. Note that the client is responsible for eventually freeing such objects with stdlib/free.

Procedure stdlib/free

(stdlib/free void*-obj)

Frees objects produced by allocators returned from stdlib/malloc.

Procedure ffi-install-void*-subtype

(ffi-install-void*-subtype rtd) => rtd

(ffi-install-void*-subtype string [parent-rtd]) => rtd

(ffi-install-void*-subtype symbol [parent-rtd]) => rtd

ffi-install-void*-subtype extends the core attribute registry with a new primitive entry for subtype. The parent-rtd argument should be a subtype of void*-rt and defaults to void*-rt. In the case of the symbol or string inputs, the procedure constructs a new record type subtyping the parent argument. In the case of the rtd input, the rtd record type must extend void*-rt. ffi-install-void*-subtype returns the subtype record type.

The returned record type represents a tagged wrapped C pointer, allowing one to encode type hierarchies.

Procedure establish-void*-subhierarchy!

(establish-void*-subhierarchy! symbol-tree) => unspecified

establish-void*-subhierarchy! is a convenience function for constructing large object hierarchies. It descends the symbol-tree, creates a record type descriptor for each symbol (where the root of the tree has the parent void*-rt), and invokes ffi-install-void*-subtype on all of the introduced types.

Type char* extends void* Procedure string->char*

(string->char* string) => char*

Procedure char*-strlen

(char*-strlen char*) => fixnum

Procedure char*->string

(char*->string char*) => string

(char*->string char* len) => string

Procedure call-with-char*

(call-with-char* string string-function) => value

Type char** extends void* Procedure call-with-char**

(call-with-char** string-vector function) => value

Type int* extends void* Procedure call-with-int*

(call-with-int* fixnum-vector function) => value

Type short* extends void* Procedure call-with-short*

(call-with-short* fixnum-vector function) => value

Type double* extends void* Procedure call-with-double*

(call-with-double* num-vector function) => value

FIXME: (There are other functions, but I want to test and document the ones above first…)

12.19.9.4. foreign-cstructs

The foreign-cstructs library provides a more direct interface to C structures. It provides the define-c-struct special form. This form is layered on top of define-c-info; the latter provides the structure field offsets and sizes used to generate constructors (which produce appropriately sized bytevectors, not record instances). The define-c-struct form combines these with marshaling and unmarshaling procedures to provide high-level access to a structure.

The grammar for the define-c-struct form is presented below.

<exp>    ::= (define-c-struct (<struct-type> <ctor-id> <c-decl> ...)
                <field-clause> ...)

<field-clause>
         ::= (<c-field> <getter>) | (<c-field> <getter> <setter>)

<getter> ::= (<id>) | (<id> <unmarshal>)

<setter> ::= (<id>) | (<id> <marshal>)

<marshal> ::= <ffi-attr-symbol> | <marshal-proc-exp>

<unmarshal> ::= <ffi-attr-symbol> | <unmarshal-proc-exp>

<struct-type> ::= <string-literal>
12.19.9.5. foreign-cenums

This library provides the special forms define-c-enum and define-c-enum-set, which associate the identifiers of a C enum type declaration with the integer values they denote.

The define-c-enum form describes enums encoding a discriminated sum; define-c-enum-set describes bitmasks, mapping them to R6RS enum-sets in Scheme.

The (define-c-enum en (<c-decl> …) (x "cn") …) form adds the en FFI attribute. The attribute marshals each symbol x to the integer value that cn denotes in C; unmarshaling does the inverse translation.

The (define-c-enum-set ens (<c-decl> …) (x "cn") …) form binds ens to an R6RS enum-set constructor with universe resulting from (make-enumeration '(x …)); it also adds the ens FFI attribute. The attribute marshals an enum-set s constructed by ens to the corresponding bitmask in C (that is, the integer one would get by logically or'ing all cn such that the corresponding x is in s). Unmarshaling attempts to do the inverse translation.

The grammar for the two forms is presented below.

<exp> ::= (define-c-enum <enum-id> (<c-decl> ...)
            (<id> <c-name>) ...)

<exp> ::= (define-c-enum-set <enum-id> (<c-decl> ...)
            (<id> <c-name>) ...)

<enum-id> ::= <id>