The TUNES LLL Subproject

Some notes up front:

Our generic implementation of the LLL.
There is a page about the low-level issues of Storage: garbage collection, persistency, etc.
There are two sub-subprojects below the LLL: the i386 and the O'TOP subprojects.
A list of programming resources available to build upon or translate.

Relocate these to more appropriate places.

NOTE: existing code and corresponding technical comments lie in directory src/LLL of the source distribution.

You may also want to look at the what Review subproject said about implementations of other systems.

Principles

LLL objects are exactly those defined to assist in implementing arbitrary Tunes objects, whether high-level or low-level. This is the defining principle of the scope of the LLL subproject.

See also the HLL Principles, since it guides the principles of all Tunes system objects.

Portability (not interoperability or complete compatibility) across different architectures.
Suited both to implement and to use transparently an elaborate memory management system, with GC for persistent objects, and weak pointers.

At the core of the LLL, all the computation words of ANS Forth, excluding all that have to do with parsing, I/O and defining words; our version for these will be specialized. Above that, the LLL system provides some mechanism to do the memory management.

Low-level objects

Objects must be uniquely identifiable in some way.

The typical system for handling this issue:

Objects in the local address space are identified by their 32 bit CPU address (perhaps with an implementation-dependent offset).
Objects outside it are accessed using descriptors/handles inside it, so need not be considered as special objects with respect to the Low-Level memory system.
Objects in a same address space must trust each other anyway, so let them have a cooperative GC and multithreading.
Communication between address spaces happens directly through space-dependent drivers.

Annotations

Collect the standard mechanisms for annotations and their resolution.

Every annotation may have its own implementation, be it a hashtable of object-to-value associations, or an array of values, or some executable code. We can implement lazy evaluation and futures, typing, UI interface, scoping through annotations. For example, an object's concrete type (relative to the GC mechanism) may be determined from bits in its address, whether statically or dynamically.

These goals require a framework to be developed to systematize this approach.

Issues

See HLL issues.

Hardware independence

At what level shall word size be decided?
How can objects migrate back and forth between machines with different bytesize with no loss in information, yet while taking full advantage of available hardware?

Garbage Collection

Infix pointers (that do not point at some globally constant offset to the beginning of an allocation unit) greatly complicate the GC. They should be forbidden whenever possible, and replaced with their simulation by segment-offset pairs, with an aligned pointer and an offset inside the segment. Alternatively, efficient segment-offset infix pointer simulationscould use a cache of the single infix pointer, when in tight loops, as long as the cache is properly invalidated when objects move.

The GC may have to accept infix pointers for code return addresses, or else the calling convention may become grossly unefficient. This calls for some kind of static or dynamic typing accessible to the GC so as to treat return addresses specially without having to pay for ubiquitous special checks. An obvious way to do things is to segregate code in separate heaps from data, and distinguish code addresses by addresses, using a static or dynamic BIBOP technique.

Once it has been determined that a pointer is infix, usual techniques apply to find the head of the corresponding object: infix-pointable objects are grouped into pages (say 4K or 8K hardware pages), and you associate to every page, either on-page or off-page, enough information to track down object heads on the page (naive way: a linked list; more advanced: a binary tree; more efficient: just the size of objects, assuming they all have the same size). If the information is on-page, it means that objects may not cross page boundaries.

Big problem: how to efficiently differentiate pointers from numbers, etc.? Structural differentiation is powerful, but may slow the GC considerably, unless descriptors are simple (can be just an integer for length of a pointer array, for most objects), and forbids dynamic differentiation, mixing integers and pointers in an array (e.g. simple unframed stack), etc. That's why we'll use a simple bit pattern to differentiate integers (raw data) from pointers (structured data), and different kind of pointers from each other (that's a BIg Bunch Of Pages kind of GC). Full system integers can still be accessed if properly boxed or framed. This leads to slow interpretation, but interpretation is slow, anyway, and compiled code need pay {,un}{box,fram}ing only at enter and exit, not in tight loops.

Question: will integers be stripped of their low bit, which would simplify overflow testing code to naught, and make the implementation portable, but make a little harder doing pointer arithmetics and mixing of true integers with 31 bit ones. Or stripping them from their overflow bit, which makes integer overflows to generate GC-readjustable pointers, rather than providing flat modulo arithmetics, but allows easy pointer arithmetics and mixing of 31-bit integers and 32-bit ones?

If we tag the low bits, we must choose between integers having low bit set or low bit cleared. Having it set (and thus having bit cleared for pointers) may allow faster pointer access on RISC machines, but slows any arithmetics. Having bit set for pointers allow easier arithmetics, but forces the use of an offset for all memory accesses.

We shall meta-implement all the ways, and compare actual execution time and code space measurements! Tongue in cheek, perhaps?

A high-level page directory is used to determine the GC type of objects according to the page it is in. It is a multi-level hashed structure that may evolve with the GC code, so that it may allow to find quickly the type of objects. Typically a mix between arrays and balanced binary trees to recognize bit patterns.

The GC type of an object, as determined by its address gives us routines to update the object during a GC, to destroy the object when it is not accessed anymore, etc.

The GC type of a page chunk allows us to track down the beginning of individual objects pointed to on the page (in case infix pointers are used), also gives us the policy to follow when swapping out the page (which may be copying the page to disk, sending it to the network, or compressing it to memory for possible further actual swapping out of memory, etc).

Persistence

Be careful with distributed persistence: always remember previous states until all transactions using it are finished and confirmed.
Real-Time persistent objects will have to time-allocate the copying of their data into buffers for synchronized persistent store, even though this operation may be done only once in a while.

Plan

Implementation language

We shall use Scheme (or later our own HLL) to produce assembly source files for all our different target processors from mostly the same meta-source.
We shall use C as though it were a mostly-regular assembler, with labels, jumps, etc, so the same meta-source also produces the C source files. Hey, wasn't C called a portable assembler?
Actually, we may even depend on some properties of a particular C implementation with particular optimization flags so as to make assumptions on the generated C code, and be able to cut it in parts that allow for garbage collection, run-time code generation, or whatelse.

Modules

modules have some install/uninstall annotation fields explaining how to restore/resume the object from the state log as gotten from persistent store. In general, this will be a call to a standard trusted high-level module. However, this can be low-level code in the very first bootstrapping modules.
This scheme can be used for migration in general: persistence, garbage collection, dynamic linking, etc.

Mixed cooperative/preemptive multithreading:

Cooperation vs. Pre-emption

Preemption is necessary for real-time response as well as for unsecure tasks; but cooperation allows far better context switch time (on the i386, compare saving some 5 registers to saving the whole CPU+FPU+MMU state), not to talk about trivial implementation of mutual exclusion, and all kinds of similar system-wide assumptions, such as those needed for a garbage collector.

In simpler terms, cooperative multithreading is faster since the compiler performs the tasks of determining how to defer and save the computation, instead of providing a generic solution at run-time which has no access to the structure of the code, and therefore must act in as conservative manner as possible.

Solutions

Allow the HLL to be preemptively-multithreaded (or rather, it will manage seamlessly-concurrent objects), while the LLL is cooperatively-multithreaded.

Actually, if we define cooperation by the fact that you may rely on some software properties of code, and preemption by the fact that a thread may yield execution without having to poll the scheduler, then the two techniques are not incompatible, as we intend to show in this essay.

Modules to implement

Foundational Object Types

This outlines the general collection of object types relevant to the LLL in its role as an implementational foundation:

Bit

Machine-Word

Bit-field parametrized by size, and usually constant per architecture.

Processor

"Abstract" description of processing units, describing the instruction sets, internal state (all the registers), internal re-scheduling semantics (if any), and relation to the memory architecture.

Memory

An abstraction over some piece of hardware which has some kind of performance and persistence semantics. For example,

CPU cache
RAM
ROM
"flash"able RAM
floppy disks
hard disk drives
compact disks of various kinds

Or abstractions over this such as:

Memory pages
Memory-protection spaces
Memory-mappings and memory addressing modes

Channel

An abstraction over communications or input/output channels, which have state and modes, utilizing some cooperating concepts:

Signal: Information propagation and verification basics.
Bus: Shared resource for communication, which have state as well.

Thread

A voucher for a share of processor resources. This optionally contains additional aspects such as execution state and scheduling state. This definition is intentionally left abstract, so that there are various models that it can be extended into. In fact, this may turn into a specification for separate sub-object types.

Generic Memory-Management

Here are some grand end-point goals:

A generic generational garbage collector with full support for persistence (checkpointing and restarting of objects).
A generic module for inter-heap (perhaps distributed) synchronized persistent garbage collector.
A generic module for back-reference unsynchronized persistent conservative garbage collector.
Generic support for chunks of executable binary code.
Mechanisms for the migration manager (from the Migration subproject) to use: swapping to disk, to a compressed RAM zone, to another host, etc.

Direct Management of User Interface Hardware

Here are some basic needed drivers for interfaces:

A console adapter for text-mode.
a text-mode adapter for serial consoles (perhaps just use the ncurses package).
A screen windowing multiplexer for text-mode screens.
A generic output-synchronizing input multiplexer.
A standard combination of the previous two.
A fast graphics library.

File formats

A partition manager, that multiplexes hard disks according to standard partitioning methods.
Support for various existing file systems: MS-DOS FAT FS (and WindowsNT or Linux UMSDOS extensions), Linux EXT2 FS, etc.
Graphic file formats: GIF, JPEG, PCX, TIFF, etc.
Various Audio file formats for sound samples.

Why GC

This essay has been moved to the Wiki.

To Do

Find out how multiple architectures can be made interoperable.
Open a "Store" subproject about encodings and algorithms for the dynamically and actively annotable, distributed, persistent, garbage-collected store. Actually, distribution, persistence, and garbage-collection could be obtained by proper active annotations.