Plan 9 and Inferno at the Google Summer of Code

QEMU, as discussed earlier, uses a library of so-called "micro-ops" to emulate guest code. It freely copies and rewrites these micro-ops to create translation buffers of basic blocks of guest code. Doing this on Plan 9 has been considered possibly the most challenging part of the QEMU port. We now have a demo, which, while hardly feature complete, shows promise and reduces the space for technical glitches to occur.

Our micro-ops library consists of exactly one function, foo. foo requires relocation of exactly one function from the host program, which simply calls print. foo passes one parameter to this function; the value of this parameter is determined by the host program using relocation to achieve immediate value folding. So how does this work? Our operations library, ops.c, compiles fully into a dlm, 8.ops, which contains:

In a high-level sense, dyngen's job is then to take 8.ops and turn it back into a form that the master program (QEMU) can use. Dyngen/UNIX can get away with leaving the compiled function bodies in a .o file, a luxury not available to us on Plan 9, where dyngen copies the compiled text out to C uchar arrays. These compiled text sections, though, need to be relocated, so dyngen's other primary job is to emit code to do this relocation at runtime, on a function-by-function basis (contrast libdynld's entire-dlm-at-once approach). Dyngen/UNIX does this by understanding the relocation record formats of every executable type and architecture it supports; I think the Plan 9 version can get away only understanding the architecture indepenent headers.

The mechanism for this portability will be to make the emitted code (and thereby master program) a consumer of dynld's relocation facilities but not its loader.

Dyngen's basic code generation strategy is:

  1. Extract each exported symbol (op_...) and emit it as a byte array into a .c file. This uses fnbounds() and gives us an opportunity to cache the results for later steps.
  2. Emit something which will cause qemu to fill out the import table at startup. This basically means copying part of libdynld's dynloadgen() parser into dyngen. The current demo emits an array of "DyngenImport" structures in import table order; the master program then, once at start up, builds the real import table in memory. In retrospect, this seems to have been the wrong decision: no such structure is necessary as we can make the loader build the import table for us from assembler source (where we can get away not knowing the types of the symbols at hand).
  3. Minimally parse the import table itself, searching for the abusive symbols that are used for constant loading.
  4. Pair the relocation instructions with the code to copy that function. This consumes the fnbounds() table from above and involves a scan over the relocation table, again slurping in part of dynloadgen(). The emitted code will use libdynld's dynreloc() function to achieve platform independence of dyngen itself. This dramatically simplifies dyngen as it honors (part of) the information hiding properties of libdynld's API. On the other hand, it imposes the requirement on the master program of having the import table around. Each of these relocator invocations "pretends" that we're loading the entire object, adjusting the base pointer so that the relocations happen atop the region we just copied. Constant folding is achieved by patching the import table with the value desired before calling the relocator function.
The master program then
  1. At startup, follows the import table construction part of its agreement.
  2. Creates translation buffers by calling dyngen's emitted code.
  3. calls into the translation buffer.
Our demo relocates foo twice, with two different immediate values, and calls them once each. To compile, unpack dynld.tgz in /sys (possibly using divergefs) and wire it into the libs make infrastructure. Then unpack the demo, run mk all, and then 8.masterprog should emit:
externfun 11e7
__abuseme cafebabe
 83 EC 08 B8 00 00 40 00 89 04 24 E8 00 00 00 00 83 C4 08 C3
 83 EC 08 B8 69 7A 00 00 89 04 24 E8 9F 59 FF FF 83 C4 08 C3
externfun called: 31337
externfun called: 732811277
The lines are, respecitvely, import information for the two symbols (the function and the abusee for constant import), the unrelocated version of foo, the relocated version for the first buffer, and the result of calling each relocated buffer. We hope to have multiple-function dispatches and proper handling of the immediate-value folding material shortly.