QEMU, as discussed earlier, uses a library of so-called "micro-ops" to
emulate guest code. It freely copies and rewrites these micro-ops to create
translation buffers of basic blocks of guest code. Doing this on Plan 9 has
been considered possibly the most challenging part of the QEMU port. We now
have a demo, which, while hardly feature complete, shows promise and reduces
the space for technical glitches to occur.
Our micro-ops library consists of exactly one function, foo. foo requires
relocation of exactly one function from the host program, which simply calls
print. foo passes one parameter to this function; the value of this parameter
is determined by the host program using relocation to achieve immediate value
folding. So how does this work?
Our operations library, ops.c, compiles fully into a dlm, 8.ops, which contains:
- compiled text
- symbol table annotations (which allow fnbounds() to do its thing)
- an export table
- an import table
- a relocation table
In a high-level sense, dyngen's job is then to take 8.ops and turn it back
into a form that the master program (QEMU) can use. Dyngen/UNIX can get
away with leaving the compiled function bodies in a .o file, a luxury not
available to us on Plan 9, where dyngen copies the compiled text out to C
uchar arrays. These compiled text sections, though, need to be relocated,
so dyngen's other primary job is to emit code to do this relocation at
runtime, on a function-by-function basis (contrast libdynld's
entire-dlm-at-once approach). Dyngen/UNIX does this by understanding the
relocation record formats of every executable type and architecture it
supports; I think the Plan 9 version can get away only understanding the
architecture indepenent headers.
The mechanism for this portability will be to make the emitted code (and
thereby master program) a consumer of dynld's relocation facilities but not
Dyngen's basic code generation strategy is:
- Extract each exported symbol (op_...) and emit it as a byte array
into a .c file. This uses fnbounds() and gives us an opportunity to cache
the results for later steps.
- Emit something which will cause qemu to fill out the import table at
startup. This basically means copying part of libdynld's dynloadgen()
parser into dyngen.
The current demo emits an array of "DyngenImport" structures in import
table order; the master program then, once at start up, builds the real
import table in memory. In retrospect, this seems to have been the wrong
decision: no such structure is necessary as we can make the loader build
the import table for us from assembler source (where we can get away not
knowing the types of the symbols at hand).
- Minimally parse the import table itself, searching for the abusive
symbols that are used for constant loading.
- Pair the relocation instructions with the code to copy that function.
This consumes the fnbounds() table from above and involves a scan over the
relocation table, again slurping in part of dynloadgen().
The emitted code will use libdynld's dynreloc() function to achieve
platform independence of dyngen itself. This dramatically simplifies
dyngen as it honors (part of) the information hiding properties of
libdynld's API. On the other hand, it imposes the requirement on the
master program of having the import table around.
Each of these relocator invocations "pretends" that we're loading the
entire object, adjusting the base pointer so that the relocations happen
atop the region we just copied.
Constant folding is achieved by patching the import table with the value
desired before calling the relocator function.
The master program then
- At startup, follows the import table construction part of its agreement.
- Creates translation buffers by calling dyngen's emitted code.
- calls into the translation buffer.
Our demo relocates foo twice, with two different immediate values, and calls
them once each. To compile, unpack dynld.tgz in /sys (possibly using
divergefs) and wire it into the libs make infrastructure. Then unpack the
demo, run mk all, and then 8.masterprog should emit:
83 EC 08 B8 00 00 40 00 89 04 24 E8 00 00 00 00 83 C4 08 C3
83 EC 08 B8 69 7A 00 00 89 04 24 E8 9F 59 FF FF 83 C4 08 C3
externfun called: 31337
externfun called: 732811277
The lines are, respecitvely, import information for the two symbols (the
function and the abusee for constant import), the unrelocated version of
foo, the relocated version for the first buffer, and the result of calling
each relocated buffer.
We hope to have multiple-function dispatches and proper handling of the
immediate-value folding material shortly.