Cell Memory Management (2007/07/29)
Several bugs have been identified and fixed in the MMU code. Although the first user process still dies prematurely. I hope that we're close to finding the remaining issues. The simulator output has been very useful, although it's really a tedious task having to single-step through long runs of instructions to find where things go wrong.
Most changes have been done in l.s, an assembly file containing processor initialization code & other essential functions, and mmu.c, which has the MMU code specific to the Cell BE. This is a brief summary of the issues that were solved so far:
- As an enhancement, l.s now uses the TLBIA instruction to invalidate the whole TLB, instead of looping through all entries. This is both smaller & faster.
- The simulator was producing warnings about bad values used in specifying the TLB way. I found that a bit-mask used to specify the way had an invalid value.
- We're software-managing the TLB, and I found that in certain places, inconsistency is created between the hardware TLB and the cached Software TLB.
- kernel pages weren't mapped correctly in the TLB whenswitching between processes.
Being working at a very early stage in the kernel, fixing all of these issues hasn't shown any perceivable improvement yet (other than the internal state of the processor being more like what we expect), simply because any little remaining issue is capable of screwing everything up!
By enabling Memory XLATE_FAULTS debug control in the simulator, I get a message saying “hw tlb loading disabled w/o sw loading available” right after switching to the user process. This is interesting because software management of TLB is enabled in the very beginning, long before the the user process is created, and I don't see that message during that. Anyway, at least I have a clue about where to look next!
Inferno Port to The Cell BE (2007/06/24)
So, this is where I’ll be posting bits of information about what’s happening in the Cell BE port. The current code base is built on the PPC440 port, which is used in IBM’s Blue Gene. Currently, most attention is given to MMU code, which is substantially different from PPC440. The kernel now boots on IBM’s systemsim simulator, but only to panic after mapping the first user page. I noticed that the number of kernel pages is being calculated differently in several locations, and this may be leading to inconsistency. I’m almost sure that this is one of the problems because I noticed that the kernel is reporting the wrong amount of physical memory. I hope to be able to figure out the problem soon. Also, to make mapping of kernel pages easier, one of the four ways of the TLB is being monopolized by kpages alone, which has to change.
The next step I guess would be enabling the second PPU thread, as we’re only using one. Then it might be a suitable time to turn our attention to the SPUs, which should be interesting.
Till the next time…