Short review of guarded page-tables

Wed 05 September 2007

Anyone who has done an operating systems 101 course is familiar with the standard hierarchical page-table structure. This is essentially a radix-tree where bits of the virtual address are taken as indexes into an array of pointers, each pointing to the next level down or, at the leaves, to a physical entry.

One potential issue with this style of page-table is with a very large, very sparse address space (such as in a single address-space operating system) there will be many intermediate steps with possibly only a single translation at the end (i.e. lots of single pages scattered all around your address-space).

Thus we can add a bit-field guard to each of the entries in the page table (Liedtke, 1994) to help indicate only those paths through the tree which are valid (similar to a Patricia tree).

Translation first proceeds as usual; the top bits are used as an index into the top level table. However, each entry now contains extra information about further valid addresses. The guard is masked against the next bits in the virtual address; if it matches (e.g. in our example is 0x2) the next step can proceed directly to the leaf node, bypassing the middle look-up. The entry can be marked as either final (i.e. pointing to a leaf node) or intermediate, in which case the process continues with the next level (which also has its own guard, and so forth). Any entry that does not match the guard is known to be false, and therefore a fault can be raised.

To examine this further, consider that we know that each page in our 24-bit address space above is 4KiB, thanks to the 2¹² = 4KiB offset. Therefore each of the 16 possible values of the top four bits selects 1MiB of the 2²⁴ = 16MiB address space; e.g. 0x0 selects the first megabyte, 0x1 the second and so forth. The second 4 bits selects a 64KiB region of this 1MiB, and the last 4 bits selects a 4KiB region of this 64KiB region.

This scheme could work quite well if we were using only 16 pages each separated by 1MiB. Each of these pages would only require an entry in the top level with an appropriate guard which could then be marked as final and point directly to a physical frame.

Once things become less sparse however, we need to start filling in middle levels. If you have 2 pages within a 16MiB region, you need at least one extra bit of address to tell them apart. If you have 3 or 4 pages, you need at least 2 bits to tell them apart, and so forth.

It would therefore be beneficial to have variable level sizes to dynamically adapt and create the most efficient lookup possible given the address space layout of a process. For example, if you knew your process was only going map pages 0x123 (100100011) and 0x124 (100100100) a good option would be to make your top-level only two entries (i.e. check the top bit) with a guard on the entry at index 0x1 of 00100. You then need a second level with 8 pointers to differentiate the remaining 3 bits (each of these would be final and hence point directly to a physical page).

The more pages you start mapping, the less bits you can discard with the guard. If you follow this through to a guard of 0 bits you end up with a linear page-table.

It has been shown (Elphinstone, 1999) that fixed level sizes with 16 entries tends to work well. For a 32-bit address-space with 4KiB pages (e.g. 12 bits offset) this leaves 20 bits to be mapped by the page-tables; with each level mapping 2⁴ bits this means a 5 level tree. The guard values provide enough path-compression that the table is not constantly walked. The disadvantage is that it may "over-allocate" meaning more space is dedicated to page-tables than strictly required.

Deciding where and when to split levels, what size indexes to put at each level and updating all the guard values dynamically to make the smallest, most efficient page-table would create a variable-radix page-table (Szmajda and Heiser, 2003).

In summary, a guarded page-table is more useful the more sparse your address space is. A variable-radix guarded page-table is complex, but could offer advantages for implementing variable page-size support and thus having positive effects on TLB coverage.