Virtual Linear Page Table

Several people asked me to re-explain virtual linear page tables after my SLUG talk. So here it goes!

3 level page table

First, let's review the standard translation mechanism. We have a large address space, which we divide up into pages. We map pages to frames of physical memory.

The processor has a translation look-aside buffer or TLB which stores a cache of virtual page to physical frame mappings. It is small, but fast. Like any cache, it runs out and needs to be filled. The operating system does this from the page tables, which keep a virtual to physical mapping.

On Linux, we use a multi-level hierarchical page table, illustrated above. First you take some of the top bits, and use that as an index into the page global directory (PGD). You then follow that to a page middle directory entry (PMD), take more bits as an index into that page, and finally follow that to page table entries. Finally, you take another index to find the PTE entry, which points to the physical page the given virtual address is mapped to.

So far, so good. But this walking process is very slow. It requires a lot of memory references and hence cache traffic, and has to be performed at regular intervals.

linear page table

What would be good is if we dispensed with the upper levels, as per the above diagram. So we just keep the PTE entries (which point directly to allocated physical pages) in one long array, and then simply use the VPN as an index into it. This would require much less effort.

The problem is that we can not find that many contiguous (next to each other) physical pages. Even if we could, the amount of physical memory we had to allocate to page tables would be huge! So this scheme won't work.

But we have a large virtual address space. Why not dedicate the top portion of that to be a linear array of PTE entries, and use the TLB to map the virtual pages full of PTE's to actual pages full of PTE's.

virtual linear page table

So when we ask for a virtual page, and it is not mapped by the TLB, we can use the virtual page number as an index into our virtually linear page table (note above we mention that we hash the virtual page number, it equates to the same thing as just using it as an index). We can now go directly to the PTE entry, and as long as the virtual page that PTE entry exists in has a mapping to the underlying physical page full of PTE entries, there is no need to walk the tree - the "virtual" PTE is now a pointer to the real thing! From there, it is a simple matter to read the PTE entry and find the physical frame for the virtual page we are looking for.

Of course, as with any engineering effort, there are trade-offs. Firstly, we have to sacrifice some of our virtual address space to the virtual linear table. We have lots of address space, so this is not too severe. The other trade-off is that you now have an extra TLB entry; as we mentioned the TLB is small, so taking up more entries can have a penalising effect. However, if you are walking linearly through pages, as a program usually does, you will not need to do a full page table walk for a whole page full of PTE entries and it only cost you one TLB entry; a significant win.

Itanium attempts to find the hashed PTE address in hardware, and calls is the virtually hashed page table walker or VHPT walker. If it can not resolve the situation, it returns control to the operating system, which will then have to do a full walk of the page tables. There's a few more details, but that's the broad strokes.

I hope that clears things up for a few people ...

... however if it does not, please feel free to modify the figures to make them clearer!