When playing around with memory management a matrix multiply is often mentioned as a TLB thrasher, and a good way to stress your system. It's interesting to illustrate this process a little.

The first thing to remember is that C99 (6.5.2.1 to be precise) dictates that arrays are always stored in row-major order, as shown below.

So, remembering back to first year algebra, to multiply two matrices we multiply and sum rows and columns, illustrated linearly as the computer see it below. Black lines are page boundaries, while boxes are matrix elements.

The simplest code to do this is usually along the lines of

int a[LEN][LEN]; int b[LEN][LEN]; int r[LEN][LEN]; int i,j,k for(i=0; i < LEN; i++) for(j = 0; i < LEN; j++) for(k = 0; k < LEN; k++) r[i][j] += a[i][k] + b[k][j];

We can see how we have a repeated linear walk over the memory holding
matrix `b`, done for each element in `a`. If we make `b`
sufficiently large (say, a few hundred pages) we can be sure that by the
time we get to the end, we have kicked out the earlier TLB entries. Then
we start all over again, and start faulting pages back in. Thus a matrix
multiply problem comes down to doing repeated page-sized stride walks
over a linear region.

Looking at the diagram, it is also clearer what can solve this problem. We can either have more TLB entries, so we don't have to kick out entries which we will use again, or have larger pages -- e.g. less black lines. Or a smarter algorithm ...