RSS | technovelty home | page of ian | ianw@ieee.org
We were visiting our
friends at HP yesterday, and whilst
discussing some of the Itanium product line plans, the comment that
certain TLA's
have a real sweet spot for Itanium because it can tell them the number
of set bits in a word very quickly. I was aware of the
popcnt instruction, but wasn't aware people were buying
machines to use it.
I instrumented some code to run a
test -- one using the GCC built-in (which falls through to the
popcnt instruction on Itanium), one using a
re-implementation of the generic GCC version, and one described in the
excellent "Algorithms for
programmers" book by Jörg Arndt (download it now if you don't
have it!).
A naive implementation would do it with 64 shifts, which would take
forever (for a fairly good explanation of why shifting takes so long,
see the patent on optimised
bit-shifting). The gcc built-in way is a clever
hack:
const int count_table[256] =
{
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
};
static inline unsigned long countit_table(unsigned long x) {
int i, ret = 0;
for (i = 0; i < 64; i += 8)
ret += count_table[(x >> i) & 0xff];
return ret;
}
The results I saw, in cycles, were:
| Algorithm | 2Ghz AMD Opteron 270 | 1.5Ghz Itanium2 |
|---|---|---|
| Built in | 30 | 6 |
| Table lookup | 45 | 27 |
| Shifting | 60 | 48 |
The AMD GCC built-in seems to create some slightly tighter code than my extract table lookup version -- it doesn't unroll the loop and I assume the cache does a good job of the prediction. But if your applications rely on finding set bits heavily, then Itanium is the platform for you!
posted at: Fri, 15 Dec 2006 13:39 | in /code/arch | permalink | add comment (2 others)

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.