user manual

Efficient Implementation of Population Count Function 91

22007E/0—November 1999 AMD Athlon™ Processor x86 Code Optimization

Efficient Implementation of Population Count Function

Population count is an operation that determines the number of

set bits in a bit string. For example, this can be used to

determine the cardinality of a set. The following example code

shows how to efficiently implement a population count

operation for 32-bit operands. The example is written for the

inline assembler of Microsoft Visual C.

Function popcount() implements a branchless computation of

the population count. It is based on a O(log(n)) algorithm that

successively groups the bits into groups of 2, 4, 8, 16, and 32,

while maintaining a count of the set bits in each group. The

algorithms consist of the following steps:

Step 1 Partition the integer into groups of two bits. Compute the

population count for each 2-bit group and store the result in the

2-bit group. This calls for the following transformation to be

performed for each 2-bit group:

00b -> 00b

01b -> 01b

10b -> 01b

11b -> 10b

If the original value of a 2-bit group is v, then the new value will

be v - (v >> 1). In order to handle all 2-bit groups simultaneously,

it is necessary to mask appropriately to prevent spilling from

one bit group to the next lower bit group. Thus:

w = v - ((v >> 1) & 0x55555555)

Step 2 Add the population count of adjacent 2-bit group and store the

sum to the 4-bit group resulting from merging these adjacent

2-bit groups. To do this simultaneously to all groups, mask out

the odd numbered groups, mask out the even numbered groups,

and then add the odd numbered groups to the even numbered

groups:

x = (w & 0x33333333) + ((w >> 2) & 0x33333333)

Each 4-bit field now has value 0000b, 0001b, 0010b, 0011b, or

0100b.