Compute parity of 64-bit unsigned integer

21 July 2015

Introduction

I'll go different ways to compute parity of 64-bit unsigned integer. For parity computing we should first find out whether the count of ones in binary representation of number is even or odd. To analyze algorithm efficiency we'll use four different functions. The first function is easiest to write, but it's not very efficient. The last one takes longer to code, but it provides the best efficiency. Let's compare some benchmarks.

Computing functions

1) Function f1 converts the number to binary representation and counts '1' characters inside a string.

func f1(n uint64) int {
    bs := strconv.FormatUint(n, 2)
    return strings.Count(bs, "1")
}

2) Function f2 checks if last bit of a number is one and shifts number with every iteration until it's zero (no more ones). Loop stops when we find the last bit set.

func f2(n uint64) int {
    cnt := 0
    for n > 0 {
        if n&1 == 1 {
            cnt++
        }
        n = n >> 1
    }
    return cnt
}

3) Function f3 decreases number by one and performs AND operation with the original number on each iteration, until the resulting number is zero. There are as much iterations as there are ones. It clears bit by bit.

func f3(n uint64) int {
    cnt := 0
    for n > 0 {
        n &= (n - 1)
        cnt++
    }
    return cnt
}

4) Function f4 uses lookup table for the first 255 numbers. Every table entry contains total count of ones for corresponding number. The idea is to divide 64-bit integer into 8 unsigned bytes (numbers) and then the final count is calculated as a sum of ones found in each one of divided bytes. The procedure steps are always the same, independently of varying arguments.

var bitCounts = []int8{
    0, 1, 1, 2, 1, 2, 2, 3,
    1, 2, 2, 3, 2, 3, 3, 4,
    1, 2, 2, 3, 2, 3, 3, 4,
    2, 3, 3, 4, 3, 4, 4, 5,
    1, 2, 2, 3, 2, 3, 3, 4,
    2, 3, 3, 4, 3, 4, 4, 5,
    2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6,
    1, 2, 2, 3, 2, 3, 3, 4,
    2, 3, 3, 4, 3, 4, 4, 5,
    2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6,
    2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6,
    3, 4, 4, 5, 4, 5, 5, 6,
    4, 5, 5, 6, 5, 6, 6, 7,
    1, 2, 2, 3, 2, 3, 3, 4,
    2, 3, 3, 4, 3, 4, 4, 5,
    2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6,
    2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6,
    3, 4, 4, 5, 4, 5, 5, 6,
    4, 5, 5, 6, 5, 6, 6, 7,
    2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6,
    3, 4, 4, 5, 4, 5, 5, 6,
    4, 5, 5, 6, 5, 6, 6, 7,
    3, 4, 4, 5, 4, 5, 5, 6,
    4, 5, 5, 6, 5, 6, 6, 7,
    4, 5, 5, 6, 5, 6, 6, 7,
    5, 6, 6, 7, 6, 7, 7, 8,
}

func f4(n uint64) int {
    var cnt int8
    cnt += bitCounts[n&255]
    cnt += bitCounts[n>>8&255]
    cnt += bitCounts[n>>16&255]
    cnt += bitCounts[n>>24&255]
    cnt += bitCounts[n>>32&255]
    cnt += bitCounts[n>>40&255]
    cnt += bitCounts[n>>48&255]
    cnt += bitCounts[n>>56&255]
    return int(cnt)
} // end-of-f4

Benchmark

We run functions f1, f2, f3 and f4 with different size of integer arguments (0xF, 0xFF, 0xFFF, 0xFFFF, 0xFFFFF, 0xFFFFFF, 0xFFFFFFF, 0xFFFFFFFF). The number of ones of integer arguments varies from 8 to 64. We are interested in how different algorithm design reflects function execution time in case when we vary arguments (the number of ones to count).

After benchmarking we get these results:

go test -bench .|benchgraph -title="Graph1: Benchmark F(x) in ns/op" -obn="F2,F3,F4" -oba="F,FF,FFF,FFFF,FFFFF,FFFFFF,FFFFFFF,FFFFFFFF"

√ BenchmarkF1_F-4           20000000            54.1 ns/op
√ BenchmarkF1_FF-4          20000000            61.0 ns/op
√ BenchmarkF1_FFF-4         20000000            70.0 ns/op
√ BenchmarkF1_FFFF-4        20000000            78.9 ns/op
√ BenchmarkF1_FFFFF-4       20000000            90.6 ns/op
√ BenchmarkF1_FFFFFF-4      20000000            99.1 ns/op
√ BenchmarkF1_FFFFFFF-4     20000000           111 ns/op
√ BenchmarkF1_FFFFFFFF-4    10000000           137 ns/op
√ BenchmarkF2_F-4           500000000             3.57 ns/op
√ BenchmarkF2_FF-4          300000000             5.69 ns/op
√ BenchmarkF2_FFF-4         200000000             8.54 ns/op
√ BenchmarkF2_FFFF-4        100000000            10.5 ns/op
√ BenchmarkF2_FFFFF-4       100000000            12.9 ns/op
√ BenchmarkF2_FFFFFF-4      100000000            14.7 ns/op
√ BenchmarkF2_FFFFFFF-4     100000000            16.4 ns/op
√ BenchmarkF2_FFFFFFFF-4    100000000            18.4 ns/op
√ BenchmarkF3_F-4           500000000             3.47 ns/op
√ BenchmarkF3_FF-4          300000000             5.55 ns/op
√ BenchmarkF3_FFF-4         200000000             7.67 ns/op
√ BenchmarkF3_FFFF-4        200000000             9.86 ns/op
√ BenchmarkF3_FFFFF-4       100000000            12.8 ns/op
√ BenchmarkF3_FFFFFF-4      100000000            14.2 ns/op
√ BenchmarkF3_FFFFFFF-4     100000000            16.2 ns/op
√ BenchmarkF3_FFFFFFFF-4    100000000            18.3 ns/op
√ BenchmarkF4_F-4           300000000             5.54 ns/op
√ BenchmarkF4_FF-4          300000000             5.59 ns/op
√ BenchmarkF4_FFF-4         300000000             5.55 ns/op
√ BenchmarkF4_FFFF-4        300000000             5.54 ns/op
√ BenchmarkF4_FFFFF-4       300000000             5.54 ns/op
√ BenchmarkF4_FFFFFF-4      300000000             5.59 ns/op
√ BenchmarkF4_FFFFFFF-4     300000000             5.55 ns/op
√ BenchmarkF4_FFFFFFFF-4    300000000             5.53 ns/op

And for other set of arguments (0xF, 0xF0, 0xF00, 0xF000, 0xF0000, 0xF00000, 0xF000000, 0xF0000000), having same number of ones set on different positions:

go test -bench .|benchgraph -title="Graph2: Benchmark F(x) in ns/op" -obn="F2,F3,F4" -oba="0F,F0,F00,F000,F0000,F00000,F000000,F0000000"

√ BenchmarkF1_0F-4          30000000            53.6 ns/op
√ BenchmarkF1_F0-4          20000000            61.8 ns/op
√ BenchmarkF1_F00-4         20000000            70.4 ns/op
√ BenchmarkF1_F000-4        20000000            79.2 ns/op
√ BenchmarkF1_F0000-4       20000000            93.2 ns/op
√ BenchmarkF1_F00000-4      20000000           101 ns/op
√ BenchmarkF1_F000000-4     20000000           117 ns/op
√ BenchmarkF1_F0000000-4    10000000           129 ns/op
√ BenchmarkF2_0F-4          500000000             3.50 ns/op
√ BenchmarkF2_F0-4          200000000             7.91 ns/op
√ BenchmarkF2_F00-4         100000000            10.8 ns/op
√ BenchmarkF2_F000-4        100000000            14.1 ns/op
√ BenchmarkF2_F0000-4       100000000            17.2 ns/op
√ BenchmarkF2_F00000-4      50000000            24.2 ns/op
√ BenchmarkF2_F000000-4     50000000            26.5 ns/op
√ BenchmarkF2_F0000000-4    50000000            28.9 ns/op
√ BenchmarkF3_0F-4          500000000             3.48 ns/op
√ BenchmarkF3_F0-4          500000000             3.47 ns/op
√ BenchmarkF3_F00-4         500000000             3.50 ns/op
√ BenchmarkF3_F000-4        500000000             3.50 ns/op
√ BenchmarkF3_F0000-4       500000000             3.46 ns/op
√ BenchmarkF3_F00000-4      500000000             3.43 ns/op
√ BenchmarkF3_F000000-4     500000000             3.45 ns/op
√ BenchmarkF3_F0000000-4    500000000             3.52 ns/op
√ BenchmarkF4_0F-4          300000000             5.67 ns/op
√ BenchmarkF4_F0-4          300000000             5.62 ns/op
√ BenchmarkF4_F00-4         300000000             5.55 ns/op
√ BenchmarkF4_F000-4        300000000             5.61 ns/op
√ BenchmarkF4_F0000-4       300000000             5.57 ns/op
√ BenchmarkF4_F00000-4      300000000             5.54 ns/op
√ BenchmarkF4_F000000-4     300000000             5.54 ns/op
√ BenchmarkF4_F0000000-4    300000000             5.58 ns/op

We will ignore f1 benchmarks because the results are significantly worse than in case of f2, f3, and f4. Function f1 was just lazy written code.

Let's see at benchgraph visualization of benchmarking results:

Conclusion

From Graph1 above we conclude that functions f2 and f3 require more CPU time to complete as size of integer arguments increases (more ones). Function f4 implements algorithm of complexity O(1). No matter what is the size of argument, it takes approximately equal time to complete. It's like a dream for algorithm designers.

The interesting part is what happens if we take the same number of ones but on different positions. On Graph2 we can see that f2 depends on position of ones, while f3 and f4 don't. They execute in constant time. In some cases f3 is even faster than f4, but still it implements less efficiant algorithm of O(n).

Find code samples here github.com/CodingBerg/golang-interview.

For benchmark visualization we have used github.com/CodingBerg/benchgraph.

By one coder