google.sparsegroup 可以更好

作者: rockeet 发表日期: 2010年02月22日分类: C++, HashTable 评论: 1 条阅读次数: 4,046 次

sparsegroup 是 google.sparseXXXX （sparsehashmap）系列中最底层的一个数据结构，sparseXXX 的互相依赖如下:

-sparsegroup

– sparsetable

– sparsehashtable

– sparse_hash_map

– sparse_hash_set

因此，sparsegroup 实现的性能直接关系到整个 sparseXXXX 系列。

先看 sparsegroup 的一段代码：

template <class T, u_int16_t GROUP_SIZE>
class sparsegroup {
// 省略 .....
  typedef T value_type;
  typedef u_int16_t size_type;                  // max # of buckets
 
// 省略 .....
 private:
  // The actual data
  value_type *group;                            // (small) array of T's
  unsigned char bitmap[(GROUP_SIZE-1)/8 + 1];   // fancy math is so we round up
  size_type num_buckets;                        // limits GROUP_SIZE to 64K
};

template <class T, u_int16_t GROUP_SIZE>

class sparsegroup {

// 省略 .....

typedef T value_type;

typedef u_int16_t size_type; // max # of buckets

// 省略 .....

private:

// The actual data

value_type *group; // (small) array of T's

unsigned char bitmap[(GROUP_SIZE-1)/8 + 1]; // fancy math is so we round up

size_type num_buckets; // limits GROUP_SIZE to 64K

};

sparsegroup 只包含 3 个数据成员：

group

bitmap

num_buckets

当 GROUP_SIZE 是默认值 48 时，bitmap+num_buckets 刚好对齐到 pointer-size，如果是64位环境，整个 sparsegroup 刚好 16 bytes，即两个指针宽度。

我重点要说的是：

sparsegroup 把计算 popcount(bitmap, idx) 看成是一个昂贵的操作，然而，在现代 cpu 中，这是错误的

！popcount 在大多数主流cpu中都有直接的硬件支持。

num_buckets 实际上是一个冗余数据，因为 num_buckets == popcount(bitmap)，本来在 sparsegroup 中增加 num_buckets 成员只是为了加速，避免对整个bitmap 计算 popcount。然而在支持硬件 popcount 的 cpu 中，num_buckets 的存在反而大大降低了性能！因为它使得整个 bitmap的尺寸不能按机器字对齐，从而计算 popcount 时要多一个bitmask 操作，并且，取 num_buckets 也可能比 popcount(int64)还要慢。如果直接改成：

// 假定   GROUP_SIZE == 64
// 对其它 GROUP_SIZE，可以用模板偏特化(partial specialization) 来实现
// 为求简单，这里使用 gcc.__builtin_popcountll 来表达
value_type* group;
int64_t bitmap;
// ...
int size() const { return __builtin_popcountll(bitmap); }
bool test(int pos) const { return (bitmap & 1LL << pos) != 0LL; }
const value_type& unsafe_get(int pos) const {
    assert(pos < GROUP_SIZE);
    assert(bitmap & 1LL << pos);
    return group[__builtin_popcountll(bitmap & ~-1LL<<pos)];
}

// 假定 GROUP_SIZE == 64

// 对其它 GROUP_SIZE，可以用模板偏特化(partial specialization) 来实现

// 为求简单，这里使用 gcc.__builtin_popcountll 来表达

value_type* group;

int64_t bitmap;

// ...

int size() const { return __builtin_popcountll(bitmap); }

bool test(int pos) const { return (bitmap & 1LL << pos) != 0LL; }

const value_type& unsafe_get(int pos) const {

assert(pos < GROUP_SIZE);

assert(bitmap & 1LL << pos);

return group[__builtin_popcountll(bitmap & ~-1LL<<pos)];

}

google 实现的 popcount(bitmap, pos) 要比硬件支持的__builtin_popcountll(bitmap & ~-1LL<<pos)

慢几百倍，他们当然也有理由——popcount并非在所有系统上都有编译器和硬件支持！

对 GROUP_SIZE==64 这个典型值，这样的优化版明显比 google 版好得多，不但简单，而且更快。如果要增大 GROUP_SIZE，应直接增加一倍或两倍，GROUP_SIZE 不宜过大，以免对 group 数组的添加、删除操作代价太高，64也许是一个最佳值。

google.sparsegroup 可以更好

您可能感兴趣的文章:

1 个回复

发表评论

近期文章

近期评论

文章归档

分类目录

功能