mbox

[v7,0/4] hash: add extendable bucket and partial key hashing

Message ID 1539206862-306341-1-git-send-email-yipeng1.wang@intel.com (mailing list archive)
Headers

Message

Wang, Yipeng1 Oct. 10, 2018, 9:27 p.m. UTC
This patch has dependency on another bug fix patch set:
http://patchwork.dpdk.org/cover/45611/

This patch set makes two major optimizations over the current rte_hash
library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

v6->v7:
Fix a bug accidentally introduced in V5. In iterate function, the main table
copmleted condition should be 
*next == total_entries_main instead of total_entries

v5->v6:
1. hash: fix a typo in comment, found by Honnappa.
2. Fix typos in commit messages.
3. Add review-by and acked-by.

v4->v5:
1. hash: for the first commit, move back the lock and read "position" in the
while condition as Honnappa suggested.
2. hash: minor coding style change (Honnappa) and commit message typo fix.
3. Add Review-by from Honnappa.

v3->v4:
1. hash: Revise commit message to be more clear for "utilization" (Honnappa)
2. hash: in delete key function, return bucket change to use rte_ring_sp_enqueue
instead of rte_ring_mp_enqueue, since it is already protected inside locks.
3. hash: update rte_hash_iterate comments (Honnappa)
4. hash: Add a new commit to fix race condition in the rte_hash_iterate (Honnappa)
5. hash/test: during utilization test, double check rte_hash_cnt returns correct
value (Honnappa)
6. hash: for partial-key-hashing commit, break the get_buckets_index function
into three. It may make future extension easier (Honnappa)
7. hash: change the comment for typedef uint32_t hash_sig_t to be more clear
to users (Honnappa)

v2->v3:
The first four commits were separated from this patch set as another
independent patch set:
https://mails.dpdk.org/archives/dev/2018-September/113118.html
1. hash: move snprintf for ext_ring name under the ext_table condition.
2. hash: fix memory leak by freeing ext_buckets in rte_hash_free.
3. hash: after failing cuckoo path, search not only ext buckets, but also the
secondary bucket first to see if there may be an empty location now.
4. hash: totally rewrote the key deleting function logic. If the deleted key was
not in the last bucket of the linked list when ext table enabled, the last
entry in the linked list will be placed in the vacant slot from the deleted
key. The purpose is to compact the entries in the linked list to be more close
to the main table. This is to make sure that not many extendable buckets are
wasted with only one or two entries after some time of running, also benefit
lookup speed.
5. Other minor coding style/comments improvements.

V1->V2:
1. hash: Rewrite rte_hash_get_last_bkt to be more concise.
2. hash: Reorder the rte_hash struct to align cache line better.
3. test: Minor changes in auto test to add key insertion failure check during
iteration test.
4. test: Add new commit to fix read-write test non-consecutive core issue.
4. hash: Add a new commit to remove unnecessary code introduced by previous
patches.
5. hash: Comments improvement and coding style improvements over multiple
places.

Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com>

Yipeng Wang (4):
  hash: fix race condition in iterate
  hash: add extendable bucket feature
  test/hash: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 582 ++++++++++++++++++++++++++++----------
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h        |   8 +-
 test/test/test_hash.c             | 159 ++++++++++-
 test/test/test_hash_perf.c        | 114 ++++++--
 5 files changed, 678 insertions(+), 196 deletions(-)