Message ID | cover.1584360151.git.vladimir.medvedkin@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id EB5C3A0559; Mon, 16 Mar 2020 14:38:39 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 500E31BF30; Mon, 16 Mar 2020 14:38:39 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id DA082CF3 for <dev@dpdk.org>; Mon, 16 Mar 2020 14:38:37 +0100 (CET) IronPort-SDR: AilapkSkJMUKM0QkR550mEn8v/5axKryTvjbkcm/0KfmWQ7F0tiq46dwI4DfeLZU4o9R8rWNKV DqzgbCLDrv9Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Mar 2020 06:38:36 -0700 IronPort-SDR: NVOKJKjmeR1J4tl+OGWReb5ZkhwemqC5ezICod4ZWoNH5N0yK9tgQH3ED2jL9IdtLThJsFmlsz PPJOEpAAwmiQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,560,1574150400"; d="scan'208";a="233166904" Received: from silpixa00400072.ir.intel.com ([10.237.222.213]) by orsmga007.jf.intel.com with ESMTP; 16 Mar 2020 06:38:35 -0700 From: Vladimir Medvedkin <vladimir.medvedkin@intel.com> To: dev@dpdk.org Cc: konstantin.ananyev@intel.com, yipeng1.wang@intel.com, sameh.gobriel@intel.com, bruce.richardson@intel.com Date: Mon, 16 Mar 2020 13:38:26 +0000 Message-Id: <cover.1584360151.git.vladimir.medvedkin@intel.com> X-Mailer: git-send-email 2.7.4 Subject: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
add new Double Word Key hash table
|
|
Message
Vladimir Medvedkin
March 16, 2020, 1:38 p.m. UTC
Currently DPDK has a special implementation of a hash table for 4 byte keys which is called FBK hash. Unfortunately its main drawback is that it only supports 2 byte values. The new implementation called DWK (double word key) hash supports 8 byte values, which is enough to store a pointer. It would also be nice to get feedback on whether to leave the old FBK and new DWK implementations, or whether to deprecate the old one? Vladimir Medvedkin (3): hash: add dwk hash library test: add dwk hash autotests test: add dwk perf tests app/test/Makefile | 1 + app/test/meson.build | 1 + app/test/test_dwk_hash.c | 229 +++++++++++++++++++++++++++++ app/test/test_hash_perf.c | 81 +++++++++++ lib/Makefile | 2 +- lib/librte_hash/Makefile | 4 +- lib/librte_hash/meson.build | 5 +- lib/librte_hash/rte_dwk_hash.c | 271 +++++++++++++++++++++++++++++++++++ lib/librte_hash/rte_dwk_hash.h | 178 +++++++++++++++++++++++ lib/librte_hash/rte_hash_version.map | 5 + 10 files changed, 773 insertions(+), 4 deletions(-) create mode 100644 app/test/test_dwk_hash.c create mode 100644 lib/librte_hash/rte_dwk_hash.c create mode 100644 lib/librte_hash/rte_dwk_hash.h
Comments
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin > Sent: Monday, March 16, 2020 2:38 PM > > Currently DPDK has a special implementation of a hash table for > 4 byte keys which is called FBK hash. Unfortunately its main drawback > is that it only supports 2 byte values. > The new implementation called DWK (double word key) hash > supports 8 byte values, which is enough to store a pointer. > > It would also be nice to get feedback on whether to leave the old FBK > and new DWK implementations, or whether to deprecate the old one? <rant on> Who comes up with these names?!? FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the same. Could you use 32 somewhere in the name instead, like in int32_t, instead of using a growing list of creative synonyms for the same thing? Pretty please, with a cherry on top! And if the value size is fixed too, perhaps the name should also indicate the value size. <rant off> It's a shame we don't have C++ class templates available in DPDK... In other news, Mellanox has sent an RFC for an "indexed memory pool" library [1] to conserve memory by using uintXX_t instead of pointers, so perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to 16 bit values in FBK and 64 bit in DWK) would be nice combination with that library. [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html Med venlig hilsen / kind regards - Morten Brørup
Hi Morten, On 16/03/2020 14:39, Morten Brørup wrote: >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin >> Sent: Monday, March 16, 2020 2:38 PM >> >> Currently DPDK has a special implementation of a hash table for >> 4 byte keys which is called FBK hash. Unfortunately its main drawback >> is that it only supports 2 byte values. >> The new implementation called DWK (double word key) hash >> supports 8 byte values, which is enough to store a pointer. >> >> It would also be nice to get feedback on whether to leave the old FBK >> and new DWK implementations, or whether to deprecate the old one? > <rant on> > > Who comes up with these names?!? > > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the same. Could you use 32 somewhere in the name instead, like in int32_t, instead of using a growing list of creative synonyms for the same thing? Pretty please, with a cherry on top! That's true, at first I named it as fbk2, but then it was decided to rename it "dwk", so that there was no confusion with the existing FBK library. Naming suggestions are welcome! > > And if the value size is fixed too, perhaps the name should also indicate the value size. > > <rant off> > > It's a shame we don't have C++ class templates available in DPDK... > > In other news, Mellanox has sent an RFC for an "indexed memory pool" library [1] to conserve memory by using uintXX_t instead of pointers, so perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to 16 bit values in FBK and 64 bit in DWK) would be nice combination with that library. > > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html > > > Med venlig hilsen / kind regards > - Morten Brørup >
On Mon, 16 Mar 2020 18:27:40 +0000 "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote: > Hi Morten, > > > On 16/03/2020 14:39, Morten Brørup wrote: > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin > >> Sent: Monday, March 16, 2020 2:38 PM > >> > >> Currently DPDK has a special implementation of a hash table for > >> 4 byte keys which is called FBK hash. Unfortunately its main drawback > >> is that it only supports 2 byte values. > >> The new implementation called DWK (double word key) hash > >> supports 8 byte values, which is enough to store a pointer. > >> > >> It would also be nice to get feedback on whether to leave the old FBK > >> and new DWK implementations, or whether to deprecate the old one? > > <rant on> > > > > Who comes up with these names?!? > > > > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the same. Could you use 32 somewhere in the name instead, like in int32_t, instead of using a growing list of creative synonyms for the same thing? Pretty please, with a cherry on top! > > > That's true, at first I named it as fbk2, but then it was decided to > rename it "dwk", so that there was no confusion with the existing FBK > library. Naming suggestions are welcome! > > > > > And if the value size is fixed too, perhaps the name should also indicate the value size. > > > > <rant off> > > > > It's a shame we don't have C++ class templates available in DPDK... > > > > In other news, Mellanox has sent an RFC for an "indexed memory pool" library [1] to conserve memory by using uintXX_t instead of pointers, so perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to 16 bit values in FBK and 64 bit in DWK) would be nice combination with that library. > > > > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html > > > > > > Med venlig hilsen / kind regards > > - Morten Brørup > > Why is this different (or better) than existing rte_hash. Having more flavors is not necessarily a good thing (except in Gelato)
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Medvedkin, Vladimir > Sent: Monday, March 16, 2020 7:28 PM > > Hi Morten, > > > On 16/03/2020 14:39, Morten Brørup wrote: > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin > >> Sent: Monday, March 16, 2020 2:38 PM > >> > >> Currently DPDK has a special implementation of a hash table for > >> 4 byte keys which is called FBK hash. Unfortunately its main drawback > >> is that it only supports 2 byte values. > >> The new implementation called DWK (double word key) hash > >> supports 8 byte values, which is enough to store a pointer. > >> > >> It would also be nice to get feedback on whether to leave the old FBK > >> and new DWK implementations, or whether to deprecate the old one? > > <rant on> > > > > Who comes up with these names?!? > > > > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the > same. Could you use 32 somewhere in the name instead, like in int32_t, > instead of using a growing list of creative synonyms for the same thing? > Pretty please, with a cherry on top! > > > That's true, at first I named it as fbk2, but then it was decided to > rename it "dwk", so that there was no confusion with the existing FBK > library. Naming suggestions are welcome! > OK... let me suggest a prefix with both key and value sizes in the name: Instead of rte_dwk_hash_xxx, use rte_hash_k32v64_xxx. However, that suggests that it is a generic hash... and your hash algorithm certainly has other properties too; so perhaps a specific name, saying something about the underlying algorithm, makes more sense after all. And the documentation will show its properties anyway. Whatever you choose, Double Word is certainly not a good name.... a word is 32 bits and a double word is 64 bits in many non-Intel CPU architectures, including some supported by DPDK. > > > > And if the value size is fixed too, perhaps the name should also indicate > the value size. > > > > <rant off> > > > > It's a shame we don't have C++ class templates available in DPDK... > > > > In other news, Mellanox has sent an RFC for an "indexed memory pool" > library [1] to conserve memory by using uintXX_t instead of pointers, so > perhaps a variant of a 32 bit key hash library with 32 bit values (in > addition to 16 bit values in FBK and 64 bit in DWK) would be nice > combination with that library. > > > > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html > > > > > > Med venlig hilsen / kind regards > > - Morten Brørup > > > -- > Regards, > Vladimir >
> -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Monday, March 16, 2020 12:33 PM > To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com> > Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; > Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wang, Yipeng1 > <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; > Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou > <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; > Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko > <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam > <orika@mellanox.com> > Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table > > On Mon, 16 Mar 2020 18:27:40 +0000 > "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote: > > > Hi Morten, > > > > > > On 16/03/2020 14:39, Morten Brørup wrote: > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir > > >> Medvedkin > > >> Sent: Monday, March 16, 2020 2:38 PM > > >> > > >> Currently DPDK has a special implementation of a hash table for > > >> 4 byte keys which is called FBK hash. Unfortunately its main > > >> drawback is that it only supports 2 byte values. > > >> The new implementation called DWK (double word key) hash supports 8 > > >> byte values, which is enough to store a pointer. > > >> > > >> It would also be nice to get feedback on whether to leave the old > > >> FBK and new DWK implementations, or whether to deprecate the old > one? > > > <rant on> > > > > > > Who comes up with these names?!? > > > > > > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean > the same. Could you use 32 somewhere in the name instead, like in int32_t, > instead of using a growing list of creative synonyms for the same thing? > Pretty please, with a cherry on top! > > > > > > That's true, at first I named it as fbk2, but then it was decided to > > rename it "dwk", so that there was no confusion with the existing FBK > > library. Naming suggestions are welcome! > > > > > > > > And if the value size is fixed too, perhaps the name should also indicate > the value size. > > > > > > <rant off> > > > > > > It's a shame we don't have C++ class templates available in DPDK... > > > > > > In other news, Mellanox has sent an RFC for an "indexed memory pool" > library [1] to conserve memory by using uintXX_t instead of pointers, so > perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to > 16 bit values in FBK and 64 bit in DWK) would be nice combination with that > library. > > > > > > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html > > > > > > > > > Med venlig hilsen / kind regards > > > - Morten Brørup > > > > > Why is this different (or better) than existing rte_hash. > Having more flavors is not necessarily a good thing (except in Gelato) [Wang, Yipeng] Hi, Vladimir, As Stephen mentioned, I think it is good idea to explain the benefit of this new type of hash table more explicitly such as Specific use cases, differences with current rte_hash, and performance numbers, etc.
Hi Yipeng, Stephen, all, On 17/03/2020 19:52, Wang, Yipeng1 wrote: >> -----Original Message----- >> From: Stephen Hemminger <stephen@networkplumber.org> >> Sent: Monday, March 16, 2020 12:33 PM >> To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com> >> Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; >> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wang, Yipeng1 >> <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; >> Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou >> <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; >> Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko >> <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam >> <orika@mellanox.com> >> Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table >> >> On Mon, 16 Mar 2020 18:27:40 +0000 >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote: >> >>> Hi Morten, >>> >>> >>> On 16/03/2020 14:39, Morten Brørup wrote: >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir >>>>> Medvedkin >>>>> Sent: Monday, March 16, 2020 2:38 PM >>>>> >>>>> Currently DPDK has a special implementation of a hash table for >>>>> 4 byte keys which is called FBK hash. Unfortunately its main >>>>> drawback is that it only supports 2 byte values. >>>>> The new implementation called DWK (double word key) hash supports 8 >>>>> byte values, which is enough to store a pointer. >>>>> >>>>> It would also be nice to get feedback on whether to leave the old >>>>> FBK and new DWK implementations, or whether to deprecate the old >> one? >>>> <rant on> >>>> >>>> Who comes up with these names?!? >>>> >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean >> the same. Could you use 32 somewhere in the name instead, like in int32_t, >> instead of using a growing list of creative synonyms for the same thing? >> Pretty please, with a cherry on top! >>> >>> That's true, at first I named it as fbk2, but then it was decided to >>> rename it "dwk", so that there was no confusion with the existing FBK >>> library. Naming suggestions are welcome! >>> >>>> And if the value size is fixed too, perhaps the name should also indicate >> the value size. >>>> <rant off> >>>> >>>> It's a shame we don't have C++ class templates available in DPDK... >>>> >>>> In other news, Mellanox has sent an RFC for an "indexed memory pool" >> library [1] to conserve memory by using uintXX_t instead of pointers, so >> perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to >> 16 bit values in FBK and 64 bit in DWK) would be nice combination with that >> library. >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html >>>> >>>> >>>> Med venlig hilsen / kind regards >>>> - Morten Brørup >>>> >> Why is this different (or better) than existing rte_hash. >> Having more flavors is not necessarily a good thing (except in Gelato) > [Wang, Yipeng] > Hi, Vladimir, > As Stephen mentioned, I think it is good idea to explain the benefit of this new type of hash table more explicitly such as > Specific use cases, differences with current rte_hash, and performance numbers, etc. The main reason for this new hash library is performance. As I mentioned earlier, the current rte_fbk implementation is pretty fast but it has a number of drawbacks such as 2 byte values and limited collision resolving capabilities. On the other hand, rte_hash (cuckoo hash) doesn't have this drawbacks but at the cost of lower performance comparing to rte_fbk. If I understand correctly, performance penalty are due to : 1. Load two buckets 2. First compare signatures 3. If signature comparison hits get a key index and find memory location with a key itself and get the key 4. Using indirect call to memcmp() to compare two uint32_t. The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks while offers the same performance as rte_fbk. Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 byte key size. Replacing it with a new implementation gives about 30% in performance. The main disadvantage comparing to rte_hash is some performance degradation with high average table utilization due to chain resolving for 5th and subsequent collision.
26/03/2020 18:28, Medvedkin, Vladimir: > Hi Yipeng, Stephen, all, > > On 17/03/2020 19:52, Wang, Yipeng1 wrote: > > From: Stephen Hemminger <stephen@networkplumber.org> > >> On Mon, 16 Mar 2020 18:27:40 +0000 > >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote: > >> > >>> Hi Morten, > >>> > >>> > >>> On 16/03/2020 14:39, Morten Brørup wrote: > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir > >>>>> Medvedkin > >>>>> Sent: Monday, March 16, 2020 2:38 PM > >>>>> > >>>>> Currently DPDK has a special implementation of a hash table for > >>>>> 4 byte keys which is called FBK hash. Unfortunately its main > >>>>> drawback is that it only supports 2 byte values. > >>>>> The new implementation called DWK (double word key) hash supports 8 > >>>>> byte values, which is enough to store a pointer. > >>>>> > >>>>> It would also be nice to get feedback on whether to leave the old > >>>>> FBK and new DWK implementations, or whether to deprecate the old > >> one? > >>>> <rant on> > >>>> > >>>> Who comes up with these names?!? > >>>> > >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean > >> the same. Could you use 32 somewhere in the name instead, like in int32_t, > >> instead of using a growing list of creative synonyms for the same thing? > >> Pretty please, with a cherry on top! > >>> > >>> That's true, at first I named it as fbk2, but then it was decided to > >>> rename it "dwk", so that there was no confusion with the existing FBK > >>> library. Naming suggestions are welcome! > >>> > >>>> And if the value size is fixed too, perhaps the name should also indicate > >> the value size. > >>>> <rant off> > >>>> > >>>> It's a shame we don't have C++ class templates available in DPDK... > >>>> > >>>> In other news, Mellanox has sent an RFC for an "indexed memory pool" > >> library [1] to conserve memory by using uintXX_t instead of pointers, so > >> perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to > >> 16 bit values in FBK and 64 bit in DWK) would be nice combination with that > >> library. > >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html Yes some work is in progress to propose a new memory allocator for small objects of fixed size with small memory overhead. > >> Why is this different (or better) than existing rte_hash. > >> Having more flavors is not necessarily a good thing (except in Gelato) > > [Wang, Yipeng] > > Hi, Vladimir, > > As Stephen mentioned, I think it is good idea to explain the benefit of this new type of hash table more explicitly such as > > Specific use cases, differences with current rte_hash, and performance numbers, etc. > > The main reason for this new hash library is performance. As I mentioned > earlier, the current rte_fbk implementation is pretty fast but it has a > number of drawbacks such as 2 byte values and limited collision > resolving capabilities. On the other hand, rte_hash (cuckoo hash) > doesn't have this drawbacks but at the cost of lower performance > comparing to rte_fbk. > > If I understand correctly, performance penalty are due to : > > 1. Load two buckets > > 2. First compare signatures > > 3. If signature comparison hits get a key index and find memory location > with a key itself and get the key > > 4. Using indirect call to memcmp() to compare two uint32_t. > > The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks > while offers the same performance as rte_fbk. > > Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 byte > key size. Replacing it with a new implementation gives about 30% in > performance. > > The main disadvantage comparing to rte_hash is some performance > degradation with high average table utilization due to chain resolving > for 5th and subsequent collision. Thanks for explaining. Please, such information should added in the documentation: doc/guides/prog_guide/hash_lib.rst
<snip> > > 26/03/2020 18:28, Medvedkin, Vladimir: > > Hi Yipeng, Stephen, all, > > > > On 17/03/2020 19:52, Wang, Yipeng1 wrote: > > > From: Stephen Hemminger <stephen@networkplumber.org> > > >> On Mon, 16 Mar 2020 18:27:40 +0000 > > >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote: > > >> > > >>> Hi Morten, > > >>> > > >>> > > >>> On 16/03/2020 14:39, Morten Brørup wrote: > > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir > > >>>>> Medvedkin > > >>>>> Sent: Monday, March 16, 2020 2:38 PM > > >>>>> > > >>>>> Currently DPDK has a special implementation of a hash table for > > >>>>> 4 byte keys which is called FBK hash. Unfortunately its main > > >>>>> drawback is that it only supports 2 byte values. > > >>>>> The new implementation called DWK (double word key) hash > > >>>>> supports 8 byte values, which is enough to store a pointer. > > >>>>> > > >>>>> It would also be nice to get feedback on whether to leave the > > >>>>> old FBK and new DWK implementations, or whether to deprecate the > > >>>>> old > > >> one? > > >>>> <rant on> > > >>>> > > >>>> Who comes up with these names?!? > > >>>> > > >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean > > >> the same. Could you use 32 somewhere in the name instead, like in > > >> int32_t, instead of using a growing list of creative synonyms for the same > thing? > > >> Pretty please, with a cherry on top! > > >>> > > >>> That's true, at first I named it as fbk2, but then it was decided > > >>> to rename it "dwk", so that there was no confusion with the > > >>> existing FBK library. Naming suggestions are welcome! > > >>> > > >>>> And if the value size is fixed too, perhaps the name should also > > >>>> indicate > > >> the value size. > > >>>> <rant off> > > >>>> > > >>>> It's a shame we don't have C++ class templates available in DPDK... > > >>>> > > >>>> In other news, Mellanox has sent an RFC for an "indexed memory > pool" > > >> library [1] to conserve memory by using uintXX_t instead of > > >> pointers, so perhaps a variant of a 32 bit key hash library with 32 > > >> bit values (in addition to > > >> 16 bit values in FBK and 64 bit in DWK) would be nice combination > > >> with that library. > > >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html > > Yes some work is in progress to propose a new memory allocator for small > objects of fixed size with small memory overhead. > > > > >> Why is this different (or better) than existing rte_hash. > > >> Having more flavors is not necessarily a good thing (except in > > >> Gelato) > > > [Wang, Yipeng] > > > Hi, Vladimir, > > > As Stephen mentioned, I think it is good idea to explain the benefit > > > of this new type of hash table more explicitly such as Specific use cases, > differences with current rte_hash, and performance numbers, etc. > > > > The main reason for this new hash library is performance. As I > > mentioned earlier, the current rte_fbk implementation is pretty fast > > but it has a number of drawbacks such as 2 byte values and limited > > collision resolving capabilities. On the other hand, rte_hash (cuckoo > > hash) doesn't have this drawbacks but at the cost of lower performance > > comparing to rte_fbk. > > > > If I understand correctly, performance penalty are due to : > > > > 1. Load two buckets > > > > 2. First compare signatures > > > > 3. If signature comparison hits get a key index and find memory > > location with a key itself and get the key > > > > 4. Using indirect call to memcmp() to compare two uint32_t. > > > > The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks > > while offers the same performance as rte_fbk. > > > > Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 > > byte key size. Replacing it with a new implementation gives about 30% > > in performance. > > > > The main disadvantage comparing to rte_hash is some performance > > degradation with high average table utilization due to chain resolving > > for 5th and subsequent collision. rte_hash is linearly scalable across multiple cores for lookup due to lock-free algorithm. How is the scalability for the new algorithm? > > Thanks for explaining. > Please, such information should added in the documentation: > doc/guides/prog_guide/hash_lib.rst > >
Hi Thomas, -----Original Message----- From: Thomas Monjalon <thomas@monjalon.net> Sent: Tuesday, March 31, 2020 8:56 PM To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org; Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam <orika@mellanox.com> Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table 26/03/2020 18:28, Medvedkin, Vladimir: > Hi Yipeng, Stephen, all, > > On 17/03/2020 19:52, Wang, Yipeng1 wrote: > > From: Stephen Hemminger <stephen@networkplumber.org> > >> On Mon, 16 Mar 2020 18:27:40 +0000 > >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote: > >> > >>> Hi Morten, > >>> > >>> > >>> On 16/03/2020 14:39, Morten Brørup wrote: > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir > >>>>> Medvedkin > >>>>> Sent: Monday, March 16, 2020 2:38 PM > >>>>> > >>>>> Currently DPDK has a special implementation of a hash table for > >>>>> 4 byte keys which is called FBK hash. Unfortunately its main > >>>>> drawback is that it only supports 2 byte values. > >>>>> The new implementation called DWK (double word key) hash > >>>>> supports 8 byte values, which is enough to store a pointer. > >>>>> > >>>>> It would also be nice to get feedback on whether to leave the > >>>>> old FBK and new DWK implementations, or whether to deprecate the > >>>>> old > >> one? > >>>> <rant on> > >>>> > >>>> Who comes up with these names?!? > >>>> > >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean > >> the same. Could you use 32 somewhere in the name instead, like in > >> int32_t, instead of using a growing list of creative synonyms for the same thing? > >> Pretty please, with a cherry on top! > >>> > >>> That's true, at first I named it as fbk2, but then it was decided > >>> to rename it "dwk", so that there was no confusion with the > >>> existing FBK library. Naming suggestions are welcome! > >>> > >>>> And if the value size is fixed too, perhaps the name should also > >>>> indicate > >> the value size. > >>>> <rant off> > >>>> > >>>> It's a shame we don't have C++ class templates available in DPDK... > >>>> > >>>> In other news, Mellanox has sent an RFC for an "indexed memory pool" > >> library [1] to conserve memory by using uintXX_t instead of > >> pointers, so perhaps a variant of a 32 bit key hash library with 32 > >> bit values (in addition to > >> 16 bit values in FBK and 64 bit in DWK) would be nice combination > >> with that library. > >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html Yes some work is in progress to propose a new memory allocator for small objects of fixed size with small memory overhead. > >> Why is this different (or better) than existing rte_hash. > >> Having more flavors is not necessarily a good thing (except in > >> Gelato) > > [Wang, Yipeng] > > Hi, Vladimir, > > As Stephen mentioned, I think it is good idea to explain the benefit > > of this new type of hash table more explicitly such as Specific use cases, differences with current rte_hash, and performance numbers, etc. > > The main reason for this new hash library is performance. As I > mentioned earlier, the current rte_fbk implementation is pretty fast > but it has a number of drawbacks such as 2 byte values and limited > collision resolving capabilities. On the other hand, rte_hash (cuckoo > hash) doesn't have this drawbacks but at the cost of lower performance > comparing to rte_fbk. > > If I understand correctly, performance penalty are due to : > > 1. Load two buckets > > 2. First compare signatures > > 3. If signature comparison hits get a key index and find memory > location with a key itself and get the key > > 4. Using indirect call to memcmp() to compare two uint32_t. > > The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks > while offers the same performance as rte_fbk. > > Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 > byte key size. Replacing it with a new implementation gives about 30% > in performance. > > The main disadvantage comparing to rte_hash is some performance > degradation with high average table utilization due to chain resolving > for 5th and subsequent collision. Thanks for explaining. Please, such information should added in the documentation: doc/guides/prog_guide/hash_lib.rst I'm going to submit v2 this week, will add documentation update.
Hi Honnappa, -----Original Message----- From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> Sent: Tuesday, March 31, 2020 10:17 PM To: thomas@monjalon.net; Medvedkin, Vladimir <vladimir.medvedkin@intel.com> Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org; Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>; nd <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com> Subject: RE: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table <snip> > > 26/03/2020 18:28, Medvedkin, Vladimir: > > Hi Yipeng, Stephen, all, > > > > On 17/03/2020 19:52, Wang, Yipeng1 wrote: > > > From: Stephen Hemminger <stephen@networkplumber.org> > > >> On Mon, 16 Mar 2020 18:27:40 +0000 "Medvedkin, Vladimir" > > >> <vladimir.medvedkin@intel.com> wrote: > > >> > > >>> Hi Morten, > > >>> > > >>> > > >>> On 16/03/2020 14:39, Morten Brørup wrote: > > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir > > >>>>> Medvedkin > > >>>>> Sent: Monday, March 16, 2020 2:38 PM > > >>>>> > > >>>>> Currently DPDK has a special implementation of a hash table > > >>>>> for > > >>>>> 4 byte keys which is called FBK hash. Unfortunately its main > > >>>>> drawback is that it only supports 2 byte values. > > >>>>> The new implementation called DWK (double word key) hash > > >>>>> supports 8 byte values, which is enough to store a pointer. > > >>>>> > > >>>>> It would also be nice to get feedback on whether to leave the > > >>>>> old FBK and new DWK implementations, or whether to deprecate > > >>>>> the old > > >> one? > > >>>> <rant on> > > >>>> > > >>>> Who comes up with these names?!? > > >>>> > > >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to > > >>>> mean > > >> the same. Could you use 32 somewhere in the name instead, like in > > >> int32_t, instead of using a growing list of creative synonyms for > > >> the same > thing? > > >> Pretty please, with a cherry on top! > > >>> > > >>> That's true, at first I named it as fbk2, but then it was > > >>> decided to rename it "dwk", so that there was no confusion with > > >>> the existing FBK library. Naming suggestions are welcome! > > >>> > > >>>> And if the value size is fixed too, perhaps the name should > > >>>> also indicate > > >> the value size. > > >>>> <rant off> > > >>>> > > >>>> It's a shame we don't have C++ class templates available in DPDK... > > >>>> > > >>>> In other news, Mellanox has sent an RFC for an "indexed memory > pool" > > >> library [1] to conserve memory by using uintXX_t instead of > > >> pointers, so perhaps a variant of a 32 bit key hash library with > > >> 32 bit values (in addition to > > >> 16 bit values in FBK and 64 bit in DWK) would be nice combination > > >> with that library. > > >>>> [1]: > > >>>> http://mails.dpdk.org/archives/dev/2019-October/147513.html > > Yes some work is in progress to propose a new memory allocator for > small objects of fixed size with small memory overhead. > > > > >> Why is this different (or better) than existing rte_hash. > > >> Having more flavors is not necessarily a good thing (except in > > >> Gelato) > > > [Wang, Yipeng] > > > Hi, Vladimir, > > > As Stephen mentioned, I think it is good idea to explain the > > > benefit of this new type of hash table more explicitly such as > > > Specific use cases, > differences with current rte_hash, and performance numbers, etc. > > > > The main reason for this new hash library is performance. As I > > mentioned earlier, the current rte_fbk implementation is pretty fast > > but it has a number of drawbacks such as 2 byte values and limited > > collision resolving capabilities. On the other hand, rte_hash > > (cuckoo > > hash) doesn't have this drawbacks but at the cost of lower > > performance comparing to rte_fbk. > > > > If I understand correctly, performance penalty are due to : > > > > 1. Load two buckets > > > > 2. First compare signatures > > > > 3. If signature comparison hits get a key index and find memory > > location with a key itself and get the key > > > > 4. Using indirect call to memcmp() to compare two uint32_t. > > > > The new proposed 4 byte key hash table doesn't have rte_fbk > > drawbacks while offers the same performance as rte_fbk. > > > > Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 > > byte key size. Replacing it with a new implementation gives about > > 30% in performance. > > > > The main disadvantage comparing to rte_hash is some performance > > degradation with high average table utilization due to chain > > resolving for 5th and subsequent collision. rte_hash is linearly scalable across multiple cores for lookup due to lock-free algorithm. How is the scalability for the new algorithm? This library is scalable as well. It uses almost the same lock-free algorithm. The only difference with cuckoo is that cuckoo in lock-free implementation uses single global "change_counter" for all the table, and the proposed implementation uses fine grained approach with "change_counter" per bucket. So it should be more scalable with frequent concurrent updates. > > Thanks for explaining. > Please, such information should added in the documentation: > doc/guides/prog_guide/hash_lib.rst > >