mbox series

[0/3] add new Double Word Key hash table

Message ID cover.1584360151.git.vladimir.medvedkin@intel.com (mailing list archive)
Headers
Series add new Double Word Key hash table |

Message

Vladimir Medvedkin March 16, 2020, 1:38 p.m. UTC
  Currently DPDK has a special implementation of a hash table for
4 byte keys which is called FBK hash. Unfortunately its main drawback
is that it only supports 2 byte values.
The new implementation called DWK (double word key) hash
supports 8 byte values, which is enough to store a pointer.

It would also be nice to get feedback on whether to leave the old FBK
and new DWK implementations, or whether to deprecate the old one?

Vladimir Medvedkin (3):
  hash: add dwk hash library
  test: add dwk hash autotests
  test: add dwk perf tests

 app/test/Makefile                    |   1 +
 app/test/meson.build                 |   1 +
 app/test/test_dwk_hash.c             | 229 +++++++++++++++++++++++++++++
 app/test/test_hash_perf.c            |  81 +++++++++++
 lib/Makefile                         |   2 +-
 lib/librte_hash/Makefile             |   4 +-
 lib/librte_hash/meson.build          |   5 +-
 lib/librte_hash/rte_dwk_hash.c       | 271 +++++++++++++++++++++++++++++++++++
 lib/librte_hash/rte_dwk_hash.h       | 178 +++++++++++++++++++++++
 lib/librte_hash/rte_hash_version.map |   5 +
 10 files changed, 773 insertions(+), 4 deletions(-)
 create mode 100644 app/test/test_dwk_hash.c
 create mode 100644 lib/librte_hash/rte_dwk_hash.c
 create mode 100644 lib/librte_hash/rte_dwk_hash.h
  

Comments

Morten Brørup March 16, 2020, 2:39 p.m. UTC | #1
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin
> Sent: Monday, March 16, 2020 2:38 PM
> 
> Currently DPDK has a special implementation of a hash table for
> 4 byte keys which is called FBK hash. Unfortunately its main drawback
> is that it only supports 2 byte values.
> The new implementation called DWK (double word key) hash
> supports 8 byte values, which is enough to store a pointer.
> 
> It would also be nice to get feedback on whether to leave the old FBK
> and new DWK implementations, or whether to deprecate the old one?

<rant on>

Who comes up with these names?!?

FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the same. Could you use 32 somewhere in the name instead, like in int32_t, instead of using a growing list of creative synonyms for the same thing? Pretty please, with a cherry on top!

And if the value size is fixed too, perhaps the name should also indicate the value size.

<rant off>

It's a shame we don't have C++ class templates available in DPDK...

In other news, Mellanox has sent an RFC for an "indexed memory pool" library [1] to conserve memory by using uintXX_t instead of pointers, so perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to 16 bit values in FBK and 64 bit in DWK) would be nice combination with that library.

[1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html


Med venlig hilsen / kind regards
- Morten Brørup
  
Vladimir Medvedkin March 16, 2020, 6:27 p.m. UTC | #2
Hi Morten,


On 16/03/2020 14:39, Morten Brørup wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin
>> Sent: Monday, March 16, 2020 2:38 PM
>>
>> Currently DPDK has a special implementation of a hash table for
>> 4 byte keys which is called FBK hash. Unfortunately its main drawback
>> is that it only supports 2 byte values.
>> The new implementation called DWK (double word key) hash
>> supports 8 byte values, which is enough to store a pointer.
>>
>> It would also be nice to get feedback on whether to leave the old FBK
>> and new DWK implementations, or whether to deprecate the old one?
> <rant on>
>
> Who comes up with these names?!?
>
> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the same. Could you use 32 somewhere in the name instead, like in int32_t, instead of using a growing list of creative synonyms for the same thing? Pretty please, with a cherry on top!


That's true, at first I named it as fbk2, but then it was decided to 
rename it "dwk", so that there was no confusion with the existing FBK 
library. Naming suggestions are welcome!

>
> And if the value size is fixed too, perhaps the name should also indicate the value size.
>
> <rant off>
>
> It's a shame we don't have C++ class templates available in DPDK...
>
> In other news, Mellanox has sent an RFC for an "indexed memory pool" library [1] to conserve memory by using uintXX_t instead of pointers, so perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to 16 bit values in FBK and 64 bit in DWK) would be nice combination with that library.
>
> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
  
Stephen Hemminger March 16, 2020, 7:32 p.m. UTC | #3
On Mon, 16 Mar 2020 18:27:40 +0000
"Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote:

> Hi Morten,
> 
> 
> On 16/03/2020 14:39, Morten Brørup wrote:
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin
> >> Sent: Monday, March 16, 2020 2:38 PM
> >>
> >> Currently DPDK has a special implementation of a hash table for
> >> 4 byte keys which is called FBK hash. Unfortunately its main drawback
> >> is that it only supports 2 byte values.
> >> The new implementation called DWK (double word key) hash
> >> supports 8 byte values, which is enough to store a pointer.
> >>
> >> It would also be nice to get feedback on whether to leave the old FBK
> >> and new DWK implementations, or whether to deprecate the old one?  
> > <rant on>
> >
> > Who comes up with these names?!?
> >
> > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the same. Could you use 32 somewhere in the name instead, like in int32_t, instead of using a growing list of creative synonyms for the same thing? Pretty please, with a cherry on top!  
> 
> 
> That's true, at first I named it as fbk2, but then it was decided to 
> rename it "dwk", so that there was no confusion with the existing FBK 
> library. Naming suggestions are welcome!
> 
> >
> > And if the value size is fixed too, perhaps the name should also indicate the value size.
> >
> > <rant off>
> >
> > It's a shame we don't have C++ class templates available in DPDK...
> >
> > In other news, Mellanox has sent an RFC for an "indexed memory pool" library [1] to conserve memory by using uintXX_t instead of pointers, so perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to 16 bit values in FBK and 64 bit in DWK) would be nice combination with that library.
> >
> > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html
> >
> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> >  

Why is this different (or better) than existing rte_hash.
Having more flavors is not necessarily a good thing (except in Gelato)
  
Morten Brørup March 16, 2020, 7:33 p.m. UTC | #4
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Medvedkin, Vladimir
> Sent: Monday, March 16, 2020 7:28 PM
> 
> Hi Morten,
> 
> 
> On 16/03/2020 14:39, Morten Brørup wrote:
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir Medvedkin
> >> Sent: Monday, March 16, 2020 2:38 PM
> >>
> >> Currently DPDK has a special implementation of a hash table for
> >> 4 byte keys which is called FBK hash. Unfortunately its main drawback
> >> is that it only supports 2 byte values.
> >> The new implementation called DWK (double word key) hash
> >> supports 8 byte values, which is enough to store a pointer.
> >>
> >> It would also be nice to get feedback on whether to leave the old FBK
> >> and new DWK implementations, or whether to deprecate the old one?
> > <rant on>
> >
> > Who comes up with these names?!?
> >
> > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean the
> same. Could you use 32 somewhere in the name instead, like in int32_t,
> instead of using a growing list of creative synonyms for the same thing?
> Pretty please, with a cherry on top!
> 
> 
> That's true, at first I named it as fbk2, but then it was decided to
> rename it "dwk", so that there was no confusion with the existing FBK
> library. Naming suggestions are welcome!
> 

OK... let me suggest a prefix with both key and value sizes in the name:
Instead of rte_dwk_hash_xxx, use rte_hash_k32v64_xxx.

However, that suggests that it is a generic hash... and your hash algorithm certainly has other properties too; so perhaps a specific name, saying something about the underlying algorithm, makes more sense after all. And the documentation will show its properties anyway.

Whatever you choose, Double Word is certainly not a good name.... a word is 32 bits and a double word is 64 bits in many non-Intel CPU architectures, including some supported by DPDK.

> >
> > And if the value size is fixed too, perhaps the name should also indicate
> the value size.
> >
> > <rant off>
> >
> > It's a shame we don't have C++ class templates available in DPDK...
> >
> > In other news, Mellanox has sent an RFC for an "indexed memory pool"
> library [1] to conserve memory by using uintXX_t instead of pointers, so
> perhaps a variant of a 32 bit key hash library with 32 bit values (in
> addition to 16 bit values in FBK and 64 bit in DWK) would be nice
> combination with that library.
> >
> > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html
> >
> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> >
> --
> Regards,
> Vladimir
>
  
Wang, Yipeng1 March 17, 2020, 7:52 p.m. UTC | #5
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, March 16, 2020 12:33 PM
> To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org;
> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wang, Yipeng1
> <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou
> <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>;
> Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko
> <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam
> <orika@mellanox.com>
> Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table
> 
> On Mon, 16 Mar 2020 18:27:40 +0000
> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote:
> 
> > Hi Morten,
> >
> >
> > On 16/03/2020 14:39, Morten Brørup wrote:
> > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir
> > >> Medvedkin
> > >> Sent: Monday, March 16, 2020 2:38 PM
> > >>
> > >> Currently DPDK has a special implementation of a hash table for
> > >> 4 byte keys which is called FBK hash. Unfortunately its main
> > >> drawback is that it only supports 2 byte values.
> > >> The new implementation called DWK (double word key) hash supports 8
> > >> byte values, which is enough to store a pointer.
> > >>
> > >> It would also be nice to get feedback on whether to leave the old
> > >> FBK and new DWK implementations, or whether to deprecate the old
> one?
> > > <rant on>
> > >
> > > Who comes up with these names?!?
> > >
> > > FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean
> the same. Could you use 32 somewhere in the name instead, like in int32_t,
> instead of using a growing list of creative synonyms for the same thing?
> Pretty please, with a cherry on top!
> >
> >
> > That's true, at first I named it as fbk2, but then it was decided to
> > rename it "dwk", so that there was no confusion with the existing FBK
> > library. Naming suggestions are welcome!
> >
> > >
> > > And if the value size is fixed too, perhaps the name should also indicate
> the value size.
> > >
> > > <rant off>
> > >
> > > It's a shame we don't have C++ class templates available in DPDK...
> > >
> > > In other news, Mellanox has sent an RFC for an "indexed memory pool"
> library [1] to conserve memory by using uintXX_t instead of pointers, so
> perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to
> 16 bit values in FBK and 64 bit in DWK) would be nice combination with that
> library.
> > >
> > > [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html
> > >
> > >
> > > Med venlig hilsen / kind regards
> > > - Morten Brørup
> > >
> 
> Why is this different (or better) than existing rte_hash.
> Having more flavors is not necessarily a good thing (except in Gelato)
 [Wang, Yipeng]
Hi, Vladimir,
As Stephen mentioned, I think it is good idea to explain the benefit of this new type of hash table more explicitly such as
Specific use cases, differences with current rte_hash, and performance numbers, etc.
  
Vladimir Medvedkin March 26, 2020, 5:28 p.m. UTC | #6
Hi Yipeng, Stephen, all,

On 17/03/2020 19:52, Wang, Yipeng1 wrote:
>> -----Original Message-----
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> Sent: Monday, March 16, 2020 12:33 PM
>> To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
>> Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org;
>> Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wang, Yipeng1
>> <yipeng1.wang@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>;
>> Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou
>> <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>;
>> Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko
>> <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam
>> <orika@mellanox.com>
>> Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table
>>
>> On Mon, 16 Mar 2020 18:27:40 +0000
>> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote:
>>
>>> Hi Morten,
>>>
>>>
>>> On 16/03/2020 14:39, Morten Brørup wrote:
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir
>>>>> Medvedkin
>>>>> Sent: Monday, March 16, 2020 2:38 PM
>>>>>
>>>>> Currently DPDK has a special implementation of a hash table for
>>>>> 4 byte keys which is called FBK hash. Unfortunately its main
>>>>> drawback is that it only supports 2 byte values.
>>>>> The new implementation called DWK (double word key) hash supports 8
>>>>> byte values, which is enough to store a pointer.
>>>>>
>>>>> It would also be nice to get feedback on whether to leave the old
>>>>> FBK and new DWK implementations, or whether to deprecate the old
>> one?
>>>> <rant on>
>>>>
>>>> Who comes up with these names?!?
>>>>
>>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean
>> the same. Could you use 32 somewhere in the name instead, like in int32_t,
>> instead of using a growing list of creative synonyms for the same thing?
>> Pretty please, with a cherry on top!
>>>
>>> That's true, at first I named it as fbk2, but then it was decided to
>>> rename it "dwk", so that there was no confusion with the existing FBK
>>> library. Naming suggestions are welcome!
>>>
>>>> And if the value size is fixed too, perhaps the name should also indicate
>> the value size.
>>>> <rant off>
>>>>
>>>> It's a shame we don't have C++ class templates available in DPDK...
>>>>
>>>> In other news, Mellanox has sent an RFC for an "indexed memory pool"
>> library [1] to conserve memory by using uintXX_t instead of pointers, so
>> perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to
>> 16 bit values in FBK and 64 bit in DWK) would be nice combination with that
>> library.
>>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html
>>>>
>>>>
>>>> Med venlig hilsen / kind regards
>>>> - Morten Brørup
>>>>
>> Why is this different (or better) than existing rte_hash.
>> Having more flavors is not necessarily a good thing (except in Gelato)
>   [Wang, Yipeng]
> Hi, Vladimir,
> As Stephen mentioned, I think it is good idea to explain the benefit of this new type of hash table more explicitly such as
> Specific use cases, differences with current rte_hash, and performance numbers, etc.

The main reason for this new hash library is performance. As I mentioned 
earlier, the current rte_fbk implementation is pretty fast but it has a 
number of drawbacks such as 2 byte values and limited collision 
resolving capabilities. On the other hand, rte_hash (cuckoo hash) 
doesn't have this drawbacks but at the cost of lower performance 
comparing to rte_fbk.

If I understand correctly, performance penalty are due to :

1. Load two buckets

2. First compare signatures

3. If signature comparison hits get a key index and find memory location 
with a key itself and get the key

4. Using indirect call to memcmp() to compare two uint32_t.

The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks 
while offers the same performance as rte_fbk.

Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 byte 
key size. Replacing it with a new implementation gives about 30% in 
performance.

The main disadvantage comparing to rte_hash is some performance 
degradation with high average table utilization due to chain resolving 
for 5th and subsequent collision.
  
Thomas Monjalon March 31, 2020, 7:55 p.m. UTC | #7
26/03/2020 18:28, Medvedkin, Vladimir:
> Hi Yipeng, Stephen, all,
> 
> On 17/03/2020 19:52, Wang, Yipeng1 wrote:
> > From: Stephen Hemminger <stephen@networkplumber.org>
> >> On Mon, 16 Mar 2020 18:27:40 +0000
> >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote:
> >>
> >>> Hi Morten,
> >>>
> >>>
> >>> On 16/03/2020 14:39, Morten Brørup wrote:
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir
> >>>>> Medvedkin
> >>>>> Sent: Monday, March 16, 2020 2:38 PM
> >>>>>
> >>>>> Currently DPDK has a special implementation of a hash table for
> >>>>> 4 byte keys which is called FBK hash. Unfortunately its main
> >>>>> drawback is that it only supports 2 byte values.
> >>>>> The new implementation called DWK (double word key) hash supports 8
> >>>>> byte values, which is enough to store a pointer.
> >>>>>
> >>>>> It would also be nice to get feedback on whether to leave the old
> >>>>> FBK and new DWK implementations, or whether to deprecate the old
> >> one?
> >>>> <rant on>
> >>>>
> >>>> Who comes up with these names?!?
> >>>>
> >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean
> >> the same. Could you use 32 somewhere in the name instead, like in int32_t,
> >> instead of using a growing list of creative synonyms for the same thing?
> >> Pretty please, with a cherry on top!
> >>>
> >>> That's true, at first I named it as fbk2, but then it was decided to
> >>> rename it "dwk", so that there was no confusion with the existing FBK
> >>> library. Naming suggestions are welcome!
> >>>
> >>>> And if the value size is fixed too, perhaps the name should also indicate
> >> the value size.
> >>>> <rant off>
> >>>>
> >>>> It's a shame we don't have C++ class templates available in DPDK...
> >>>>
> >>>> In other news, Mellanox has sent an RFC for an "indexed memory pool"
> >> library [1] to conserve memory by using uintXX_t instead of pointers, so
> >> perhaps a variant of a 32 bit key hash library with 32 bit values (in addition to
> >> 16 bit values in FBK and 64 bit in DWK) would be nice combination with that
> >> library.
> >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html

Yes some work is in progress to propose a new memory allocator
for small objects of fixed size with small memory overhead.


> >> Why is this different (or better) than existing rte_hash.
> >> Having more flavors is not necessarily a good thing (except in Gelato)
> >   [Wang, Yipeng]
> > Hi, Vladimir,
> > As Stephen mentioned, I think it is good idea to explain the benefit of this new type of hash table more explicitly such as
> > Specific use cases, differences with current rte_hash, and performance numbers, etc.
> 
> The main reason for this new hash library is performance. As I mentioned 
> earlier, the current rte_fbk implementation is pretty fast but it has a 
> number of drawbacks such as 2 byte values and limited collision 
> resolving capabilities. On the other hand, rte_hash (cuckoo hash) 
> doesn't have this drawbacks but at the cost of lower performance 
> comparing to rte_fbk.
> 
> If I understand correctly, performance penalty are due to :
> 
> 1. Load two buckets
> 
> 2. First compare signatures
> 
> 3. If signature comparison hits get a key index and find memory location 
> with a key itself and get the key
> 
> 4. Using indirect call to memcmp() to compare two uint32_t.
> 
> The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks 
> while offers the same performance as rte_fbk.
> 
> Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 byte 
> key size. Replacing it with a new implementation gives about 30% in 
> performance.
> 
> The main disadvantage comparing to rte_hash is some performance 
> degradation with high average table utilization due to chain resolving 
> for 5th and subsequent collision.

Thanks for explaining.
Please, such information should added in the documentation:
	doc/guides/prog_guide/hash_lib.rst
  
Honnappa Nagarahalli March 31, 2020, 9:17 p.m. UTC | #8
<snip>

> 
> 26/03/2020 18:28, Medvedkin, Vladimir:
> > Hi Yipeng, Stephen, all,
> >
> > On 17/03/2020 19:52, Wang, Yipeng1 wrote:
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > >> On Mon, 16 Mar 2020 18:27:40 +0000
> > >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote:
> > >>
> > >>> Hi Morten,
> > >>>
> > >>>
> > >>> On 16/03/2020 14:39, Morten Brørup wrote:
> > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir
> > >>>>> Medvedkin
> > >>>>> Sent: Monday, March 16, 2020 2:38 PM
> > >>>>>
> > >>>>> Currently DPDK has a special implementation of a hash table for
> > >>>>> 4 byte keys which is called FBK hash. Unfortunately its main
> > >>>>> drawback is that it only supports 2 byte values.
> > >>>>> The new implementation called DWK (double word key) hash
> > >>>>> supports 8 byte values, which is enough to store a pointer.
> > >>>>>
> > >>>>> It would also be nice to get feedback on whether to leave the
> > >>>>> old FBK and new DWK implementations, or whether to deprecate the
> > >>>>> old
> > >> one?
> > >>>> <rant on>
> > >>>>
> > >>>> Who comes up with these names?!?
> > >>>>
> > >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean
> > >> the same. Could you use 32 somewhere in the name instead, like in
> > >> int32_t, instead of using a growing list of creative synonyms for the same
> thing?
> > >> Pretty please, with a cherry on top!
> > >>>
> > >>> That's true, at first I named it as fbk2, but then it was decided
> > >>> to rename it "dwk", so that there was no confusion with the
> > >>> existing FBK library. Naming suggestions are welcome!
> > >>>
> > >>>> And if the value size is fixed too, perhaps the name should also
> > >>>> indicate
> > >> the value size.
> > >>>> <rant off>
> > >>>>
> > >>>> It's a shame we don't have C++ class templates available in DPDK...
> > >>>>
> > >>>> In other news, Mellanox has sent an RFC for an "indexed memory
> pool"
> > >> library [1] to conserve memory by using uintXX_t instead of
> > >> pointers, so perhaps a variant of a 32 bit key hash library with 32
> > >> bit values (in addition to
> > >> 16 bit values in FBK and 64 bit in DWK) would be nice combination
> > >> with that library.
> > >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html
> 
> Yes some work is in progress to propose a new memory allocator for small
> objects of fixed size with small memory overhead.
> 
> 
> > >> Why is this different (or better) than existing rte_hash.
> > >> Having more flavors is not necessarily a good thing (except in
> > >> Gelato)
> > >   [Wang, Yipeng]
> > > Hi, Vladimir,
> > > As Stephen mentioned, I think it is good idea to explain the benefit
> > > of this new type of hash table more explicitly such as Specific use cases,
> differences with current rte_hash, and performance numbers, etc.
> >
> > The main reason for this new hash library is performance. As I
> > mentioned earlier, the current rte_fbk implementation is pretty fast
> > but it has a number of drawbacks such as 2 byte values and limited
> > collision resolving capabilities. On the other hand, rte_hash (cuckoo
> > hash) doesn't have this drawbacks but at the cost of lower performance
> > comparing to rte_fbk.
> >
> > If I understand correctly, performance penalty are due to :
> >
> > 1. Load two buckets
> >
> > 2. First compare signatures
> >
> > 3. If signature comparison hits get a key index and find memory
> > location with a key itself and get the key
> >
> > 4. Using indirect call to memcmp() to compare two uint32_t.
> >
> > The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks
> > while offers the same performance as rte_fbk.
> >
> > Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4
> > byte key size. Replacing it with a new implementation gives about 30%
> > in performance.
> >
> > The main disadvantage comparing to rte_hash is some performance
> > degradation with high average table utilization due to chain resolving
> > for 5th and subsequent collision.
rte_hash is linearly scalable across multiple cores for lookup due to lock-free algorithm. How is the scalability for the new algorithm?

> 
> Thanks for explaining.
> Please, such information should added in the documentation:
> 	doc/guides/prog_guide/hash_lib.rst
> 
>
  
Vladimir Medvedkin April 1, 2020, 6:28 p.m. UTC | #9
Hi Thomas,

-----Original Message-----
From: Thomas Monjalon <thomas@monjalon.net> 
Sent: Tuesday, March 31, 2020 8:56 PM
To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org; Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>
Subject: Re: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table

26/03/2020 18:28, Medvedkin, Vladimir:
> Hi Yipeng, Stephen, all,
> 
> On 17/03/2020 19:52, Wang, Yipeng1 wrote:
> > From: Stephen Hemminger <stephen@networkplumber.org>
> >> On Mon, 16 Mar 2020 18:27:40 +0000
> >> "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com> wrote:
> >>
> >>> Hi Morten,
> >>>
> >>>
> >>> On 16/03/2020 14:39, Morten Brørup wrote:
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir 
> >>>>> Medvedkin
> >>>>> Sent: Monday, March 16, 2020 2:38 PM
> >>>>>
> >>>>> Currently DPDK has a special implementation of a hash table for
> >>>>> 4 byte keys which is called FBK hash. Unfortunately its main 
> >>>>> drawback is that it only supports 2 byte values.
> >>>>> The new implementation called DWK (double word key) hash 
> >>>>> supports 8 byte values, which is enough to store a pointer.
> >>>>>
> >>>>> It would also be nice to get feedback on whether to leave the 
> >>>>> old FBK and new DWK implementations, or whether to deprecate the 
> >>>>> old
> >> one?
> >>>> <rant on>
> >>>>
> >>>> Who comes up with these names?!?
> >>>>
> >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to mean
> >> the same. Could you use 32 somewhere in the name instead, like in 
> >> int32_t, instead of using a growing list of creative synonyms for the same thing?
> >> Pretty please, with a cherry on top!
> >>>
> >>> That's true, at first I named it as fbk2, but then it was decided 
> >>> to rename it "dwk", so that there was no confusion with the 
> >>> existing FBK library. Naming suggestions are welcome!
> >>>
> >>>> And if the value size is fixed too, perhaps the name should also 
> >>>> indicate
> >> the value size.
> >>>> <rant off>
> >>>>
> >>>> It's a shame we don't have C++ class templates available in DPDK...
> >>>>
> >>>> In other news, Mellanox has sent an RFC for an "indexed memory pool"
> >> library [1] to conserve memory by using uintXX_t instead of 
> >> pointers, so perhaps a variant of a 32 bit key hash library with 32 
> >> bit values (in addition to
> >> 16 bit values in FBK and 64 bit in DWK) would be nice combination 
> >> with that library.
> >>>> [1]: http://mails.dpdk.org/archives/dev/2019-October/147513.html

Yes some work is in progress to propose a new memory allocator for small objects of fixed size with small memory overhead.


> >> Why is this different (or better) than existing rte_hash.
> >> Having more flavors is not necessarily a good thing (except in 
> >> Gelato)
> >   [Wang, Yipeng]
> > Hi, Vladimir,
> > As Stephen mentioned, I think it is good idea to explain the benefit 
> > of this new type of hash table more explicitly such as Specific use cases, differences with current rte_hash, and performance numbers, etc.
> 
> The main reason for this new hash library is performance. As I 
> mentioned earlier, the current rte_fbk implementation is pretty fast 
> but it has a number of drawbacks such as 2 byte values and limited 
> collision resolving capabilities. On the other hand, rte_hash (cuckoo 
> hash) doesn't have this drawbacks but at the cost of lower performance 
> comparing to rte_fbk.
> 
> If I understand correctly, performance penalty are due to :
> 
> 1. Load two buckets
> 
> 2. First compare signatures
> 
> 3. If signature comparison hits get a key index and find memory 
> location with a key itself and get the key
> 
> 4. Using indirect call to memcmp() to compare two uint32_t.
> 
> The new proposed 4 byte key hash table doesn't have rte_fbk drawbacks 
> while offers the same performance as rte_fbk.
> 
> Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 
> byte key size. Replacing it with a new implementation gives about 30% 
> in performance.
> 
> The main disadvantage comparing to rte_hash is some performance 
> degradation with high average table utilization due to chain resolving 
> for 5th and subsequent collision.

Thanks for explaining.
Please, such information should added in the documentation:
	doc/guides/prog_guide/hash_lib.rst


I'm going to submit v2 this week, will add documentation update.
  
Vladimir Medvedkin April 1, 2020, 6:37 p.m. UTC | #10
Hi Honnappa,

-----Original Message-----
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> 
Sent: Tuesday, March 31, 2020 10:17 PM
To: thomas@monjalon.net; Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
Cc: Wang, Yipeng1 <yipeng1.wang@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org; Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Gobriel, Sameh <sameh.gobriel@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Suanming Mou <suanmingm@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; Xueming(Steven) Li <xuemingl@mellanox.com>; Andrew Rybchenko <arybchenko@solarflare.com>; Asaf Penso <asafp@mellanox.com>; Ori Kam <orika@mellanox.com>; nd <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
Subject: RE: [dpdk-dev] [PATCH 0/3] add new Double Word Key hash table

<snip>

> 
> 26/03/2020 18:28, Medvedkin, Vladimir:
> > Hi Yipeng, Stephen, all,
> >
> > On 17/03/2020 19:52, Wang, Yipeng1 wrote:
> > > From: Stephen Hemminger <stephen@networkplumber.org>
> > >> On Mon, 16 Mar 2020 18:27:40 +0000 "Medvedkin, Vladimir" 
> > >> <vladimir.medvedkin@intel.com> wrote:
> > >>
> > >>> Hi Morten,
> > >>>
> > >>>
> > >>> On 16/03/2020 14:39, Morten Brørup wrote:
> > >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladimir 
> > >>>>> Medvedkin
> > >>>>> Sent: Monday, March 16, 2020 2:38 PM
> > >>>>>
> > >>>>> Currently DPDK has a special implementation of a hash table 
> > >>>>> for
> > >>>>> 4 byte keys which is called FBK hash. Unfortunately its main 
> > >>>>> drawback is that it only supports 2 byte values.
> > >>>>> The new implementation called DWK (double word key) hash 
> > >>>>> supports 8 byte values, which is enough to store a pointer.
> > >>>>>
> > >>>>> It would also be nice to get feedback on whether to leave the 
> > >>>>> old FBK and new DWK implementations, or whether to deprecate 
> > >>>>> the old
> > >> one?
> > >>>> <rant on>
> > >>>>
> > >>>> Who comes up with these names?!?
> > >>>>
> > >>>> FBK (Four Byte Key) and DWK (Double Word Key) is supposed to 
> > >>>> mean
> > >> the same. Could you use 32 somewhere in the name instead, like in 
> > >> int32_t, instead of using a growing list of creative synonyms for 
> > >> the same
> thing?
> > >> Pretty please, with a cherry on top!
> > >>>
> > >>> That's true, at first I named it as fbk2, but then it was 
> > >>> decided to rename it "dwk", so that there was no confusion with 
> > >>> the existing FBK library. Naming suggestions are welcome!
> > >>>
> > >>>> And if the value size is fixed too, perhaps the name should 
> > >>>> also indicate
> > >> the value size.
> > >>>> <rant off>
> > >>>>
> > >>>> It's a shame we don't have C++ class templates available in DPDK...
> > >>>>
> > >>>> In other news, Mellanox has sent an RFC for an "indexed memory
> pool"
> > >> library [1] to conserve memory by using uintXX_t instead of 
> > >> pointers, so perhaps a variant of a 32 bit key hash library with 
> > >> 32 bit values (in addition to
> > >> 16 bit values in FBK and 64 bit in DWK) would be nice combination 
> > >> with that library.
> > >>>> [1]: 
> > >>>> http://mails.dpdk.org/archives/dev/2019-October/147513.html
> 
> Yes some work is in progress to propose a new memory allocator for 
> small objects of fixed size with small memory overhead.
> 
> 
> > >> Why is this different (or better) than existing rte_hash.
> > >> Having more flavors is not necessarily a good thing (except in
> > >> Gelato)
> > >   [Wang, Yipeng]
> > > Hi, Vladimir,
> > > As Stephen mentioned, I think it is good idea to explain the 
> > > benefit of this new type of hash table more explicitly such as 
> > > Specific use cases,
> differences with current rte_hash, and performance numbers, etc.
> >
> > The main reason for this new hash library is performance. As I 
> > mentioned earlier, the current rte_fbk implementation is pretty fast 
> > but it has a number of drawbacks such as 2 byte values and limited 
> > collision resolving capabilities. On the other hand, rte_hash 
> > (cuckoo
> > hash) doesn't have this drawbacks but at the cost of lower 
> > performance comparing to rte_fbk.
> >
> > If I understand correctly, performance penalty are due to :
> >
> > 1. Load two buckets
> >
> > 2. First compare signatures
> >
> > 3. If signature comparison hits get a key index and find memory 
> > location with a key itself and get the key
> >
> > 4. Using indirect call to memcmp() to compare two uint32_t.
> >
> > The new proposed 4 byte key hash table doesn't have rte_fbk 
> > drawbacks while offers the same performance as rte_fbk.
> >
> > Regarding use cases, in rte_ipsec_sad we are using rte_hash with 4 
> > byte key size. Replacing it with a new implementation gives about 
> > 30% in performance.
> >
> > The main disadvantage comparing to rte_hash is some performance 
> > degradation with high average table utilization due to chain 
> > resolving for 5th and subsequent collision.
rte_hash is linearly scalable across multiple cores for lookup due to lock-free algorithm. How is the scalability for the new algorithm?

This library is scalable as well. It uses almost the same lock-free algorithm. The only difference with cuckoo is that cuckoo in lock-free implementation uses single global "change_counter" for all the table, and the proposed implementation uses fine grained approach with "change_counter" per bucket. So it should be more scalable with frequent concurrent updates.

> 
> Thanks for explaining.
> Please, such information should added in the documentation:
> 	doc/guides/prog_guide/hash_lib.rst
> 
>