mbox series

[v2,0/3] Predictable RSS feature

Message ID 1617738643-258635-1-git-send-email-vladimir.medvedkin@intel.com (mailing list archive)
Headers
Series Predictable RSS feature |

Message

Vladimir Medvedkin April 6, 2021, 7:50 p.m. UTC
  This patch series introduces predictable RSS feature.
It is based on the idea of searching for partial hash collisions
within Toeplitz hash.

The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^),
where (G, ^) - is a group of tuples and (H, ^) is a group of hashes
with respect to XOR operation. So tuples and hashes could be treated as
n-dimension and 32-dimension vector spaces over GF(2).
So, f(x ^ y) == f(x) ^ f(y)
where f - is the toeplitz hash function and x, y are tuples.

The ability to predict partial collisions allows user to compute
input hash value with desired LSB values.
Usually number of LSB's are defined by the size of RSS Redirection Table.

There could be number of use cases, for example:
1) NAT. Using this library it is possible to select a new port number
on a translation in the way that rss hash for original tuple will have
the same LSB's as rss hash for reverse tuple.
2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to
a desired queue.
3) TCP stack. It is possible to choose a source port number for outgoing
connections in the way that received replies will be assigned to
desired queue.
4) RSS hash key generation. Hash key initialization with random values
does not guarantee an uniform distribution amongst queues. This library
uses mathematically proved algorithm to complete the rss hash key to
provide the best distribution.

v2:
- added extra API rte_thash_adjust_tuple()
- added extra tests for rte_thash_adjust_tuple()
- added extra fields to rte_thash_subtuple_helper struct
- fixed typos 

Vladimir Medvedkin (3):
  hash: add predictable RSS API
  hash: add predictable RSS implementation
  test/hash: add additional thash tests

 app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
 lib/librte_hash/meson.build |   3 +-
 lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_hash/rte_thash.h | 180 +++++++++++++
 lib/librte_hash/version.map |   8 +
 5 files changed, 1289 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_hash/rte_thash.c
  

Comments

Stephen Hemminger April 8, 2021, 3:56 p.m. UTC | #1
On Tue,  6 Apr 2021 20:50:40 +0100
Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:

> This patch series introduces predictable RSS feature.
> It is based on the idea of searching for partial hash collisions
> within Toeplitz hash.
> 
> The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^),
> where (G, ^) - is a group of tuples and (H, ^) is a group of hashes
> with respect to XOR operation. So tuples and hashes could be treated as
> n-dimension and 32-dimension vector spaces over GF(2).
> So, f(x ^ y) == f(x) ^ f(y)
> where f - is the toeplitz hash function and x, y are tuples.
> 
> The ability to predict partial collisions allows user to compute
> input hash value with desired LSB values.
> Usually number of LSB's are defined by the size of RSS Redirection Table.
> 
> There could be number of use cases, for example:
> 1) NAT. Using this library it is possible to select a new port number
> on a translation in the way that rss hash for original tuple will have
> the same LSB's as rss hash for reverse tuple.
> 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to
> a desired queue.
> 3) TCP stack. It is possible to choose a source port number for outgoing
> connections in the way that received replies will be assigned to
> desired queue.
> 4) RSS hash key generation. Hash key initialization with random values
> does not guarantee an uniform distribution amongst queues. This library
> uses mathematically proved algorithm to complete the rss hash key to
> provide the best distribution.
> 
> v2:
> - added extra API rte_thash_adjust_tuple()
> - added extra tests for rte_thash_adjust_tuple()
> - added extra fields to rte_thash_subtuple_helper struct
> - fixed typos 
> 
> Vladimir Medvedkin (3):
>   hash: add predictable RSS API
>   hash: add predictable RSS implementation
>   test/hash: add additional thash tests
> 
>  app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
>  lib/librte_hash/meson.build |   3 +-
>  lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++
>  lib/librte_hash/rte_thash.h | 180 +++++++++++++
>  lib/librte_hash/version.map |   8 +
>  5 files changed, 1289 insertions(+), 7 deletions(-)
>  create mode 100644 lib/librte_hash/rte_thash.c
> 

It would be good to show how this could be used in an application.
Maybe yet another variant/flag to l3fwd example.
  
Wang, Yipeng1 April 10, 2021, 12:32 a.m. UTC | #2
> -----Original Message-----
> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
> Sent: Tuesday, April 6, 2021 12:51 PM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Chilikin, Andrey
> <andrey.chilikin@intel.com>; Kinsella, Ray <ray.kinsella@intel.com>; Wang,
> Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh
> <sameh.gobriel@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: [PATCH v2 0/3] Predictable RSS feature
> 
> This patch series introduces predictable RSS feature.
> It is based on the idea of searching for partial hash collisions within Toeplitz
> hash.
> 
> The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^),
> where (G, ^) - is a group of tuples and (H, ^) is a group of hashes with respect
> to XOR operation. So tuples and hashes could be treated as n-dimension and
> 32-dimension vector spaces over GF(2).
> So, f(x ^ y) == f(x) ^ f(y)
> where f - is the toeplitz hash function and x, y are tuples.
> 
> The ability to predict partial collisions allows user to compute input hash value
> with desired LSB values.
> Usually number of LSB's are defined by the size of RSS Redirection Table.
> 
> There could be number of use cases, for example:
> 1) NAT. Using this library it is possible to select a new port number on a
> translation in the way that rss hash for original tuple will have the same LSB's
> as rss hash for reverse tuple.
> 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to a
> desired queue.
> 3) TCP stack. It is possible to choose a source port number for outgoing
> connections in the way that received replies will be assigned to desired
> queue.
> 4) RSS hash key generation. Hash key initialization with random values does
> not guarantee an uniform distribution amongst queues. This library uses
> mathematically proved algorithm to complete the rss hash key to provide the
> best distribution.
> 
> v2:
> - added extra API rte_thash_adjust_tuple()
> - added extra tests for rte_thash_adjust_tuple()
> - added extra fields to rte_thash_subtuple_helper struct
> - fixed typos
> 
> Vladimir Medvedkin (3):
>   hash: add predictable RSS API
>   hash: add predictable RSS implementation
>   test/hash: add additional thash tests
> 
>  app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
>  lib/librte_hash/meson.build |   3 +-
>  lib/librte_hash/rte_thash.c | 637
> ++++++++++++++++++++++++++++++++++++++++++++
>  lib/librte_hash/rte_thash.h | 180 +++++++++++++
>  lib/librte_hash/version.map |   8 +
>  5 files changed, 1289 insertions(+), 7 deletions(-)  create mode 100644
> lib/librte_hash/rte_thash.c
> 
> --
> 2.7.4

[Wang, Yipeng] 
Hi, Vladimir, thanks for the patch!
I haven't fully understood every bit of the algorithm yet, 
but I did see issues that this patch could potentially solve.
My understanding is that there are some restrictions for the current implementation,
for example, it only supports port(16-bit) manipulation, but not multiple fields or IP. 
Still, I think it should be good for the use cases you listed. I would love to hear
more feedbacks from people who are more familiar with doing NAT in production systems.

For me, besides the comments I sent earlier,
good documentation and references are needed with clear usage examples, as others pointed out already.

Also, the current API design seems a bit cumbersome.
To use the library, one needs:
Init_ctx
Add_helper.
Get_helper
Get_complement
Then in a loop:
Adjust_tuples
Then XOR with the current tuple

I wonder if an alternative all-in-one API could be designed for simpler use cases.

Thanks!
  
Vladimir Medvedkin April 11, 2021, 6:51 p.m. UTC | #3
Hi Stephen,

Thanks for the feedback,

On 08/04/2021 18:56, Stephen Hemminger wrote:
> On Tue,  6 Apr 2021 20:50:40 +0100
> Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote:
> 
>> This patch series introduces predictable RSS feature.
>> It is based on the idea of searching for partial hash collisions
>> within Toeplitz hash.
>>
>> The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^),
>> where (G, ^) - is a group of tuples and (H, ^) is a group of hashes
>> with respect to XOR operation. So tuples and hashes could be treated as
>> n-dimension and 32-dimension vector spaces over GF(2).
>> So, f(x ^ y) == f(x) ^ f(y)
>> where f - is the toeplitz hash function and x, y are tuples.
>>
>> The ability to predict partial collisions allows user to compute
>> input hash value with desired LSB values.
>> Usually number of LSB's are defined by the size of RSS Redirection Table.
>>
>> There could be number of use cases, for example:
>> 1) NAT. Using this library it is possible to select a new port number
>> on a translation in the way that rss hash for original tuple will have
>> the same LSB's as rss hash for reverse tuple.
>> 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to
>> a desired queue.
>> 3) TCP stack. It is possible to choose a source port number for outgoing
>> connections in the way that received replies will be assigned to
>> desired queue.
>> 4) RSS hash key generation. Hash key initialization with random values
>> does not guarantee an uniform distribution amongst queues. This library
>> uses mathematically proved algorithm to complete the rss hash key to
>> provide the best distribution.
>>
>> v2:
>> - added extra API rte_thash_adjust_tuple()
>> - added extra tests for rte_thash_adjust_tuple()
>> - added extra fields to rte_thash_subtuple_helper struct
>> - fixed typos
>>
>> Vladimir Medvedkin (3):
>>    hash: add predictable RSS API
>>    hash: add predictable RSS implementation
>>    test/hash: add additional thash tests
>>
>>   app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
>>   lib/librte_hash/meson.build |   3 +-
>>   lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++
>>   lib/librte_hash/rte_thash.h | 180 +++++++++++++
>>   lib/librte_hash/version.map |   8 +
>>   5 files changed, 1289 insertions(+), 7 deletions(-)
>>   create mode 100644 lib/librte_hash/rte_thash.c
>>
> 
> It would be good to show how this could be used in an application.
> Maybe yet another variant/flag to l3fwd example.

Agree, I think it would be great to have a simple NAT implementation in 
examples. We've discussed this and will probably add in next releases.

>
  
Vladimir Medvedkin April 11, 2021, 6:51 p.m. UTC | #4
Hi Yipeng,

Thanks for the review,

On 10/04/2021 03:32, Wang, Yipeng1 wrote:
>> -----Original Message-----
>> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
>> Sent: Tuesday, April 6, 2021 12:51 PM
>> To: dev@dpdk.org
>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Chilikin, Andrey
>> <andrey.chilikin@intel.com>; Kinsella, Ray <ray.kinsella@intel.com>; Wang,
>> Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh
>> <sameh.gobriel@intel.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>
>> Subject: [PATCH v2 0/3] Predictable RSS feature
>>
>> This patch series introduces predictable RSS feature.
>> It is based on the idea of searching for partial hash collisions within Toeplitz
>> hash.
>>
>> The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^),
>> where (G, ^) - is a group of tuples and (H, ^) is a group of hashes with respect
>> to XOR operation. So tuples and hashes could be treated as n-dimension and
>> 32-dimension vector spaces over GF(2).
>> So, f(x ^ y) == f(x) ^ f(y)
>> where f - is the toeplitz hash function and x, y are tuples.
>>
>> The ability to predict partial collisions allows user to compute input hash value
>> with desired LSB values.
>> Usually number of LSB's are defined by the size of RSS Redirection Table.
>>
>> There could be number of use cases, for example:
>> 1) NAT. Using this library it is possible to select a new port number on a
>> translation in the way that rss hash for original tuple will have the same LSB's
>> as rss hash for reverse tuple.
>> 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to a
>> desired queue.
>> 3) TCP stack. It is possible to choose a source port number for outgoing
>> connections in the way that received replies will be assigned to desired
>> queue.
>> 4) RSS hash key generation. Hash key initialization with random values does
>> not guarantee an uniform distribution amongst queues. This library uses
>> mathematically proved algorithm to complete the rss hash key to provide the
>> best distribution.
>>
>> v2:
>> - added extra API rte_thash_adjust_tuple()
>> - added extra tests for rte_thash_adjust_tuple()
>> - added extra fields to rte_thash_subtuple_helper struct
>> - fixed typos
>>
>> Vladimir Medvedkin (3):
>>    hash: add predictable RSS API
>>    hash: add predictable RSS implementation
>>    test/hash: add additional thash tests
>>
>>   app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
>>   lib/librte_hash/meson.build |   3 +-
>>   lib/librte_hash/rte_thash.c | 637
>> ++++++++++++++++++++++++++++++++++++++++++++
>>   lib/librte_hash/rte_thash.h | 180 +++++++++++++
>>   lib/librte_hash/version.map |   8 +
>>   5 files changed, 1289 insertions(+), 7 deletions(-)  create mode 100644
>> lib/librte_hash/rte_thash.c
>>
>> --
>> 2.7.4
> 
> [Wang, Yipeng]
> Hi, Vladimir, thanks for the patch!
> I haven't fully understood every bit of the algorithm yet,
> but I did see issues that this patch could potentially solve.
> My understanding is that there are some restrictions for the current implementation,
> for example, it only supports port(16-bit) manipulation, but not multiple fields or IP.

It supports any bit-range ( with size >= reta_sz, configured for the 
CTX) manipulations within a tuple as well as multiple number of them.

> Still, I think it should be good for the use cases you listed. I would love to hear
> more feedbacks from people who are more familiar with doing NAT in production systems.
> 
> For me, besides the comments I sent earlier,
> good documentation and references are needed with clear usage examples, as others pointed out already.
> 
> Also, the current API design seems a bit cumbersome.
> To use the library, one needs:
> Init_ctx
> Add_helper.
> Get_helper
> Get_complement
> Then in a loop:
> Adjust_tuples
> Then XOR with the current tuple
> 
> I wonder if an alternative all-in-one API could be designed for simpler use cases.

Agree, v3 will have a new rte_thash_adjust_tuple() implementation that 
will make everything much easier.

> 
> Thanks!
> >
>
  
Thomas Monjalon Oct. 22, 2021, 8:37 p.m. UTC | #5
11/04/2021 20:51, Medvedkin, Vladimir:
> On 08/04/2021 18:56, Stephen Hemminger wrote:
> >>   app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
> >>   lib/librte_hash/meson.build |   3 +-
> >>   lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++
> >>   lib/librte_hash/rte_thash.h | 180 +++++++++++++
> >>   lib/librte_hash/version.map |   8 +
> >>   5 files changed, 1289 insertions(+), 7 deletions(-)
> >>   create mode 100644 lib/librte_hash/rte_thash.c
> > 
> > It would be good to show how this could be used in an application.
> > Maybe yet another variant/flag to l3fwd example.
> 
> Agree, I think it would be great to have a simple NAT implementation in 
> examples. We've discussed this and will probably add in next releases.

Vladimir, any update please?
Could you provide it in 21.11?
  
Vladimir Medvedkin Oct. 24, 2021, 2:42 p.m. UTC | #6
Hi Thomas,

пт, 22 окт. 2021 г. в 22:56, Thomas Monjalon <thomas@monjalon.net>:

> 11/04/2021 20:51, Medvedkin, Vladimir:
> > On 08/04/2021 18:56, Stephen Hemminger wrote:
> > >>   app/test/test_thash.c       | 468 +++++++++++++++++++++++++++++++-
> > >>   lib/librte_hash/meson.build |   3 +-
> > >>   lib/librte_hash/rte_thash.c | 637
> ++++++++++++++++++++++++++++++++++++++++++++
> > >>   lib/librte_hash/rte_thash.h | 180 +++++++++++++
> > >>   lib/librte_hash/version.map |   8 +
> > >>   5 files changed, 1289 insertions(+), 7 deletions(-)
> > >>   create mode 100644 lib/librte_hash/rte_thash.c
> > >
> > > It would be good to show how this could be used in an application.
> > > Maybe yet another variant/flag to l3fwd example.
> >
> > Agree, I think it would be great to have a simple NAT implementation in
> > examples. We've discussed this and will probably add in next releases.
>
> Vladimir, any update please?
> Could you provide it in 21.11?
>
>
>
>
It won't go in 21.11.
The prioritization was done in favour of the vectorized implementation of
the Toeplitz hash function, the scalar implementation takes about 200
cycles for a 12 byte tuple, which affects on a connections per second
performance of the NAT example. The new implementation is about 20 times
faster.