Message ID | 1617738643-258635-1-git-send-email-vladimir.medvedkin@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 70739A0546; Tue, 6 Apr 2021 21:50:50 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 30A3A406A2; Tue, 6 Apr 2021 21:50:50 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 067D24067C for <dev@dpdk.org>; Tue, 6 Apr 2021 21:50:48 +0200 (CEST) IronPort-SDR: hyGZOC+wnJDXlXCzmSykUB60CqjvGdk/B0fK6GWGbzD42UB+kpO9vM1p5oOeMw8bEvwIP7sgYy Wk6pPlMsElPA== X-IronPort-AV: E=McAfee;i="6000,8403,9946"; a="172619763" X-IronPort-AV: E=Sophos;i="5.82,201,1613462400"; d="scan'208";a="172619763" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Apr 2021 12:50:48 -0700 IronPort-SDR: FioEU3mT9P6Necp81bwOqYQlhy0a5iUxdcbnnZtdNcXDMVVrxO7auENTg2cY0LQESYOjuAuItt 3/JJIpz6pP6g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,201,1613462400"; d="scan'208";a="448740689" Received: from silpixa00400072.ir.intel.com ([10.237.222.213]) by fmsmga002.fm.intel.com with ESMTP; 06 Apr 2021 12:50:46 -0700 From: Vladimir Medvedkin <vladimir.medvedkin@intel.com> To: dev@dpdk.org Cc: konstantin.ananyev@intel.com, andrey.chilikin@intel.com, ray.kinsella@intel.com, yipeng1.wang@intel.com, sameh.gobriel@intel.com, bruce.richardson@intel.com Date: Tue, 6 Apr 2021 20:50:40 +0100 Message-Id: <1617738643-258635-1-git-send-email-vladimir.medvedkin@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1615919077-77774-1-git-send-email-vladimir.medvedkin@intel.com> References: <1615919077-77774-1-git-send-email-vladimir.medvedkin@intel.com> Subject: [dpdk-dev] [PATCH v2 0/3] Predictable RSS feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
Predictable RSS feature
|
|
Message
Vladimir Medvedkin
April 6, 2021, 7:50 p.m. UTC
This patch series introduces predictable RSS feature. It is based on the idea of searching for partial hash collisions within Toeplitz hash. The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^), where (G, ^) - is a group of tuples and (H, ^) is a group of hashes with respect to XOR operation. So tuples and hashes could be treated as n-dimension and 32-dimension vector spaces over GF(2). So, f(x ^ y) == f(x) ^ f(y) where f - is the toeplitz hash function and x, y are tuples. The ability to predict partial collisions allows user to compute input hash value with desired LSB values. Usually number of LSB's are defined by the size of RSS Redirection Table. There could be number of use cases, for example: 1) NAT. Using this library it is possible to select a new port number on a translation in the way that rss hash for original tuple will have the same LSB's as rss hash for reverse tuple. 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to a desired queue. 3) TCP stack. It is possible to choose a source port number for outgoing connections in the way that received replies will be assigned to desired queue. 4) RSS hash key generation. Hash key initialization with random values does not guarantee an uniform distribution amongst queues. This library uses mathematically proved algorithm to complete the rss hash key to provide the best distribution. v2: - added extra API rte_thash_adjust_tuple() - added extra tests for rte_thash_adjust_tuple() - added extra fields to rte_thash_subtuple_helper struct - fixed typos Vladimir Medvedkin (3): hash: add predictable RSS API hash: add predictable RSS implementation test/hash: add additional thash tests app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- lib/librte_hash/meson.build | 3 +- lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++ lib/librte_hash/rte_thash.h | 180 +++++++++++++ lib/librte_hash/version.map | 8 + 5 files changed, 1289 insertions(+), 7 deletions(-) create mode 100644 lib/librte_hash/rte_thash.c
Comments
On Tue, 6 Apr 2021 20:50:40 +0100 Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote: > This patch series introduces predictable RSS feature. > It is based on the idea of searching for partial hash collisions > within Toeplitz hash. > > The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^), > where (G, ^) - is a group of tuples and (H, ^) is a group of hashes > with respect to XOR operation. So tuples and hashes could be treated as > n-dimension and 32-dimension vector spaces over GF(2). > So, f(x ^ y) == f(x) ^ f(y) > where f - is the toeplitz hash function and x, y are tuples. > > The ability to predict partial collisions allows user to compute > input hash value with desired LSB values. > Usually number of LSB's are defined by the size of RSS Redirection Table. > > There could be number of use cases, for example: > 1) NAT. Using this library it is possible to select a new port number > on a translation in the way that rss hash for original tuple will have > the same LSB's as rss hash for reverse tuple. > 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to > a desired queue. > 3) TCP stack. It is possible to choose a source port number for outgoing > connections in the way that received replies will be assigned to > desired queue. > 4) RSS hash key generation. Hash key initialization with random values > does not guarantee an uniform distribution amongst queues. This library > uses mathematically proved algorithm to complete the rss hash key to > provide the best distribution. > > v2: > - added extra API rte_thash_adjust_tuple() > - added extra tests for rte_thash_adjust_tuple() > - added extra fields to rte_thash_subtuple_helper struct > - fixed typos > > Vladimir Medvedkin (3): > hash: add predictable RSS API > hash: add predictable RSS implementation > test/hash: add additional thash tests > > app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- > lib/librte_hash/meson.build | 3 +- > lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++ > lib/librte_hash/rte_thash.h | 180 +++++++++++++ > lib/librte_hash/version.map | 8 + > 5 files changed, 1289 insertions(+), 7 deletions(-) > create mode 100644 lib/librte_hash/rte_thash.c > It would be good to show how this could be used in an application. Maybe yet another variant/flag to l3fwd example.
> -----Original Message----- > From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com> > Sent: Tuesday, April 6, 2021 12:51 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Chilikin, Andrey > <andrey.chilikin@intel.com>; Kinsella, Ray <ray.kinsella@intel.com>; Wang, > Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh > <sameh.gobriel@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com> > Subject: [PATCH v2 0/3] Predictable RSS feature > > This patch series introduces predictable RSS feature. > It is based on the idea of searching for partial hash collisions within Toeplitz > hash. > > The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^), > where (G, ^) - is a group of tuples and (H, ^) is a group of hashes with respect > to XOR operation. So tuples and hashes could be treated as n-dimension and > 32-dimension vector spaces over GF(2). > So, f(x ^ y) == f(x) ^ f(y) > where f - is the toeplitz hash function and x, y are tuples. > > The ability to predict partial collisions allows user to compute input hash value > with desired LSB values. > Usually number of LSB's are defined by the size of RSS Redirection Table. > > There could be number of use cases, for example: > 1) NAT. Using this library it is possible to select a new port number on a > translation in the way that rss hash for original tuple will have the same LSB's > as rss hash for reverse tuple. > 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to a > desired queue. > 3) TCP stack. It is possible to choose a source port number for outgoing > connections in the way that received replies will be assigned to desired > queue. > 4) RSS hash key generation. Hash key initialization with random values does > not guarantee an uniform distribution amongst queues. This library uses > mathematically proved algorithm to complete the rss hash key to provide the > best distribution. > > v2: > - added extra API rte_thash_adjust_tuple() > - added extra tests for rte_thash_adjust_tuple() > - added extra fields to rte_thash_subtuple_helper struct > - fixed typos > > Vladimir Medvedkin (3): > hash: add predictable RSS API > hash: add predictable RSS implementation > test/hash: add additional thash tests > > app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- > lib/librte_hash/meson.build | 3 +- > lib/librte_hash/rte_thash.c | 637 > ++++++++++++++++++++++++++++++++++++++++++++ > lib/librte_hash/rte_thash.h | 180 +++++++++++++ > lib/librte_hash/version.map | 8 + > 5 files changed, 1289 insertions(+), 7 deletions(-) create mode 100644 > lib/librte_hash/rte_thash.c > > -- > 2.7.4 [Wang, Yipeng] Hi, Vladimir, thanks for the patch! I haven't fully understood every bit of the algorithm yet, but I did see issues that this patch could potentially solve. My understanding is that there are some restrictions for the current implementation, for example, it only supports port(16-bit) manipulation, but not multiple fields or IP. Still, I think it should be good for the use cases you listed. I would love to hear more feedbacks from people who are more familiar with doing NAT in production systems. For me, besides the comments I sent earlier, good documentation and references are needed with clear usage examples, as others pointed out already. Also, the current API design seems a bit cumbersome. To use the library, one needs: Init_ctx Add_helper. Get_helper Get_complement Then in a loop: Adjust_tuples Then XOR with the current tuple I wonder if an alternative all-in-one API could be designed for simpler use cases. Thanks!
Hi Stephen, Thanks for the feedback, On 08/04/2021 18:56, Stephen Hemminger wrote: > On Tue, 6 Apr 2021 20:50:40 +0100 > Vladimir Medvedkin <vladimir.medvedkin@intel.com> wrote: > >> This patch series introduces predictable RSS feature. >> It is based on the idea of searching for partial hash collisions >> within Toeplitz hash. >> >> The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^), >> where (G, ^) - is a group of tuples and (H, ^) is a group of hashes >> with respect to XOR operation. So tuples and hashes could be treated as >> n-dimension and 32-dimension vector spaces over GF(2). >> So, f(x ^ y) == f(x) ^ f(y) >> where f - is the toeplitz hash function and x, y are tuples. >> >> The ability to predict partial collisions allows user to compute >> input hash value with desired LSB values. >> Usually number of LSB's are defined by the size of RSS Redirection Table. >> >> There could be number of use cases, for example: >> 1) NAT. Using this library it is possible to select a new port number >> on a translation in the way that rss hash for original tuple will have >> the same LSB's as rss hash for reverse tuple. >> 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to >> a desired queue. >> 3) TCP stack. It is possible to choose a source port number for outgoing >> connections in the way that received replies will be assigned to >> desired queue. >> 4) RSS hash key generation. Hash key initialization with random values >> does not guarantee an uniform distribution amongst queues. This library >> uses mathematically proved algorithm to complete the rss hash key to >> provide the best distribution. >> >> v2: >> - added extra API rte_thash_adjust_tuple() >> - added extra tests for rte_thash_adjust_tuple() >> - added extra fields to rte_thash_subtuple_helper struct >> - fixed typos >> >> Vladimir Medvedkin (3): >> hash: add predictable RSS API >> hash: add predictable RSS implementation >> test/hash: add additional thash tests >> >> app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- >> lib/librte_hash/meson.build | 3 +- >> lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++ >> lib/librte_hash/rte_thash.h | 180 +++++++++++++ >> lib/librte_hash/version.map | 8 + >> 5 files changed, 1289 insertions(+), 7 deletions(-) >> create mode 100644 lib/librte_hash/rte_thash.c >> > > It would be good to show how this could be used in an application. > Maybe yet another variant/flag to l3fwd example. Agree, I think it would be great to have a simple NAT implementation in examples. We've discussed this and will probably add in next releases. >
Hi Yipeng, Thanks for the review, On 10/04/2021 03:32, Wang, Yipeng1 wrote: >> -----Original Message----- >> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com> >> Sent: Tuesday, April 6, 2021 12:51 PM >> To: dev@dpdk.org >> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Chilikin, Andrey >> <andrey.chilikin@intel.com>; Kinsella, Ray <ray.kinsella@intel.com>; Wang, >> Yipeng1 <yipeng1.wang@intel.com>; Gobriel, Sameh >> <sameh.gobriel@intel.com>; Richardson, Bruce >> <bruce.richardson@intel.com> >> Subject: [PATCH v2 0/3] Predictable RSS feature >> >> This patch series introduces predictable RSS feature. >> It is based on the idea of searching for partial hash collisions within Toeplitz >> hash. >> >> The Toeplitz hash function is a homomorphism between (G, ^) and (H, ^), >> where (G, ^) - is a group of tuples and (H, ^) is a group of hashes with respect >> to XOR operation. So tuples and hashes could be treated as n-dimension and >> 32-dimension vector spaces over GF(2). >> So, f(x ^ y) == f(x) ^ f(y) >> where f - is the toeplitz hash function and x, y are tuples. >> >> The ability to predict partial collisions allows user to compute input hash value >> with desired LSB values. >> Usually number of LSB's are defined by the size of RSS Redirection Table. >> >> There could be number of use cases, for example: >> 1) NAT. Using this library it is possible to select a new port number on a >> translation in the way that rss hash for original tuple will have the same LSB's >> as rss hash for reverse tuple. >> 2) IPSec/MPLS/Vxlan. It is possible to choose tunnel id to be pinned to a >> desired queue. >> 3) TCP stack. It is possible to choose a source port number for outgoing >> connections in the way that received replies will be assigned to desired >> queue. >> 4) RSS hash key generation. Hash key initialization with random values does >> not guarantee an uniform distribution amongst queues. This library uses >> mathematically proved algorithm to complete the rss hash key to provide the >> best distribution. >> >> v2: >> - added extra API rte_thash_adjust_tuple() >> - added extra tests for rte_thash_adjust_tuple() >> - added extra fields to rte_thash_subtuple_helper struct >> - fixed typos >> >> Vladimir Medvedkin (3): >> hash: add predictable RSS API >> hash: add predictable RSS implementation >> test/hash: add additional thash tests >> >> app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- >> lib/librte_hash/meson.build | 3 +- >> lib/librte_hash/rte_thash.c | 637 >> ++++++++++++++++++++++++++++++++++++++++++++ >> lib/librte_hash/rte_thash.h | 180 +++++++++++++ >> lib/librte_hash/version.map | 8 + >> 5 files changed, 1289 insertions(+), 7 deletions(-) create mode 100644 >> lib/librte_hash/rte_thash.c >> >> -- >> 2.7.4 > > [Wang, Yipeng] > Hi, Vladimir, thanks for the patch! > I haven't fully understood every bit of the algorithm yet, > but I did see issues that this patch could potentially solve. > My understanding is that there are some restrictions for the current implementation, > for example, it only supports port(16-bit) manipulation, but not multiple fields or IP. It supports any bit-range ( with size >= reta_sz, configured for the CTX) manipulations within a tuple as well as multiple number of them. > Still, I think it should be good for the use cases you listed. I would love to hear > more feedbacks from people who are more familiar with doing NAT in production systems. > > For me, besides the comments I sent earlier, > good documentation and references are needed with clear usage examples, as others pointed out already. > > Also, the current API design seems a bit cumbersome. > To use the library, one needs: > Init_ctx > Add_helper. > Get_helper > Get_complement > Then in a loop: > Adjust_tuples > Then XOR with the current tuple > > I wonder if an alternative all-in-one API could be designed for simpler use cases. Agree, v3 will have a new rte_thash_adjust_tuple() implementation that will make everything much easier. > > Thanks! > > >
11/04/2021 20:51, Medvedkin, Vladimir: > On 08/04/2021 18:56, Stephen Hemminger wrote: > >> app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- > >> lib/librte_hash/meson.build | 3 +- > >> lib/librte_hash/rte_thash.c | 637 ++++++++++++++++++++++++++++++++++++++++++++ > >> lib/librte_hash/rte_thash.h | 180 +++++++++++++ > >> lib/librte_hash/version.map | 8 + > >> 5 files changed, 1289 insertions(+), 7 deletions(-) > >> create mode 100644 lib/librte_hash/rte_thash.c > > > > It would be good to show how this could be used in an application. > > Maybe yet another variant/flag to l3fwd example. > > Agree, I think it would be great to have a simple NAT implementation in > examples. We've discussed this and will probably add in next releases. Vladimir, any update please? Could you provide it in 21.11?
Hi Thomas, пт, 22 окт. 2021 г. в 22:56, Thomas Monjalon <thomas@monjalon.net>: > 11/04/2021 20:51, Medvedkin, Vladimir: > > On 08/04/2021 18:56, Stephen Hemminger wrote: > > >> app/test/test_thash.c | 468 +++++++++++++++++++++++++++++++- > > >> lib/librte_hash/meson.build | 3 +- > > >> lib/librte_hash/rte_thash.c | 637 > ++++++++++++++++++++++++++++++++++++++++++++ > > >> lib/librte_hash/rte_thash.h | 180 +++++++++++++ > > >> lib/librte_hash/version.map | 8 + > > >> 5 files changed, 1289 insertions(+), 7 deletions(-) > > >> create mode 100644 lib/librte_hash/rte_thash.c > > > > > > It would be good to show how this could be used in an application. > > > Maybe yet another variant/flag to l3fwd example. > > > > Agree, I think it would be great to have a simple NAT implementation in > > examples. We've discussed this and will probably add in next releases. > > Vladimir, any update please? > Could you provide it in 21.11? > > > > It won't go in 21.11. The prioritization was done in favour of the vectorized implementation of the Toeplitz hash function, the scalar implementation takes about 200 cycles for a 12 byte tuple, which affects on a connections per second performance of the NAT example. The new implementation is about 20 times faster.