mbox series

[RFC,0/6] net/mlx5: introduce limit watermark and host shaper

Message ID 20220401032232.1267376-1-spiked@nvidia.com (mailing list archive)
Headers
Series net/mlx5: introduce limit watermark and host shaper |

Message

Spike Du April 1, 2022, 3:22 a.m. UTC
  LWM(limit watermark) is per RX queue attribute, when RX queue fullness reach
the LWM limit, HW sends an event to dpdk application.
Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
when one of the host port's Rx queues receives LWM event.

These two features can combine to control traffic from host port to wire port.
The work flow is configure LWM to RX queue and enable lwm-triggered flag in
host shaper, after receiving LWM event, delay a while until RX queue is empty
, then disable the shaper. We recycle this work flow to reduce RX queue drops.

Spike Du (6):
  net/mlx5: add LWM support for Rxq
  common/mlx5: share interrupt management
  net/mlx5: add LWM event handling support
  net/mlx5: add private API to configure Rxq LWM
  net/mlx5: add private API to config host port shaper
  app/testpmd: add LWM and Host Shaper command

 app/test-pmd/cmdline.c                       | 149 ++++++++++++++++++
 app/test-pmd/config.c                        | 122 +++++++++++++++
 app/test-pmd/meson.build                     |   3 +
 app/test-pmd/testpmd.c                       |   3 +
 app/test-pmd/testpmd.h                       |   5 +
 doc/guides/nics/mlx5.rst                     |  87 +++++++++++
 doc/guides/rel_notes/release_22_03.rst       |   7 +
 drivers/common/mlx5/linux/meson.build        |  21 ++-
 drivers/common/mlx5/linux/mlx5_common_os.c   | 131 ++++++++++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h   |  11 ++
 drivers/common/mlx5/mlx5_prm.h               |  26 ++++
 drivers/common/mlx5/version.map              |   3 +-
 drivers/common/mlx5/windows/mlx5_common_os.h |  24 +++
 drivers/net/mlx5/linux/mlx5_ethdev_os.c      |  71 ---------
 drivers/net/mlx5/linux/mlx5_os.c             | 132 ++++------------
 drivers/net/mlx5/linux/mlx5_socket.c         |  53 +------
 drivers/net/mlx5/mlx5.c                      |  61 ++++++++
 drivers/net/mlx5/mlx5.h                      |  12 +-
 drivers/net/mlx5/mlx5_devx.c                 |  57 ++++++-
 drivers/net/mlx5/mlx5_devx.h                 |   1 +
 drivers/net/mlx5/mlx5_rx.c                   | 221 ++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_rx.h                   |   9 ++
 drivers/net/mlx5/mlx5_txpp.c                 |  28 +---
 drivers/net/mlx5/rte_pmd_mlx5.h              |  62 ++++++++
 drivers/net/mlx5/version.map                 |   2 +
 drivers/net/mlx5/windows/mlx5_ethdev_os.c    |  22 ---
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c          |  52 +------
 27 files changed, 1057 insertions(+), 318 deletions(-)
  

Comments

Jerin Jacob April 5, 2022, 8:58 a.m. UTC | #1
On Fri, Apr 1, 2022 at 8:53 AM Spike Du <spiked@nvidia.com> wrote:
>
> LWM(limit watermark) is per RX queue attribute, when RX queue fullness reach
> the LWM limit, HW sends an event to dpdk application.
> Host shaper can configure shaper rate and lwm-triggered for a host port.
> The shaper limits the rate of traffic from host port to wire port.
> If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
> when one of the host port's Rx queues receives LWM event.
>
> These two features can combine to control traffic from host port to wire port.
> The work flow is configure LWM to RX queue and enable lwm-triggered flag in
> host shaper, after receiving LWM event, delay a while until RX queue is empty
> , then disable the shaper. We recycle this work flow to reduce RX queue drops.
>
> Spike Du (6):
>   net/mlx5: add LWM support for Rxq
>   common/mlx5: share interrupt management
>   net/mlx5: add LWM event handling support
>   net/mlx5: add private API to configure Rxq LWM
>   net/mlx5: add private API to config host port shaper
>   app/testpmd: add LWM and Host Shaper command

+ @Andrew Rybchenko  @Ferruh Yigit cristian.dumitrescu@intel.com

I think, case one, can be easily abstracted via adding new
rte_eth_event_type event and
case two can be abstracted via the existing Rx meter framework in ethdev.

Also, Updating generic testpmd to support PMD specific API should be
avoided, I know there
is existing stuff in testpmd, I think, we should have the policy to
add PMD specific commands to testpmd.

There are around 56PMDs in ethdev now, If PMDs try to add PMD specific
API in testpmd it will be bloated or
at minimum, it should a separate file in testpmd if we choose to take that path.

+ @techboard@dpdk.org
  
Spike Du April 26, 2022, 2:42 a.m. UTC | #2
Hi Jerin,	
	Thanks for your comments and sorry for the late response.

	For case one, I think I can refine the design and add LWM(limit watermark) in rte_eth_rxconf, and add a new rte_eth_event_type event.

	For case two(host shaper), I think we can't use RX meter, because it's actually TX shaper on a remote system. It's quite specific to Mellanox/Nvidia BlueField 2(BF2 for short) NIC. The NIC contains an ARM system. We have two terms here: Host-system stands for the system the BF2 NIC is inserted; ARM-system stands for the embedded ARM in BF2. ARM-system is doing the forwarding. This is the way host shaper works: we configure the register on ARM-system, but it affects Host-system's TX shaper, which means the shaper is working on the remote port, it's not a RX meter concept, hence we can't use DPDK RX meter framework. I'd suggest to still use private API.

	For testpmd part, I understand your concern. Because we need one private API for host shaper, and we need testpmd's forwarding code to show how it works to user, we need to call the private API in testpmd. If current patch is not acceptable, what's the correct way to do it? Any framework to isolate the PMD private logic from testpmd common code, but still give a chance to call private APIs in testpmd?


Regards,
Spike.



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, April 5, 2022 4:59 PM
> To: Spike Du <spiked@nvidia.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Cristian Dumitrescu
> <cristian.dumitrescu@intel.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> techboard@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; dpdk-dev
> <dev@dpdk.org>; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [RFC 0/6] net/mlx5: introduce limit watermark and host shaper
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Apr 1, 2022 at 8:53 AM Spike Du <spiked@nvidia.com> wrote:
> >
> > LWM(limit watermark) is per RX queue attribute, when RX queue fullness
> > reach the LWM limit, HW sends an event to dpdk application.
> > Host shaper can configure shaper rate and lwm-triggered for a host port.
> > The shaper limits the rate of traffic from host port to wire port.
> > If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
> > when one of the host port's Rx queues receives LWM event.
> >
> > These two features can combine to control traffic from host port to wire
> port.
> > The work flow is configure LWM to RX queue and enable lwm-triggered
> > flag in host shaper, after receiving LWM event, delay a while until RX
> > queue is empty , then disable the shaper. We recycle this work flow to
> reduce RX queue drops.
> >
> > Spike Du (6):
> >   net/mlx5: add LWM support for Rxq
> >   common/mlx5: share interrupt management
> >   net/mlx5: add LWM event handling support
> >   net/mlx5: add private API to configure Rxq LWM
> >   net/mlx5: add private API to config host port shaper
> >   app/testpmd: add LWM and Host Shaper command
> 
> + @Andrew Rybchenko  @Ferruh Yigit cristian.dumitrescu@intel.com
> 
> I think, case one, can be easily abstracted via adding new
> rte_eth_event_type event and case two can be abstracted via the existing
> Rx meter framework in ethdev.
> 
> Also, Updating generic testpmd to support PMD specific API should be
> avoided, I know there is existing stuff in testpmd, I think, we should have the
> policy to add PMD specific commands to testpmd.
> 
> There are around 56PMDs in ethdev now, If PMDs try to add PMD specific
> API in testpmd it will be bloated or at minimum, it should a separate file in
> testpmd if we choose to take that path.
> 
> + @techboard@dpdk.org
  
Spike Du April 29, 2022, 5:48 a.m. UTC | #3
Hi Jerin,	
	Thanks for your comments and sorry for the late response.

	For case one, I think I can refine the design and add LWM(limit watermark) in rte_eth_rxconf, and add a new rte_eth_event_type event.

	For case two(host shaper), I think we can't use RX meter, because it's actually TX shaper on a remote system. It's quite specific to Mellanox/Nvidia BlueField 2(BF2 for short) NIC. The NIC contains an ARM system. We have two terms here: Host-system stands for the system the BF2 NIC is inserted; ARM-system stands for the embedded ARM in BF2. ARM-system is doing the forwarding. This is the way host shaper works: we configure the register on ARM-system, but it affects Host-system's TX shaper, which means the shaper is working on the remote port, it's not a RX meter concept, hence we can't use DPDK RX meter framework. I'd suggest to still use private API.

	For testpmd part, I understand your concern. Because we need one private API for host shaper, and we need testpmd's forwarding code to show how it works to user, we need to call the private API in testpmd. If current patch is not acceptable, what's the correct way to do it? Any framework to isolate the PMD private logic from testpmd common code, but still give a chance to call private APIs in testpmd?


Regards,
Spike.

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, April 5, 2022 4:59 PM
> To: Spike Du <spiked@nvidia.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Cristian Dumitrescu
> <cristian.dumitrescu@intel.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> techboard@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; dpdk-dev
> <dev@dpdk.org>; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [RFC 0/6] net/mlx5: introduce limit watermark and host shaper
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Apr 1, 2022 at 8:53 AM Spike Du <spiked@nvidia.com> wrote:
> >
> > LWM(limit watermark) is per RX queue attribute, when RX queue fullness
> > reach the LWM limit, HW sends an event to dpdk application.
> > Host shaper can configure shaper rate and lwm-triggered for a host port.
> > The shaper limits the rate of traffic from host port to wire port.
> > If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
> > when one of the host port's Rx queues receives LWM event.
> >
> > These two features can combine to control traffic from host port to wire
> port.
> > The work flow is configure LWM to RX queue and enable lwm-triggered
> > flag in host shaper, after receiving LWM event, delay a while until RX
> > queue is empty , then disable the shaper. We recycle this work flow to
> reduce RX queue drops.
> >
> > Spike Du (6):
> >   net/mlx5: add LWM support for Rxq
> >   common/mlx5: share interrupt management
> >   net/mlx5: add LWM event handling support
> >   net/mlx5: add private API to configure Rxq LWM
> >   net/mlx5: add private API to config host port shaper
> >   app/testpmd: add LWM and Host Shaper command
> 
> + @Andrew Rybchenko  @Ferruh Yigit cristian.dumitrescu@intel.com
> 
> I think, case one, can be easily abstracted via adding new
> rte_eth_event_type event and case two can be abstracted via the existing
> Rx meter framework in ethdev.
> 
> Also, Updating generic testpmd to support PMD specific API should be
> avoided, I know there is existing stuff in testpmd, I think, we should have the
> policy to add PMD specific commands to testpmd.
> 
> There are around 56PMDs in ethdev now, If PMDs try to add PMD specific
> API in testpmd it will be bloated or at minimum, it should a separate file in
> testpmd if we choose to take that path.
> 
> + @techboard@dpdk.org
  
Jerin Jacob May 1, 2022, 12:50 p.m. UTC | #4
On Tue, Apr 26, 2022 at 8:12 AM Spike Du <spiked@nvidia.com> wrote:
>
> Hi Jerin,

Hi Spike,

>         Thanks for your comments and sorry for the late response.
>
>         For case one, I think I can refine the design and add LWM(limit watermark) in rte_eth_rxconf, and add a new rte_eth_event_type event.

OK.

>
>         For case two(host shaper), I think we can't use RX meter, because it's actually TX shaper on a remote system. It's quite specific to Mellanox/Nvidia BlueField 2(BF2 for short) NIC. The NIC contains an ARM system. We have two terms here: Host-system stands for the system the BF2 NIC is inserted; ARM-system stands for the embedded ARM in BF2. ARM-system is doing the forwarding. This is the way host shaper works: we configure the register on ARM-system, but it affects Host-system's TX shaper, which means the shaper is working on the remote port, it's not a RX meter concept, hence we can't use DPDK RX meter framework. I'd suggest to still use private API.

OK. If the host is using the DPDK application then rte_tm can be used
on the egress side to enable the same. If it is not DPDK, then yes, we
need private APIs.

>
>         For testpmd part, I understand your concern. Because we need one private API for host shaper, and we need testpmd's forwarding code to show how it works to user, we need to call the private API in testpmd. If current patch is not acceptable, what's the correct way to do it? Any framework to isolate the PMD private logic from testpmd common code, but still give a chance to call private APIs in testpmd?

Please check "PMD API" item in
http://mails.dpdk.org/archives/dev/2022-April/239191.html

>
>
> Regards,
> Spike.
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Tuesday, April 5, 2022 4:59 PM
> > To: Spike Du <spiked@nvidia.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Cristian Dumitrescu
> > <cristian.dumitrescu@intel.com>; Ferruh Yigit <ferruh.yigit@intel.com>;
> > techboard@dpdk.org
> > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; dpdk-dev
> > <dev@dpdk.org>; Raslan Darawsheh <rasland@nvidia.com>
> > Subject: Re: [RFC 0/6] net/mlx5: introduce limit watermark and host shaper
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Apr 1, 2022 at 8:53 AM Spike Du <spiked@nvidia.com> wrote:
> > >
> > > LWM(limit watermark) is per RX queue attribute, when RX queue fullness
> > > reach the LWM limit, HW sends an event to dpdk application.
> > > Host shaper can configure shaper rate and lwm-triggered for a host port.
> > > The shaper limits the rate of traffic from host port to wire port.
> > > If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically
> > > when one of the host port's Rx queues receives LWM event.
> > >
> > > These two features can combine to control traffic from host port to wire
> > port.
> > > The work flow is configure LWM to RX queue and enable lwm-triggered
> > > flag in host shaper, after receiving LWM event, delay a while until RX
> > > queue is empty , then disable the shaper. We recycle this work flow to
> > reduce RX queue drops.
> > >
> > > Spike Du (6):
> > >   net/mlx5: add LWM support for Rxq
> > >   common/mlx5: share interrupt management
> > >   net/mlx5: add LWM event handling support
> > >   net/mlx5: add private API to configure Rxq LWM
> > >   net/mlx5: add private API to config host port shaper
> > >   app/testpmd: add LWM and Host Shaper command
> >
> > + @Andrew Rybchenko  @Ferruh Yigit cristian.dumitrescu@intel.com
> >
> > I think, case one, can be easily abstracted via adding new
> > rte_eth_event_type event and case two can be abstracted via the existing
> > Rx meter framework in ethdev.
> >
> > Also, Updating generic testpmd to support PMD specific API should be
> > avoided, I know there is existing stuff in testpmd, I think, we should have the
> > policy to add PMD specific commands to testpmd.
> >
> > There are around 56PMDs in ethdev now, If PMDs try to add PMD specific
> > API in testpmd it will be bloated or at minimum, it should a separate file in
> > testpmd if we choose to take that path.
> >
> > + @techboard@dpdk.org
  
Spike Du May 2, 2022, 3:58 a.m. UTC | #5
Hi Jerin,
	
> >         For case two(host shaper), I think we can't use RX meter, because it's
> actually TX shaper on a remote system. It's quite specific to Mellanox/Nvidia
> BlueField 2(BF2 for short) NIC. The NIC contains an ARM system. We have
> two terms here: Host-system stands for the system the BF2 NIC is inserted;
> ARM-system stands for the embedded ARM in BF2. ARM-system is doing the
> forwarding. This is the way host shaper works: we configure the register on
> ARM-system, but it affects Host-system's TX shaper, which means the
> shaper is working on the remote port, it's not a RX meter concept, hence we
> can't use DPDK RX meter framework. I'd suggest to still use private API.
> 
> OK. If the host is using the DPDK application then rte_tm can be used on the
> egress side to enable the same. If it is not DPDK, then yes, we need private
> APIs.
	I see your point. The RX drop happens on ARM-system, it'll be too late to notify Host-system to reduce traffic rate. To achieve dropless, MLX developed
	this feature to configure host shaper on remote port. The Host-system is flexible, it may use DPDK or not.

Regards,
Spike.


> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Sunday, May 1, 2022 8:51 PM
> To: Spike Du <spiked@nvidia.com>
> Cc: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Cristian
> Dumitrescu <cristian.dumitrescu@intel.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; techboard@dpdk.org; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam
> <orika@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; dpdk-dev <dev@dpdk.org>; Raslan Darawsheh
> <rasland@nvidia.com>
> Subject: Re: [RFC 0/6] net/mlx5: introduce limit watermark and host shaper
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Apr 26, 2022 at 8:12 AM Spike Du <spiked@nvidia.com> wrote:
> >
> > Hi Jerin,
> 
> Hi Spike,
> 
> >         Thanks for your comments and sorry for the late response.
> >
> >         For case one, I think I can refine the design and add LWM(limit
> watermark) in rte_eth_rxconf, and add a new rte_eth_event_type event.
> 
> OK.
> 
> >
> >         For case two(host shaper), I think we can't use RX meter, because it's
> actually TX shaper on a remote system. It's quite specific to Mellanox/Nvidia
> BlueField 2(BF2 for short) NIC. The NIC contains an ARM system. We have
> two terms here: Host-system stands for the system the BF2 NIC is inserted;
> ARM-system stands for the embedded ARM in BF2. ARM-system is doing the
> forwarding. This is the way host shaper works: we configure the register on
> ARM-system, but it affects Host-system's TX shaper, which means the
> shaper is working on the remote port, it's not a RX meter concept, hence we
> can't use DPDK RX meter framework. I'd suggest to still use private API.
> 
> OK. If the host is using the DPDK application then rte_tm can be used on the
> egress side to enable the same. If it is not DPDK, then yes, we need private
> APIs.
> 
> >
> >         For testpmd part, I understand your concern. Because we need one
> private API for host shaper, and we need testpmd's forwarding code to show
> how it works to user, we need to call the private API in testpmd. If current
> patch is not acceptable, what's the correct way to do it? Any framework to
> isolate the PMD private logic from testpmd common code, but still give a
> chance to call private APIs in testpmd?
> 
> Please check "PMD API" item in
> http://mails.dpdk.org/archives/dev/2022-April/239191.html
> 
> >
> >
> > Regards,
> > Spike.
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Tuesday, April 5, 2022 4:59 PM
> > > To: Spike Du <spiked@nvidia.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Cristian Dumitrescu
> > > <cristian.dumitrescu@intel.com>; Ferruh Yigit
> > > <ferruh.yigit@intel.com>; techboard@dpdk.org
> > > Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> > > Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; dpdk-dev
> > > <dev@dpdk.org>; Raslan Darawsheh <rasland@nvidia.com>
> > > Subject: Re: [RFC 0/6] net/mlx5: introduce limit watermark and host
> > > shaper
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Fri, Apr 1, 2022 at 8:53 AM Spike Du <spiked@nvidia.com> wrote:
> > > >
> > > > LWM(limit watermark) is per RX queue attribute, when RX queue
> > > > fullness reach the LWM limit, HW sends an event to dpdk application.
> > > > Host shaper can configure shaper rate and lwm-triggered for a host port.
> > > > The shaper limits the rate of traffic from host port to wire port.
> > > > If lwm-triggered is enabled, a 100Mbps shaper is enabled
> > > > automatically when one of the host port's Rx queues receives LWM
> event.
> > > >
> > > > These two features can combine to control traffic from host port
> > > > to wire
> > > port.
> > > > The work flow is configure LWM to RX queue and enable
> > > > lwm-triggered flag in host shaper, after receiving LWM event,
> > > > delay a while until RX queue is empty , then disable the shaper.
> > > > We recycle this work flow to
> > > reduce RX queue drops.
> > > >
> > > > Spike Du (6):
> > > >   net/mlx5: add LWM support for Rxq
> > > >   common/mlx5: share interrupt management
> > > >   net/mlx5: add LWM event handling support
> > > >   net/mlx5: add private API to configure Rxq LWM
> > > >   net/mlx5: add private API to config host port shaper
> > > >   app/testpmd: add LWM and Host Shaper command
> > >
> > > + @Andrew Rybchenko  @Ferruh Yigit cristian.dumitrescu@intel.com
> > >
> > > I think, case one, can be easily abstracted via adding new
> > > rte_eth_event_type event and case two can be abstracted via the
> > > existing Rx meter framework in ethdev.
> > >
> > > Also, Updating generic testpmd to support PMD specific API should be
> > > avoided, I know there is existing stuff in testpmd, I think, we
> > > should have the policy to add PMD specific commands to testpmd.
> > >
> > > There are around 56PMDs in ethdev now, If PMDs try to add PMD
> > > specific API in testpmd it will be bloated or at minimum, it should
> > > a separate file in testpmd if we choose to take that path.
> > >
> > > + @techboard@dpdk.org