[v3,0/7] introduce per-queue limit watermark and host shaper

Message ID	20220524152041.737154-1-spiked@nvidia.com (mailing list archive)
Headers	Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C From: Spike Du <spiked@nvidia.com> To: <matan@nvidia.com>, <viacheslavo@nvidia.com>, <orika@nvidia.com>, <thomas@monjalon.net> CC: <dev@dpdk.org>, <rasland@nvidia.com> Subject: [PATCH v3 0/7] introduce per-queue limit watermark and host shaper Date: Tue, 24 May 2022 18:20:34 +0300 Message-ID: <20220524152041.737154-1-spiked@nvidia.com> In-Reply-To: <20220522055900.417282-1-spiked@nvidia.com> References: <20220522055900.417282-1-spiked@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: list Errors-To: dev-bounces@dpdk.org
Series	introduce per-queue limit watermark and host shaper \| [v3,0/7] introduce per-queue limit watermark and host shaper [v3,1/7] net/mlx5: add LWM support for Rxq [v3,2/7] common/mlx5: share interrupt management [v3,3/7] ethdev: introduce Rx queue based limit watermark [v3,4/7] net/mlx5: add LWM event handling support [v3,5/7] net/mlx5: support Rx queue based limit watermark [v3,6/7] net/mlx5: add private API to config host port shaper [v3,7/7] app/testpmd: add LWM and Host Shaper command

Message ID

20220524152041.737154-1-spiked@nvidia.com (mailing list archive)

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 12.22.5.235 as permitted sender) receiver=protection.outlook.com;
 client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C
From: Spike Du <spiked@nvidia.com>
To: <matan@nvidia.com>, <viacheslavo@nvidia.com>, <orika@nvidia.com>,
 <thomas@monjalon.net>
CC: <dev@dpdk.org>, <rasland@nvidia.com>
Subject: [PATCH v3 0/7] introduce per-queue limit watermark and host shaper
Date: Tue, 24 May 2022 18:20:34 +0300
Message-ID: <20220524152041.737154-1-spiked@nvidia.com>
In-Reply-To: <20220522055900.417282-1-spiked@nvidia.com>
References: <20220522055900.417282-1-spiked@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 May 2022 15:20:59.3174 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 a7f8a260-0f9e-4de0-e0bb-08da3d9901b4
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.235];
 Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 CO1NAM11FT043.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB3214
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Series

introduce per-queue limit watermark and host shaper |

Message

Spike Du May 24, 2022, 3:20 p.m. UTC

  LWM(limit watermark) is per RX queue attribute, when RX queue fullness reach the LWM limit, HW sends an event to dpdk application.
Host shaper can configure shaper rate and lwm-triggered for a host port.
The shaper limits the rate of traffic from host port to wire port.
If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically when one of the host port's Rx queues receives LWM event.

These two features can combine to control traffic from host port to wire port.
The work flow is configure LWM to RX queue and enable lwm-triggered flag in host shaper, after receiving LWM event, delay a while until RX queue is empty , then disable the shaper. We recycle this work flow to reduce RX queue drops.

Add new libethdev API to set LWM, add rte event RTE_ETH_EVENT_RXQ_LIMIT_REACHED to handle LWM event. For host shaper, because it doesn't align to existing DPDK framework and is specific to Nvidia NIC, use PMD private API.

For integration with testpmd, put the private cmdline function and LWM event handler in mlx5 PMD directory by adding a new file mlx5_test.c. Only add minimal code in testpmd to invoke interfaces from mlx5_test.c.

Spike Du (7):
  net/mlx5: add LWM support for Rxq
  common/mlx5: share interrupt management
  ethdev: introduce Rx queue based limit watermark
  net/mlx5: add LWM event handling support
  net/mlx5: support Rx queue based limit watermark
  net/mlx5: add private API to config host port shaper
  app/testpmd: add LWM and Host Shaper command

 app/test-pmd/cmdline.c                       |  74 +++++
 app/test-pmd/config.c                        |  21 ++
 app/test-pmd/meson.build                     |   4 +
 app/test-pmd/testpmd.c                       |  24 ++
 app/test-pmd/testpmd.h                       |   1 +
 doc/guides/nics/mlx5.rst                     |  84 ++++++
 doc/guides/rel_notes/release_22_07.rst       |   2 +
 drivers/common/mlx5/linux/meson.build        |  13 +
 drivers/common/mlx5/linux/mlx5_common_os.c   | 131 +++++++++
 drivers/common/mlx5/linux/mlx5_common_os.h   |  11 +
 drivers/common/mlx5/mlx5_prm.h               |  26 ++
 drivers/common/mlx5/version.map              |   2 +
 drivers/common/mlx5/windows/mlx5_common_os.h |  24 ++
 drivers/net/mlx5/linux/mlx5_ethdev_os.c      |  71 -----
 drivers/net/mlx5/linux/mlx5_os.c             | 132 ++-------
 drivers/net/mlx5/linux/mlx5_socket.c         |  53 +---
 drivers/net/mlx5/mlx5.c                      |  68 +++++
 drivers/net/mlx5/mlx5.h                      |  12 +-
 drivers/net/mlx5/mlx5_devx.c                 |  60 +++-
 drivers/net/mlx5/mlx5_devx.h                 |   1 +
 drivers/net/mlx5/mlx5_rx.c                   | 292 +++++++++++++++++++
 drivers/net/mlx5/mlx5_rx.h                   |  13 +
 drivers/net/mlx5/mlx5_testpmd.c              | 184 ++++++++++++
 drivers/net/mlx5/mlx5_testpmd.h              |  27 ++
 drivers/net/mlx5/mlx5_txpp.c                 |  28 +-
 drivers/net/mlx5/rte_pmd_mlx5.h              |  30 ++
 drivers/net/mlx5/version.map                 |   2 +
 drivers/net/mlx5/windows/mlx5_ethdev_os.c    |  22 --
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c          |  48 +--
 lib/ethdev/ethdev_driver.h                   |  22 ++
 lib/ethdev/rte_ethdev.c                      |  52 ++++
 lib/ethdev/rte_ethdev.h                      |  71 +++++
 lib/ethdev/version.map                       |   2 +
 33 files changed, 1299 insertions(+), 308 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_testpmd.c
 create mode 100644 drivers/net/mlx5/mlx5_testpmd.h

Comments

Thomas Monjalon May 24, 2022, 3:59 p.m. UTC | #1

+Cc people involved in previous versions

24/05/2022 17:20, Spike Du:
> LWM(limit watermark) is per RX queue attribute, when RX queue fullness reach the LWM limit, HW sends an event to dpdk application.
> Host shaper can configure shaper rate and lwm-triggered for a host port.
> The shaper limits the rate of traffic from host port to wire port.
> If lwm-triggered is enabled, a 100Mbps shaper is enabled automatically when one of the host port's Rx queues receives LWM event.
> 
> These two features can combine to control traffic from host port to wire port.
> The work flow is configure LWM to RX queue and enable lwm-triggered flag in host shaper, after receiving LWM event, delay a while until RX queue is empty , then disable the shaper. We recycle this work flow to reduce RX queue drops.
> 
> Add new libethdev API to set LWM, add rte event RTE_ETH_EVENT_RXQ_LIMIT_REACHED to handle LWM event. For host shaper, because it doesn't align to existing DPDK framework and is specific to Nvidia NIC, use PMD private API.
> 
> For integration with testpmd, put the private cmdline function and LWM event handler in mlx5 PMD directory by adding a new file mlx5_test.c. Only add minimal code in testpmd to invoke interfaces from mlx5_test.c.
> 
> Spike Du (7):
>   net/mlx5: add LWM support for Rxq
>   common/mlx5: share interrupt management
>   ethdev: introduce Rx queue based limit watermark
>   net/mlx5: add LWM event handling support
>   net/mlx5: support Rx queue based limit watermark
>   net/mlx5: add private API to config host port shaper
>   app/testpmd: add LWM and Host Shaper command
> 
>  app/test-pmd/cmdline.c                       |  74 +++++
>  app/test-pmd/config.c                        |  21 ++
>  app/test-pmd/meson.build                     |   4 +
>  app/test-pmd/testpmd.c                       |  24 ++
>  app/test-pmd/testpmd.h                       |   1 +
>  doc/guides/nics/mlx5.rst                     |  84 ++++++
>  doc/guides/rel_notes/release_22_07.rst       |   2 +
>  drivers/common/mlx5/linux/meson.build        |  13 +
>  drivers/common/mlx5/linux/mlx5_common_os.c   | 131 +++++++++
>  drivers/common/mlx5/linux/mlx5_common_os.h   |  11 +
>  drivers/common/mlx5/mlx5_prm.h               |  26 ++
>  drivers/common/mlx5/version.map              |   2 +
>  drivers/common/mlx5/windows/mlx5_common_os.h |  24 ++
>  drivers/net/mlx5/linux/mlx5_ethdev_os.c      |  71 -----
>  drivers/net/mlx5/linux/mlx5_os.c             | 132 ++-------
>  drivers/net/mlx5/linux/mlx5_socket.c         |  53 +---
>  drivers/net/mlx5/mlx5.c                      |  68 +++++
>  drivers/net/mlx5/mlx5.h                      |  12 +-
>  drivers/net/mlx5/mlx5_devx.c                 |  60 +++-
>  drivers/net/mlx5/mlx5_devx.h                 |   1 +
>  drivers/net/mlx5/mlx5_rx.c                   | 292 +++++++++++++++++++
>  drivers/net/mlx5/mlx5_rx.h                   |  13 +
>  drivers/net/mlx5/mlx5_testpmd.c              | 184 ++++++++++++
>  drivers/net/mlx5/mlx5_testpmd.h              |  27 ++
>  drivers/net/mlx5/mlx5_txpp.c                 |  28 +-
>  drivers/net/mlx5/rte_pmd_mlx5.h              |  30 ++
>  drivers/net/mlx5/version.map                 |   2 +
>  drivers/net/mlx5/windows/mlx5_ethdev_os.c    |  22 --
>  drivers/vdpa/mlx5/mlx5_vdpa_virtq.c          |  48 +--
>  lib/ethdev/ethdev_driver.h                   |  22 ++
>  lib/ethdev/rte_ethdev.c                      |  52 ++++
>  lib/ethdev/rte_ethdev.h                      |  71 +++++
>  lib/ethdev/version.map                       |   2 +
>  33 files changed, 1299 insertions(+), 308 deletions(-)
>  create mode 100644 drivers/net/mlx5/mlx5_testpmd.c
>  create mode 100644 drivers/net/mlx5/mlx5_testpmd.h

Morten Brørup May 24, 2022, 7 p.m. UTC | #2

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Tuesday, 24 May 2022 17.59
> 
> +Cc people involved in previous versions
> 
> 24/05/2022 17:20, Spike Du:
> > LWM(limit watermark) is per RX queue attribute, when RX queue
> fullness reach the LWM limit, HW sends an event to dpdk application.
> > Host shaper can configure shaper rate and lwm-triggered for a host
> port.

Please ignore this comment, it is not important, but I had to get it out of my system: I assume that the "LWM" name is from the NIC datasheet; otherwise I would probably prefer something with "threshold"... LWM is easily confused with "low water mark", which is the opposite of what the LWM does. Names are always open for discussion, so I won't object to it.

> > The shaper limits the rate of traffic from host port to wire port.

From host to wire? It is RX, so you must mean from wire to host.

> > If lwm-triggered is enabled, a 100Mbps shaper is enabled
> automatically when one of the host port's Rx queues receives LWM event.
> >
> > These two features can combine to control traffic from host port to
> wire port.

Again, you mean from wire to host?

> > The work flow is configure LWM to RX queue and enable lwm-triggered
> flag in host shaper, after receiving LWM event, delay a while until RX
> queue is empty , then disable the shaper. We recycle this work flow to
> reduce RX queue drops.

You delay while RX queue gets drained by some other threads, I assume.

Surely, the excess packets must be dropped somewhere, e.g. by the shaper?

> >
> > Add new libethdev API to set LWM, add rte event
> RTE_ETH_EVENT_RXQ_LIMIT_REACHED to handle LWM event.

Makes sense to make it public; could be usable for other purposes, similar to interrupt coalescing, as mentioned by Stephen.

> > For host shaper,
> because it doesn't align to existing DPDK framework and is specific to
> Nvidia NIC, use PMD private API.

Makes sense to keep it private.

> >
> > For integration with testpmd, put the private cmdline function and
> LWM event handler in mlx5 PMD directory by adding a new file
> mlx5_test.c. Only add minimal code in testpmd to invoke interfaces from
> mlx5_test.c.
> >
> > Spike Du (7):
> >   net/mlx5: add LWM support for Rxq
> >   common/mlx5: share interrupt management
> >   ethdev: introduce Rx queue based limit watermark
> >   net/mlx5: add LWM event handling support
> >   net/mlx5: support Rx queue based limit watermark
> >   net/mlx5: add private API to config host port shaper
> >   app/testpmd: add LWM and Host Shaper command
> >

Thomas Monjalon May 24, 2022, 7:22 p.m. UTC | #3

24/05/2022 21:00, Morten Brørup:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 24/05/2022 17:20, Spike Du:
> > > LWM(limit watermark) is per RX queue attribute, when RX queue
> > fullness reach the LWM limit, HW sends an event to dpdk application.
> 
> Please ignore this comment, it is not important, but I had to get it out of my system: I assume that the "LWM" name is from the NIC datasheet; otherwise I would probably prefer something with "threshold"... LWM is easily confused with "low water mark", which is the opposite of what the LWM does. Names are always open for discussion, so I won't object to it.

Yes it is a threshold, and yes it is often called a watermark.
I think we can get more ideas and votes about the naming.
Please let's conclude on a short name which can be inserted
easily in function names.

Spike Du May 25, 2022, 1:14 p.m. UTC | #4

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, May 25, 2022 3:00 AM
> To: NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> Spike Du <spiked@nvidia.com>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; dev@dpdk.org;
> Raslan Darawsheh <rasland@nvidia.com>; stephen@networkplumber.org;
> andrew.rybchenko@oktetlabs.ru; ferruh.yigit@amd.com;
> david.marchand@redhat.com
> Subject: RE: [PATCH v3 0/7] introduce per-queue limit watermark and host
> shaper
> 
> External email: Use caution opening links or attachments
> 
> 
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Tuesday, 24 May 2022 17.59
> >
> > +Cc people involved in previous versions
> >
> > 24/05/2022 17:20, Spike Du:
> > > LWM(limit watermark) is per RX queue attribute, when RX queue
> > fullness reach the LWM limit, HW sends an event to dpdk application.
> > > Host shaper can configure shaper rate and lwm-triggered for a host
> > port.
> 
> Please ignore this comment, it is not important, but I had to get it out of my
> system: I assume that the "LWM" name is from the NIC datasheet; otherwise
> I would probably prefer something with "threshold"... LWM is easily
> confused with "low water mark", which is the opposite of what the LWM
> does. Names are always open for discussion, so I won't object to it.
> 
> > > The shaper limits the rate of traffic from host port to wire port.
> 
> From host to wire? It is RX, so you must mean from wire to host.

The host shaper is quite private to Nvidia's BlueField 2 NIC. The NIC is inserted
In a server which we call it host-system, and the NIC has an embedded Arm-system
Which does the forwarding.
The traffic flows from host-system to wire like this:
Host-system generates traffic, send it to Arm-system, Arm sends it to physical/wire port.
So the RX happens between host-system and Arm-system, and the traffic is host to wire.
The shaper also works in a special way: you configure it on Arm-system, but it takes effect
On host-sysmem's TX side. 

> 
> > > If lwm-triggered is enabled, a 100Mbps shaper is enabled
> > automatically when one of the host port's Rx queues receives LWM event.
> > >
> > > These two features can combine to control traffic from host port to
> > wire port.
> 
> Again, you mean from wire to host?

Pls see above.

> 
> > > The work flow is configure LWM to RX queue and enable lwm-triggered
> > flag in host shaper, after receiving LWM event, delay a while until RX
> > queue is empty , then disable the shaper. We recycle this work flow to
> > reduce RX queue drops.
> 
> You delay while RX queue gets drained by some other threads, I assume.

The PMD thread drains the Rx queue, the PMD receiving  as normal, as the PMD
Implementation uses rte interrupt thread to handle LWM event.

> 
> Surely, the excess packets must be dropped somewhere, e.g. by the shaper?
> 
> > >
> > > Add new libethdev API to set LWM, add rte event
> > RTE_ETH_EVENT_RXQ_LIMIT_REACHED to handle LWM event.
> 
> Makes sense to make it public; could be usable for other purposes, similar to
> interrupt coalescing, as mentioned by Stephen.
> 
> > > For host shaper,
> > because it doesn't align to existing DPDK framework and is specific to
> > Nvidia NIC, use PMD private API.
> 
> Makes sense to keep it private.
> 
> > >
> > > For integration with testpmd, put the private cmdline function and
> > LWM event handler in mlx5 PMD directory by adding a new file
> > mlx5_test.c. Only add minimal code in testpmd to invoke interfaces
> > from mlx5_test.c.
> > >
> > > Spike Du (7):
> > >   net/mlx5: add LWM support for Rxq
> > >   common/mlx5: share interrupt management
> > >   ethdev: introduce Rx queue based limit watermark
> > >   net/mlx5: add LWM event handling support
> > >   net/mlx5: support Rx queue based limit watermark
> > >   net/mlx5: add private API to config host port shaper
> > >   app/testpmd: add LWM and Host Shaper command
> > >

Morten Brørup May 25, 2022, 1:40 p.m. UTC | #5

> From: Spike Du [mailto:spiked@nvidia.com]
> Sent: Wednesday, 25 May 2022 15.15
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, May 25, 2022 3:00 AM
> >
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Tuesday, 24 May 2022 17.59
> > >
> > > +Cc people involved in previous versions
> > >
> > > 24/05/2022 17:20, Spike Du:
> > > > LWM(limit watermark) is per RX queue attribute, when RX queue
> > > fullness reach the LWM limit, HW sends an event to dpdk
> application.
> > > > Host shaper can configure shaper rate and lwm-triggered for a
> host
> > > port.
> >
> > Please ignore this comment, it is not important, but I had to get it
> out of my
> > system: I assume that the "LWM" name is from the NIC datasheet;
> otherwise
> > I would probably prefer something with "threshold"... LWM is easily
> > confused with "low water mark", which is the opposite of what the LWM
> > does. Names are always open for discussion, so I won't object to it.
> >
> > > > The shaper limits the rate of traffic from host port to wire
> port.
> >
> > From host to wire? It is RX, so you must mean from wire to host.
> 
> The host shaper is quite private to Nvidia's BlueField 2 NIC. The NIC
> is inserted
> In a server which we call it host-system, and the NIC has an embedded
> Arm-system
> Which does the forwarding.
> The traffic flows from host-system to wire like this:
> Host-system generates traffic, send it to Arm-system, Arm sends it to
> physical/wire port.
> So the RX happens between host-system and Arm-system, and the traffic
> is host to wire.
> The shaper also works in a special way: you configure it on Arm-system,
> but it takes effect
> On host-sysmem's TX side.
> 
> >
> > > > If lwm-triggered is enabled, a 100Mbps shaper is enabled
> > > automatically when one of the host port's Rx queues receives LWM
> event.
> > > >
> > > > These two features can combine to control traffic from host port
> to
> > > wire port.
> >
> > Again, you mean from wire to host?
> 
> Pls see above.
> 
> >
> > > > The work flow is configure LWM to RX queue and enable lwm-
> triggered
> > > flag in host shaper, after receiving LWM event, delay a while until
> RX
> > > queue is empty , then disable the shaper. We recycle this work flow
> to
> > > reduce RX queue drops.
> >
> > You delay while RX queue gets drained by some other threads, I
> assume.
> 
> The PMD thread drains the Rx queue, the PMD receiving  as normal, as
> the PMD
> Implementation uses rte interrupt thread to handle LWM event.
> 

Thank you for the explanation, Spike. It really clarifies a lot!

If this patch is intended for DPDK running on the host-system, then the LWM attribute is associated with a TX queue, not an RX queue. The packets are egressing from the host-system, so TX from the host-system's perspective.

Otherwise, if this patch is for DPDK running on the embedded ARM-system, it should be highlighted somewhere.

> >
> > Surely, the excess packets must be dropped somewhere, e.g. by the
> shaper?

I guess the shaper doesn't have to drop any packets, but the host-system will simply be unable to put more packets into the queue if it runs full.

Spike Du May 25, 2022, 1:59 p.m. UTC | #6

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, May 25, 2022 9:40 PM
> To: Spike Du <spiked@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>
> Cc: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; dev@dpdk.org;
> Raslan Darawsheh <rasland@nvidia.com>; stephen@networkplumber.org;
> andrew.rybchenko@oktetlabs.ru; ferruh.yigit@amd.com;
> david.marchand@redhat.com
> Subject: RE: [PATCH v3 0/7] introduce per-queue limit watermark and host
> shaper
> 
> External email: Use caution opening links or attachments
> 
> 
> > From: Spike Du [mailto:spiked@nvidia.com]
> > Sent: Wednesday, 25 May 2022 15.15
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Wednesday, May 25, 2022 3:00 AM
> > >
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Tuesday, 24 May 2022 17.59
> > > >
> > > > +Cc people involved in previous versions
> > > >
> > > > 24/05/2022 17:20, Spike Du:
> > > > > LWM(limit watermark) is per RX queue attribute, when RX queue
> > > > fullness reach the LWM limit, HW sends an event to dpdk
> > application.
> > > > > Host shaper can configure shaper rate and lwm-triggered for a
> > host
> > > > port.
> > >
> > > Please ignore this comment, it is not important, but I had to get it
> > out of my
> > > system: I assume that the "LWM" name is from the NIC datasheet;
> > otherwise
> > > I would probably prefer something with "threshold"... LWM is easily
> > > confused with "low water mark", which is the opposite of what the
> > > LWM does. Names are always open for discussion, so I won't object to it.
> > >
> > > > > The shaper limits the rate of traffic from host port to wire
> > port.
> > >
> > > From host to wire? It is RX, so you must mean from wire to host.
> >
> > The host shaper is quite private to Nvidia's BlueField 2 NIC. The NIC
> > is inserted In a server which we call it host-system, and the NIC has
> > an embedded Arm-system Which does the forwarding.
> > The traffic flows from host-system to wire like this:
> > Host-system generates traffic, send it to Arm-system, Arm sends it to
> > physical/wire port.
> > So the RX happens between host-system and Arm-system, and the traffic
> > is host to wire.
> > The shaper also works in a special way: you configure it on
> > Arm-system, but it takes effect On host-sysmem's TX side.
> >
> > >
> > > > > If lwm-triggered is enabled, a 100Mbps shaper is enabled
> > > > automatically when one of the host port's Rx queues receives LWM
> > event.
> > > > >
> > > > > These two features can combine to control traffic from host port
> > to
> > > > wire port.
> > >
> > > Again, you mean from wire to host?
> >
> > Pls see above.
> >
> > >
> > > > > The work flow is configure LWM to RX queue and enable lwm-
> > triggered
> > > > flag in host shaper, after receiving LWM event, delay a while
> > > > until
> > RX
> > > > queue is empty , then disable the shaper. We recycle this work
> > > > flow
> > to
> > > > reduce RX queue drops.
> > >
> > > You delay while RX queue gets drained by some other threads, I
> > assume.
> >
> > The PMD thread drains the Rx queue, the PMD receiving  as normal, as
> > the PMD Implementation uses rte interrupt thread to handle LWM event.
> >
> 
> Thank you for the explanation, Spike. It really clarifies a lot!
> 
> If this patch is intended for DPDK running on the host-system, then the LWM
> attribute is associated with a TX queue, not an RX queue. The packets are
> egressing from the host-system, so TX from the host-system's perspective.
> 
> Otherwise, if this patch is for DPDK running on the embedded ARM-system,
> it should be highlighted somewhere.

The host-shaper patch is running on ARM-system, I think in that patch I have some explanation in mlx5.rst.
The LWM patch is common and should work on any Rx queue(right now mlx5 doesn't support Hairpin Rx queue and shared Rx queue).
On ARM-system, we can use it to monitor traffic from host(representor port) or from wire(physical port).
LWM can also work on host-system if there is DPDK running, for example it can monitor traffic from Arm-system to host-system.

> 
> > >
> > > Surely, the excess packets must be dropped somewhere, e.g. by the
> > shaper?
> 
> I guess the shaper doesn't have to drop any packets, but the host-system will
> simply be unable to put more packets into the queue if it runs full.
> 

When LWM event happens, the host-shaper throttles traffic from host-system to Arm-system. Yes, the shaper doesn't drop pkts.
Normally the shaper is small and if PMD thread on Arm keeps working, Rx queue is dropless.
But if PMD thread doesn't receive fast enough, or even with a small shaper but host-system is sending some burst,  Rx queue may still drop on Arm.
Anyway even sometimes drop still happens, the cooperation of host-shaper and LWM greatly reduce the Rx drop on Arm.

Andrew Rybchenko May 25, 2022, 2:11 p.m. UTC | #7

On 5/24/22 22:22, Thomas Monjalon wrote:
> 24/05/2022 21:00, Morten Brørup:
>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>>> 24/05/2022 17:20, Spike Du:
>>>> LWM(limit watermark) is per RX queue attribute, when RX queue
>>> fullness reach the LWM limit, HW sends an event to dpdk application.
>>
>> Please ignore this comment, it is not important, but I had to get it out of my system: I assume that the "LWM" name is from the NIC datasheet; otherwise I would probably prefer something with "threshold"... LWM is easily confused with "low water mark", which is the opposite of what the LWM does. Names are always open for discussion, so I won't object to it.
> 
> Yes it is a threshold, and yes it is often called a watermark.
> I think we can get more ideas and votes about the naming.
> Please let's conclude on a short name which can be inserted
> easily in function names.

As I understand it is an Rx queue fill (level) threshold.
"fill_thresh" or "flt" if the first one is too long.

Morten Brørup May 25, 2022, 2:16 p.m. UTC | #8

> From: Spike Du [mailto:spiked@nvidia.com]
> Sent: Wednesday, 25 May 2022 15.59
> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, May 25, 2022 9:40 PM
> >
> > > From: Spike Du [mailto:spiked@nvidia.com]
> > > Sent: Wednesday, 25 May 2022 15.15
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: Wednesday, May 25, 2022 3:00 AM
> > > >
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Tuesday, 24 May 2022 17.59
> > > > >
> > > > > +Cc people involved in previous versions
> > > > >
> > > > > 24/05/2022 17:20, Spike Du:
> > > > > > LWM(limit watermark) is per RX queue attribute, when RX queue
> > > > > fullness reach the LWM limit, HW sends an event to dpdk
> > > application.
> > > > > > Host shaper can configure shaper rate and lwm-triggered for a
> > > host
> > > > > port.
> > > >
> > > >
> > > > > > The shaper limits the rate of traffic from host port to wire
> > > port.
> > > >
> > > > From host to wire? It is RX, so you must mean from wire to host.
> > >
> > > The host shaper is quite private to Nvidia's BlueField 2 NIC. The
> NIC
> > > is inserted In a server which we call it host-system, and the NIC
> has
> > > an embedded Arm-system Which does the forwarding.
> > > The traffic flows from host-system to wire like this:
> > > Host-system generates traffic, send it to Arm-system, Arm sends it
> to
> > > physical/wire port.
> > > So the RX happens between host-system and Arm-system, and the
> traffic
> > > is host to wire.
> > > The shaper also works in a special way: you configure it on
> > > Arm-system, but it takes effect On host-sysmem's TX side.
> > >
> > > >
> > > > > > If lwm-triggered is enabled, a 100Mbps shaper is enabled
> > > > > automatically when one of the host port's Rx queues receives
> LWM
> > > event.
> > > > > >
> > > > > > These two features can combine to control traffic from host
> port
> > > to
> > > > > wire port.
> > > >
> > > > Again, you mean from wire to host?
> > >
> > > Pls see above.
> > >
> > > >
> > > > > > The work flow is configure LWM to RX queue and enable lwm-
> > > triggered
> > > > > flag in host shaper, after receiving LWM event, delay a while
> > > > > until
> > > RX
> > > > > queue is empty , then disable the shaper. We recycle this work
> > > > > flow
> > > to
> > > > > reduce RX queue drops.
> > > >
> > > > You delay while RX queue gets drained by some other threads, I
> > > assume.
> > >
> > > The PMD thread drains the Rx queue, the PMD receiving  as normal,
> as
> > > the PMD Implementation uses rte interrupt thread to handle LWM
> event.
> > >
> >
> > Thank you for the explanation, Spike. It really clarifies a lot!
> >
> > If this patch is intended for DPDK running on the host-system, then
> the LWM
> > attribute is associated with a TX queue, not an RX queue. The packets
> are
> > egressing from the host-system, so TX from the host-system's
> perspective.
> >
> > Otherwise, if this patch is for DPDK running on the embedded ARM-
> system,
> > it should be highlighted somewhere.
> 
> The host-shaper patch is running on ARM-system, I think in that patch I
> have some explanation in mlx5.rst.
> The LWM patch is common and should work on any Rx queue(right now mlx5
> doesn't support Hairpin Rx queue and shared Rx queue).
> On ARM-system, we can use it to monitor traffic from host(representor
> port) or from wire(physical port).
> LWM can also work on host-system if there is DPDK running, for example
> it can monitor traffic from Arm-system to host-system.

OK. Then I get it! I was reading the patch description wearing my host-system glasses, and thus got very confused. :-)

> 
> >
> > > >
> > > > Surely, the excess packets must be dropped somewhere, e.g. by the
> > > shaper?
> >
> > I guess the shaper doesn't have to drop any packets, but the host-
> system will
> > simply be unable to put more packets into the queue if it runs full.
> >
> 
> When LWM event happens, the host-shaper throttles traffic from host-
> system to Arm-system. Yes, the shaper doesn't drop pkts.
> Normally the shaper is small and if PMD thread on Arm keeps working, Rx
> queue is dropless.
> But if PMD thread doesn't receive fast enough, or even with a small
> shaper but host-system is sending some burst,  Rx queue may still drop
> on Arm.
> Anyway even sometimes drop still happens, the cooperation of host-
> shaper and LWM greatly reduce the Rx drop on Arm.

Thanks for elaborating. And yes, shapers are excellent for many scenarios.

Andrew Rybchenko May 25, 2022, 2:30 p.m. UTC | #9

On 5/25/22 17:16, Morten Brørup wrote:
>> From: Spike Du [mailto:spiked@nvidia.com]
>> Sent: Wednesday, 25 May 2022 15.59
>>
>>> From: Morten Brørup <mb@smartsharesystems.com>
>>> Sent: Wednesday, May 25, 2022 9:40 PM
>>>
>>>> From: Spike Du [mailto:spiked@nvidia.com]
>>>> Sent: Wednesday, 25 May 2022 15.15
>>>>
>>>>> From: Morten Brørup <mb@smartsharesystems.com>
>>>>> Sent: Wednesday, May 25, 2022 3:00 AM
>>>>>
>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>>>>>> Sent: Tuesday, 24 May 2022 17.59
>>>>>>
>>>>>> +Cc people involved in previous versions
>>>>>>
>>>>>> 24/05/2022 17:20, Spike Du:
>>>>>>> LWM(limit watermark) is per RX queue attribute, when RX queue
>>>>>> fullness reach the LWM limit, HW sends an event to dpdk
>>>> application.
>>>>>>> Host shaper can configure shaper rate and lwm-triggered for a
>>>> host
>>>>>> port.
>>>>>
>>>>>
>>>>>>> The shaper limits the rate of traffic from host port to wire
>>>> port.
>>>>>
>>>>>  From host to wire? It is RX, so you must mean from wire to host.
>>>>
>>>> The host shaper is quite private to Nvidia's BlueField 2 NIC. The
>> NIC
>>>> is inserted In a server which we call it host-system, and the NIC
>> has
>>>> an embedded Arm-system Which does the forwarding.
>>>> The traffic flows from host-system to wire like this:
>>>> Host-system generates traffic, send it to Arm-system, Arm sends it
>> to
>>>> physical/wire port.
>>>> So the RX happens between host-system and Arm-system, and the
>> traffic
>>>> is host to wire.
>>>> The shaper also works in a special way: you configure it on
>>>> Arm-system, but it takes effect On host-sysmem's TX side.
>>>>
>>>>>
>>>>>>> If lwm-triggered is enabled, a 100Mbps shaper is enabled
>>>>>> automatically when one of the host port's Rx queues receives
>> LWM
>>>> event.
>>>>>>>
>>>>>>> These two features can combine to control traffic from host
>> port
>>>> to
>>>>>> wire port.
>>>>>
>>>>> Again, you mean from wire to host?
>>>>
>>>> Pls see above.
>>>>
>>>>>
>>>>>>> The work flow is configure LWM to RX queue and enable lwm-
>>>> triggered
>>>>>> flag in host shaper, after receiving LWM event, delay a while
>>>>>> until
>>>> RX
>>>>>> queue is empty , then disable the shaper. We recycle this work
>>>>>> flow
>>>> to
>>>>>> reduce RX queue drops.
>>>>>
>>>>> You delay while RX queue gets drained by some other threads, I
>>>> assume.
>>>>
>>>> The PMD thread drains the Rx queue, the PMD receiving  as normal,
>> as
>>>> the PMD Implementation uses rte interrupt thread to handle LWM
>> event.
>>>>
>>>
>>> Thank you for the explanation, Spike. It really clarifies a lot!
>>>
>>> If this patch is intended for DPDK running on the host-system, then
>> the LWM
>>> attribute is associated with a TX queue, not an RX queue. The packets
>> are
>>> egressing from the host-system, so TX from the host-system's
>> perspective.
>>>
>>> Otherwise, if this patch is for DPDK running on the embedded ARM-
>> system,
>>> it should be highlighted somewhere.
>>
>> The host-shaper patch is running on ARM-system, I think in that patch I
>> have some explanation in mlx5.rst.
>> The LWM patch is common and should work on any Rx queue(right now mlx5
>> doesn't support Hairpin Rx queue and shared Rx queue).
>> On ARM-system, we can use it to monitor traffic from host(representor
>> port) or from wire(physical port).
>> LWM can also work on host-system if there is DPDK running, for example
>> it can monitor traffic from Arm-system to host-system.
> 
> OK. Then I get it! I was reading the patch description wearing my host-system glasses, and thus got very confused. :-)

The description in cover letter is very misleading for me as
well. It is not a problem right now after long detailed
explanations. Hopefully there is no such problem in suggested
ethdev documentation. I'll reread it carefully before applying
when time comes.

> 
>>
>>>
>>>>>
>>>>> Surely, the excess packets must be dropped somewhere, e.g. by the
>>>> shaper?
>>>
>>> I guess the shaper doesn't have to drop any packets, but the host-
>> system will
>>> simply be unable to put more packets into the queue if it runs full.
>>>
>>
>> When LWM event happens, the host-shaper throttles traffic from host-
>> system to Arm-system. Yes, the shaper doesn't drop pkts.
>> Normally the shaper is small and if PMD thread on Arm keeps working, Rx
>> queue is dropless.
>> But if PMD thread doesn't receive fast enough, or even with a small
>> shaper but host-system is sending some burst,  Rx queue may still drop
>> on Arm.
>> Anyway even sometimes drop still happens, the cooperation of host-
>> shaper and LWM greatly reduce the Rx drop on Arm.
> 
> Thanks for elaborating. And yes, shapers are excellent for many scenarios.
>