[RFC] ethdev: support congestion management

Message ID 20220530131526.3598658-1-jerinj@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] ethdev: support congestion management |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Jerin Jacob Kollanukkaran May 30, 2022, 1:15 p.m. UTC
From: Jerin Jacob <jerinj@marvell.com>

NIC HW controllers often come with congestion management support on
various HW objects such as Rx queue depth or mempool queue depth.

Also, it can support various modes of operation such as RED
(Random early discard), WRED etc on those HW objects.

This patch adds a framework to express such modes(enum rte_cman_mode)
and introduce (enum rte_eth_cman_obj) to enumerate the different
objects where the modes can operate on.

This patch adds RTE_CMAN_RED mode of operation and
RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL object.

Introduced reserved fields in configuration structure
backed by rte_eth_cman_config_init() to add new configuration
parameters without ABI breakage.

Added rte_eth_cman_info_get() API to get the information such as
supported modes and objects.

Added rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
to configure congestion management on those object with associated mode.

Finally, Added rte_eth_cman_config_get() API to retrieve the
applied configuration.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 doc/guides/nics/features.rst         |  12 +++
 doc/guides/nics/features/default.ini |   1 +
 lib/eal/include/meson.build          |   1 +
 lib/eal/include/rte_cman.h           |  51 ++++++++++
 lib/ethdev/rte_ethdev.h              | 145 +++++++++++++++++++++++++++
 5 files changed, 210 insertions(+)
 create mode 100644 lib/eal/include/rte_cman.h
  

Comments

Ajit Khaparde May 30, 2022, 7:43 p.m. UTC | #1
On Mon, May 30, 2022 at 6:16 AM <jerinj@marvell.com> wrote:
>
> From: Jerin Jacob <jerinj@marvell.com>
>
> NIC HW controllers often come with congestion management support on
> various HW objects such as Rx queue depth or mempool queue depth.
>
> Also, it can support various modes of operation such as RED
> (Random early discard), WRED etc on those HW objects.
>
> This patch adds a framework to express such modes(enum rte_cman_mode)
> and introduce (enum rte_eth_cman_obj) to enumerate the different
> objects where the modes can operate on.
>
> This patch adds RTE_CMAN_RED mode of operation and
> RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL object.
>
> Introduced reserved fields in configuration structure
> backed by rte_eth_cman_config_init() to add new configuration
> parameters without ABI breakage.
>
> Added rte_eth_cman_info_get() API to get the information such as
> supported modes and objects.
>
> Added rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
> to configure congestion management on those object with associated mode.
>
> Finally, Added rte_eth_cman_config_get() API to retrieve the
> applied configuration.

Can you also add How all this helps an application? Thanks

>
>
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  doc/guides/nics/features.rst         |  12 +++
>  doc/guides/nics/features/default.ini |   1 +
>  lib/eal/include/meson.build          |   1 +
>  lib/eal/include/rte_cman.h           |  51 ++++++++++
>  lib/ethdev/rte_ethdev.h              | 145 +++++++++++++++++++++++++++
>  5 files changed, 210 insertions(+)
>  create mode 100644 lib/eal/include/rte_cman.h
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 21bedb743f..ffcc2bcc92 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -727,6 +727,18 @@ Supports configuring per-queue stat counter mapping.
>    ``rte_eth_dev_set_tx_queue_stats_mapping()``.
>
>
> +.. _nic_features_congestion_management
> +
> +Congestion management
> +---------------------
> +
> +Supports congestion management.
> +
> +* **[implements] eth_dev_ops**: ``cman_info_get``, ``cman_config_set``, ``cman_config_get``.
> +* **[related]    API**: ``rte_eth_cman_info_get()``, ``rte_eth_cman_config_init()``,
> +  ``rte_eth_cman_config_set()``, ``rte_eth_cman_config_get()``.
> +
> +
>  .. _nic_features_fw_version:
>
>  FW version
> diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> index b1d18ac62c..060497e41f 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -60,6 +60,7 @@ Tx descriptor status =
>  Basic stats          =
>  Extended stats       =
>  Stats per queue      =
> +Congestion management =
>  FW version           =
>  EEPROM dump          =
>  Module EEPROM dump   =
> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
> index 9700494816..40e31e0e6d 100644
> --- a/lib/eal/include/meson.build
> +++ b/lib/eal/include/meson.build
> @@ -10,6 +10,7 @@ headers += files(
>          'rte_branch_prediction.h',
>          'rte_bus.h',
>          'rte_class.h',
> +        'rte_cman.h',
>          'rte_common.h',
>          'rte_compat.h',
>          'rte_debug.h',
> diff --git a/lib/eal/include/rte_cman.h b/lib/eal/include/rte_cman.h
> new file mode 100644
> index 0000000000..e6dad2ebf7
> --- /dev/null
> +++ b/lib/eal/include/rte_cman.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Marvell International Ltd.
> + */
> +
> +#ifndef RTE_CMAN_H
> +#define RTE_CMAN_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_bitops.h>
> +
> +/**
> + * @file
> + * Congestion management related parameters for DPDK.
> + */
> +
> +/** Congestion management modes */
> +enum rte_cman_mode {
> +       /** Congestion based on Random Early Detection.
> +        *
> +        * https://en.wikipedia.org/wiki/Random_early_detection
> +        * @see struct rte_cman_red_params
> +        */
> +       RTE_CMAN_RED = RTE_BIT64(0),
> +};
> +
> +/**
> + * RED based congestion management configuration parameters.
> + */
> +struct rte_cman_red_params {
> +       /**< Minimum threshold (min_th) value
> +        *
> +        * Value expressed as percentage. Value must be in 0 to 100(inclusive).
> +        */
> +       uint8_t min_th;
> +       /**< Maximum threshold (max_th) value
> +        *
> +        * Value expressed as percentage. Value must be in 0 to 100(inclusive).
> +        */
> +       uint8_t max_th;
> +       /** Inverse of packet marking probability maximum value (maxp = 1 / maxp_inv) */
> +       uint16_t maxp_inv;
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_CMAN_H */
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..7c8718bb2e 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -160,6 +160,7 @@ extern "C" {
>  #define RTE_ETHDEV_DEBUG_TX
>  #endif
>
> +#include <rte_cman.h>
>  #include <rte_compat.h>
>  #include <rte_log.h>
>  #include <rte_interrupts.h>
> @@ -5452,6 +5453,150 @@ typedef struct {
>  __rte_experimental
>  int rte_eth_dev_priv_dump(uint16_t port_id, FILE *file);
>
> +/* Congestion management */
> +
> +/** Enumerate list of ethdev congestion management objects */
> +enum rte_eth_cman_obj {
> +       /** Congestion management based on Rx queue depth */
> +       RTE_ETH_CMAN_OBJ_RX_QUEUE = RTE_BIT64(0),
> +       /** Congestion management based on mempool depth associated with Rx queue
> +        * @see rte_eth_rx_queue_setup()
> +        */
> +       RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL = RTE_BIT64(1),
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to retrieve information of ethdev congestion management.
> + */
> +struct rte_eth_cman_info {
> +       /** Set of supported congestion management modes
> +        * @see enum rte_cman_mode
> +        */
> +       uint64_t modes_supported;
> +       /** Set of supported congestion management objects
> +        * @see enum rte_eth_cman_obj
> +        */
> +       uint64_t objs_supported;
> +       /** Reserved for future fields */
> +       uint8_t rsvd[64];
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure the ethdev congestion management.
> + */
> +struct rte_eth_cman_config {
> +       /** Congestion management object */
> +       enum rte_eth_cman_obj obj;
> +       /** Congestion management mode */
> +       enum rte_cman_mode mode;
> +       union {
> +               /** Rx queue to configure congestion management.
> +                *
> +                * Valid when object is RTE_ETH_CMAN_OBJ_RX_QUEUE or
> +                * RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL.
> +                */
> +               uint16_t rx_queue;
> +               /** Reserved for future fields */
> +               uint8_t rsvd_obj_params[16];
> +       } obj_param;
> +       union {
> +               /** RED configuration parameters.
> +                *
> +                * Valid when mode is RTE_CMAN_RED.
> +                */
> +               struct rte_cman_red_params red;
> +               /** Reserved for future fields */
> +               uint8_t rsvd_mode_params[64];
> +       } mode_param;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Retrieve the information for ethdev congestion management
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param info
> + *   A pointer to a structure of type *rte_eth_cman_info* to be filled with
> + *   the information about congestion management.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for cman_info_get does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_info_get(uint16_t port_id, struct rte_eth_cman_info *info);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Initialize the ethdev congestion management configuration structure with default values.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param config
> + *   A pointer to a structure of type *rte_eth_cman_config* to be initialized
> + *   with default value.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_config_init(uint16_t port_id, struct rte_eth_cman_config *config);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Configure ethdev congestion management
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param config
> + *   A pointer to a structure of type *rte_eth_cman_config* to be configured.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for cman_config_set does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_config_set(uint16_t port_id, struct rte_eth_cman_config *config);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Retrieve the applied ethdev congestion management parameters for the given port.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param config
> + *   A pointer to a structure of type *rte_eth_cman_config* to retrieve
> + *   congestion management parameters for the given object.
> + *   Application must fill all parameters except mode_param parameter in
> + *   struct rte_eth_cman_config.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for cman_config_get does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
> +
>  #include <rte_ethdev_core.h>
>
>  /**
> --
> 2.36.1
>
  
humin (Q) May 31, 2022, 1:09 a.m. UTC | #2
Hi, jerinj,

在 2022/5/30 21:15, jerinj@marvell.com 写道:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> NIC HW controllers often come with congestion management support on
> various HW objects such as Rx queue depth or mempool queue depth.
> 
> Also, it can support various modes of operation such as RED
> (Random early discard), WRED etc on those HW objects.
> 
> This patch adds a framework to express such modes(enum rte_cman_mode)
> and introduce (enum rte_eth_cman_obj) to enumerate the different
> objects where the modes can operate on.
> 
> This patch adds RTE_CMAN_RED mode of operation and
> RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL object.
> 
> Introduced reserved fields in configuration structure
> backed by rte_eth_cman_config_init() to add new configuration
> parameters without ABI breakage.
> 
> Added rte_eth_cman_info_get() API to get the information such as
> supported modes and objects.
> 
> Added rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
> to configure congestion management on those object with associated mode.
> 
> Finally, Added rte_eth_cman_config_get() API to retrieve the
> applied configuration.
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>   doc/guides/nics/features.rst         |  12 +++
>   doc/guides/nics/features/default.ini |   1 +
>   lib/eal/include/meson.build          |   1 +
>   lib/eal/include/rte_cman.h           |  51 ++++++++++
>   lib/ethdev/rte_ethdev.h              | 145 +++++++++++++++++++++++++++
>   5 files changed, 210 insertions(+)
>   create mode 100644 lib/eal/include/rte_cman.h
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 21bedb743f..ffcc2bcc92 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -727,6 +727,18 @@ Supports configuring per-queue stat counter mapping.
>     ``rte_eth_dev_set_tx_queue_stats_mapping()``.
>   
>   
> +.. _nic_features_congestion_management
> +
> +Congestion management
> +---------------------
> +
> +Supports congestion management.
> +
> +* **[implements] eth_dev_ops**: ``cman_info_get``, ``cman_config_set``, ``cman_config_get``.
> +* **[related]    API**: ``rte_eth_cman_info_get()``, ``rte_eth_cman_config_init()``,
> +  ``rte_eth_cman_config_set()``, ``rte_eth_cman_config_get()``.
> +
> +
>   .. _nic_features_fw_version:
>   
>   FW version
> diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
> index b1d18ac62c..060497e41f 100644
> --- a/doc/guides/nics/features/default.ini
> +++ b/doc/guides/nics/features/default.ini
> @@ -60,6 +60,7 @@ Tx descriptor status =
>   Basic stats          =
>   Extended stats       =
>   Stats per queue      =
> +Congestion management =
>   FW version           =
>   EEPROM dump          =
>   Module EEPROM dump   =
> diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
> index 9700494816..40e31e0e6d 100644
> --- a/lib/eal/include/meson.build
> +++ b/lib/eal/include/meson.build
> @@ -10,6 +10,7 @@ headers += files(
>           'rte_branch_prediction.h',
>           'rte_bus.h',
>           'rte_class.h',
> +        'rte_cman.h',
>           'rte_common.h',
>           'rte_compat.h',
>           'rte_debug.h',
> diff --git a/lib/eal/include/rte_cman.h b/lib/eal/include/rte_cman.h
> new file mode 100644
> index 0000000000..e6dad2ebf7
> --- /dev/null
> +++ b/lib/eal/include/rte_cman.h
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(C) 2022 Marvell International Ltd.
> + */
> +
> +#ifndef RTE_CMAN_H
> +#define RTE_CMAN_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_bitops.h>
> +
> +/**
> + * @file
> + * Congestion management related parameters for DPDK.
> + */
> +
> +/** Congestion management modes */
> +enum rte_cman_mode {
> +	/** Congestion based on Random Early Detection.
> +	 *
> +	 * https://en.wikipedia.org/wiki/Random_early_detection
> +	 * @see struct rte_cman_red_params
> +	 */
> +	RTE_CMAN_RED = RTE_BIT64(0),
> +};
> +
> +/**
> + * RED based congestion management configuration parameters.
> + */
> +struct rte_cman_red_params {
> +	/**< Minimum threshold (min_th) value
> +	 *
> +	 * Value expressed as percentage. Value must be in 0 to 100(inclusive).
> +	 */
> +	uint8_t min_th;
> +	/**< Maximum threshold (max_th) value
> +	 *
> +	 * Value expressed as percentage. Value must be in 0 to 100(inclusive).
> +	 */
> +	uint8_t max_th;
> +	/** Inverse of packet marking probability maximum value (maxp = 1 / maxp_inv) */
> +	uint16_t maxp_inv;
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_CMAN_H */
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 04cff8ee10..7c8718bb2e 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -160,6 +160,7 @@ extern "C" {
>   #define RTE_ETHDEV_DEBUG_TX
>   #endif
>   
> +#include <rte_cman.h>
>   #include <rte_compat.h>
>   #include <rte_log.h>
>   #include <rte_interrupts.h>
> @@ -5452,6 +5453,150 @@ typedef struct {
>   __rte_experimental
>   int rte_eth_dev_priv_dump(uint16_t port_id, FILE *file);
>   
> +/* Congestion management */
> +
> +/** Enumerate list of ethdev congestion management objects */
> +enum rte_eth_cman_obj {
> +	/** Congestion management based on Rx queue depth */
> +	RTE_ETH_CMAN_OBJ_RX_QUEUE = RTE_BIT64(0),
> +	/** Congestion management based on mempool depth associated with Rx queue
> +	 * @see rte_eth_rx_queue_setup()
> +	 */
> +	RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL = RTE_BIT64(1),
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
'API'  --> 'structure'
> + *
> + * A structure used to retrieve information of ethdev congestion management.
> + */
> +struct rte_eth_cman_info {
> +	/** Set of supported congestion management modes
> +	 * @see enum rte_cman_mode
> +	 */
> +	uint64_t modes_supported;
> +	/** Set of supported congestion management objects
> +	 * @see enum rte_eth_cman_obj
> +	 */
> +	uint64_t objs_supported;
> +	/** Reserved for future fields */
> +	uint8_t rsvd[64];
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
'API'  --> 'structure'
> + *
> + * A structure used to configure the ethdev congestion management.
> + */
> +struct rte_eth_cman_config {
> +	/** Congestion management object */
> +	enum rte_eth_cman_obj obj;
> +	/** Congestion management mode */
> +	enum rte_cman_mode mode;
> +	union {
> +		/** Rx queue to configure congestion management.
> +		 *
> +		 * Valid when object is RTE_ETH_CMAN_OBJ_RX_QUEUE or
> +		 * RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL.
> +		 */
> +		uint16_t rx_queue;
> +		/** Reserved for future fields */
> +		uint8_t rsvd_obj_params[16];
> +	} obj_param;
> +	union {
> +		/** RED configuration parameters.
> +		 *
> +		 * Valid when mode is RTE_CMAN_RED.
> +		 */
> +		struct rte_cman_red_params red;
> +		/** Reserved for future fields */
> +		uint8_t rsvd_mode_params[64];
> +	} mode_param;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Retrieve the information for ethdev congestion management
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param info
> + *   A pointer to a structure of type *rte_eth_cman_info* to be filled with
> + *   the information about congestion management.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for cman_info_get does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_info_get(uint16_t port_id, struct rte_eth_cman_info *info);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Initialize the ethdev congestion management configuration structure with default values.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param config
> + *   A pointer to a structure of type *rte_eth_cman_config* to be initialized
> + *   with default value.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_config_init(uint16_t port_id, struct rte_eth_cman_config *config);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Configure ethdev congestion management
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param config
> + *   A pointer to a structure of type *rte_eth_cman_config* to be configured.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for cman_config_set does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_config_set(uint16_t port_id, struct rte_eth_cman_config *config);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Retrieve the applied ethdev congestion management parameters for the given port.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param config
> + *   A pointer to a structure of type *rte_eth_cman_config* to retrieve
> + *   congestion management parameters for the given object.
> + *   Application must fill all parameters except mode_param parameter in
> + *   struct rte_eth_cman_config.
> + *
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for cman_config_get does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
> +
>   #include <rte_ethdev_core.h>
>   
>   /**
>
  
Stephen Hemminger May 31, 2022, 4:09 p.m. UTC | #3
On Mon, 30 May 2022 18:45:26 +0530
<jerinj@marvell.com> wrote:

> From: Jerin Jacob <jerinj@marvell.com>
> 
> NIC HW controllers often come with congestion management support on
> various HW objects such as Rx queue depth or mempool queue depth.
> 
> Also, it can support various modes of operation such as RED
> (Random early discard), WRED etc on those HW objects.
> 
> This patch adds a framework to express such modes(enum rte_cman_mode)
> and introduce (enum rte_eth_cman_obj) to enumerate the different
> objects where the modes can operate on.
> 
> This patch adds RTE_CMAN_RED mode of operation and
> RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL object.
> 
> Introduced reserved fields in configuration structure
> backed by rte_eth_cman_config_init() to add new configuration
> parameters without ABI breakage.
> 
> Added rte_eth_cman_info_get() API to get the information such as
> supported modes and objects.
> 
> Added rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
> to configure congestion management on those object with associated mode.
> 
> Finally, Added rte_eth_cman_config_get() API to retrieve the
> applied configuration.
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>

The concept of supporting hardware queue management is good, but would like
to make some suggestions:

The hardware support of QOS should ideally be part of the existing DPDK
traffic management. For example, in Linux there are devices that can offload
traffic control (TC) to hardware and this is done via flags to existing
software infrastructure.

Your implementation makes HW and SW QoS diverge. This will require each application
to get even more dependent on a particular hardware device.

Which brings up the bigger picture problem. Don't want to repeat the problems
with rte_flow here. Any hardware support needs to have a matching set of software
implementation, and test cases.  The problem with rte_flow is the semantics are
poorly defined and each vendor does it differently; and the SW flow classifier
is incomplete and does not match what the HW does. 

In an ideal world, there would be:
- an abstract API for defining ingress and egress queue management
- a software implementation of that API
- multiple hardware implementations
- ability to transparently use both SW and HW queue management from application
- complete test suite for that API.

Yes, that is asking a lot. But as trailblazer in this area, you have to do the
hard work.
  
Jerin Jacob July 13, 2022, 12:11 p.m. UTC | #4
On Tue, May 31, 2022 at 1:13 AM Ajit Khaparde
<ajit.khaparde@broadcom.com> wrote:
>
> On Mon, May 30, 2022 at 6:16 AM <jerinj@marvell.com> wrote:
> >
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > NIC HW controllers often come with congestion management support on
> > various HW objects such as Rx queue depth or mempool queue depth.
> >
> > Also, it can support various modes of operation such as RED
> > (Random early discard), WRED etc on those HW objects.
> >
> > This patch adds a framework to express such modes(enum rte_cman_mode)
> > and introduce (enum rte_eth_cman_obj) to enumerate the different
> > objects where the modes can operate on.
> >
> > This patch adds RTE_CMAN_RED mode of operation and
> > RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL object.
> >
> > Introduced reserved fields in configuration structure
> > backed by rte_eth_cman_config_init() to add new configuration
> > parameters without ABI breakage.
> >
> > Added rte_eth_cman_info_get() API to get the information such as
> > supported modes and objects.
> >
> > Added rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
> > to configure congestion management on those object with associated mode.
> >
> > Finally, Added rte_eth_cman_config_get() API to retrieve the
> > applied configuration.
>
> Can you also add How all this helps an application? Thanks


Random early detection (RED), also known as random early discard or
random early drop is a queuing discipline for a network scheduler
suited for congestion avoidance.

In the conventional tail drop algorithm, a router or other network
component buffers as many packets as it can, and simply drops the ones
it cannot buffer. If buffers are constantly full, the network is
congested. Tail drop distributes buffer space unfairly among traffic
flows. Tail drop can also lead to TCP global synchronization as all
TCP connections "hold back" simultaneously, and then step forward
simultaneously. Networks become under-utilized and
flooded—alternately, in waves.

RED addresses these issues by pre-emptively dropping packets before
the buffer becomes completely full. It uses predictive models to
decide which packets to drop.

it is copied from https://en.wikipedia.org/wiki/Random_early_detection
page. I have kept this URL under enum rte_cman_mode:: RTE_CMAN_RED
too.
Also I will add original specification URL too,
http://www.aciri.org/floyd/papers/red/red.html.
  
Jerin Jacob July 13, 2022, 12:16 p.m. UTC | #5
On Tue, May 31, 2022 at 6:39 AM Min Hu (Connor) <humin29@huawei.com> wrote:
>
> Hi, jerinj,

HI Min Hu,

> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> 'API'  --> 'structure'


Will fix it in v1. Thanks

> > + *
> > + * A structure used to retrieve information of ethdev congestion management.
> > + */
> > +struct rte_eth_cman_info {
> > +     /** Set of supported congestion management modes
> > +      * @see enum rte_cman_mode
> > +      */
> > +     uint64_t modes_supported;
> > +     /** Set of supported congestion management objects
> > +      * @see enum rte_eth_cman_obj
> > +      */
> > +     uint64_t objs_supported;
> > +     /** Reserved for future fields */
> > +     uint8_t rsvd[64];
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> 'API'  --> 'structure'


Will fix it in v1. Thanks
  
Jerin Jacob July 13, 2022, 12:25 p.m. UTC | #6
On Tue, May 31, 2022 at 9:39 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 30 May 2022 18:45:26 +0530
> <jerinj@marvell.com> wrote:
>
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > NIC HW controllers often come with congestion management support on
> > various HW objects such as Rx queue depth or mempool queue depth.
> >
> > Also, it can support various modes of operation such as RED
> > (Random early discard), WRED etc on those HW objects.
> >
> > This patch adds a framework to express such modes(enum rte_cman_mode)
> > and introduce (enum rte_eth_cman_obj) to enumerate the different
> > objects where the modes can operate on.
> >
> > This patch adds RTE_CMAN_RED mode of operation and
> > RTE_ETH_CMAN_OBJ_RX_QUEUE, RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL object.
> >
> > Introduced reserved fields in configuration structure
> > backed by rte_eth_cman_config_init() to add new configuration
> > parameters without ABI breakage.
> >
> > Added rte_eth_cman_info_get() API to get the information such as
> > supported modes and objects.
> >
> > Added rte_eth_cman_config_init(), rte_eth_cman_config_set() APIs
> > to configure congestion management on those object with associated mode.
> >
> > Finally, Added rte_eth_cman_config_get() API to retrieve the
> > applied configuration.
> >
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
>
> The concept of supporting hardware queue management is good, but would like
> to make some suggestions:
>
> The hardware support of QOS should ideally be part of the existing DPDK
> traffic management. For example, in Linux there are devices that can offload
> traffic control (TC) to hardware and this is done via flags to existing
> software infrastructure.
>
> Your implementation makes HW and SW QoS diverge. This will require each application
> to get even more dependent on a particular hardware device.
>
> Which brings up the bigger picture problem. Don't want to repeat the problems
> with rte_flow here. Any hardware support needs to have a matching set of software
> implementation, and test cases.  The problem with rte_flow is the semantics are
> poorly defined and each vendor does it differently; and the SW flow classifier
> is incomplete and does not match what the HW does.
>
> In an ideal world, there would be:
> - an abstract API for defining ingress and egress queue management

It is already available as rte_tm and rte_mtr API. Only congestion
management is missing. That is added in the patch.


> - a software implementation of that API

It is already available. See lib/sched/rte_red.h

> - multiple hardware implementations
> - ability to transparently use both SW and HW queue management from application
> - complete test suite for that API.
>
> Yes, that is asking a lot. But as trailblazer in this area, you have to do the
> hard work.

None of the new features in rte_flow, rte_ethdev or rte_tm etc NOT
going in this path.
If we think, we need to enforce this path(SW driver is a must) for any
feature. Let's discuss here and TB meeting and decide and
we need to make that policy for any new contribution.



>
  

Patch

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 21bedb743f..ffcc2bcc92 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -727,6 +727,18 @@  Supports configuring per-queue stat counter mapping.
   ``rte_eth_dev_set_tx_queue_stats_mapping()``.
 
 
+.. _nic_features_congestion_management
+
+Congestion management
+---------------------
+
+Supports congestion management.
+
+* **[implements] eth_dev_ops**: ``cman_info_get``, ``cman_config_set``, ``cman_config_get``.
+* **[related]    API**: ``rte_eth_cman_info_get()``, ``rte_eth_cman_config_init()``,
+  ``rte_eth_cman_config_set()``, ``rte_eth_cman_config_get()``.
+
+
 .. _nic_features_fw_version:
 
 FW version
diff --git a/doc/guides/nics/features/default.ini b/doc/guides/nics/features/default.ini
index b1d18ac62c..060497e41f 100644
--- a/doc/guides/nics/features/default.ini
+++ b/doc/guides/nics/features/default.ini
@@ -60,6 +60,7 @@  Tx descriptor status =
 Basic stats          =
 Extended stats       =
 Stats per queue      =
+Congestion management =
 FW version           =
 EEPROM dump          =
 Module EEPROM dump   =
diff --git a/lib/eal/include/meson.build b/lib/eal/include/meson.build
index 9700494816..40e31e0e6d 100644
--- a/lib/eal/include/meson.build
+++ b/lib/eal/include/meson.build
@@ -10,6 +10,7 @@  headers += files(
         'rte_branch_prediction.h',
         'rte_bus.h',
         'rte_class.h',
+        'rte_cman.h',
         'rte_common.h',
         'rte_compat.h',
         'rte_debug.h',
diff --git a/lib/eal/include/rte_cman.h b/lib/eal/include/rte_cman.h
new file mode 100644
index 0000000000..e6dad2ebf7
--- /dev/null
+++ b/lib/eal/include/rte_cman.h
@@ -0,0 +1,51 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2022 Marvell International Ltd.
+ */
+
+#ifndef RTE_CMAN_H
+#define RTE_CMAN_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_bitops.h>
+
+/**
+ * @file
+ * Congestion management related parameters for DPDK.
+ */
+
+/** Congestion management modes */
+enum rte_cman_mode {
+	/** Congestion based on Random Early Detection.
+	 *
+	 * https://en.wikipedia.org/wiki/Random_early_detection
+	 * @see struct rte_cman_red_params
+	 */
+	RTE_CMAN_RED = RTE_BIT64(0),
+};
+
+/**
+ * RED based congestion management configuration parameters.
+ */
+struct rte_cman_red_params {
+	/**< Minimum threshold (min_th) value
+	 *
+	 * Value expressed as percentage. Value must be in 0 to 100(inclusive).
+	 */
+	uint8_t min_th;
+	/**< Maximum threshold (max_th) value
+	 *
+	 * Value expressed as percentage. Value must be in 0 to 100(inclusive).
+	 */
+	uint8_t max_th;
+	/** Inverse of packet marking probability maximum value (maxp = 1 / maxp_inv) */
+	uint16_t maxp_inv;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_CMAN_H */
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 04cff8ee10..7c8718bb2e 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -160,6 +160,7 @@  extern "C" {
 #define RTE_ETHDEV_DEBUG_TX
 #endif
 
+#include <rte_cman.h>
 #include <rte_compat.h>
 #include <rte_log.h>
 #include <rte_interrupts.h>
@@ -5452,6 +5453,150 @@  typedef struct {
 __rte_experimental
 int rte_eth_dev_priv_dump(uint16_t port_id, FILE *file);
 
+/* Congestion management */
+
+/** Enumerate list of ethdev congestion management objects */
+enum rte_eth_cman_obj {
+	/** Congestion management based on Rx queue depth */
+	RTE_ETH_CMAN_OBJ_RX_QUEUE = RTE_BIT64(0),
+	/** Congestion management based on mempool depth associated with Rx queue
+	 * @see rte_eth_rx_queue_setup()
+	 */
+	RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL = RTE_BIT64(1),
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to retrieve information of ethdev congestion management.
+ */
+struct rte_eth_cman_info {
+	/** Set of supported congestion management modes
+	 * @see enum rte_cman_mode
+	 */
+	uint64_t modes_supported;
+	/** Set of supported congestion management objects
+	 * @see enum rte_eth_cman_obj
+	 */
+	uint64_t objs_supported;
+	/** Reserved for future fields */
+	uint8_t rsvd[64];
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure the ethdev congestion management.
+ */
+struct rte_eth_cman_config {
+	/** Congestion management object */
+	enum rte_eth_cman_obj obj;
+	/** Congestion management mode */
+	enum rte_cman_mode mode;
+	union {
+		/** Rx queue to configure congestion management.
+		 *
+		 * Valid when object is RTE_ETH_CMAN_OBJ_RX_QUEUE or
+		 * RTE_ETH_CMAN_OBJ_RX_QUEUE_MEMPOOL.
+		 */
+		uint16_t rx_queue;
+		/** Reserved for future fields */
+		uint8_t rsvd_obj_params[16];
+	} obj_param;
+	union {
+		/** RED configuration parameters.
+		 *
+		 * Valid when mode is RTE_CMAN_RED.
+		 */
+		struct rte_cman_red_params red;
+		/** Reserved for future fields */
+		uint8_t rsvd_mode_params[64];
+	} mode_param;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the information for ethdev congestion management
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param info
+ *   A pointer to a structure of type *rte_eth_cman_info* to be filled with
+ *   the information about congestion management.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if support for cman_info_get does not exist.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_cman_info_get(uint16_t port_id, struct rte_eth_cman_info *info);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Initialize the ethdev congestion management configuration structure with default values.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param config
+ *   A pointer to a structure of type *rte_eth_cman_config* to be initialized
+ *   with default value.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_cman_config_init(uint16_t port_id, struct rte_eth_cman_config *config);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Configure ethdev congestion management
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param config
+ *   A pointer to a structure of type *rte_eth_cman_config* to be configured.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if support for cman_config_set does not exist.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_cman_config_set(uint16_t port_id, struct rte_eth_cman_config *config);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the applied ethdev congestion management parameters for the given port.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param config
+ *   A pointer to a structure of type *rte_eth_cman_config* to retrieve
+ *   congestion management parameters for the given object.
+ *   Application must fill all parameters except mode_param parameter in
+ *   struct rte_eth_cman_config.
+ *
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if support for cman_config_get does not exist.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config);
+
 #include <rte_ethdev_core.h>
 
 /**