[v3,1/2] ethdev: support queue-based priority flow control

Message ID 20220131180859.2662034-1-jerinj@marvell.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series [v3,1/2] ethdev: support queue-based priority flow control |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Jerin Jacob Kollanukkaran Jan. 31, 2022, 6:08 p.m. UTC
  From: Jerin Jacob <jerinj@marvell.com>

Based on device support and use-case need, there are two different ways
to enable PFC. The first case is the port level PFC configuration, in
this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
configure the PFC, and PFC frames will be generated using based on VLAN
TC value.

The second case is the queue level PFC configuration, in this
case, Any packet field content can be used to steer the packet to the
specific queue using rte_flow or RSS and then use
rte_eth_dev_priority_flow_ctrl_queue_configure() to configure the
TC mapping on each queue.
Based on congestion selected on the specific queue, configured TC
shall be used to generate PFC frames.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
---

v2..v1:
- Introduce rte_eth_dev_priority_flow_ctrl_queue_info_get() to
avoid updates to rte_eth_dev_info
- Removed devtools/libabigail.abignore changes
- Address the comment from Ferruh in
http://patches.dpdk.org/project/dpdk/patch/20220113102718.3167282-1-jerinj@marvell.com/

 doc/guides/nics/features.rst           |   7 +-
 doc/guides/rel_notes/release_22_03.rst |   6 ++
 lib/ethdev/ethdev_driver.h             |  12 ++-
 lib/ethdev/rte_ethdev.c                | 132 +++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                |  89 +++++++++++++++++
 lib/ethdev/version.map                 |   4 +
 6 files changed, 247 insertions(+), 3 deletions(-)
  

Comments

Ferruh Yigit Feb. 3, 2022, 4 p.m. UTC | #1
On 1/31/2022 6:08 PM, jerinj@marvell.com wrote:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Based on device support and use-case need, there are two different ways
> to enable PFC. The first case is the port level PFC configuration, in
> this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
> configure the PFC, and PFC frames will be generated using based on VLAN
> TC value.
> 
> The second case is the queue level PFC configuration, in this
> case, Any packet field content can be used to steer the packet to the
> specific queue using rte_flow or RSS and then use
> rte_eth_dev_priority_flow_ctrl_queue_configure() to configure the
> TC mapping on each queue.
> Based on congestion selected on the specific queue, configured TC
> shall be used to generate PFC frames.
> 

Hi Jerin, Sunil,

Please find below minor comments, mostly syntax issues.

> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
> ---
> 
> v2..v1:
> - Introduce rte_eth_dev_priority_flow_ctrl_queue_info_get() to
> avoid updates to rte_eth_dev_info
> - Removed devtools/libabigail.abignore changes
> - Address the comment from Ferruh in
> http://patches.dpdk.org/project/dpdk/patch/20220113102718.3167282-1-jerinj@marvell.com/
> 
>   doc/guides/nics/features.rst           |   7 +-
>   doc/guides/rel_notes/release_22_03.rst |   6 ++
>   lib/ethdev/ethdev_driver.h             |  12 ++-
>   lib/ethdev/rte_ethdev.c                | 132 +++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                |  89 +++++++++++++++++
>   lib/ethdev/version.map                 |   4 +
>   6 files changed, 247 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 27be2d2576..1cacdc883a 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -379,9 +379,12 @@ Flow control
>   Supports configuring link flow control.
>   
>   * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``,
> -  ``priority_flow_ctrl_set``.
> +  ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_info_get``,
> +  ``priority_flow_ctrl_queue_configure``
>   * **[related]    API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``,
> -  ``rte_eth_dev_priority_flow_ctrl_set()``.
> +  ``rte_eth_dev_priority_flow_ctrl_set()``,
> +  ``rte_eth_dev_priority_flow_ctrl_queue_info_get()``,
> +  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``.
>   
>   
>   .. _nic_features_rate_limitation:
> diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> index 3bc0630c7c..e988c104e8 100644
> --- a/doc/guides/rel_notes/release_22_03.rst
> +++ b/doc/guides/rel_notes/release_22_03.rst
> @@ -69,6 +69,12 @@ New Features
>   
>     The new API ``rte_event_eth_rx_adapter_event_port_get()`` was added.
>   
> +* **Added an API to enable queue based priority flow ctrl(PFC).**
> +
> +  New APIs, ``rte_eth_dev_priority_flow_ctrl_queue_info_get()`` and
> +  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``, was added.
> +
> +
>  


Can you please move this update before ethdev driver updates.
And no need double empty lines.
  
>   Removed Items
>   -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index d95605a355..320a364766 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -533,6 +533,13 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
>   typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
>   				struct rte_eth_pfc_conf *pfc_conf);
>   
> +/** @internal Get info for queue based PFC on an Ethernet device. */
> +typedef int (*priority_flow_ctrl_queue_info_get_t)(
> +	struct rte_eth_dev *dev, struct rte_eth_pfc_queue_info *pfc_queue_info);
> +/** @internal Configure queue based PFC parameter on an Ethernet device. */
> +typedef int (*priority_flow_ctrl_queue_config_t)(
> +	struct rte_eth_dev *dev, struct rte_eth_pfc_queue_conf *pfc_queue_conf);
> +

Instead of ending line with opening parantesis '(', can you break the line after
first argument, like:

typedef int (*priority_flow_ctrl_queue_config_t)(struct rte_eth_dev *dev,
				struct rte_eth_pfc_queue_conf *pfc_queue_conf);

Same for all instances.

>   /** @internal Update RSS redirection table on an Ethernet device. */
>   typedef int (*reta_update_t)(struct rte_eth_dev *dev,
>   			     struct rte_eth_rss_reta_entry64 *reta_conf,
> @@ -1080,7 +1087,10 @@ struct eth_dev_ops {
>   	flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control */
>   	/** Setup priority flow control */
>   	priority_flow_ctrl_set_t   priority_flow_ctrl_set;
> -
> +	/** Priority flow control queue info get */
> +	priority_flow_ctrl_queue_info_get_t priority_flow_ctrl_queue_info_get;
> +	/** Priority flow control queue configure */
> +	priority_flow_ctrl_queue_config_t priority_flow_ctrl_queue_config;

Can you please keep empty line before next (hash) group?

>   	/** Set Unicast Table Array */
>   	eth_uc_hash_table_set_t    uc_hash_table_set;
>   	/** Set Unicast hash bitmap */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index a1d475a292..2ce38cd2c5 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -4022,6 +4022,138 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   	return -ENOTSUP;
>   }
>   
> +static inline int

Not sure if there is a value to ask function to be 'inline', this is in control
path, only static can be enough.

> +validate_rx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
> +			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
> +	    (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {

We don't allign to paranthesis in dpdk coding convenion [1], it should be as:

if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
		(pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
	if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
		...
         }
}



[1]
Altough I am aware many instances sneaked in, still I think better to follow
the convention.

> +		if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
> +			RTE_ETHDEV_LOG(ERR, "Tx queue not in range for Rx pause"
> +				       " (requested: %d configured: %d)\n",
> +				       pfc_queue_conf->rx_pause.tx_qid,
> +				       dev_info->nb_tx_queues);


Should log mention that this is related to the "priority flow Rx queue control"?

> +			return -EINVAL;
> +		}
> +
> +		if (pfc_queue_conf->rx_pause.tc >= tc_max) {

Should we document somewhere that 'tc_max' value itself is an invalid value?

> +			RTE_ETHDEV_LOG(ERR, "TC not in range for Rx pause"
> +				       " (requested: %d max: %d)\n",
> +				       pfc_queue_conf->rx_pause.tc, tc_max);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int
> +validate_tx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
> +			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	if ((pfc_queue_conf->mode == RTE_ETH_FC_TX_PAUSE) ||
> +	     (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
> +		if (pfc_queue_conf->tx_pause.rx_qid >= dev_info->nb_rx_queues) {
> +			RTE_ETHDEV_LOG(ERR, "Rx queue not in range for Tx pause"
> +				       "(requested: %d configured: %d)\n",
> +				       pfc_queue_conf->tx_pause.rx_qid,
> +				       dev_info->nb_rx_queues);
> +			return -EINVAL;
> +		}
> +
> +		if (pfc_queue_conf->tx_pause.tc >= tc_max) {
> +			RTE_ETHDEV_LOG(ERR, "TC not in range for Tx pause"
> +				       "(requested: %d max: %d)\n",
> +				       pfc_queue_conf->tx_pause.tc, tc_max);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_dev_priority_flow_ctrl_queue_info_get(
> +	uint16_t port_id, struct rte_eth_pfc_queue_info *pfc_queue_info)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (pfc_queue_info == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "PFC info param is NULL for port (%u)\n",
> +			       port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (*dev->dev_ops->priority_flow_ctrl_queue_info_get)
> +		return eth_err(port_id,
> +			       (*dev->dev_ops->priority_flow_ctrl_queue_info_get)(
> +				       dev, pfc_queue_info));
> +	return -ENOTSUP;
> +}
> +
> +int
> +rte_eth_dev_priority_flow_ctrl_queue_configure(
> +	uint16_t port_id, struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> +{
> +	struct rte_eth_pfc_queue_info pfc_info;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_dev *dev;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (pfc_queue_conf == NULL) {
> +		RTE_ETHDEV_LOG(ERR, "PFC parameters are NULL for port (%u)\n",
> +			       port_id);
> +		return -EINVAL;
> +	}
> +
> +	ret = rte_eth_dev_info_get(port_id, &dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	ret = rte_eth_dev_priority_flow_ctrl_queue_info_get(port_id, &pfc_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	if (pfc_info.capa == 0) {
> +		RTE_ETHDEV_LOG(ERR, "Ethdev port %u does not support PFC\n",
> +			port_id);
> +		return -ENOTSUP;
> +	}
> +
> +	if (pfc_info.tc_max == 0) {
> +		RTE_ETHDEV_LOG(ERR,
> +			"Ethdev port %u does not support PFC TC values\n",
> +			port_id);
> +		return -ENOTSUP;
> +	}
> +
> +	if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE) {
> +		ret = validate_rx_pause_config(&dev_info, pfc_info.tc_max,
> +					       pfc_queue_conf);

There is capablilty flags for RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE and RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE
also there is config flags RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_TX_PAUSE and RTE_ETH_FC_FULL


What should happen if driver only support RX_PAUSE but app config request only
TX_PAUSE?
As far as can see with current code it pass the validation, but should it?


> +		if (ret != 0)
> +			return ret;
> +	}
> +
> +	if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE) {
> +		ret = validate_tx_pause_config(&dev_info, pfc_info.tc_max,
> +					       pfc_queue_conf);

syntax, please don't align to paranthesis

> +		if (ret != 0)
> +			return ret;
> +	}
> +
> +	if (*dev->dev_ops->priority_flow_ctrl_queue_config)
> +		return eth_err(port_id,
> +			       (*dev->dev_ops->priority_flow_ctrl_queue_config)(
> +				       dev, pfc_queue_conf));
> +	return -ENOTSUP;
> +}
> +
>   static int
>   eth_check_reta_mask(struct rte_eth_rss_reta_entry64 *reta_conf,
>   			uint16_t reta_size)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index fa299c8ad7..383ad1cdd7 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1395,6 +1395,48 @@ struct rte_eth_pfc_conf {
>   	uint8_t priority;          /**< VLAN User Priority. */
>   };
>   
> +/** Device supports Rx pause for queue based PFC. */
> +#define RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE RTE_BIT64(0)
> +/** Device supports Tx pause for queue based PFC. */
> +#define RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE RTE_BIT64(1)
> +

There is already control flow mode enum 'enum rte_eth_fc_mode', those
enum items use FC as abrivation (RTE_ETH_FC_RX_PAUSE), above ones use PFC,
should they use same prefix 'RTE_ETH_FC_' for consistency?

And overall, what for struct and functins too, is the correct abreviation
'pfc' or 'fc', since old code has 'fc' as far as I see?

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to retrieve information of queue based PFC.
> + */
> +struct rte_eth_pfc_queue_info {
> +	/**
> +	 * Maximum supported traffic class as per PFC (802.1Qbb) specification.

Will it be redundant to say valid value should be bigger than 0?

> +	 */
> +	uint8_t tc_max;
> +	/** PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */

Can move doxygen comments to same line if they fit to 80 char limit:
	uint64_t capa; /**< PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */

> +	uint64_t capa;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * A structure used to configure Ethernet priority flow control parameter for
> + * ethdev queues.
> + */
> +struct rte_eth_pfc_queue_conf {
> +	enum rte_eth_fc_mode mode; /**< Link flow control mode */
> +
> +	struct {
> +		uint16_t tx_qid; /**< Tx queue ID */

'tx_qid' within 'rx_pause' struct, this seems done intentionally but just to double
check, can you please describe here the intendent usage?

> +		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> +	} rx_pause; /* Valid when (mode == FC_RX_PAUSE || mode == FC_FULL) */
> +
> +	struct {
> +		uint16_t pause_time; /**< Pause quota in the Pause frame */
> +		uint16_t rx_qid;     /**< Rx queue ID */
> +		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> +	} tx_pause; /* Valid when (mode == FC_TX_PAUSE || mode == FC_FULL) */
> +};
> +
>   /**
>    * Tunnel type for device-specific classifier configuration.
>    * @see rte_eth_udp_tunnel
> @@ -4144,6 +4186,53 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
>   int rte_eth_dev_mac_addr_add(uint16_t port_id, struct rte_ether_addr *mac_addr,
>   				uint32_t pool);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve the information for queue based PFC.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param pfc_queue_info
> + *   A pointer to a structure of type *rte_eth_pfc_queue_info* to be filled with
> + *   the information about queue based PFC.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if support for priority_flow_ctrl_queue_info_get does not exist.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_dev_priority_flow_ctrl_queue_info_get(uint16_t port_id,
> +				   struct rte_eth_pfc_queue_info *pfc_queue_info);
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure the queue based priority flow control for a given queue
> + * for Ethernet device.
> + *
> + * @note When an ethdev port switches to queue based PFC mode, the
> + * unconfigured queues shall be configured by the driver with
> + * default values such as lower priority value for TC etc.
> + *

May be good to document it has dependency to 'rte_eth_dev_info_get()' API?

> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param pfc_queue_conf
> + *   The pointer to the structure of the priority flow control parameters
> + *   for the queue.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support queue based PFC mode.
> + *   - (-ENODEV)  if *port_id* invalid.
> + *   - (-EINVAL)  if bad parameter
> + *   - (-EIO)     if flow control setup queue failure
> + */
> +__rte_experimental
> +int rte_eth_dev_priority_flow_ctrl_queue_configure(uint16_t port_id,
> +			      struct rte_eth_pfc_queue_conf *pfc_queue_conf);
> +
>   /**
>    * Remove a MAC address from the internal array of addresses.
>    *
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index c2fb0669a4..49523ebc45 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -256,6 +256,10 @@ EXPERIMENTAL {
>   	rte_flow_flex_item_create;
>   	rte_flow_flex_item_release;
>   	rte_flow_pick_transfer_proxy;
> +
> +	# added in 22.03
> +	rte_eth_dev_priority_flow_ctrl_queue_configure;
> +	rte_eth_dev_priority_flow_ctrl_queue_info_get;
>   };
>   
>   INTERNAL {
  
Jerin Jacob Feb. 7, 2022, 7:23 a.m. UTC | #2
)


On Thu, Feb 3, 2022 at 9:31 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> On 1/31/2022 6:08 PM, jerinj@marvell.com wrote:
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Based on device support and use-case need, there are two different ways
> > to enable PFC. The first case is the port level PFC configuration, in
> > this case, rte_eth_dev_priority_flow_ctrl_set() API shall be used to
> > configure the PFC, and PFC frames will be generated using based on VLAN
> > TC value.
> >
> > The second case is the queue level PFC configuration, in this
> > case, Any packet field content can be used to steer the packet to the
> > specific queue using rte_flow or RSS and then use
> > rte_eth_dev_priority_flow_ctrl_queue_configure() to configure the
> > TC mapping on each queue.
> > Based on congestion selected on the specific queue, configured TC
> > shall be used to generate PFC frames.
> >
>
> Hi Jerin, Sunil,
>
> Please find below minor comments, mostly syntax issues.
>
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
> > ---
> >
> > v2..v1:
> > - Introduce rte_eth_dev_priority_flow_ctrl_queue_info_get() to
> > avoid updates to rte_eth_dev_info
> > - Removed devtools/libabigail.abignore changes
> > - Address the comment from Ferruh in
> > http://patches.dpdk.org/project/dpdk/patch/20220113102718.3167282-1-jerinj@marvell.com/
> >
> >   doc/guides/nics/features.rst           |   7 +-
> >   doc/guides/rel_notes/release_22_03.rst |   6 ++
> >   lib/ethdev/ethdev_driver.h             |  12 ++-
> >   lib/ethdev/rte_ethdev.c                | 132 +++++++++++++++++++++++++
> >   lib/ethdev/rte_ethdev.h                |  89 +++++++++++++++++
> >   lib/ethdev/version.map                 |   4 +
> >   6 files changed, 247 insertions(+), 3 deletions(-)
> >
> > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > index 27be2d2576..1cacdc883a 100644
> > --- a/doc/guides/nics/features.rst
> > +++ b/doc/guides/nics/features.rst
> > @@ -379,9 +379,12 @@ Flow control
> >   Supports configuring link flow control.
> >
> >   * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``,
> > -  ``priority_flow_ctrl_set``.
> > +  ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_info_get``,
> > +  ``priority_flow_ctrl_queue_configure``
> >   * **[related]    API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``,
> > -  ``rte_eth_dev_priority_flow_ctrl_set()``.
> > +  ``rte_eth_dev_priority_flow_ctrl_set()``,
> > +  ``rte_eth_dev_priority_flow_ctrl_queue_info_get()``,
> > +  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``.
> >
> >
> >   .. _nic_features_rate_limitation:
> > diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
> > index 3bc0630c7c..e988c104e8 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -69,6 +69,12 @@ New Features
> >
> >     The new API ``rte_event_eth_rx_adapter_event_port_get()`` was added.
> >
> > +* **Added an API to enable queue based priority flow ctrl(PFC).**
> > +
> > +  New APIs, ``rte_eth_dev_priority_flow_ctrl_queue_info_get()`` and
> > +  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``, was added.
> > +
> > +
> >
>
>
> Can you please move this update before ethdev driver updates.
> And no need double empty lines.


Ack


>
> >   Removed Items
> >   -------------
> > diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > index d95605a355..320a364766 100644
> > --- a/lib/ethdev/ethdev_driver.h
> > +++ b/lib/ethdev/ethdev_driver.h
> > @@ -533,6 +533,13 @@ typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
> >   typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
> >                               struct rte_eth_pfc_conf *pfc_conf);
> >
> > +/** @internal Get info for queue based PFC on an Ethernet device. */
> > +typedef int (*priority_flow_ctrl_queue_info_get_t)(
> > +     struct rte_eth_dev *dev, struct rte_eth_pfc_queue_info *pfc_queue_info);
> > +/** @internal Configure queue based PFC parameter on an Ethernet device. */
> > +typedef int (*priority_flow_ctrl_queue_config_t)(
> > +     struct rte_eth_dev *dev, struct rte_eth_pfc_queue_conf *pfc_queue_conf);
> > +
>
> Instead of ending line with opening parantesis '(', can you break the line after
> first argument, like:
>
> typedef int (*priority_flow_ctrl_queue_config_t)(struct rte_eth_dev *dev,
>                                 struct rte_eth_pfc_queue_conf *pfc_queue_conf);


Ack

>
> Same for all instances.
>
> >   /** @internal Update RSS redirection table on an Ethernet device. */
> >   typedef int (*reta_update_t)(struct rte_eth_dev *dev,
> >                            struct rte_eth_rss_reta_entry64 *reta_conf,
> > @@ -1080,7 +1087,10 @@ struct eth_dev_ops {
> >       flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control */
> >       /** Setup priority flow control */
> >       priority_flow_ctrl_set_t   priority_flow_ctrl_set;
> > -
> > +     /** Priority flow control queue info get */
> > +     priority_flow_ctrl_queue_info_get_t priority_flow_ctrl_queue_info_get;
> > +     /** Priority flow control queue configure */
> > +     priority_flow_ctrl_queue_config_t priority_flow_ctrl_queue_config;
>
> Can you please keep empty line before next (hash) group?


Ack

>
> >       /** Set Unicast Table Array */
> >       eth_uc_hash_table_set_t    uc_hash_table_set;
> >       /** Set Unicast hash bitmap */
> > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> > index a1d475a292..2ce38cd2c5 100644
> > --- a/lib/ethdev/rte_ethdev.c
> > +++ b/lib/ethdev/rte_ethdev.c
> > @@ -4022,6 +4022,138 @@ rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
> >       return -ENOTSUP;
> >   }
> >
> > +static inline int
>
> Not sure if there is a value to ask function to be 'inline', this is in control
> path, only static can be enough.

Ack

>
> > +validate_rx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
> > +                      struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> > +{
> > +     if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
> > +         (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
>
> We don't allign to paranthesis in dpdk coding convenion [1], it should be as:
>
> if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
>                 (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
>         if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
>                 ...
>          }
> }

Ack

>
>
> [1]
> Altough I am aware many instances sneaked in, still I think better to follow
> the convention.
>
> > +             if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
> > +                     RTE_ETHDEV_LOG(ERR, "Tx queue not in range for Rx pause"
> > +                                    " (requested: %d configured: %d)\n",
> > +                                    pfc_queue_conf->rx_pause.tx_qid,
> > +                                    dev_info->nb_tx_queues);
>
>
> Should log mention that this is related to the "priority flow Rx queue control"?

Prepended "PFC" in the log message


>
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (pfc_queue_conf->rx_pause.tc >= tc_max) {
>
> Should we document somewhere that 'tc_max' value itself is an invalid value?

Updated the documentation to following

        struct {
                uint16_t tx_qid; /**< Tx queue ID */
-               uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
+               uint8_t tc;
+               /**< Traffic class as per PFC (802.1Qbb) spec. The value must be
+                * in the range [0, rte_eth_pfc_queue_info::tx_max - 1]
+                */


>
> > +                     RTE_ETHDEV_LOG(ERR, "TC not in range for Rx pause"
> > +                                    " (requested: %d max: %d)\n",
> > +                                    pfc_queue_conf->rx_pause.tc, tc_max);
> > +                     return -EINVAL;
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static inline int
> > +validate_tx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
> > +                      struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> > +{
> > +     if ((pfc_queue_conf->mode == RTE_ETH_FC_TX_PAUSE) ||
> > +          (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
> > +             if (pfc_queue_conf->tx_pause.rx_qid >= dev_info->nb_rx_queues) {
> > +                     RTE_ETHDEV_LOG(ERR, "Rx queue not in range for Tx pause"
> > +                                    "(requested: %d configured: %d)\n",
> > +                                    pfc_queue_conf->tx_pause.rx_qid,
> > +                                    dev_info->nb_rx_queues);
> > +                     return -EINVAL;
> > +             }
> > +
> > +             if (pfc_queue_conf->tx_pause.tc >= tc_max) {
> > +                     RTE_ETHDEV_LOG(ERR, "TC not in range for Tx pause"
> > +                                    "(requested: %d max: %d)\n",
> > +                                    pfc_queue_conf->tx_pause.tc, tc_max);
> > +                     return -EINVAL;
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +int
> > +rte_eth_dev_priority_flow_ctrl_queue_info_get(
> > +     uint16_t port_id, struct rte_eth_pfc_queue_info *pfc_queue_info)
> > +{
> > +     struct rte_eth_dev *dev;
> > +
> > +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +     dev = &rte_eth_devices[port_id];
> > +
> > +     if (pfc_queue_info == NULL) {
> > +             RTE_ETHDEV_LOG(ERR, "PFC info param is NULL for port (%u)\n",
> > +                            port_id);
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (*dev->dev_ops->priority_flow_ctrl_queue_info_get)
> > +             return eth_err(port_id,
> > +                            (*dev->dev_ops->priority_flow_ctrl_queue_info_get)(
> > +                                    dev, pfc_queue_info));
> > +     return -ENOTSUP;
> > +}
> > +
> > +int
> > +rte_eth_dev_priority_flow_ctrl_queue_configure(
> > +     uint16_t port_id, struct rte_eth_pfc_queue_conf *pfc_queue_conf)
> > +{
> > +     struct rte_eth_pfc_queue_info pfc_info;
> > +     struct rte_eth_dev_info dev_info;
> > +     struct rte_eth_dev *dev;
> > +     int ret;
> > +
> > +     RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> > +     dev = &rte_eth_devices[port_id];
> > +
> > +     if (pfc_queue_conf == NULL) {
> > +             RTE_ETHDEV_LOG(ERR, "PFC parameters are NULL for port (%u)\n",
> > +                            port_id);
> > +             return -EINVAL;
> > +     }
> > +
> > +     ret = rte_eth_dev_info_get(port_id, &dev_info);
> > +     if (ret != 0)
> > +             return ret;
> > +
> > +     ret = rte_eth_dev_priority_flow_ctrl_queue_info_get(port_id, &pfc_info);
> > +     if (ret != 0)
> > +             return ret;
> > +
> > +     if (pfc_info.capa == 0) {
> > +             RTE_ETHDEV_LOG(ERR, "Ethdev port %u does not support PFC\n",
> > +                     port_id);
> > +             return -ENOTSUP;
> > +     }
> > +
> > +     if (pfc_info.tc_max == 0) {
> > +             RTE_ETHDEV_LOG(ERR,
> > +                     "Ethdev port %u does not support PFC TC values\n",
> > +                     port_id);
> > +             return -ENOTSUP;
> > +     }
> > +
> > +     if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE) {
> > +             ret = validate_rx_pause_config(&dev_info, pfc_info.tc_max,
> > +                                            pfc_queue_conf);
>
> There is capablilty flags for RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE and RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE
> also there is config flags RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_TX_PAUSE and RTE_ETH_FC_FULL

Removed that and reused rte_eth_fc_mode

-/** Device supports Rx pause for queue based PFC. */
-#define RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE RTE_BIT64(0)
-/** Device supports Tx pause for queue based PFC. */
-#define RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE RTE_BIT64(1)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
@@ -1424,8 +1419,8 @@ struct rte_eth_pfc_queue_info {
         * Maximum supported traffic class as per PFC (802.1Qbb) specification.
         */
        uint8_t tc_max;
-       /** PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */
-       uint64_t capa;
+       /** PFC queue mode capabilities. */
+       enum rte_eth_fc_mode mode_capa;
 };



>
>
> What should happen if driver only support RX_PAUSE but app config request only
> TX_PAUSE?
> As far as can see with current code it pass the validation, but should it?

Yes. Good catch. Added

        /* Check requested mode supported or not */
        if (pfc_info.mode_capa == RTE_ETH_FC_RX_PAUSE &&
                        pfc_queue_conf->mode == RTE_ETH_FC_TX_PAUSE) {
                RTE_ETHDEV_LOG(ERR, "PFC Tx pause unsupported for port (%d)\n",
                               port_id);
                return -EINVAL;
        }

        if (pfc_info.mode_capa == RTE_ETH_FC_TX_PAUSE &&
                        pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) {
                RTE_ETHDEV_LOG(ERR, "PFC Rx pause unsupported for port (%d)\n",
                               port_id);
                return -EINVAL;
        }


>
>
> > +             if (ret != 0)
> > +                     return ret;
> > +     }
> > +
> > +     if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE) {
> > +             ret = validate_tx_pause_config(&dev_info, pfc_info.tc_max,
> > +                                            pfc_queue_conf);
>
> syntax, please don't align to paranthesis
>
> > +             if (ret != 0)
> > +                     return ret;
> > +     }
> > +
> > +     if (*dev->dev_ops->priority_flow_ctrl_queue_config)
> > +             return eth_err(port_id,
> > +                            (*dev->dev_ops->priority_flow_ctrl_queue_config)(
> > +                                    dev, pfc_queue_conf));
> > +     return -ENOTSUP;
> > +}
> > +
> >   static int
> >   eth_check_reta_mask(struct rte_eth_rss_reta_entry64 *reta_conf,
> >                       uint16_t reta_size)
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index fa299c8ad7..383ad1cdd7 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1395,6 +1395,48 @@ struct rte_eth_pfc_conf {
> >       uint8_t priority;          /**< VLAN User Priority. */
> >   };
> >
> > +/** Device supports Rx pause for queue based PFC. */
> > +#define RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE RTE_BIT64(0)
> > +/** Device supports Tx pause for queue based PFC. */
> > +#define RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE RTE_BIT64(1)
> > +
>
> There is already control flow mode enum 'enum rte_eth_fc_mode', those
> enum items use FC as abrivation (RTE_ETH_FC_RX_PAUSE), above ones use PFC,
> should they use same prefix 'RTE_ETH_FC_' for consistency?
>
> And overall, what for struct and functins too, is the correct abreviation
> 'pfc' or 'fc', since old code has 'fc' as far as I see?

Removed RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE and reused enum rte_eth_fc_mode
>
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * A structure used to retrieve information of queue based PFC.
> > + */
> > +struct rte_eth_pfc_queue_info {
> > +     /**
> > +      * Maximum supported traffic class as per PFC (802.1Qbb) specification.
>
> Will it be redundant to say valid value should be bigger than 0?

Updated the document as:
@@ -1440,13 +1435,19 @@ struct rte_eth_pfc_queue_conf {

        struct {
                uint16_t tx_qid; /**< Tx queue ID */
-               uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
+               uint8_t tc;
+               /**< Traffic class as per PFC (802.1Qbb) spec. The value must be
+                * in the range [0, rte_eth_pfc_queue_info::tx_max - 1]
+                */



>
> > +      */
> > +     uint8_t tc_max;
> > +     /** PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */
>
> Can move doxygen comments to same line if they fit to 80 char limit:
>         uint64_t capa; /**< PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */
>
> > +     uint64_t capa;
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> > + *
> > + * A structure used to configure Ethernet priority flow control parameter for
> > + * ethdev queues.
> > + */
> > +struct rte_eth_pfc_queue_conf {
> > +     enum rte_eth_fc_mode mode; /**< Link flow control mode */
> > +
> > +     struct {
> > +             uint16_t tx_qid; /**< Tx queue ID */
>
> 'tx_qid' within 'rx_pause' struct, this seems done intentionally but just to double
> check, can you please describe here the intendent usage?


Clarified the doc as

@@ -1429,6 +1429,16 @@ struct rte_eth_pfc_queue_info {
  *
  * A structure used to configure Ethernet priority flow control parameter for
  * ethdev queues.
+ *
+ * rte_eth_pfc_queue_conf::rx_pause structure shall used to configure given
+ * tx_qid with corresponding tc. When ethdev device receives PFC frame with
+ * rte_eth_pfc_queue_conf::rx_pause::tc, traffic will be paused on
+ * rte_eth_pfc_queue_conf::rx_pause::tx_qid for that tc.
+ *
+ * rte_eth_pfc_queue_conf::tx_pause structure shall used to configure given
+ * rx_qid. When rx_qid is congested, PFC frames are generated with
+ * rte_eth_pfc_queue_conf::rx_pause::tc and
+ * rte_eth_pfc_queue_conf::rx_pause::pause_time to the peer.

>
> > +             uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> > +     } rx_pause; /* Valid when (mode == FC_RX_PAUSE || mode == FC_FULL) */
> > +
> > +     struct {
> > +             uint16_t pause_time; /**< Pause quota in the Pause frame */
> > +             uint16_t rx_qid;     /**< Rx queue ID */
> > +             uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
> > +     } tx_pause; /* Valid when (mode == FC_TX_PAUSE || mode == FC_FULL) */
> > +};
> > +
> >   /**
> >    * Tunnel type for device-specific classifier configuration.
> >    * @see rte_eth_udp_tunnel
> > @@ -4144,6 +4186,53 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
> >   int rte_eth_dev_mac_addr_add(uint16_t port_id, struct rte_ether_addr *mac_addr,
> >                               uint32_t pool);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Retrieve the information for queue based PFC.
> > + *
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param pfc_queue_info
> > + *   A pointer to a structure of type *rte_eth_pfc_queue_info* to be filled with
> > + *   the information about queue based PFC.
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if support for priority_flow_ctrl_queue_info_get does not exist.
> > + *   - (-ENODEV) if *port_id* invalid.
> > + *   - (-EINVAL) if bad parameter.
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_priority_flow_ctrl_queue_info_get(uint16_t port_id,
> > +                                struct rte_eth_pfc_queue_info *pfc_queue_info);
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Configure the queue based priority flow control for a given queue
> > + * for Ethernet device.
> > + *
> > + * @note When an ethdev port switches to queue based PFC mode, the
> > + * unconfigured queues shall be configured by the driver with
> > + * default values such as lower priority value for TC etc.
> > + *
>
> May be good to document it has dependency to 'rte_eth_dev_info_get()' API?

There is no dependency to rte_eth_dev_info_get() in the spec.

>
> > + * @param port_id
> > + *   The port identifier of the Ethernet device.
> > + * @param pfc_queue_conf
> > + *   The pointer to the structure of the priority flow control parameters
> > + *   for the queue.
> > + * @return
> > + *   - (0) if successful.
> > + *   - (-ENOTSUP) if hardware doesn't support queue based PFC mode.
> > + *   - (-ENODEV)  if *port_id* invalid.
> > + *   - (-EINVAL)  if bad parameter
> > + *   - (-EIO)     if flow control setup queue failure
> > + */
> > +__rte_experimental
> > +int rte_eth_dev_priority_flow_ctrl_queue_configure(uint16_t port_id,
> > +                           struct rte_eth_pfc_queue_conf *pfc_queue_conf);
> > +
> >   /**
> >    * Remove a MAC address from the internal array of addresses.
> >    *
> > diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > index c2fb0669a4..49523ebc45 100644
> > --- a/lib/ethdev/version.map
> > +++ b/lib/ethdev/version.map
> > @@ -256,6 +256,10 @@ EXPERIMENTAL {
> >       rte_flow_flex_item_create;
> >       rte_flow_flex_item_release;
> >       rte_flow_pick_transfer_proxy;
> > +
> > +     # added in 22.03
> > +     rte_eth_dev_priority_flow_ctrl_queue_configure;
> > +     rte_eth_dev_priority_flow_ctrl_queue_info_get;
> >   };
> >
> >   INTERNAL {
>
  

Patch

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 27be2d2576..1cacdc883a 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -379,9 +379,12 @@  Flow control
 Supports configuring link flow control.
 
 * **[implements] eth_dev_ops**: ``flow_ctrl_get``, ``flow_ctrl_set``,
-  ``priority_flow_ctrl_set``.
+  ``priority_flow_ctrl_set``, ``priority_flow_ctrl_queue_info_get``,
+  ``priority_flow_ctrl_queue_configure``
 * **[related]    API**: ``rte_eth_dev_flow_ctrl_get()``, ``rte_eth_dev_flow_ctrl_set()``,
-  ``rte_eth_dev_priority_flow_ctrl_set()``.
+  ``rte_eth_dev_priority_flow_ctrl_set()``,
+  ``rte_eth_dev_priority_flow_ctrl_queue_info_get()``,
+  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``.
 
 
 .. _nic_features_rate_limitation:
diff --git a/doc/guides/rel_notes/release_22_03.rst b/doc/guides/rel_notes/release_22_03.rst
index 3bc0630c7c..e988c104e8 100644
--- a/doc/guides/rel_notes/release_22_03.rst
+++ b/doc/guides/rel_notes/release_22_03.rst
@@ -69,6 +69,12 @@  New Features
 
   The new API ``rte_event_eth_rx_adapter_event_port_get()`` was added.
 
+* **Added an API to enable queue based priority flow ctrl(PFC).**
+
+  New APIs, ``rte_eth_dev_priority_flow_ctrl_queue_info_get()`` and
+  ``rte_eth_dev_priority_flow_ctrl_queue_configure()``, was added.
+
+
 
 Removed Items
 -------------
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index d95605a355..320a364766 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -533,6 +533,13 @@  typedef int (*flow_ctrl_set_t)(struct rte_eth_dev *dev,
 typedef int (*priority_flow_ctrl_set_t)(struct rte_eth_dev *dev,
 				struct rte_eth_pfc_conf *pfc_conf);
 
+/** @internal Get info for queue based PFC on an Ethernet device. */
+typedef int (*priority_flow_ctrl_queue_info_get_t)(
+	struct rte_eth_dev *dev, struct rte_eth_pfc_queue_info *pfc_queue_info);
+/** @internal Configure queue based PFC parameter on an Ethernet device. */
+typedef int (*priority_flow_ctrl_queue_config_t)(
+	struct rte_eth_dev *dev, struct rte_eth_pfc_queue_conf *pfc_queue_conf);
+
 /** @internal Update RSS redirection table on an Ethernet device. */
 typedef int (*reta_update_t)(struct rte_eth_dev *dev,
 			     struct rte_eth_rss_reta_entry64 *reta_conf,
@@ -1080,7 +1087,10 @@  struct eth_dev_ops {
 	flow_ctrl_set_t            flow_ctrl_set; /**< Setup flow control */
 	/** Setup priority flow control */
 	priority_flow_ctrl_set_t   priority_flow_ctrl_set;
-
+	/** Priority flow control queue info get */
+	priority_flow_ctrl_queue_info_get_t priority_flow_ctrl_queue_info_get;
+	/** Priority flow control queue configure */
+	priority_flow_ctrl_queue_config_t priority_flow_ctrl_queue_config;
 	/** Set Unicast Table Array */
 	eth_uc_hash_table_set_t    uc_hash_table_set;
 	/** Set Unicast hash bitmap */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index a1d475a292..2ce38cd2c5 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -4022,6 +4022,138 @@  rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
 	return -ENOTSUP;
 }
 
+static inline int
+validate_rx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
+			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
+{
+	if ((pfc_queue_conf->mode == RTE_ETH_FC_RX_PAUSE) ||
+	    (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
+		if (pfc_queue_conf->rx_pause.tx_qid >= dev_info->nb_tx_queues) {
+			RTE_ETHDEV_LOG(ERR, "Tx queue not in range for Rx pause"
+				       " (requested: %d configured: %d)\n",
+				       pfc_queue_conf->rx_pause.tx_qid,
+				       dev_info->nb_tx_queues);
+			return -EINVAL;
+		}
+
+		if (pfc_queue_conf->rx_pause.tc >= tc_max) {
+			RTE_ETHDEV_LOG(ERR, "TC not in range for Rx pause"
+				       " (requested: %d max: %d)\n",
+				       pfc_queue_conf->rx_pause.tc, tc_max);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static inline int
+validate_tx_pause_config(struct rte_eth_dev_info *dev_info, uint8_t tc_max,
+			 struct rte_eth_pfc_queue_conf *pfc_queue_conf)
+{
+	if ((pfc_queue_conf->mode == RTE_ETH_FC_TX_PAUSE) ||
+	     (pfc_queue_conf->mode == RTE_ETH_FC_FULL)) {
+		if (pfc_queue_conf->tx_pause.rx_qid >= dev_info->nb_rx_queues) {
+			RTE_ETHDEV_LOG(ERR, "Rx queue not in range for Tx pause"
+				       "(requested: %d configured: %d)\n",
+				       pfc_queue_conf->tx_pause.rx_qid,
+				       dev_info->nb_rx_queues);
+			return -EINVAL;
+		}
+
+		if (pfc_queue_conf->tx_pause.tc >= tc_max) {
+			RTE_ETHDEV_LOG(ERR, "TC not in range for Tx pause"
+				       "(requested: %d max: %d)\n",
+				       pfc_queue_conf->tx_pause.tc, tc_max);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+int
+rte_eth_dev_priority_flow_ctrl_queue_info_get(
+	uint16_t port_id, struct rte_eth_pfc_queue_info *pfc_queue_info)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (pfc_queue_info == NULL) {
+		RTE_ETHDEV_LOG(ERR, "PFC info param is NULL for port (%u)\n",
+			       port_id);
+		return -EINVAL;
+	}
+
+	if (*dev->dev_ops->priority_flow_ctrl_queue_info_get)
+		return eth_err(port_id,
+			       (*dev->dev_ops->priority_flow_ctrl_queue_info_get)(
+				       dev, pfc_queue_info));
+	return -ENOTSUP;
+}
+
+int
+rte_eth_dev_priority_flow_ctrl_queue_configure(
+	uint16_t port_id, struct rte_eth_pfc_queue_conf *pfc_queue_conf)
+{
+	struct rte_eth_pfc_queue_info pfc_info;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_dev *dev;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (pfc_queue_conf == NULL) {
+		RTE_ETHDEV_LOG(ERR, "PFC parameters are NULL for port (%u)\n",
+			       port_id);
+		return -EINVAL;
+	}
+
+	ret = rte_eth_dev_info_get(port_id, &dev_info);
+	if (ret != 0)
+		return ret;
+
+	ret = rte_eth_dev_priority_flow_ctrl_queue_info_get(port_id, &pfc_info);
+	if (ret != 0)
+		return ret;
+
+	if (pfc_info.capa == 0) {
+		RTE_ETHDEV_LOG(ERR, "Ethdev port %u does not support PFC\n",
+			port_id);
+		return -ENOTSUP;
+	}
+
+	if (pfc_info.tc_max == 0) {
+		RTE_ETHDEV_LOG(ERR,
+			"Ethdev port %u does not support PFC TC values\n",
+			port_id);
+		return -ENOTSUP;
+	}
+
+	if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE) {
+		ret = validate_rx_pause_config(&dev_info, pfc_info.tc_max,
+					       pfc_queue_conf);
+		if (ret != 0)
+			return ret;
+	}
+
+	if (pfc_info.capa & RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE) {
+		ret = validate_tx_pause_config(&dev_info, pfc_info.tc_max,
+					       pfc_queue_conf);
+		if (ret != 0)
+			return ret;
+	}
+
+	if (*dev->dev_ops->priority_flow_ctrl_queue_config)
+		return eth_err(port_id,
+			       (*dev->dev_ops->priority_flow_ctrl_queue_config)(
+				       dev, pfc_queue_conf));
+	return -ENOTSUP;
+}
+
 static int
 eth_check_reta_mask(struct rte_eth_rss_reta_entry64 *reta_conf,
 			uint16_t reta_size)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index fa299c8ad7..383ad1cdd7 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1395,6 +1395,48 @@  struct rte_eth_pfc_conf {
 	uint8_t priority;          /**< VLAN User Priority. */
 };
 
+/** Device supports Rx pause for queue based PFC. */
+#define RTE_ETH_PFC_QUEUE_CAPA_RX_PAUSE RTE_BIT64(0)
+/** Device supports Tx pause for queue based PFC. */
+#define RTE_ETH_PFC_QUEUE_CAPA_TX_PAUSE RTE_BIT64(1)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to retrieve information of queue based PFC.
+ */
+struct rte_eth_pfc_queue_info {
+	/**
+	 * Maximum supported traffic class as per PFC (802.1Qbb) specification.
+	 */
+	uint8_t tc_max;
+	/** PFC queue capabilities (RTE_ETH_PFC_QUEUE_CAPA_). */
+	uint64_t capa;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * A structure used to configure Ethernet priority flow control parameter for
+ * ethdev queues.
+ */
+struct rte_eth_pfc_queue_conf {
+	enum rte_eth_fc_mode mode; /**< Link flow control mode */
+
+	struct {
+		uint16_t tx_qid; /**< Tx queue ID */
+		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
+	} rx_pause; /* Valid when (mode == FC_RX_PAUSE || mode == FC_FULL) */
+
+	struct {
+		uint16_t pause_time; /**< Pause quota in the Pause frame */
+		uint16_t rx_qid;     /**< Rx queue ID */
+		uint8_t tc; /**< Traffic class as per PFC (802.1Qbb) spec */
+	} tx_pause; /* Valid when (mode == FC_TX_PAUSE || mode == FC_FULL) */
+};
+
 /**
  * Tunnel type for device-specific classifier configuration.
  * @see rte_eth_udp_tunnel
@@ -4144,6 +4186,53 @@  int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
 int rte_eth_dev_mac_addr_add(uint16_t port_id, struct rte_ether_addr *mac_addr,
 				uint32_t pool);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve the information for queue based PFC.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param pfc_queue_info
+ *   A pointer to a structure of type *rte_eth_pfc_queue_info* to be filled with
+ *   the information about queue based PFC.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if support for priority_flow_ctrl_queue_info_get does not exist.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_dev_priority_flow_ctrl_queue_info_get(uint16_t port_id,
+				   struct rte_eth_pfc_queue_info *pfc_queue_info);
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Configure the queue based priority flow control for a given queue
+ * for Ethernet device.
+ *
+ * @note When an ethdev port switches to queue based PFC mode, the
+ * unconfigured queues shall be configured by the driver with
+ * default values such as lower priority value for TC etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param pfc_queue_conf
+ *   The pointer to the structure of the priority flow control parameters
+ *   for the queue.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support queue based PFC mode.
+ *   - (-ENODEV)  if *port_id* invalid.
+ *   - (-EINVAL)  if bad parameter
+ *   - (-EIO)     if flow control setup queue failure
+ */
+__rte_experimental
+int rte_eth_dev_priority_flow_ctrl_queue_configure(uint16_t port_id,
+			      struct rte_eth_pfc_queue_conf *pfc_queue_conf);
+
 /**
  * Remove a MAC address from the internal array of addresses.
  *
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index c2fb0669a4..49523ebc45 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -256,6 +256,10 @@  EXPERIMENTAL {
 	rte_flow_flex_item_create;
 	rte_flow_flex_item_release;
 	rte_flow_pick_transfer_proxy;
+
+	# added in 22.03
+	rte_eth_dev_priority_flow_ctrl_queue_configure;
+	rte_eth_dev_priority_flow_ctrl_queue_info_get;
 };
 
 INTERNAL {