[dpdk-dev,v2,2/2] ethdev: introduce Tx queue offloads API

Message ID ce7981e06173fc681d5fdf165eef22c602211a5d.1505044395.git.shahafs@mellanox.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Shahaf Shuler Sept. 10, 2017, 12:07 p.m. UTC
  Introduce a new API to configure Tx offloads.

In the new API, offloads are divided into per-port and per-queue
offloads. The PMD reports capability for each of them.
Offloads are enabled using the existing DEV_TX_OFFLOAD_* flags.
To enable per-port offload, the offload should be set on both device
configuration and queue configuration. To enable per-queue offload, the
offloads can be set only on queue configuration.

In addition the Tx offloads will be disabled by default and be
enabled per application needs. This will much simplify PMD management of
the different offloads.

The new API does not have an equivalent for the below, benchmark
specific, flags:

	- ETH_TXQ_FLAGS_NOREFCOUNT
	- ETH_TXQ_FLAGS_NOMULTMEMP

Applications should set the ETH_TXQ_FLAGS_IGNORE flag on txq_flags
field in order to move to the new API.

The old Tx offloads API is kept for the meanwhile, in order to enable a
smooth transition for PMDs and application to the new API.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---

Note:

In the special case were application is using the old API and the PMD
has allready converted to the new one. If there are Tx offloads which can
be set only per-port the queue setup may fail.

I choose to treat this case as an exception considering all Tx offloads are
currently defined to be per-queue. New ones to be added should require from
the application to move to the new API as well.

---
 doc/guides/nics/features.rst  |  8 ++++++
 lib/librte_ether/rte_ethdev.c | 59 +++++++++++++++++++++++++++++++++++++-
 lib/librte_ether/rte_ethdev.h | 32 ++++++++++++++++++++-
 3 files changed, 97 insertions(+), 2 deletions(-)
  

Comments

Stephen Hemminger Sept. 10, 2017, 5:48 p.m. UTC | #1
On Sun, 10 Sep 2017 15:07:49 +0300
Shahaf Shuler <shahafs@mellanox.com> wrote:

> Introduce a new API to configure Tx offloads.
> 
> In the new API, offloads are divided into per-port and per-queue
> offloads. The PMD reports capability for each of them.
> Offloads are enabled using the existing DEV_TX_OFFLOAD_* flags.
> To enable per-port offload, the offload should be set on both device
> configuration and queue configuration. To enable per-queue offload, the
> offloads can be set only on queue configuration.
> 
> In addition the Tx offloads will be disabled by default and be
> enabled per application needs. This will much simplify PMD management of
> the different offloads.
> 
> The new API does not have an equivalent for the below, benchmark
> specific, flags:
> 
> 	- ETH_TXQ_FLAGS_NOREFCOUNT
> 	- ETH_TXQ_FLAGS_NOMULTMEMP
> 
> Applications should set the ETH_TXQ_FLAGS_IGNORE flag on txq_flags
> field in order to move to the new API.
> 
> The old Tx offloads API is kept for the meanwhile, in order to enable a
> smooth transition for PMDs and application to the new API.
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---

Agree on a conceptual and hardware level, that this is a property that
could be per queue. But is there really an application that would want
to have refcounting on one queue and not another?  If application is cloning
mbuf's it needs refcounting.  One could even argue that for safety
these should be library wide.  That way if an application tried to manipulate
ref count on an mbuf and refcountin was enabled it could be panic'd.
  
Shahaf Shuler Sept. 11, 2017, 5:52 a.m. UTC | #2
Sunday, September 10, 2017 8:48 PM, Stephen Hemminger:
> 
> On Sun, 10 Sep 2017 15:07:49 +0300
> Shahaf Shuler <shahafs@mellanox.com> wrote:
> 
> > Introduce a new API to configure Tx offloads.
> >
> > In the new API, offloads are divided into per-port and per-queue
> > offloads. The PMD reports capability for each of them.
> > Offloads are enabled using the existing DEV_TX_OFFLOAD_* flags.
> > To enable per-port offload, the offload should be set on both device
> > configuration and queue configuration. To enable per-queue offload,
> > the offloads can be set only on queue configuration.
> >
> > In addition the Tx offloads will be disabled by default and be enabled
> > per application needs. This will much simplify PMD management of the
> > different offloads.
> >
> > The new API does not have an equivalent for the below, benchmark
> > specific, flags:
> >
> > 	- ETH_TXQ_FLAGS_NOREFCOUNT
> > 	- ETH_TXQ_FLAGS_NOMULTMEMP
> >
> > Applications should set the ETH_TXQ_FLAGS_IGNORE flag on txq_flags
> > field in order to move to the new API.
> >
> > The old Tx offloads API is kept for the meanwhile, in order to enable
> > a smooth transition for PMDs and application to the new API.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> 
> Agree on a conceptual and hardware level, that this is a property that could
> be per queue. But is there really an application that would want to have
> refcounting on one queue and not another?  If application is cloning mbuf's it
> needs refcounting.  One could even argue that for safety these should be
> library wide.  That way if an application tried to manipulate ref count on an
> mbuf and refcountin was enabled it could be panic'd.

Actually the refcount and multi mempool flags has no equivalent on this new API. They are not counted as offloads rather some guarantees from application side, which I agree that probably needs to by library wide. 
In the current API you cannot set those per queue nor per port. I think there is intention to move those flags to some other location following this series [1]

[1]
http://dpdk.org/ml/archives/dev/2017-September/074475.html
  
Jerin Jacob Sept. 11, 2017, 6:21 a.m. UTC | #3
-----Original Message-----
> Date: Mon, 11 Sep 2017 05:52:19 +0000
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: Stephen Hemminger <stephen@networkplumber.org>
> CC: Thomas Monjalon <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> Sunday, September 10, 2017 8:48 PM, Stephen Hemminger:
> > 
> > On Sun, 10 Sep 2017 15:07:49 +0300
> > Shahaf Shuler <shahafs@mellanox.com> wrote:
> > 
> > > Introduce a new API to configure Tx offloads.
> > >
> > > In the new API, offloads are divided into per-port and per-queue
> > > offloads. The PMD reports capability for each of them.
> > > Offloads are enabled using the existing DEV_TX_OFFLOAD_* flags.
> > > To enable per-port offload, the offload should be set on both device
> > > configuration and queue configuration. To enable per-queue offload,
> > > the offloads can be set only on queue configuration.
> > >
> > > In addition the Tx offloads will be disabled by default and be enabled
> > > per application needs. This will much simplify PMD management of the
> > > different offloads.
> > >
> > > The new API does not have an equivalent for the below, benchmark
> > > specific, flags:
> > >
> > > 	- ETH_TXQ_FLAGS_NOREFCOUNT
> > > 	- ETH_TXQ_FLAGS_NOMULTMEMP
> > >
> > > Applications should set the ETH_TXQ_FLAGS_IGNORE flag on txq_flags
> > > field in order to move to the new API.
> > >
> > > The old Tx offloads API is kept for the meanwhile, in order to enable
> > > a smooth transition for PMDs and application to the new API.
> > >
> > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > ---
> > 
> > Agree on a conceptual and hardware level, that this is a property that could
> > be per queue. But is there really an application that would want to have
> > refcounting on one queue and not another?  If application is cloning mbuf's it
> > needs refcounting.  One could even argue that for safety these should be
> > library wide.  That way if an application tried to manipulate ref count on an
> > mbuf and refcountin was enabled it could be panic'd.
> 
> Actually the refcount and multi mempool flags has no equivalent on this new API. They are not counted as offloads rather some guarantees from application side, which I agree that probably needs to by library wide. 
> In the current API you cannot set those per queue nor per port. I think there is intention to move those flags to some other location following this series [1]

I don't think that is in following this series. It should be in this
series, if we are removing a feature then we should find a way to fit that in
some location as there is a use case for it[1]. Without an alternative,
this patch is NACK from me.

[1]
http://dpdk.org/ml/archives/dev/2017-September/074475.html

> 
> [1]
> http://dpdk.org/ml/archives/dev/2017-September/074475.html
> 
> 
>
  
Shahaf Shuler Sept. 11, 2017, 7:56 a.m. UTC | #4
Monday, September 11, 2017 9:21 AM, Jerin Jacob:
> 
> I don't think that is in following this series. It should be in this series, if we are
> removing a feature then we should find a way to fit that in some location as
> there is a use case for it[1]. Without an alternative, this patch is NACK from
> me.
> 
> [1]
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> k.org%2Fml%2Farchives%2Fdev%2F2017-
> September%2F074475.html&data=02%7C01%7Cshahafs%40mellanox.com%7
> C6ce00422e1db4075f9f808d4f8dd5f39%7Ca652971c7d2e4d9ba6a4d149256f46
> 1b%7C0%7C0%7C636407077062635613&sdata=INJMOfiL9iwSboWuTVhnVvllu
> e2gS1%2FVB4Aj9XP09No%3D&reserved=0

I don't understand.
From the exact link above, you explicitly say that *you* will move this flags once the series is integrated. Quoting:

" 
> Please Jerin, could you work on moving these settings in a new API?

Sure. Once the generic code is in place. We are committed to fix the
PMDs by 18.02.
"

What has changed?
  
Andrew Rybchenko Sept. 11, 2017, 8:03 a.m. UTC | #5
On 09/10/2017 03:07 PM, Shahaf Shuler wrote:
> Introduce a new API to configure Tx offloads.
>
> In the new API, offloads are divided into per-port and per-queue
> offloads. The PMD reports capability for each of them.
> Offloads are enabled using the existing DEV_TX_OFFLOAD_* flags.
> To enable per-port offload, the offload should be set on both device
> configuration and queue configuration. To enable per-queue offload, the
> offloads can be set only on queue configuration.
>
> In addition the Tx offloads will be disabled by default and be
> enabled per application needs. This will much simplify PMD management of
> the different offloads.
>
> The new API does not have an equivalent for the below, benchmark
> specific, flags:
>
> 	- ETH_TXQ_FLAGS_NOREFCOUNT
> 	- ETH_TXQ_FLAGS_NOMULTMEMP
>
> Applications should set the ETH_TXQ_FLAGS_IGNORE flag on txq_flags
> field in order to move to the new API.
>
> The old Tx offloads API is kept for the meanwhile, in order to enable a
> smooth transition for PMDs and application to the new API.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>
> Note:
>
> In the special case were application is using the old API and the PMD
> has allready converted to the new one. If there are Tx offloads which can
> be set only per-port the queue setup may fail.
>
> I choose to treat this case as an exception considering all Tx offloads are
> currently defined to be per-queue. New ones to be added should require from
> the application to move to the new API as well.
>
> ---
>   doc/guides/nics/features.rst  |  8 ++++++
>   lib/librte_ether/rte_ethdev.c | 59 +++++++++++++++++++++++++++++++++++++-
>   lib/librte_ether/rte_ethdev.h | 32 ++++++++++++++++++++-
>   3 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index f2c8497c2..bb25a1cee 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -131,6 +131,7 @@ Lock-free Tx queue
>   If a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads can
>   invoke rte_eth_tx_burst() concurrently on the same Tx queue without SW lock.
>   
> +* **[uses]    rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_MT_LOCKFREE``.

It should be rte_eth_txconf here and below since renaming is postponed.

>   * **[provides] rte_eth_dev_info**: ``tx_offload_capa:DEV_TX_OFFLOAD_MT_LOCKFREE``.
>   * **[related]  API**: ``rte_eth_tx_burst()``.
>   
> @@ -220,6 +221,7 @@ TSO
>   
>   Supports TCP Segmentation Offloading.
>   
> +* **[uses]       rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_TCP_TSO``.
>   * **[uses]       rte_eth_desc_lim**: ``nb_seg_max``, ``nb_mtu_seg_max``.
>   * **[uses]       mbuf**: ``mbuf.ol_flags:PKT_TX_TCP_SEG``.
>   * **[uses]       mbuf**: ``mbuf.tso_segsz``, ``mbuf.l2_len``, ``mbuf.l3_len``, ``mbuf.l4_len``.
> @@ -510,6 +512,7 @@ VLAN offload
>   Supports VLAN offload to hardware.
>   
>   * **[uses]       rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_VLAN_STRIP,DEV_RX_OFFLOAD_VLAN_FILTER,DEV_RX_OFFLOAD_VLAN_EXTEND``.
> +* **[uses]       rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_VLAN_INSERT``.
>   * **[implements] eth_dev_ops**: ``vlan_offload_set``.
>   * **[provides]   mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.vlan_tci``.
>   * **[provides]   rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_VLAN_STRIP``,
> @@ -526,6 +529,7 @@ QinQ offload
>   Supports QinQ (queue in queue) offload.
>   
>   * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_QINQ_STRIP``.
> +* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_QINQ_INSERT``.
>   * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_QINQ_PKT``.
>   * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, ``mbuf.vlan_tci``,
>      ``mbuf.vlan_tci_outer``.
> @@ -541,6 +545,7 @@ L3 checksum offload
>   Supports L3 checksum offload.
>   
>   * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_IPV4_CKSUM``.
> +* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_IPV4_CKSUM``.
>   * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_IP_CKSUM``,
>     ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``.
>   * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_IP_CKSUM_UNKNOWN`` |
> @@ -558,6 +563,7 @@ L4 checksum offload
>   Supports L4 checksum offload.
>   
>   * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``.
> +* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``.
>   * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``,
>     ``mbuf.ol_flags:PKT_TX_L4_NO_CKSUM`` | ``PKT_TX_TCP_CKSUM`` |
>     ``PKT_TX_SCTP_CKSUM`` | ``PKT_TX_UDP_CKSUM``.
> @@ -576,6 +582,7 @@ MACsec offload
>   Supports MACsec.
>   
>   * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_MACSEC_STRIP``.
> +* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_MACSEC_INSERT``.
>   * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_MACSEC``.
>   * **[provides] rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_MACSEC_STRIP``,
>     ``tx_offload_capa:DEV_TX_OFFLOAD_MACSEC_INSERT``.
> @@ -589,6 +596,7 @@ Inner L3 checksum
>   Supports inner packet L3 checksum.
>   
>   * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM``.
> +* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM``.
>   * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_IP_CKSUM``,
>     ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``,
>     ``mbuf.ol_flags:PKT_TX_OUTER_IP_CKSUM``,
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index b3c10701e..cd79cb1c9 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1186,6 +1186,50 @@ rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id,
>   	return ret;
>   }
>   
> +/**
> + * A conversion function from txq_flags API.
> + */
> +static void
> +rte_eth_convert_txq_flags(const uint32_t txq_flags, uint64_t *tx_offloads)

Maybe tx_offlaods should be simply return value of the function instead 
of void.
Similar comment is applicable to rte_eth_convert_txq_offloads().

> +{
> +	uint64_t offloads = 0;
> +
> +	if (!(txq_flags & ETH_TXQ_FLAGS_NOMULTSEGS))
> +		offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
> +	if (!(txq_flags & ETH_TXQ_FLAGS_NOVLANOFFL))
> +		offloads |= DEV_TX_OFFLOAD_VLAN_INSERT;
> +	if (!(txq_flags & ETH_TXQ_FLAGS_NOXSUMSCTP))
> +		offloads |= DEV_TX_OFFLOAD_SCTP_CKSUM;
> +	if (!(txq_flags & ETH_TXQ_FLAGS_NOXSUMUDP))
> +		offloads |= DEV_TX_OFFLOAD_UDP_CKSUM;
> +	if (!(txq_flags & ETH_TXQ_FLAGS_NOXSUMTCP))
> +		offloads |= DEV_TX_OFFLOAD_TCP_CKSUM;
> +
> +	*tx_offloads = offloads;
> +}
> +
> +/**
> + * A conversion function from offloads API.
> + */
> +static void
> +rte_eth_convert_txq_offloads(const uint64_t tx_offloads, uint32_t *txq_flags)
> +{
> +	uint32_t flags = 0;
> +
> +	if (!(tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS))
> +		flags |= ETH_TXQ_FLAGS_NOMULTSEGS;
> +	if (!(tx_offloads & DEV_TX_OFFLOAD_VLAN_INSERT))
> +		flags |= ETH_TXQ_FLAGS_NOVLANOFFL;
> +	if (!(tx_offloads & DEV_TX_OFFLOAD_SCTP_CKSUM))
> +		flags |= ETH_TXQ_FLAGS_NOXSUMSCTP;
> +	if (!(tx_offloads & DEV_TX_OFFLOAD_UDP_CKSUM))
> +		flags |= ETH_TXQ_FLAGS_NOXSUMUDP;
> +	if (!(tx_offloads & DEV_TX_OFFLOAD_TCP_CKSUM))
> +		flags |= ETH_TXQ_FLAGS_NOXSUMTCP;
> +
> +	*txq_flags = flags;
> +}
> +
>   int
>   rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
>   		       uint16_t nb_tx_desc, unsigned int socket_id,
> @@ -1193,6 +1237,7 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
>   {
>   	struct rte_eth_dev *dev;
>   	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_txconf local_conf;
>   	void **txq;
>   
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
> @@ -1237,8 +1282,20 @@ rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
>   	if (tx_conf == NULL)
>   		tx_conf = &dev_info.default_txconf;
>   
> +	/*
> +	 * Convert between the offloads API to enable PMDs to support
> +	 * only one of them.
> +	 */
> +	local_conf = *tx_conf;
> +	if (tx_conf->txq_flags & ETH_TXQ_FLAGS_IGNORE)
> +		rte_eth_convert_txq_offloads(tx_conf->offloads,
> +					     &local_conf.txq_flags);
> +	else
> +		rte_eth_convert_txq_flags(tx_conf->txq_flags,
> +					  &local_conf.offloads);
> +
>   	return (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
> -					       socket_id, tx_conf);
> +					       socket_id, &local_conf);
>   }
>   
>   void
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index f424cba04..4ad7dd059 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -692,6 +692,12 @@ struct rte_eth_vmdq_rx_conf {
>    */
>   struct rte_eth_txmode {
>   	enum rte_eth_tx_mq_mode mq_mode; /**< TX multi-queues mode. */
> +	uint64_t offloads;
> +	/**

It should be /**< to say Doxygen that it is a comment for the previous line.
However, I'd prefer to see the comment before uint64_t offloads; (and 
keep /** )
Not sure, since it highly depends on what is used in other similar 
places in the file.
Similar comments are applicable to a number of lines below.

> +	 * Per-port Tx offloads to be set using DEV_TX_OFFLOAD_* flags.
> +	 * Only offloads set on tx_offload_capa field on rte_eth_dev_info
> +	 * structure are allowed to be set.
> +	 */
>   
>   	/* For i40e specifically */
>   	uint16_t pvid;
> @@ -733,6 +739,14 @@ struct rte_eth_rxconf {
>   #define ETH_TXQ_FLAGS_NOXSUMS \
>   		(ETH_TXQ_FLAGS_NOXSUMSCTP | ETH_TXQ_FLAGS_NOXSUMUDP | \
>   		 ETH_TXQ_FLAGS_NOXSUMTCP)
> +#define ETH_TXQ_FLAGS_IGNORE	0x8000
> +	/**
> +	 * When set the txq_flags should be ignored,
> +	 * instead per-queue Tx offloads will be set on offloads field
> +	 * located on rte_eth_txq_conf struct.
> +	 * This flag is temporary till the rte_eth_txq_conf.txq_flags
> +	 * API will be deprecated.
> +	 */
>   
>   /**
>    * A structure used to configure a TX ring of an Ethernet port.
> @@ -745,6 +759,12 @@ struct rte_eth_txconf {
>   
>   	uint32_t txq_flags; /**< Set flags for the Tx queue */
>   	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +	uint64_t offloads;
> +	/**
> +	 * Per-queue Tx offloads to be set  using DEV_TX_OFFLOAD_* flags.
> +	 * Only offloads set on tx_queue_offload_capa field on rte_eth_dev_info
> +	 * structure are allowed to be set.
> +	 */
>   };
>   
>   /**
> @@ -969,6 +989,8 @@ struct rte_eth_conf {
>   /**< Multiple threads can invoke rte_eth_tx_burst() concurrently on the same
>    * tx queue without SW lock.
>    */
> +#define DEV_TX_OFFLOAD_MULTI_SEGS	0x00008000
> +/**< multi segment send is supported. */

The comment should start from capital letter as everywhere else in the 
file (as far as I can see).

>   
>   struct rte_pci_device;
>   
> @@ -991,9 +1013,12 @@ struct rte_eth_dev_info {
>   	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
>   	uint64_t rx_offload_capa;
>   	/**< Device per port RX offload capabilities. */
> -	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
> +	uint64_t tx_offload_capa;
> +	/**< Device per port TX offload capabilities. */
>   	uint64_t rx_queue_offload_capa;
>   	/**< Device per queue RX offload capabilities. */
> +	uint64_t tx_queue_offload_capa;
> +	/**< Device per queue TX offload capabilities. */
>   	uint16_t reta_size;
>   	/**< Device redirection table size, the total number of entries. */
>   	uint8_t hash_key_size; /**< Hash key size in bytes */
> @@ -2024,6 +2049,11 @@ int rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id,
>    *   - The *txq_flags* member contains flags to pass to the TX queue setup
>    *     function to configure the behavior of the TX queue. This should be set
>    *     to 0 if no special configuration is required.
> + *     This API is obsolete and will be deprecated. Applications
> + *     should set it to ETH_TXQ_FLAGS_IGNORE and use
> + *     the offloads field below.
> + *   - The *offloads* member contains Tx offloads to be enabled.
> + *     Offloads which are not set cannot be used on the datapath.
>    *
>    *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0 forces
>    *     the transmit function to use default values.
  
Jerin Jacob Sept. 11, 2017, 8:06 a.m. UTC | #6
-----Original Message-----
> Date: Mon, 11 Sep 2017 07:56:05 +0000
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon
>  <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> Monday, September 11, 2017 9:21 AM, Jerin Jacob:
> > 
> > I don't think that is in following this series. It should be in this series, if we are
> > removing a feature then we should find a way to fit that in some location as
> > there is a use case for it[1]. Without an alternative, this patch is NACK from
> > me.
> > 
> > [1]
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> > k.org%2Fml%2Farchives%2Fdev%2F2017-
> > September%2F074475.html&data=02%7C01%7Cshahafs%40mellanox.com%7
> > C6ce00422e1db4075f9f808d4f8dd5f39%7Ca652971c7d2e4d9ba6a4d149256f46
> > 1b%7C0%7C0%7C636407077062635613&sdata=INJMOfiL9iwSboWuTVhnVvllu
> > e2gS1%2FVB4Aj9XP09No%3D&reserved=0
> 
> I don't understand.
> From the exact link above, you explicitly say that *you* will move this flags once the series is integrated. Quoting:
> 
> " 
> > Please Jerin, could you work on moving these settings in a new API?
> 
> Sure. Once the generic code is in place. We are committed to fix the
> PMDs by 18.02.

Yes. I will take care of the PMD(nicvf) side of the changes. Not in ethdev or
mempool. Meaning, you need to decide how you are going to expose the
equivalent of these flags and enable the generic code for those flags in
ethdev or mempool. The drivers side of changes I can take care.

> "
> 
> What has changed?
> 
>
  
Shahaf Shuler Sept. 11, 2017, 8:46 a.m. UTC | #7
Monday, September 11, 2017 11:06 AM, Jerin Jacob:
> >
> > I don't understand.
> > From the exact link above, you explicitly say that *you* will move this flags
> once the series is integrated. Quoting:
> >
> > "
> > > Please Jerin, could you work on moving these settings in a new API?
> >
> > Sure. Once the generic code is in place. We are committed to fix the
> > PMDs by 18.02.
> 
> Yes. I will take care of the PMD(nicvf) side of the changes. Not in ethdev or
> mempool. Meaning, you need to decide how you are going to expose the
> equivalent of these flags and enable the generic code for those flags in
> ethdev or mempool. The drivers side of changes I can take care.
> 

How about doing it a PMD option? 
Seems like nicvf is the only PMD which care about them.

If there will be more PMDs later, we can think about which API is needed.
  
Jerin Jacob Sept. 11, 2017, 9:05 a.m. UTC | #8
-----Original Message-----
> Date: Mon, 11 Sep 2017 08:46:50 +0000
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon
>  <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> Monday, September 11, 2017 11:06 AM, Jerin Jacob:
> > >
> > > I don't understand.
> > > From the exact link above, you explicitly say that *you* will move this flags
> > once the series is integrated. Quoting:
> > >
> > > "
> > > > Please Jerin, could you work on moving these settings in a new API?
> > >
> > > Sure. Once the generic code is in place. We are committed to fix the
> > > PMDs by 18.02.
> > 
> > Yes. I will take care of the PMD(nicvf) side of the changes. Not in ethdev or
> > mempool. Meaning, you need to decide how you are going to expose the
> > equivalent of these flags and enable the generic code for those flags in
> > ethdev or mempool. The drivers side of changes I can take care.
> > 
> 
> How about doing it a PMD option? 
> Seems like nicvf is the only PMD which care about them.

Lets take flag by flag:
ETH_TXQ_FLAGS_NOMULTMEMP - I think, this should be removed. But we can have
common code in ethdev pmd to detect all pool being configured from on the same pool
as on the rx_configure() application passes the mempool.

ETH_TXQ_FLAGS_NOREFCOUNT: This one has i40e and nicvf consumers.
And it is driven by the use case too. So it should available in some
form.

> 
> If there will be more PMDs later, we can think about which API is needed.  
>
  
Ananyev, Konstantin Sept. 11, 2017, 11:02 a.m. UTC | #9
> > > >
> > > > I don't understand.
> > > > From the exact link above, you explicitly say that *you* will move this flags
> > > once the series is integrated. Quoting:
> > > >
> > > > "
> > > > > Please Jerin, could you work on moving these settings in a new API?
> > > >
> > > > Sure. Once the generic code is in place. We are committed to fix the
> > > > PMDs by 18.02.
> > >
> > > Yes. I will take care of the PMD(nicvf) side of the changes. Not in ethdev or
> > > mempool. Meaning, you need to decide how you are going to expose the
> > > equivalent of these flags and enable the generic code for those flags in
> > > ethdev or mempool. The drivers side of changes I can take care.
> > >
> >
> > How about doing it a PMD option?
> > Seems like nicvf is the only PMD which care about them.
> 
> Lets take flag by flag:
> ETH_TXQ_FLAGS_NOMULTMEMP - I think, this should be removed. But we can have
> common code in ethdev pmd to detect all pool being configured from on the same pool
> as on the rx_configure() application passes the mempool.


This is TX offloads, not RX.
At tx_queue_setup() user doesn't have to provide the mempool pointer,
and can pass mbuf from any mempool to the TX routine.
BTW, how do you know one which particular mempool to use?
Still read it from xmitted mbuf (At least first one), I presume?

> 
> ETH_TXQ_FLAGS_NOREFCOUNT: This one has i40e and nicvf consumers.

About i40e - as far as I know, no-one use i40e PMD with this flag.
As far as I remember, it was added purely for benchmarking purposes on some early stages.
So my vote would be to remove it from i40e.
Helin, Jingjing - what are your thoughts here.
About nicvf - as I can see it is used only in conjunction with ETH_TXQ_FLAGS_NOMULTMEMP,
never alone.
My understanding is that current meaning of these flags
is a promise for PMD that for that particular TX queue user would submit only mbufs that:
- all belong to the same mempool
- always would have refcount==1
 - would always be a direct ones (no indirect mbufs)

So literally, yes it is not a TX HW offload, though I understand your intention to
have such possibility - it might help to save some cycles. 
Wonder would some new driver specific function would help in that case?
nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool, uint32_t flags);
or so?
So the user can call it just before rte_eth_tx_queue_setup()?
Konstantin

> And it is driven by the use case too. So it should available in some
> form.	
> 
> >
> > If there will be more PMDs later, we can think about which API is needed.
> >
  
Shahaf Shuler Sept. 11, 2017, 12:27 p.m. UTC | #10
September 11, 2017 11:03 AM, Andrew Rybchenko:




 +/**

 + * A conversion function from txq_flags API.

 + */

 +static void

 +rte_eth_convert_txq_flags(const uint32_t txq_flags, uint64_t *tx_offloads)

Maybe tx_offlaods should be simply return value of the function instead of void.
Similar comment is applicable to rte_eth_convert_txq_offloads().


Can you elaborate why it would be better?
  
Andrew Rybchenko Sept. 11, 2017, 1:10 p.m. UTC | #11
On 09/11/2017 03:27 PM, Shahaf Shuler wrote:
>
> September 11, 2017 11:03 AM, Andrew Rybchenko:
>
>     +/**
>
>     + * A conversion function from txq_flags API.
>
>     + */
>
>     +static void
>
>     +rte_eth_convert_txq_flags(const uint32_t txq_flags, uint64_t *tx_offloads)
>
>
> Maybe tx_offlaods should be simply return value of the function 
> instead of void.
> Similar comment is applicable to rte_eth_convert_txq_offloads().
>
> Can you elaborate why it would be better?
>

It is a pure converter function and it would avoid questions like:
Is tx_offloads an output only or input/output parameter?
Yes, the function is tiny and it is easy to find the answer, but still.

Also it would make it possible to pass the result to other function
call without extra variables etc.
Yes, right now usage of the function is very limited and hopefully
it will not live long, so it is not that important. Definitely up to you.
  
Jerin Jacob Sept. 12, 2017, 4:01 a.m. UTC | #12
-----Original Message-----
> Date: Mon, 11 Sep 2017 11:02:07 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Shahaf Shuler
>  <shahafs@mellanox.com>
> CC: Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon
>  <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>, "Zhang, Helin"
>  <helin.zhang@intel.com>, "Wu, Jingjing" <jingjing.wu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> 
> > > > >
> > > > > I don't understand.
> > > > > From the exact link above, you explicitly say that *you* will move this flags
> > > > once the series is integrated. Quoting:
> > > > >
> > > > > "
> > > > > > Please Jerin, could you work on moving these settings in a new API?
> > > > >
> > > > > Sure. Once the generic code is in place. We are committed to fix the
> > > > > PMDs by 18.02.
> > > >
> > > > Yes. I will take care of the PMD(nicvf) side of the changes. Not in ethdev or
> > > > mempool. Meaning, you need to decide how you are going to expose the
> > > > equivalent of these flags and enable the generic code for those flags in
> > > > ethdev or mempool. The drivers side of changes I can take care.
> > > >
> > >
> > > How about doing it a PMD option?
> > > Seems like nicvf is the only PMD which care about them.
> > 
> > Lets take flag by flag:
> > ETH_TXQ_FLAGS_NOMULTMEMP - I think, this should be removed. But we can have
> > common code in ethdev pmd to detect all pool being configured from on the same pool
> > as on the rx_configure() application passes the mempool.
> 
> 
> This is TX offloads, not RX.
> At tx_queue_setup() user doesn't have to provide the mempool pointer,
> and can pass mbuf from any mempool to the TX routine.
> BTW, how do you know one which particular mempool to use?
> Still read it from xmitted mbuf (At least first one), I presume?

Yes. Still it reads from xmitted mbuf for the first one.

> 
> > 
> > ETH_TXQ_FLAGS_NOREFCOUNT: This one has i40e and nicvf consumers.
> 
> About i40e - as far as I know, no-one use i40e PMD with this flag.
> As far as I remember, it was added purely for benchmarking purposes on some early stages.
> So my vote would be to remove it from i40e.
> Helin, Jingjing - what are your thoughts here.
> About nicvf - as I can see it is used only in conjunction with ETH_TXQ_FLAGS_NOMULTMEMP,
> never alone.
> My understanding is that current meaning of these flags
> is a promise for PMD that for that particular TX queue user would submit only mbufs that:
> - all belong to the same mempool
> - always would have refcount==1
>  - would always be a direct ones (no indirect mbufs)

Yes, only when ETH_TXQ_FLAGS_NOMULTMEMP and ETH_TXQ_FLAGS_NOREFCOUNT
selected at tx queue configuration.

> 
> So literally, yes it is not a TX HW offload, though I understand your intention to
> have such possibility - it might help to save some cycles. 

It not a few cycles. We could see ~24% drop on per core(with 64B) with 
testpmd and l3fwd on some SoCs. It is not very specific to nicvf HW, The
problem is with limited cache hierarchy in very low end arm64 machines.
For TX buffer recycling case, it need to touch the mbuf again to find out the
associated mempool to free. It is fine if application demands it but not
all the application demands it.

We have two category of arm64 machines, The high end machine where cache
hierarchy similar x86 server machine. The low end ones with very
limited cache resources. Unfortunately, we need to have the same binary on both
machines.


> Wonder would some new driver specific function would help in that case?
> nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool, uint32_t flags);
> or so?

It is possible, but how do we make such change in testpmd, l3fwd or
ipsec-gw in tree application which does need only NOMULTIMEMP &
NOREFCOUNT.

If there is concern about making it Tx queue level it is fine. We can
move from queue level to port level or global level.
IMO, Application should express in some form that it wants only
NOMULTIMEMP & NOREFCOUNT and thats is the case for l3fwd and ipsec-gw


> So the user can call it just before rte_eth_tx_queue_setup()?
> Konstantin
> 
> > And it is driven by the use case too. So it should available in some
> > form.	
> > 
> > >
> > > If there will be more PMDs later, we can think about which API is needed.
> > >
  
Shahaf Shuler Sept. 12, 2017, 5:25 a.m. UTC | #13
Tuesday, September 12, 2017 7:01 AM, Jerin Jacob:
> Yes, only when ETH_TXQ_FLAGS_NOMULTMEMP and
> ETH_TXQ_FLAGS_NOREFCOUNT selected at tx queue configuration.
> 
> >
> > So literally, yes it is not a TX HW offload, though I understand your
> > intention to have such possibility - it might help to save some cycles.
> 
> It not a few cycles. We could see ~24% drop on per core(with 64B) with
> testpmd and l3fwd on some SoCs. It is not very specific to nicvf HW, The
> problem is with limited cache hierarchy in very low end arm64 machines.
> For TX buffer recycling case, it need to touch the mbuf again to find out the
> associated mempool to free. It is fine if application demands it but not all the
> application demands it.
> 
> We have two category of arm64 machines, The high end machine where
> cache hierarchy similar x86 server machine. The low end ones with very
> limited cache resources. Unfortunately, we need to have the same binary on
> both machines.
> 
> 
> > Wonder would some new driver specific function would help in that case?
> > nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool,
> > uint32_t flags); or so?
> 
> It is possible, but how do we make such change in testpmd, l3fwd or ipsec-
> gw in tree application which does need only NOMULTIMEMP &
> NOREFCOUNT.
> 
> If there is concern about making it Tx queue level it is fine. We can move
> from queue level to port level or global level.
> IMO, Application should express in some form that it wants only
> NOMULTIMEMP & NOREFCOUNT and thats is the case for l3fwd and ipsec-
> gw
> 

I understand the use case, and the fact those flags improve the performance on low-end ARM CPUs.
IMO those flags cannot be on queue/port level. They must be global.

Even though the use-case is generic the nicvf PMD is the only one which do such optimization.
So am suggesting again - why not expose it as a PMD specific parameter?

- The application can express it wants such optimization. 
- It is global

Currently it does not seems there is high demand for such flags from other PMDs. If such demand will raise, we can discuss again on how to expose it properly.
  
Jerin Jacob Sept. 12, 2017, 5:51 a.m. UTC | #14
-----Original Message-----
> Date: Tue, 12 Sep 2017 05:25:42 +0000
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, "Ananyev, Konstantin"
>  <konstantin.ananyev@intel.com>
> CC: Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon
>  <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>, "Zhang, Helin"
>  <helin.zhang@intel.com>, "Wu, Jingjing" <jingjing.wu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> Tuesday, September 12, 2017 7:01 AM, Jerin Jacob:
> > Yes, only when ETH_TXQ_FLAGS_NOMULTMEMP and
> > ETH_TXQ_FLAGS_NOREFCOUNT selected at tx queue configuration.
> > 
> > >
> > > So literally, yes it is not a TX HW offload, though I understand your
> > > intention to have such possibility - it might help to save some cycles.
> > 
> > It not a few cycles. We could see ~24% drop on per core(with 64B) with
> > testpmd and l3fwd on some SoCs. It is not very specific to nicvf HW, The
> > problem is with limited cache hierarchy in very low end arm64 machines.
> > For TX buffer recycling case, it need to touch the mbuf again to find out the
> > associated mempool to free. It is fine if application demands it but not all the
> > application demands it.
> > 
> > We have two category of arm64 machines, The high end machine where
> > cache hierarchy similar x86 server machine. The low end ones with very
> > limited cache resources. Unfortunately, we need to have the same binary on
> > both machines.
> > 
> > 
> > > Wonder would some new driver specific function would help in that case?
> > > nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool,
> > > uint32_t flags); or so?
> > 
> > It is possible, but how do we make such change in testpmd, l3fwd or ipsec-
> > gw in tree application which does need only NOMULTIMEMP &
> > NOREFCOUNT.
> > 
> > If there is concern about making it Tx queue level it is fine. We can move
> > from queue level to port level or global level.
> > IMO, Application should express in some form that it wants only
> > NOMULTIMEMP & NOREFCOUNT and thats is the case for l3fwd and ipsec-
> > gw
> > 
> 
> I understand the use case, and the fact those flags improve the performance on low-end ARM CPUs.
> IMO those flags cannot be on queue/port level. They must be global.

Where should we have it as global(in terms of API)?
And why it can not be at port level?

> 
> Even though the use-case is generic the nicvf PMD is the only one which do such optimization.
> So am suggesting again - why not expose it as a PMD specific parameter?

Why to make it as PMD specific? if application can express it though
normative DPDK APIs.

> 
> - The application can express it wants such optimization. 
> - It is global
> 
> Currently it does not seems there is high demand for such flags from other PMDs. If such demand will raise, we can discuss again on how to expose it properly.

It is not PMD specific. It is all about where it runs? it will
applicable for any PMD that runs low end hardwares where it need SW
based Tx buffer recycling(The NPU is different story as it has HW
assisted mempool manager).
What we are loosing by running DPDK effectively on low end hardware
with such "on demand" runtime configuration though DPDK normative API.


> 
> 
> 
> 
>
  
Shahaf Shuler Sept. 12, 2017, 6:35 a.m. UTC | #15
Tuesday, September 12, 2017 8:52 AM, Jerin Jacob:
> > I understand the use case, and the fact those flags improve the
> performance on low-end ARM CPUs.
> > IMO those flags cannot be on queue/port level. They must be global.
> 
> Where should we have it as global(in terms of API)?
> And why it can not be at port level?

Because I don't think there is a use-case that application would want to have recounting on one port and not on the other. It is either application clone/not clone mbufs. 
Same about the multi mempool. It is either application have it or not. 

If there is a strong use-case for application to say on port X it clones mbufs and and port Y it don't then maybe this is enough to have it per-port.
We can go even further - why not to have guarantee per queue? it is possible if application is willing to manage. 

Again those are not offloads, therefore if we expose those this should on different location the offloads field on eth conf. 

> 
> >
> > Even though the use-case is generic the nicvf PMD is the only one which do
> such optimization.
> > So am suggesting again - why not expose it as a PMD specific parameter?
> 
> Why to make it as PMD specific? if application can express it though
> normative DPDK APIs.
> 
> >
> > - The application can express it wants such optimization.
> > - It is global
> >
> > Currently it does not seems there is high demand for such flags from other
> PMDs. If such demand will raise, we can discuss again on how to expose it
> properly.
> 
> It is not PMD specific. It is all about where it runs? it will applicable for any
> PMD that runs low end hardwares where it need SW based Tx buffer
> recycling(The NPU is different story as it has HW assisted mempool
> manager).

Maybe, but I don't see other PMD which use those flags. Do you aware to any plans to add such optimizations?
You are pushing for generic API which is currently used only by a single entity. 

> What we are loosing by running DPDK effectively on low end hardware with
> such "on demand" runtime configuration though DPDK normative API.

Complexity of APIs for applications. More structs on ethdev, more API definitions, more field to be configured by application, all valid for a single PMD. 
For the rest of the PMDs, those fields are currently don't-care. 

> 
> 
> >
> >
> >
> >
> >
  
Andrew Rybchenko Sept. 12, 2017, 6:43 a.m. UTC | #16
On 09/12/2017 08:51 AM, Jerin Jacob wrote:
>> Tuesday, September 12, 2017 7:01 AM, Jerin Jacob:
>>> Yes, only when ETH_TXQ_FLAGS_NOMULTMEMP and
>>> ETH_TXQ_FLAGS_NOREFCOUNT selected at tx queue configuration.
>>>
>>>> So literally, yes it is not a TX HW offload, though I understand your
>>>> intention to have such possibility - it might help to save some cycles.
>>> It not a few cycles. We could see ~24% drop on per core(with 64B) with
>>> testpmd and l3fwd on some SoCs. It is not very specific to nicvf HW, The
>>> problem is with limited cache hierarchy in very low end arm64 machines.
>>> For TX buffer recycling case, it need to touch the mbuf again to find out the
>>> associated mempool to free. It is fine if application demands it but not all the
>>> application demands it.
>>>
>>> We have two category of arm64 machines, The high end machine where
>>> cache hierarchy similar x86 server machine. The low end ones with very
>>> limited cache resources. Unfortunately, we need to have the same binary on
>>> both machines.
>>>
>>>
>>>> Wonder would some new driver specific function would help in that case?
>>>> nicvf_txq_pool_setup(portid, queueid, struct rte_mempool *txpool,
>>>> uint32_t flags); or so?
>>> It is possible, but how do we make such change in testpmd, l3fwd or ipsec-
>>> gw in tree application which does need only NOMULTIMEMP &
>>> NOREFCOUNT.
>>>
>>> If there is concern about making it Tx queue level it is fine. We can move
>>> from queue level to port level or global level.
>>> IMO, Application should express in some form that it wants only
>>> NOMULTIMEMP & NOREFCOUNT and thats is the case for l3fwd and ipsec-
>>> gw
>>>
>> I understand the use case, and the fact those flags improve the performance on low-end ARM CPUs.
>> IMO those flags cannot be on queue/port level. They must be global.
> Where should we have it as global(in terms of API)?
> And why it can not be at port level?

I think port level is the right place for these flags. These flags 
define which
transmit and transmit cleanup callbacks could be used. These functions are
specified on port level now. However, I see no good reasons to change it.
It will complicate the possibility to make transmit and transmit cleanup 
callback
per queue (not per port as now).
All three (no-multi-seg, no-multi-mempool, no-reference-counter) are from
one group and should go together.

>> Even though the use-case is generic the nicvf PMD is the only one which do such optimization.
>> So am suggesting again - why not expose it as a PMD specific parameter?
> Why to make it as PMD specific? if application can express it though
> normative DPDK APIs.
>
>> - The application can express it wants such optimization.
>> - It is global
>>
>> Currently it does not seems there is high demand for such flags from other PMDs. If such demand will raise, we can discuss again on how to expose it properly.
> It is not PMD specific. It is all about where it runs? it will
> applicable for any PMD that runs low end hardwares where it need SW
> based Tx buffer recycling(The NPU is different story as it has HW
> assisted mempool manager).
> What we are loosing by running DPDK effectively on low end hardware
> with such "on demand" runtime configuration though DPDK normative API.

+1 and it improves performance on amd64 as well, definitely less than 24%,
but noticeable. If application architecture meets these conditions, why 
don't
allow it use the advantage and run faster.
  
Andrew Rybchenko Sept. 12, 2017, 6:46 a.m. UTC | #17
On 09/12/2017 09:35 AM, Shahaf Shuler wrote:
> Tuesday, September 12, 2017 8:52 AM, Jerin Jacob:
>>> - The application can express it wants such optimization.
>>> - It is global
>>>
>>> Currently it does not seems there is high demand for such flags from other
>> PMDs. If such demand will raise, we can discuss again on how to expose it
>> properly.
>>
>> It is not PMD specific. It is all about where it runs? it will applicable for any
>> PMD that runs low end hardwares where it need SW based Tx buffer
>> recycling(The NPU is different story as it has HW assisted mempool
>> manager).
> Maybe, but I don't see other PMD which use those flags. Do you aware to any plans to add such optimizations?
> You are pushing for generic API which is currently used only by a single entity.

http://dpdk.org/ml/archives/dev/2017-September/074907.html
  
Shahaf Shuler Sept. 12, 2017, 6:59 a.m. UTC | #18
September 12, 2017 9:43 AM, Andrew Rybchenko:

I think port level is the right place for these flags. These flags define which
transmit and transmit cleanup callbacks could be used. These functions are
specified on port level now. However, I see no good reasons to change it.

The Tx queue flags are not currently per-port  rather per-queue. The flags are provided as an input for tx_queue_setup.
Even though application and example in dpdk tree use identical flags for all queues it doesn’t mean application is not allowed to do otherwise.


It will complicate the possibility to make transmit and transmit cleanup callback
per queue (not per port as now).
All three (no-multi-seg, no-multi-mempool, no-reference-counter) are from
one group and should go together.


Even though the use-case is generic the nicvf PMD is the only one which do such optimization.

So am suggesting again - why not expose it as a PMD specific parameter?



Why to make it as PMD specific? if application can express it though

normative DPDK APIs.





- The application can express it wants such optimization.

- It is global



Currently it does not seems there is high demand for such flags from other PMDs. If such demand will raise, we can discuss again on how to expose it properly.



It is not PMD specific. It is all about where it runs? it will

applicable for any PMD that runs low end hardwares where it need SW

based Tx buffer recycling(The NPU is different story as it has HW

assisted mempool manager).

What we are loosing by running DPDK effectively on low end hardware

with such "on demand" runtime configuration though DPDK normative API.

+1 and it improves performance on amd64 as well, definitely less than 24%,
but noticeable. If application architecture meets these conditions, why don't
allow it use the advantage and run faster.
  
Jerin Jacob Sept. 12, 2017, 7:17 a.m. UTC | #19
-----Original Message-----
> Date: Tue, 12 Sep 2017 06:35:16 +0000
> From: Shahaf Shuler <shahafs@mellanox.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, Stephen Hemminger
>  <stephen@networkplumber.org>, Thomas Monjalon <thomas@monjalon.net>,
>  "dev@dpdk.org" <dev@dpdk.org>, "Zhang, Helin" <helin.zhang@intel.com>,
>  "Wu, Jingjing" <jingjing.wu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> Tuesday, September 12, 2017 8:52 AM, Jerin Jacob:
> > > I understand the use case, and the fact those flags improve the
> > performance on low-end ARM CPUs.
> > > IMO those flags cannot be on queue/port level. They must be global.
> > 
> > Where should we have it as global(in terms of API)?
> > And why it can not be at port level?
> 
> Because I don't think there is a use-case that application would want to have recounting on one port and not on the other. It is either application clone/not clone mbufs. 
> Same about the multi mempool. It is either application have it or not. 

Why not? If a port is given to data plane and another port to control
plane. It can have different characteristics.

Making it port level, we can achieve the global use case as well. but not
another way around.

MULTISEG flag also has the same attribute. But some reason you are OK to
include that in flags.

> 
> If there is a strong use-case for application to say on port X it clones mbufs and and port Y it don't then maybe this is enough to have it per-port.
> We can go even further - why not to have guarantee per queue? it is possible if application is willing to manage. 
> 
> Again those are not offloads, therefore if we expose those this should on different location the offloads field on eth conf. 

What is the definition of offload? It is something we can offload to HW.
If so, then, reference count we can offload to HW with external HW pool
manager which DPDK has support now.

> 
> > 
> > >
> > > Even though the use-case is generic the nicvf PMD is the only one which do
> > such optimization.
> > > So am suggesting again - why not expose it as a PMD specific parameter?
> > 
> > Why to make it as PMD specific? if application can express it though
> > normative DPDK APIs.
> > 
> > >
> > > - The application can express it wants such optimization.
> > > - It is global
> > >
> > > Currently it does not seems there is high demand for such flags from other
> > PMDs. If such demand will raise, we can discuss again on how to expose it
> > properly.
> > 
> > It is not PMD specific. It is all about where it runs? it will applicable for any
> > PMD that runs low end hardwares where it need SW based Tx buffer
> > recycling(The NPU is different story as it has HW assisted mempool
> > manager).
> 
> Maybe, but I don't see other PMD which use those flags. Do you aware to any plans to add such optimizations?

Sorry. I can't comment on another vendor PMD roadmap.

> You are pushing for generic API which is currently used only by a single entity. 

You are removing a existing generic flag.

> 
> > What we are loosing by running DPDK effectively on low end hardware with
> > such "on demand" runtime configuration though DPDK normative API.
> 
> Complexity of APIs for applications. More structs on ethdev, more API definitions, more field to be configured by application, all valid for a single PMD. 
> For the rest of the PMDs, those fields are currently don't-care. 

I don't understand the application complexly port. It just configuration at
port level. And it is at application will, it can choose to run in any mode.
BTW, It is all boils down to features and performance/watt.
IMO, everything should be runtime configurable.

> 
> > 
> > 
> > >
> > >
> > >
> > >
> > >
  
Shahaf Shuler Sept. 12, 2017, 8:03 a.m. UTC | #20
Tuesday, September 12, 2017 10:18 AM, Jerin Jacob:
> > Tuesday, September 12, 2017 8:52 AM, Jerin Jacob:
> > > > I understand the use case, and the fact those flags improve the
> > > performance on low-end ARM CPUs.
> > > > IMO those flags cannot be on queue/port level. They must be global.
> > >
> > > Where should we have it as global(in terms of API)?
> > > And why it can not be at port level?
> >
> > Because I don't think there is a use-case that application would want to
> have recounting on one port and not on the other. It is either application
> clone/not clone mbufs.
> > Same about the multi mempool. It is either application have it or not.
> 
> Why not? If a port is given to data plane and another port to control plane. It
> can have different characteristics.
> 
> Making it port level, we can achieve the global use case as well. but not
> another way around.
> 
> MULTISEG flag also has the same attribute. But some reason you are OK to
> include that in flags.
> 
> >
> > If there is a strong use-case for application to say on port X it clones mbufs
> and and port Y it don't then maybe this is enough to have it per-port.
> > We can go even further - why not to have guarantee per queue? it is
> possible if application is willing to manage.
> >
> > Again those are not offloads, therefore if we expose those this should on
> different location the offloads field on eth conf.
> 
> What is the definition of offload? It is something we can offload to HW.
> If so, then, reference count we can offload to HW with external HW pool
> manager which DPDK has support now.

OK, well understood the requirement for such flags. Thanks for your replies.

I think that for simplicity I will add two more flags on the Tx offloads capabilities:

DEV_TX_OFFLOADS _MULTI_MEMPOOL <** Device supports transmission of mbufs from multiple mempools. */
DEV_TX_OFFLOADS_INDIRECT_MBUFS <** Device support transmission of indirect mbufs. */

Those caps can be reported by the PMD as per-port/per-queue offloads. Application will choose how to set those. When not set - PMD can assume all mbufs has ref_cnt = 1 and the same mempool.

Any objection?
  
Andrew Rybchenko Sept. 12, 2017, 10:27 a.m. UTC | #21
On 09/12/2017 11:03 AM, Shahaf Shuler wrote:
> OK, well understood the requirement for such flags. Thanks for your replies.
>
> I think that for simplicity I will add two more flags on the Tx offloads capabilities:
>
> DEV_TX_OFFLOADS _MULTI_MEMPOOL <** Device supports transmission of mbufs from multiple mempools. */
> DEV_TX_OFFLOADS_INDIRECT_MBUFS <** Device support transmission of indirect mbufs. */

Indirect mbufs is just an example when reference counters are required.
Direct mbufs may use reference counters as well.

> Those caps can be reported by the PMD as per-port/per-queue offloads. Application will choose how to set those. When not set - PMD can assume all mbufs has ref_cnt = 1 and the same mempool.
>
> Any objection?
  
Ananyev, Konstantin Sept. 12, 2017, 2:26 p.m. UTC | #22
> -----Original Message-----

> From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]

> Sent: Tuesday, September 12, 2017 11:28 AM

> To: Shahaf Shuler <shahafs@mellanox.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>

> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Thomas Monjalon

> <thomas@monjalon.net>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>

> Subject: Re: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads API

> 

> On 09/12/2017 11:03 AM, Shahaf Shuler wrote:

> > OK, well understood the requirement for such flags. Thanks for your replies.

> >

> > I think that for simplicity I will add two more flags on the Tx offloads capabilities:

> >

> > DEV_TX_OFFLOADS _MULTI_MEMPOOL <** Device supports transmission of mbufs from multiple mempools. */

> > DEV_TX_OFFLOADS_INDIRECT_MBUFS <** Device support transmission of indirect mbufs. */

> 

> Indirect mbufs is just an example when reference counters are required.

> Direct mbufs may use reference counters as well.


Personally, I still in favor to move these 2 flags away from TX_OFFLOADS.
But if people think it would be really helpfull to keep them, should we have then:
DEV_TX_OFFLOADS_FAST_FREE (or whatever then name will be) - 
it would mean the same what (NOMULTIMEMP | NOREFCOUNT) means now.
?
Konstsantin

> 

> > Those caps can be reported by the PMD as per-port/per-queue offloads. Application will choose how to set those. When not set - PMD

> can assume all mbufs has ref_cnt = 1 and the same mempool.

> >

> > Any objection?

>
  
Jerin Jacob Sept. 12, 2017, 2:36 p.m. UTC | #23
-----Original Message-----
> Date: Tue, 12 Sep 2017 14:26:38 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Andrew Rybchenko <arybchenko@solarflare.com>, Shahaf Shuler
>  <shahafs@mellanox.com>, Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon
>  <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>, "Zhang, Helin"
>  <helin.zhang@intel.com>, "Wu, Jingjing" <jingjing.wu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>  API
> 
> 
> 
> > -----Original Message-----
> > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > Sent: Tuesday, September 12, 2017 11:28 AM
> > To: Shahaf Shuler <shahafs@mellanox.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Thomas Monjalon
> > <thomas@monjalon.net>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads API
> > 
> > On 09/12/2017 11:03 AM, Shahaf Shuler wrote:
> > > OK, well understood the requirement for such flags. Thanks for your replies.
> > >
> > > I think that for simplicity I will add two more flags on the Tx offloads capabilities:
> > >
> > > DEV_TX_OFFLOADS _MULTI_MEMPOOL <** Device supports transmission of mbufs from multiple mempools. */
> > > DEV_TX_OFFLOADS_INDIRECT_MBUFS <** Device support transmission of indirect mbufs. */
> > 
> > Indirect mbufs is just an example when reference counters are required.
> > Direct mbufs may use reference counters as well.
> 
> Personally, I still in favor to move these 2 flags away from TX_OFFLOADS.
> But if people think it would be really helpfull to keep them, should we have then:
> DEV_TX_OFFLOADS_FAST_FREE (or whatever then name will be) - 
> it would mean the same what (NOMULTIMEMP | NOREFCOUNT) means now.

I am not too concerned about name. Yes. it should mean exiting (NOMULTIMEMP |
NOREFCOUNT)


> ?
> Konstsantin
> 
> > 
> > > Those caps can be reported by the PMD as per-port/per-queue offloads. Application will choose how to set those. When not set - PMD
> > can assume all mbufs has ref_cnt = 1 and the same mempool.
> > >
> > > Any objection?
> > 
>
  
Andrew Rybchenko Sept. 12, 2017, 2:43 p.m. UTC | #24
On 09/12/2017 05:36 PM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Tue, 12 Sep 2017 14:26:38 +0000
>> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
>> To: Andrew Rybchenko <arybchenko@solarflare.com>, Shahaf Shuler
>>   <shahafs@mellanox.com>, Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: Stephen Hemminger <stephen@networkplumber.org>, Thomas Monjalon
>>   <thomas@monjalon.net>, "dev@dpdk.org" <dev@dpdk.org>, "Zhang, Helin"
>>   <helin.zhang@intel.com>, "Wu, Jingjing" <jingjing.wu@intel.com>
>> Subject: RE: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads
>>   API
>>
>>
>>
>>> -----Original Message-----
>>> From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
>>> Sent: Tuesday, September 12, 2017 11:28 AM
>>> To: Shahaf Shuler <shahafs@mellanox.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Stephen Hemminger <stephen@networkplumber.org>; Thomas Monjalon
>>> <thomas@monjalon.net>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>>> Subject: Re: [dpdk-dev] [PATCH v2 2/2] ethdev: introduce Tx queue offloads API
>>>
>>> On 09/12/2017 11:03 AM, Shahaf Shuler wrote:
>>>> OK, well understood the requirement for such flags. Thanks for your replies.
>>>>
>>>> I think that for simplicity I will add two more flags on the Tx offloads capabilities:
>>>>
>>>> DEV_TX_OFFLOADS _MULTI_MEMPOOL <** Device supports transmission of mbufs from multiple mempools. */
>>>> DEV_TX_OFFLOADS_INDIRECT_MBUFS <** Device support transmission of indirect mbufs. */
>>> Indirect mbufs is just an example when reference counters are required.
>>> Direct mbufs may use reference counters as well.
>> Personally, I still in favor to move these 2 flags away from TX_OFFLOADS.
>> But if people think it would be really helpfull to keep them, should we have then:
>> DEV_TX_OFFLOADS_FAST_FREE (or whatever then name will be) -
>> it would mean the same what (NOMULTIMEMP | NOREFCOUNT) means now.
> I am not too concerned about name. Yes. it should mean exiting (NOMULTIMEMP |
> NOREFCOUNT)

Merging these two flags together is OK for me as well.
  

Patch

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index f2c8497c2..bb25a1cee 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -131,6 +131,7 @@  Lock-free Tx queue
 If a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads can
 invoke rte_eth_tx_burst() concurrently on the same Tx queue without SW lock.
 
+* **[uses]    rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_MT_LOCKFREE``.
 * **[provides] rte_eth_dev_info**: ``tx_offload_capa:DEV_TX_OFFLOAD_MT_LOCKFREE``.
 * **[related]  API**: ``rte_eth_tx_burst()``.
 
@@ -220,6 +221,7 @@  TSO
 
 Supports TCP Segmentation Offloading.
 
+* **[uses]       rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_TCP_TSO``.
 * **[uses]       rte_eth_desc_lim**: ``nb_seg_max``, ``nb_mtu_seg_max``.
 * **[uses]       mbuf**: ``mbuf.ol_flags:PKT_TX_TCP_SEG``.
 * **[uses]       mbuf**: ``mbuf.tso_segsz``, ``mbuf.l2_len``, ``mbuf.l3_len``, ``mbuf.l4_len``.
@@ -510,6 +512,7 @@  VLAN offload
 Supports VLAN offload to hardware.
 
 * **[uses]       rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_VLAN_STRIP,DEV_RX_OFFLOAD_VLAN_FILTER,DEV_RX_OFFLOAD_VLAN_EXTEND``.
+* **[uses]       rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_VLAN_INSERT``.
 * **[implements] eth_dev_ops**: ``vlan_offload_set``.
 * **[provides]   mbuf**: ``mbuf.ol_flags:PKT_RX_VLAN_STRIPPED``, ``mbuf.vlan_tci``.
 * **[provides]   rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_VLAN_STRIP``,
@@ -526,6 +529,7 @@  QinQ offload
 Supports QinQ (queue in queue) offload.
 
 * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_QINQ_STRIP``.
+* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_QINQ_INSERT``.
 * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_QINQ_PKT``.
 * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_QINQ_STRIPPED``, ``mbuf.vlan_tci``,
    ``mbuf.vlan_tci_outer``.
@@ -541,6 +545,7 @@  L3 checksum offload
 Supports L3 checksum offload.
 
 * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_IPV4_CKSUM``.
+* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_IPV4_CKSUM``.
 * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_IP_CKSUM``,
   ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``.
 * **[provides] mbuf**: ``mbuf.ol_flags:PKT_RX_IP_CKSUM_UNKNOWN`` |
@@ -558,6 +563,7 @@  L4 checksum offload
 Supports L4 checksum offload.
 
 * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_UDP_CKSUM,DEV_RX_OFFLOAD_TCP_CKSUM``.
+* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_UDP_CKSUM,DEV_TX_OFFLOAD_TCP_CKSUM,DEV_TX_OFFLOAD_SCTP_CKSUM``.
 * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``,
   ``mbuf.ol_flags:PKT_TX_L4_NO_CKSUM`` | ``PKT_TX_TCP_CKSUM`` |
   ``PKT_TX_SCTP_CKSUM`` | ``PKT_TX_UDP_CKSUM``.
@@ -576,6 +582,7 @@  MACsec offload
 Supports MACsec.
 
 * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_MACSEC_STRIP``.
+* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_MACSEC_INSERT``.
 * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_MACSEC``.
 * **[provides] rte_eth_dev_info**: ``rx_offload_capa:DEV_RX_OFFLOAD_MACSEC_STRIP``,
   ``tx_offload_capa:DEV_TX_OFFLOAD_MACSEC_INSERT``.
@@ -589,6 +596,7 @@  Inner L3 checksum
 Supports inner packet L3 checksum.
 
 * **[uses]     rte_eth_rxq_conf**: ``offloads:DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM``.
+* **[uses]     rte_eth_txq_conf**: ``offloads:DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM``.
 * **[uses]     mbuf**: ``mbuf.ol_flags:PKT_TX_IP_CKSUM``,
   ``mbuf.ol_flags:PKT_TX_IPV4`` | ``PKT_TX_IPV6``,
   ``mbuf.ol_flags:PKT_TX_OUTER_IP_CKSUM``,
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index b3c10701e..cd79cb1c9 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1186,6 +1186,50 @@  rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id,
 	return ret;
 }
 
+/**
+ * A conversion function from txq_flags API.
+ */
+static void
+rte_eth_convert_txq_flags(const uint32_t txq_flags, uint64_t *tx_offloads)
+{
+	uint64_t offloads = 0;
+
+	if (!(txq_flags & ETH_TXQ_FLAGS_NOMULTSEGS))
+		offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
+	if (!(txq_flags & ETH_TXQ_FLAGS_NOVLANOFFL))
+		offloads |= DEV_TX_OFFLOAD_VLAN_INSERT;
+	if (!(txq_flags & ETH_TXQ_FLAGS_NOXSUMSCTP))
+		offloads |= DEV_TX_OFFLOAD_SCTP_CKSUM;
+	if (!(txq_flags & ETH_TXQ_FLAGS_NOXSUMUDP))
+		offloads |= DEV_TX_OFFLOAD_UDP_CKSUM;
+	if (!(txq_flags & ETH_TXQ_FLAGS_NOXSUMTCP))
+		offloads |= DEV_TX_OFFLOAD_TCP_CKSUM;
+
+	*tx_offloads = offloads;
+}
+
+/**
+ * A conversion function from offloads API.
+ */
+static void
+rte_eth_convert_txq_offloads(const uint64_t tx_offloads, uint32_t *txq_flags)
+{
+	uint32_t flags = 0;
+
+	if (!(tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS))
+		flags |= ETH_TXQ_FLAGS_NOMULTSEGS;
+	if (!(tx_offloads & DEV_TX_OFFLOAD_VLAN_INSERT))
+		flags |= ETH_TXQ_FLAGS_NOVLANOFFL;
+	if (!(tx_offloads & DEV_TX_OFFLOAD_SCTP_CKSUM))
+		flags |= ETH_TXQ_FLAGS_NOXSUMSCTP;
+	if (!(tx_offloads & DEV_TX_OFFLOAD_UDP_CKSUM))
+		flags |= ETH_TXQ_FLAGS_NOXSUMUDP;
+	if (!(tx_offloads & DEV_TX_OFFLOAD_TCP_CKSUM))
+		flags |= ETH_TXQ_FLAGS_NOXSUMTCP;
+
+	*txq_flags = flags;
+}
+
 int
 rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
 		       uint16_t nb_tx_desc, unsigned int socket_id,
@@ -1193,6 +1237,7 @@  rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
 {
 	struct rte_eth_dev *dev;
 	struct rte_eth_dev_info dev_info;
+	struct rte_eth_txconf local_conf;
 	void **txq;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -1237,8 +1282,20 @@  rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id,
 	if (tx_conf == NULL)
 		tx_conf = &dev_info.default_txconf;
 
+	/*
+	 * Convert between the offloads API to enable PMDs to support
+	 * only one of them.
+	 */
+	local_conf = *tx_conf;
+	if (tx_conf->txq_flags & ETH_TXQ_FLAGS_IGNORE)
+		rte_eth_convert_txq_offloads(tx_conf->offloads,
+					     &local_conf.txq_flags);
+	else
+		rte_eth_convert_txq_flags(tx_conf->txq_flags,
+					  &local_conf.offloads);
+
 	return (*dev->dev_ops->tx_queue_setup)(dev, tx_queue_id, nb_tx_desc,
-					       socket_id, tx_conf);
+					       socket_id, &local_conf);
 }
 
 void
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f424cba04..4ad7dd059 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -692,6 +692,12 @@  struct rte_eth_vmdq_rx_conf {
  */
 struct rte_eth_txmode {
 	enum rte_eth_tx_mq_mode mq_mode; /**< TX multi-queues mode. */
+	uint64_t offloads;
+	/**
+	 * Per-port Tx offloads to be set using DEV_TX_OFFLOAD_* flags.
+	 * Only offloads set on tx_offload_capa field on rte_eth_dev_info
+	 * structure are allowed to be set.
+	 */
 
 	/* For i40e specifically */
 	uint16_t pvid;
@@ -733,6 +739,14 @@  struct rte_eth_rxconf {
 #define ETH_TXQ_FLAGS_NOXSUMS \
 		(ETH_TXQ_FLAGS_NOXSUMSCTP | ETH_TXQ_FLAGS_NOXSUMUDP | \
 		 ETH_TXQ_FLAGS_NOXSUMTCP)
+#define ETH_TXQ_FLAGS_IGNORE	0x8000
+	/**
+	 * When set the txq_flags should be ignored,
+	 * instead per-queue Tx offloads will be set on offloads field
+	 * located on rte_eth_txq_conf struct.
+	 * This flag is temporary till the rte_eth_txq_conf.txq_flags
+	 * API will be deprecated.
+	 */
 
 /**
  * A structure used to configure a TX ring of an Ethernet port.
@@ -745,6 +759,12 @@  struct rte_eth_txconf {
 
 	uint32_t txq_flags; /**< Set flags for the Tx queue */
 	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
+	uint64_t offloads;
+	/**
+	 * Per-queue Tx offloads to be set  using DEV_TX_OFFLOAD_* flags.
+	 * Only offloads set on tx_queue_offload_capa field on rte_eth_dev_info
+	 * structure are allowed to be set.
+	 */
 };
 
 /**
@@ -969,6 +989,8 @@  struct rte_eth_conf {
 /**< Multiple threads can invoke rte_eth_tx_burst() concurrently on the same
  * tx queue without SW lock.
  */
+#define DEV_TX_OFFLOAD_MULTI_SEGS	0x00008000
+/**< multi segment send is supported. */
 
 struct rte_pci_device;
 
@@ -991,9 +1013,12 @@  struct rte_eth_dev_info {
 	uint16_t max_vmdq_pools; /**< Maximum number of VMDq pools. */
 	uint64_t rx_offload_capa;
 	/**< Device per port RX offload capabilities. */
-	uint32_t tx_offload_capa; /**< Device TX offload capabilities. */
+	uint64_t tx_offload_capa;
+	/**< Device per port TX offload capabilities. */
 	uint64_t rx_queue_offload_capa;
 	/**< Device per queue RX offload capabilities. */
+	uint64_t tx_queue_offload_capa;
+	/**< Device per queue TX offload capabilities. */
 	uint16_t reta_size;
 	/**< Device redirection table size, the total number of entries. */
 	uint8_t hash_key_size; /**< Hash key size in bytes */
@@ -2024,6 +2049,11 @@  int rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id,
  *   - The *txq_flags* member contains flags to pass to the TX queue setup
  *     function to configure the behavior of the TX queue. This should be set
  *     to 0 if no special configuration is required.
+ *     This API is obsolete and will be deprecated. Applications
+ *     should set it to ETH_TXQ_FLAGS_IGNORE and use
+ *     the offloads field below.
+ *   - The *offloads* member contains Tx offloads to be enabled.
+ *     Offloads which are not set cannot be used on the datapath.
  *
  *     Note that setting *tx_free_thresh* or *tx_rs_thresh* value to 0 forces
  *     the transmit function to use default values.