[dpdk-dev,v3] ethdev: introduce lock-free txq capability flag

Message ID 20170710165946.31080-1-jerin.jacob@caviumnetworks.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Jerin Jacob July 10, 2017, 4:59 p.m. UTC
  Introducing the DEV_TX_OFFLOAD_MT_LOCKFREE TX capability flag.
if a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads
can invoke rte_eth_tx_burst() concurrently on the same tx queue without
SW lock. This PMD feature will be useful in the following use cases and
found in the OCTEON family of NPUs.

1) Remove explicit spinlock in some applications where lcores
to TX queues are not mapped 1:1.
example: OVS has such instance
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L299
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L1859
See the the usage of tx_lock spinlock.

2) In the eventdev use case, avoid dedicating a separate TX core for
transmitting and thus enables more scaling as all workers can
send the packets.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v3:
- Addressed Thomas's documentation review comments
http://dpdk.org/ml/archives/dev/2017-July/070672.html

v2:
- Changed the flag name to DEV_TX_OFFLOAD_MT_LOCKFREE(Thomas)
- Updated the documentation in doc/guides/prog_guide/poll_mode_drv.rst
and rte_eth_tx_burst() doxgen comments(Thomas)
---
 doc/guides/prog_guide/poll_mode_drv.rst | 15 +++++++++++++--
 lib/librte_ether/rte_ethdev.h           |  8 ++++++++
 2 files changed, 21 insertions(+), 2 deletions(-)
  

Comments

Hemant Agrawal July 13, 2017, 12:02 p.m. UTC | #1
On 7/10/2017 10:29 PM, Jerin Jacob wrote:
> Introducing the DEV_TX_OFFLOAD_MT_LOCKFREE TX capability flag.
> if a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads
> can invoke rte_eth_tx_burst() concurrently on the same tx queue without
> SW lock. This PMD feature will be useful in the following use cases and
> found in the OCTEON family of NPUs.
>
> 1) Remove explicit spinlock in some applications where lcores
> to TX queues are not mapped 1:1.
> example: OVS has such instance
> https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L299
> https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L1859
> See the the usage of tx_lock spinlock.
>
> 2) In the eventdev use case, avoid dedicating a separate TX core for
> transmitting and thus enables more scaling as all workers can
> send the packets.
>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v3:
> - Addressed Thomas's documentation review comments
> http://dpdk.org/ml/archives/dev/2017-July/070672.html
>
> v2:
> - Changed the flag name to DEV_TX_OFFLOAD_MT_LOCKFREE(Thomas)
> - Updated the documentation in doc/guides/prog_guide/poll_mode_drv.rst
> and rte_eth_tx_burst() doxgen comments(Thomas)
> ---
>  doc/guides/prog_guide/poll_mode_drv.rst | 15 +++++++++++++--
>  lib/librte_ether/rte_ethdev.h           |  8 ++++++++
>  2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index 4987f70a1..1ac8f7ebf 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -84,7 +84,7 @@ Whenever needed and appropriate, asynchronous communication should be introduced
>
>  Avoiding lock contention is a key issue in a multi-core environment.
>  To address this issue, PMDs are designed to work with per-core private resources as much as possible.
> -For example, a PMD maintains a separate transmit queue per-core, per-port.
> +For example, a PMD maintains a separate transmit queue per-core, per-port, if the PMD is not ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable.
>  In the same way, every receive queue of a port is assigned to and polled by a single logical core (lcore).
>
>  To comply with Non-Uniform Memory Access (NUMA), memory management is designed to assign to each logical core
> @@ -146,6 +146,16 @@ This is also true for the pipe-line model provided all logical cores used are lo
>
>  Multiple logical cores should never share receive or transmit queues for interfaces since this would require global locks and hinder performance.
>
> +If the PMD is ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable, multiple threads can invoke ``rte_eth_tx_burst()``
> +concurrently on the same tx queue without SW lock. This PMD feature found in some NICs and useful in the following use cases:
> +
> +*  Remove explicit spinlock in some applications where lcores are not mapped to Tx queues with 1:1 relation.
> +
> +*  In the eventdev use case, avoid dedicating a separate TX core for transmitting and thus
> +   enables more scaling as all workers can send the packets.
> +
> +See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing details.
> +
>  Device Identification and Configuration
>  ---------------------------------------
>
> @@ -290,7 +300,8 @@ Hardware Offload
>
>  Depending on driver capabilities advertised by
>  ``rte_eth_dev_info_get()``, the PMD may support hardware offloading
> -feature like checksumming, TCP segmentation or VLAN insertion.
> +feature like checksumming, TCP segmentation, VLAN insertion or
> +lockfree multithreaded TX burst on the same TX queue.
>
>  The support of these offload features implies the addition of dedicated
>  status bit(s) and value field(s) into the rte_mbuf data structure, along
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index fd6baf37a..11fe13a07 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -927,6 +927,10 @@ struct rte_eth_conf {
>  #define DEV_TX_OFFLOAD_IPIP_TNL_TSO     0x00000800    /**< Used for tunneling packet. */
>  #define DEV_TX_OFFLOAD_GENEVE_TNL_TSO   0x00001000    /**< Used for tunneling packet. */
>  #define DEV_TX_OFFLOAD_MACSEC_INSERT    0x00002000
> +#define DEV_TX_OFFLOAD_MT_LOCKFREE      0x00004000
> +/**< Multiple threads can invoke rte_eth_tx_burst() concurrently on the same
> + * tx queue without SW lock.
> + */
>
>  struct rte_pci_device;
>
> @@ -2961,6 +2965,10 @@ static inline int rte_eth_tx_descriptor_status(uint8_t port_id,
>   * rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf*  buffers
>   * of those packets whose transmission was effectively completed.
>   *
> + * If the PMD is DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads can
> + * invoke this function concurrently on the same tx queue without SW lock.
> + * @see rte_eth_dev_info_get, struct rte_eth_txconf::txq_flags
> + *
>   * @param port_id
>   *   The port identifier of the Ethernet device.
>   * @param queue_id
>
You may also like to add this capability in "doc/guides/nics/features/*.ini"

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
  
Santosh Shukla July 13, 2017, 6:42 p.m. UTC | #2
On Monday 10 July 2017 10:29 PM, Jerin Jacob wrote:

> Introducing the DEV_TX_OFFLOAD_MT_LOCKFREE TX capability flag.
> if a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads
> can invoke rte_eth_tx_burst() concurrently on the same tx queue without
> SW lock. This PMD feature will be useful in the following use cases and
> found in the OCTEON family of NPUs.
>
> 1) Remove explicit spinlock in some applications where lcores
> to TX queues are not mapped 1:1.
> example: OVS has such instance
> https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L299
> https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L1859
> See the the usage of tx_lock spinlock.
>
> 2) In the eventdev use case, avoid dedicating a separate TX core for
> transmitting and thus enables more scaling as all workers can
> send the packets.
>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v3:
> - Addressed Thomas's documentation review comments
> http://dpdk.org/ml/archives/dev/2017-July/070672.html
>
> v2:
> - Changed the flag name to DEV_TX_OFFLOAD_MT_LOCKFREE(Thomas)
> - Updated the documentation in doc/guides/prog_guide/poll_mode_drv.rst
> and rte_eth_tx_burst() doxgen comments(Thomas)
> ---
>  doc/guides/prog_guide/poll_mode_drv.rst | 15 +++++++++++++--
>  lib/librte_ether/rte_ethdev.h           |  8 ++++++++
>  2 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index 4987f70a1..1ac8f7ebf 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -84,7 +84,7 @@ Whenever needed and appropriate, asynchronous communication should be introduced
>  
>  Avoiding lock contention is a key issue in a multi-core environment.
>  To address this issue, PMDs are designed to work with per-core private resources as much as possible.
> -For example, a PMD maintains a separate transmit queue per-core, per-port.
> +For example, a PMD maintains a separate transmit queue per-core, per-port, if the PMD is not ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable.
>  In the same way, every receive queue of a port is assigned to and polled by a single logical core (lcore).
>  
>  To comply with Non-Uniform Memory Access (NUMA), memory management is designed to assign to each logical core
> @@ -146,6 +146,16 @@ This is also true for the pipe-line model provided all logical cores used are lo
>  
>  Multiple logical cores should never share receive or transmit queues for interfaces since this would require global locks and hinder performance.
>  
> +If the PMD is ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable, multiple threads can invoke ``rte_eth_tx_burst()``
> +concurrently on the same tx queue without SW lock. This PMD feature found in some NICs and useful in the following use cases:
> +
> +*  Remove explicit spinlock in some applications where lcores are not mapped to Tx queues with 1:1 relation.
> +
> +*  In the eventdev use case, avoid dedicating a separate TX core for transmitting and thus
> +   enables more scaling as all workers can send the packets.
> +
> +See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing details.
> +
>  Device Identification and Configuration
>  ---------------------------------------
>  
> @@ -290,7 +300,8 @@ Hardware Offload
>  
>  Depending on driver capabilities advertised by
>  ``rte_eth_dev_info_get()``, the PMD may support hardware offloading
> -feature like checksumming, TCP segmentation or VLAN insertion.
> +feature like checksumming, TCP segmentation, VLAN insertion or
> +lockfree multithreaded TX burst on the same TX queue.
>  
>  The support of these offload features implies the addition of dedicated
>  status bit(s) and value field(s) into the rte_mbuf data structure, along
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index fd6baf37a..11fe13a07 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -927,6 +927,10 @@ struct rte_eth_conf {
>  #define DEV_TX_OFFLOAD_IPIP_TNL_TSO     0x00000800    /**< Used for tunneling packet. */
>  #define DEV_TX_OFFLOAD_GENEVE_TNL_TSO   0x00001000    /**< Used for tunneling packet. */
>  #define DEV_TX_OFFLOAD_MACSEC_INSERT    0x00002000
> +#define DEV_TX_OFFLOAD_MT_LOCKFREE      0x00004000
> +/**< Multiple threads can invoke rte_eth_tx_burst() concurrently on the same
> + * tx queue without SW lock.
> + */
>  
>  struct rte_pci_device;
>  
> @@ -2961,6 +2965,10 @@ static inline int rte_eth_tx_descriptor_status(uint8_t port_id,
>   * rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf*  buffers
>   * of those packets whose transmission was effectively completed.
>   *
> + * If the PMD is DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads can
> + * invoke this function concurrently on the same tx queue without SW lock.
> + * @see rte_eth_dev_info_get, struct rte_eth_txconf::txq_flags
> + *
>   * @param port_id
>   *   The port identifier of the Ethernet device.
>   * @param queue_id

Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
  
Thomas Monjalon July 18, 2017, 1:43 p.m. UTC | #3
13/07/2017 15:02, Hemant Agrawal:
> On 7/10/2017 10:29 PM, Jerin Jacob wrote:
> > Introducing the DEV_TX_OFFLOAD_MT_LOCKFREE TX capability flag.
> > if a PMD advertises DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads
> > can invoke rte_eth_tx_burst() concurrently on the same tx queue without
> > SW lock. This PMD feature will be useful in the following use cases and
> > found in the OCTEON family of NPUs.
> >
> > 1) Remove explicit spinlock in some applications where lcores
> > to TX queues are not mapped 1:1.
> > example: OVS has such instance
> > https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L299
> > https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L1859
> > See the the usage of tx_lock spinlock.
> >
> > 2) In the eventdev use case, avoid dedicating a separate TX core for
> > transmitting and thus enables more scaling as all workers can
> > send the packets.
> >
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> 
> You may also like to add this capability in "doc/guides/nics/features/*.ini"

I've just added the feature "Lock-free Tx queue" in features/default.ini.

> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

Applied, thanks
  

Patch

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index 4987f70a1..1ac8f7ebf 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -84,7 +84,7 @@  Whenever needed and appropriate, asynchronous communication should be introduced
 
 Avoiding lock contention is a key issue in a multi-core environment.
 To address this issue, PMDs are designed to work with per-core private resources as much as possible.
-For example, a PMD maintains a separate transmit queue per-core, per-port.
+For example, a PMD maintains a separate transmit queue per-core, per-port, if the PMD is not ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable.
 In the same way, every receive queue of a port is assigned to and polled by a single logical core (lcore).
 
 To comply with Non-Uniform Memory Access (NUMA), memory management is designed to assign to each logical core
@@ -146,6 +146,16 @@  This is also true for the pipe-line model provided all logical cores used are lo
 
 Multiple logical cores should never share receive or transmit queues for interfaces since this would require global locks and hinder performance.
 
+If the PMD is ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capable, multiple threads can invoke ``rte_eth_tx_burst()``
+concurrently on the same tx queue without SW lock. This PMD feature found in some NICs and useful in the following use cases:
+
+*  Remove explicit spinlock in some applications where lcores are not mapped to Tx queues with 1:1 relation.
+
+*  In the eventdev use case, avoid dedicating a separate TX core for transmitting and thus
+   enables more scaling as all workers can send the packets.
+
+See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing details.
+
 Device Identification and Configuration
 ---------------------------------------
 
@@ -290,7 +300,8 @@  Hardware Offload
 
 Depending on driver capabilities advertised by
 ``rte_eth_dev_info_get()``, the PMD may support hardware offloading
-feature like checksumming, TCP segmentation or VLAN insertion.
+feature like checksumming, TCP segmentation, VLAN insertion or
+lockfree multithreaded TX burst on the same TX queue.
 
 The support of these offload features implies the addition of dedicated
 status bit(s) and value field(s) into the rte_mbuf data structure, along
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fd6baf37a..11fe13a07 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -927,6 +927,10 @@  struct rte_eth_conf {
 #define DEV_TX_OFFLOAD_IPIP_TNL_TSO     0x00000800    /**< Used for tunneling packet. */
 #define DEV_TX_OFFLOAD_GENEVE_TNL_TSO   0x00001000    /**< Used for tunneling packet. */
 #define DEV_TX_OFFLOAD_MACSEC_INSERT    0x00002000
+#define DEV_TX_OFFLOAD_MT_LOCKFREE      0x00004000
+/**< Multiple threads can invoke rte_eth_tx_burst() concurrently on the same
+ * tx queue without SW lock.
+ */
 
 struct rte_pci_device;
 
@@ -2961,6 +2965,10 @@  static inline int rte_eth_tx_descriptor_status(uint8_t port_id,
  * rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf*  buffers
  * of those packets whose transmission was effectively completed.
  *
+ * If the PMD is DEV_TX_OFFLOAD_MT_LOCKFREE capable, multiple threads can
+ * invoke this function concurrently on the same tx queue without SW lock.
+ * @see rte_eth_dev_info_get, struct rte_eth_txconf::txq_flags
+ *
  * @param port_id
  *   The port identifier of the Ethernet device.
  * @param queue_id