mbox series

[v2,0/2] net/mlx5: add vectorized mprq

Message ID 20201021203030.19042-1-akozyrev@nvidia.com (mailing list archive)
Headers
Series net/mlx5: add vectorized mprq |

Message

Alexander Kozyrev Oct. 21, 2020, 8:30 p.m. UTC
  The vectorized Rx burst function helps to accelerate the Rx processing
by using SIMD (single instruction, multiple data) extensions for the
multi-buffer packet processing. Pre-allocating multiple mbufs and
filling them in batches of four greatly improves the throughput of the
Rx burst routine.

MPRQ (Multi-Packet Rx Queue) lacks the vectorized version currently.
It works by posting a single large buffer (consisted of  multiple
fixed-size strides) in order to receive multiple packets at once on this
buffer. A Rx packet is then copied to a user-provided mbuf or PMD
attaches the Rx packet to the mbuf by the pointer to an external buffer.

It is proposed to add a vectorized MPRQ Rx routine to speed up the MPRQ
buffer handling as well. It would require pre-allocation of multiple
mbufs every time we exhaust all the strides from the current MPRQ buffer
and switch to a new one. The new mlx5_rx_burst_mprq_vec() routine will
take care of this as well as of decision on whether should we copy or
attach an external buffer for a packet. The batch processing logic won't
be different from the simple vectorized Rx routine.

The new vectorized MPRQ burst function is going to be selected
automatically whenever the mprq_en devarg is specified. If SIMD is not
available on the platform we fall back to the simple MPRQ Rx burst
function. LRO is not supported by the vectorized MPRQ version and fall
back to the regular MPRQ will be performed.


Alexander Kozyrev (2):
  net/mlx5: refactor vectorized Rx routine
  net/mlx5: implement vectorized MPRQ burst

 drivers/net/mlx5/mlx5_devx.c             |  15 +-
 drivers/net/mlx5/mlx5_ethdev.c           |  20 +-
 drivers/net/mlx5/mlx5_rxq.c              |  96 +++---
 drivers/net/mlx5/mlx5_rxtx.c             | 237 ++++---------
 drivers/net/mlx5/mlx5_rxtx.h             | 200 ++++++++++-
 drivers/net/mlx5/mlx5_rxtx_vec.c         | 416 ++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_rxtx_vec.h         |  55 ---
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 106 ++----
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h    | 103 ++----
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h     | 121 ++-----
 10 files changed, 813 insertions(+), 556 deletions(-)
  

Comments

Raslan Darawsheh Oct. 22, 2020, 3:01 p.m. UTC | #1
Hi,

> -----Original Message-----
> From: Alexander Kozyrev <akozyrev@nvidia.com>
> Sent: Wednesday, October 21, 2020 11:30 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Subject: [PATCH v2 0/2] net/mlx5: add vectorized mprq
> 
> The vectorized Rx burst function helps to accelerate the Rx processing
> by using SIMD (single instruction, multiple data) extensions for the
> multi-buffer packet processing. Pre-allocating multiple mbufs and
> filling them in batches of four greatly improves the throughput of the
> Rx burst routine.
> 
> MPRQ (Multi-Packet Rx Queue) lacks the vectorized version currently.
> It works by posting a single large buffer (consisted of  multiple
> fixed-size strides) in order to receive multiple packets at once on this
> buffer. A Rx packet is then copied to a user-provided mbuf or PMD
> attaches the Rx packet to the mbuf by the pointer to an external buffer.
> 
> It is proposed to add a vectorized MPRQ Rx routine to speed up the MPRQ
> buffer handling as well. It would require pre-allocation of multiple
> mbufs every time we exhaust all the strides from the current MPRQ buffer
> and switch to a new one. The new mlx5_rx_burst_mprq_vec() routine will
> take care of this as well as of decision on whether should we copy or
> attach an external buffer for a packet. The batch processing logic won't
> be different from the simple vectorized Rx routine.
> 
> The new vectorized MPRQ burst function is going to be selected
> automatically whenever the mprq_en devarg is specified. If SIMD is not
> available on the platform we fall back to the simple MPRQ Rx burst
> function. LRO is not supported by the vectorized MPRQ version and fall
> back to the regular MPRQ will be performed.
> 
> 
> Alexander Kozyrev (2):
>   net/mlx5: refactor vectorized Rx routine
>   net/mlx5: implement vectorized MPRQ burst
> 
>  drivers/net/mlx5/mlx5_devx.c             |  15 +-
>  drivers/net/mlx5/mlx5_ethdev.c           |  20 +-
>  drivers/net/mlx5/mlx5_rxq.c              |  96 +++---
>  drivers/net/mlx5/mlx5_rxtx.c             | 237 ++++---------
>  drivers/net/mlx5/mlx5_rxtx.h             | 200 ++++++++++-
>  drivers/net/mlx5/mlx5_rxtx_vec.c         | 416 ++++++++++++++++++++++-
>  drivers/net/mlx5/mlx5_rxtx_vec.h         |  55 ---
>  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 106 ++----
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h    | 103 ++----
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h     | 121 ++-----
>  10 files changed, 813 insertions(+), 556 deletions(-)
> 
> --
> 2.24.1


Series applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh