From patchwork Wed Oct 21 20:30:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Kozyrev X-Patchwork-Id: 81709 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 92EBDA04DD; Wed, 21 Oct 2020 22:30:53 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A06BDA54C; Wed, 21 Oct 2020 22:30:50 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 49FE1A54B for ; Wed, 21 Oct 2020 22:30:49 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE1 (envelope-from akozyrev@nvidia.com) with SMTP; 21 Oct 2020 23:30:47 +0300 Received: from nvidia.com (pegasus02.mtr.labs.mlnx [10.210.16.122]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 09LKUl30019906; Wed, 21 Oct 2020 23:30:47 +0300 From: Alexander Kozyrev To: dev@dpdk.org Cc: rasland@nvidia.com, matan@nvidia.com, viacheslavo@nvidia.com Date: Wed, 21 Oct 2020 20:30:28 +0000 Message-Id: <20201021203030.19042-1-akozyrev@nvidia.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200719041142.14485-1-akozyrev@mellanox.com> References: <20200719041142.14485-1-akozyrev@mellanox.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v2 0/2] net/mlx5: add vectorized mprq X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The vectorized Rx burst function helps to accelerate the Rx processing by using SIMD (single instruction, multiple data) extensions for the multi-buffer packet processing. Pre-allocating multiple mbufs and filling them in batches of four greatly improves the throughput of the Rx burst routine. MPRQ (Multi-Packet Rx Queue) lacks the vectorized version currently. It works by posting a single large buffer (consisted of multiple fixed-size strides) in order to receive multiple packets at once on this buffer. A Rx packet is then copied to a user-provided mbuf or PMD attaches the Rx packet to the mbuf by the pointer to an external buffer. It is proposed to add a vectorized MPRQ Rx routine to speed up the MPRQ buffer handling as well. It would require pre-allocation of multiple mbufs every time we exhaust all the strides from the current MPRQ buffer and switch to a new one. The new mlx5_rx_burst_mprq_vec() routine will take care of this as well as of decision on whether should we copy or attach an external buffer for a packet. The batch processing logic won't be different from the simple vectorized Rx routine. The new vectorized MPRQ burst function is going to be selected automatically whenever the mprq_en devarg is specified. If SIMD is not available on the platform we fall back to the simple MPRQ Rx burst function. LRO is not supported by the vectorized MPRQ version and fall back to the regular MPRQ will be performed. Alexander Kozyrev (2): net/mlx5: refactor vectorized Rx routine net/mlx5: implement vectorized MPRQ burst drivers/net/mlx5/mlx5_devx.c | 15 +- drivers/net/mlx5/mlx5_ethdev.c | 20 +- drivers/net/mlx5/mlx5_rxq.c | 96 +++--- drivers/net/mlx5/mlx5_rxtx.c | 237 ++++--------- drivers/net/mlx5/mlx5_rxtx.h | 200 ++++++++++- drivers/net/mlx5/mlx5_rxtx_vec.c | 416 ++++++++++++++++++++++- drivers/net/mlx5/mlx5_rxtx_vec.h | 55 --- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 106 ++---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 103 ++---- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 121 ++----- 10 files changed, 813 insertions(+), 556 deletions(-)