From patchwork Thu Aug 24 15:54:06 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Moti Haimovsky X-Patchwork-Id: 27881 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 4F0C77D5B; Thu, 24 Aug 2017 17:54:33 +0200 (CEST) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0057.outbound.protection.outlook.com [104.47.0.57]) by dpdk.org (Postfix) with ESMTP id 299457D3E for ; Thu, 24 Aug 2017 17:54:30 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=cODr/DgOw7ZEpqylgMPIgfpnNZZWLGqw8ZXhyskYKQ0=; b=V2ws4ZUjWA6zyhH3KvZLbEsHsLD3iJQP8p8q10gteGt0u+k5GqdIdgyCKXW5fbfz6QliTU3hcTr4HDevbRSBO9ZWJaMheHeF5Oaoyegey/RJtb42Z9eCvOoFGP+IXth82OXnYGDhskoqBNjuR6aWrau4qp3/hRcsbA0XpmMqyWY= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=motih@mellanox.com; Received: from mellanox.com (37.142.13.130) by AM4PR05MB1907.eurprd05.prod.outlook.com (2603:10a6:200:15::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1362.18; Thu, 24 Aug 2017 15:54:27 +0000 From: Moti Haimovsky To: adrien.mazarguil@6wind.com Cc: dev@dpdk.org, Moti Haimovsky Date: Thu, 24 Aug 2017 18:54:06 +0300 Message-Id: <1503590050-196143-2-git-send-email-motih@mellanox.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1503590050-196143-1-git-send-email-motih@mellanox.com> References: <1503590050-196143-1-git-send-email-motih@mellanox.com> MIME-Version: 1.0 X-Originating-IP: [37.142.13.130] X-ClientProxiedBy: VI1PR0302CA0006.eurprd03.prod.outlook.com (2603:10a6:800:e9::16) To AM4PR05MB1907.eurprd05.prod.outlook.com (2603:10a6:200:15::15) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 7c3f6dbe-1bda-4999-ffd9-08d4eb0866e1 X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(48565401081)(300000503095)(300135400095)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:AM4PR05MB1907; X-Microsoft-Exchange-Diagnostics: 1; AM4PR05MB1907; 3:+6e1EIGFd7sD+BIMb3/FYgnTzEB+6sR78boq2AYvWQXOrmWB+E2P0Wl0eycDgn4cKNvtSow9tx2obwNmPeSY1aAY4PZAzWlWt78IS2j92Kih2eG9diRGyG9JBY5K5pj06WvLMaIlfOxX3M3r8nu6AysnaNDjWZElZxLgrUHYDf866yi1/QntOneahVGSYnezdzkV3KoFlq8bivbVg8U7Kjq9k5SrtU4Yq2YFYgRB59zXLNA0G3t476z5pmA/Q7o6; 25:Xdw9Fb1QwmvSwsfcMu1R73dGvJr+2En6Hj5UckNxPkolVf1D7XwLucriOMY+AUhtquyzXyqgETGxuWKLRv/iWeLEyP4BCaKs+F3sesF7PKqd8Ig8dFOuujYLlXFCG4Sv2Tvke2rJaIpF9NgBgIrsL2hLUPHixnGSog/toEvVZ8WhDymuJ58p/a9C/yinCwYvpF2bpYK4AucB6cgn8XVhEic7YCQE0HGuVywpLQsejfRcohvXOTzH11XXW8iX/9Y0C7vM57moTbxZ43u7U5IyJbUpyn/TGV+CkNIE3JIs+t7qU2zxQZSZG2+Pj1TGstICso7MOxGa2KnZoP0YgGndeg==; 31:LqJn69QS5ybMwhLeFsHS4AVxgR4CS34nOVFfFzoqAZvvD1WVF2DLgXnVuswZMOjVROLDfpct6zWAzVrfHz4wANvCfCIKHz6k8vpn1QchZi1xpb2X/ub+U1dtgtt0igMtx6faye1Q8LY9YZFNMjPJKJ0o9vHAj6Xq0RJsD0Dpn7TYEMXlS8lm6vt1rQX75j+p5EcvvbTWVvoFT7FbFYd7BikM91l9xLzo9HeTnUq1JPA= X-MS-TrafficTypeDiagnostic: AM4PR05MB1907: X-LD-Processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr X-Microsoft-Exchange-Diagnostics: 1; AM4PR05MB1907; 20:HqksLh8Zn7KSrcpvaLw+YqfGNy3m8qc62lST61hFXeiVYc6AHmTk+GETdBtTnblRNxZYX7C5YUTK96mG+uSrRpdbSXcNGovTF5S4PXz4c4UbbiO01IxmczTK1gVMPNT0RRA5NAsOtFJHm+1dEsj1YYjgnMB5RLmH7mRIhj/f9+g0JywQNwVo4lKMB2qhdjhA+ywR6H4RvbDl9mIuEddmKn8+41FSDGjFLS9NK+72pbX3oYNYHGCdXfPs+yNvugQiRKiti7G7nWJiKCLxeTkX08nXye+86300AvFb7XMz6HuXYOYuix+ed+gHjWSEPQPpFG2HiHg6FiX0uVmdaLjcGPCHgbIuG2tCQeUDVOnkTJJZXA+EAJs1r29i1czWtpS5jx9PryI0Mv9LhAQs7bABuVV78sfoqq72E5YZPPvhfC+zRC3OjdxDLutFTn9xS6xTanDtJkApYouRspFzbFPT6K2pheW4v5CHwZfyU4gbHN12Gu+DeLU+krIOR8KT+CBd; 4:jH8OEKlT/jiW8fNUmCLwSS/DjdWUW08zYVXBVhjuLqzKqbydS7yj4wKA6I46Y3N9vrL0BJcm8YNISgqrt/JXuJ3ZPd4zzNlhdPWO7KdZnbV0CSBtogJWBsbtNLhm+EV8PvpHXerxTzDVZG4EGv2nnk8LWAFeK5Dp35ICjocHORBEQ7hsHR9NqwZ9j8kiU90hvvUtwGAUBQ/mswnCqOy9izFqwKQs4VKboP5NmnReMrehw7cLk/EPj9swm0t7l4TKPGFuBA1IjHxHJ2JoNav/gd1ok26v9e+PdTwW8gZJAK5OeLpudP3dXCRwwOhMO4bY6mLcG3r05K7bl8RKiJU+kUZrvQL9Dm3WhrmNRGOIwlG0AcxGr2TKVC9uJDiMChrb X-Exchange-Antispam-Report-Test: UriScan:(60795455431006)(131327999870524)(66839620246622); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(93006095)(93001095)(100000703101)(100105400095)(3002001)(6055026)(2002001)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123560025)(20161123562025)(20161123555025)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:AM4PR05MB1907; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:AM4PR05MB1907; X-Forefront-PRVS: 04097B7F7F X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(7370300001)(6009001)(39860400002)(199003)(189002)(66066001)(47776003)(5660300001)(50226002)(2361001)(50986999)(81166006)(8676002)(69596002)(33026002)(478600001)(105586002)(7736002)(42186005)(76176999)(106356001)(305945005)(33646002)(4326008)(81156014)(2906002)(2351001)(101416001)(21086003)(48376002)(25786009)(189998001)(2950100002)(4720700003)(86362001)(6916009)(575784001)(55016002)(6666003)(53936002)(50466002)(107886003)(110136004)(7350300001)(6116002)(36756003)(5003940100001)(97736004)(3846002)(68736007)(50929005); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB1907; H:mellanox.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; AM4PR05MB1907; 23:LSeXZfkUTjPNJ+sHetsc0VoLwcE1xm16TahvRqRrx?= LPPYVav0ZzCmItCqCxYjRy5lMH6lKQfP7YNASu0/nicBcdcv/BRhyfu0XFtrx9qJNu0BkxKDzBYvJtbcdUmS4SYJpaALyVdIB3ad0e75TxVnmsQxYzVsVp05uHVcLvVraSdqphsPJk3SYjhua3eH9U36TqmMxA2pmrlB/SraAinOkKtnuMqmrfiKTQo4tyQrPzKYmtdxlPS+bGGikoqIZ/RUJaIMRA/573WvF5jtY+OTsNyWdtoDCUs4cK61KOgchhUNwLwgQsCEabdz57E9zIVmBiaQvrRYsNC6UXozO55u2woCjbQfZV+bjn86FP4q8F/Y2W4/bbw/JFaWANjn7DmbD8xJSi4h6eU1rVk5tVhmvOvyLydTX+MT2GQZR8f0V4J86v6ibxFG7NONyN/K8VSlb+ZAaNSdGxOQjEVc9UY9jvBNveUAANF6iZaMH9bIA+o1kHGhEDz7VBtaSH+Zla11MTWLZdSUlVgKCZUSfvTK1diDYo0sj3zVGgiYsljXcWFdZrpRX+m1baSxx1GT69a9szHAK8jE5QKljkMPGAgUetB4gdHtM1dnSX2NcyU0WfqSLVrouv7r0EEj47grpj3NzBwSMletjcLasiZ79VJG0+1pqsFE38LDb9fX8byhDFJApFgaORcvhirqS2p3nGgCp3kXe86Xeg2GG2ejEZboi9JEi6q+6rcVY8bHf7RdhG1E7kgySxnC1IngqACCmcF7asV0r05J1u60M1xr/sVSMU1MmUK78bxnxfjhdB56kzQvzj8CDfNqUegsw60ZEQNk1VVKyyQKE9+rxTVGfZ8UNOFJfApu1D5HZHmTsZMQU7H31NZUeKqL40FElP7zoWtVp0Fgx2z58jgKuR0MjfIpuqYL2lZuA2ioW32tUJpt9rQqlcnadDgDq5BL2axUgOlUDTjF5AYND9AsoeDowQv41KjvOjTB/FPh/XPsDzNrTZcL39b3fDu69EIYueZbX1TjGld2zmta6q5xbvF4IojbcGn8CsCRVFj152nxuigps9BnBRIGmdE0cqWc1BYV8cCmoVEOfm96NQvptrfQY4o+2hX1dwQcoPB48fxk1FchYsHMIP5hb4phSG0ywq4Kg7NWpfx3FmTArfK+Ta31+acpR1wzYxx7I6FMtoo5p1k5EAz+RLcbK/iIxbAbMV+5NMD0oCUtFSDIqU+oWNR3YclCieI7NbxbkOyr4cfmjebUgGy+i/zR5mc6R4znC8dhJKn X-Microsoft-Exchange-Diagnostics: 1; AM4PR05MB1907; 6:3iI4BCSOK8DotkWWwtC0wpMf9Jb507QZfKw7OCDzXhstfMPSvWxjNHVKFBxZYQLNz8GSDho63tW2flQZzoAOybWuxnbAAdGivnd1Lk1cDkuZwNK4/PgWDH91oWDn0mVUCZgwsa++bYDDymC4N4Zha4sqBG9q3Z0gT372h3IBz0igwBXZnhsDIYHrVB7TcxefaQ4vpca7e5Uj6RRgA+ZgXNnu6rFXLDPw0ZLVUpDAHH1utj+Np9wQdsdyxw/a7ort4NH1oorn6DEsFry9gQGiPIBkvFZOyYT4AGwm6YBF6ENjDKs9YnoFiKny5DWTndeC0fjykdd9Ipo2QYg38WiFbw==; 5:DFqjNJga0uYctGGRocNmucSsoQd90Zg11VNawS1r3tasYl/qTD/x3bcw07graY88WOdLlsURr5Pp8fQNYSiy3z9kebg5meLKcJLfuGIQewr6Ox2UkDdX04k0X4Av7qt3HlcrRDHZCD6wywrZTPvH1A==; 24:dmqR9tlNIS5YeZnyzyX/LAPuYxvUIUqv2mFL/pAHvqiE7NUmI0jekLXW1OiSnyva/DeJ+O4G3HO5LGHRibga+dw+rmznA1EImGJoAfYchIc=; 7:XuDbxWhAwzvBUT/RR+9Z1xsNBzsvvqxvpo6aeNGQ9uyc1k4hbPZjxikJnUqNKBnDfIGpvWcRfCVqguPLDd8r8n4bXQXXAJantsDsR2IJPX3bbe+aCqefHHezOgTqYeil0Goi6rfkVm8olYgd+Fau0gAd4AP4q7coZUZKEaWFp5Wb3VqUWQscEaasgXwzgf7zqNUTWiWuRPTa2Uqep2MC2ywRJa445P1lS4J1RXK38AA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Aug 2017 15:54:27.5997 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB1907 Subject: [dpdk-dev] [PATCH 1/5] net/mlx4: add simple Tx bypassing ibverbs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" PMD now sends the single-buffer packets directly to the device bypassing the ibv Tx post and poll routines. Signed-off-by: Moti Haimovsky --- drivers/net/mlx4/mlx4_prm.h | 253 +++++++++++++++++++++++++++++++++++++++++ drivers/net/mlx4/mlx4_rxtx.c | 260 +++++++++++++++++++++++++++++++++++-------- drivers/net/mlx4/mlx4_rxtx.h | 30 ++++- drivers/net/mlx4/mlx4_txq.c | 52 ++++++++- mk/rte.app.mk | 2 +- 5 files changed, 546 insertions(+), 51 deletions(-) create mode 100644 drivers/net/mlx4/mlx4_prm.h diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/mlx4/mlx4_prm.h new file mode 100644 index 0000000..c5ce33b --- /dev/null +++ b/drivers/net/mlx4/mlx4_prm.h @@ -0,0 +1,253 @@ +/*- + * BSD LICENSE + * + * Copyright 2017 Mellanox. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef RTE_PMD_MLX4_MLX4_PRM_H_ +#define RTE_PMD_MLX4_MLX4_PRM_H_ + +/* Verbs headers do not support -pedantic. */ +#ifdef PEDANTIC +#pragma GCC diagnostic ignored "-Wpedantic" +#endif +#include +#include +#ifdef PEDANTIC +#pragma GCC diagnostic error "-Wpedantic" +#endif + +/* Basic TxQ building block */ +#define TXBB_SHIFT 6 +#define TXBB_SIZE (1 << TXBB_SHIFT) + +/* Typical TSO descriptor with 16 gather entries is 352 bytes... */ +#define MAX_WQE_SIZE 512 +#define MAX_WQE_TXBBS (MAX_WQE_SIZE / TXBB_SIZE) + +/* Send Queue Stamping/Invalidating info */ +#define SQ_STAMP_STRIDE 64 +#define SQ_STAMP_DWORDS (SQ_STAMP_STRIDE / 4) +#define SQ_STAMP_SHIFT 31 +#define SQ_STAMP_VAL 0x7fffffff + +/* WQE flags */ +#define MLX4_OPCODE_SEND 0x0a +#define MLX4_EN_BIT_WQE_OWN 0x80000000 + +#define SIZE_TO_TXBBS(size) (RTE_ALIGN((size), (TXBB_SIZE)) / (TXBB_SIZE)) + +/** + * Update the HW with the new CQ consumer value. + * + * @param cq + * Pointer to the cq structure. + */ +static inline void +mlx4_cq_set_ci(struct mlx4_cq *cq) +{ + *cq->set_ci_db = rte_cpu_to_be_32(cq->cons_index & 0xffffff); +} + +/** + * Returns a pointer to the cqe in position n. + * + * @param cq + * Pointer to the cq structure. + * @param n + * The number of the entry its address we seek. + * + * @return + * pointer to the cqe. + */ +static inline struct mlx4_cqe +*mlx4_get_cqe(struct mlx4_cq *cq, int n) +{ + return (struct mlx4_cqe *)(cq->buf + n * cq->cqe_size); +} + +/** + * Returns a pointer to the cqe in position n if it is owned by SW. + * + * @param cq + * Pointer to the cq structure. + * @param n + * The number of the entry its address we seek. + * + * @return + * pointer to the cqe if owned by SW, otherwise returns NULL. + */ +static inline void +*mlx4_get_sw_cqe(struct mlx4_cq *cq, int n) +{ + struct mlx4_cqe *cqe = mlx4_get_cqe(cq, n & (cq->cqe_cnt - 1)); + struct mlx4_cqe *tcqe = cq->cqe_size == 64 ? cqe + 1 : cqe; + + return (!!(tcqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK) ^ + !!(n & cq->cqe_cnt)) ? NULL : cqe; +} + +/** + * returns pointer to the wqe at position n. + * + * @param sq + * Pointer to the sq. + * @param n + * The entry number the queue. + * + * @return + * A pointer to the required entry. + */ +static inline void +*mlx4_get_send_wqe(struct mlx4_sq *sq, unsigned int n) +{ + return sq->buf + n * TXBB_SIZE; +} + +/** + * returns the size in bytes of this WQE. + * + * @param wqe + * Pointer to the WQE we want to interrogate. + * + * @return + * WQE size in bytes. + */ +static inline int +mlx4_wqe_get_real_size(void *wqe) +{ + struct mlx4_wqe_ctrl_seg *ctrl = (struct mlx4_wqe_ctrl_seg *)wqe; + return ((ctrl->fence_size & 0x3f) << 4); +} + +/** + * Fills the ctrl segment of a WQE with info needed for transmitting the packet. + * + * @param seg + * Pointer to the control structure in the WQE. + * @param owner + * The value for the owner field. + * @param fence_size + * Fence bit and WQE size in octowords. + * @param srcrb_flags + * High 24 bits are SRC remote buffer; low 8 bits are flags. + * @param imm + * Immediate data/Invalidation key.. + */ +static inline void +mlx4_set_ctrl_seg(struct mlx4_wqe_ctrl_seg *seg, uint32_t owner, + uint8_t fence_size, uint32_t srcrb_flags, uint32_t imm) +{ + seg->fence_size = fence_size; + seg->srcrb_flags = rte_cpu_to_be_32(srcrb_flags); + /* + * The caller should prepare "imm" in advance based on WR opcode. + * For IBV_WR_SEND_WITH_IMM and IBV_WR_RDMA_WRITE_WITH_IMM, + * the "imm" should be assigned as is. + * For the IBV_WR_SEND_WITH_INV, it should be htobe32(imm). + */ + seg->imm = imm; + /* + * Make sure descriptor is fully written before + * setting ownership bit (because HW can start + * executing as soon as we do). + */ + rte_wmb(); + seg->owner_opcode = rte_cpu_to_be_32(owner); +} + +/** + * Fills a data segment of a WQE with info needed for transmitting + * a data fragment. + * + * @param dseg + * Pointer to a data segment structure in the WQE. + * (WQE may contain several data segments). + * @param sq + * fragment info (addr, length, lkey). + */ +static inline void +mlx4_set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ibv_sge *sg) +{ + dseg->lkey = rte_cpu_to_be_32(sg->lkey); + dseg->addr = rte_cpu_to_be_64(sg->addr); + + /* + * Need a barrier here before writing the byte_count field to + * make sure that all the data is visible before the + * byte_count field is set. Otherwise, if the segment begins + * a new cacheline, the HCA prefetcher could grab the 64-byte + * chunk and get a valid (!= * 0xffffffff) byte count but + * stale data, and end up sending the wrong data. + */ + rte_io_wmb(); + + if (likely(sg->length)) + dseg->byte_count = rte_cpu_to_be_32(sg->length); + else + /* Zero len seg is treated as inline segment with zero data */ + dseg->byte_count = rte_cpu_to_be_32(0x80000000); +} + +/** + * Checks if a BBE sized WQE can be inserted to the sq without + * overflowing the Q and that theWQE is not over-sized. + * + * @param sq + * Pointer to the sq we want to put the wqe in. + * @param ntxbb + * Number of EBBs the WQE occupies. + */ +static inline int +mlx4_wq_overflow(struct mlx4_sq *sq, int ntxbb) +{ + unsigned int cur; + + cur = sq->head - sq->tail; + return ((cur + ntxbb + sq->headroom_txbbs >= sq->txbb_cnt) || + (ntxbb > MAX_WQE_TXBBS)); +} + +/** + * Calc the WQE size (in bytes) needed for posting this packet. + * + * @param count + * The number of data-segments the WQE contains. + * + * @return + * WQE size in bytes. + */ +static inline int +mlx4_wqe_calc_real_size(unsigned int count) +{ + return sizeof(struct mlx4_wqe_ctrl_seg) + + (count * sizeof(struct mlx4_wqe_data_seg)); +} + +#endif /* RTE_PMD_MLX4_MLX4_PRM_H_ */ diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c index b5e7777..0720e34 100644 --- a/drivers/net/mlx4/mlx4_rxtx.c +++ b/drivers/net/mlx4/mlx4_rxtx.c @@ -55,10 +55,125 @@ #include #include #include +#include #include "mlx4.h" #include "mlx4_rxtx.h" #include "mlx4_utils.h" +#include "mlx4_prm.h" + + +typedef int bool; +#define TRUE 1 +#define FALSE 0 + +/** + * stamp a freed wqe so it won't be reused by the HW. + * + * @param sq + * Pointer to the sq structure. + * @param index + * Index of the freed WQE. + * @param owner + * The value of the WQE owner bit to use in the stamp. + * + * @return + * The number of TX EBBs the WQE contained. + */ +static int +mlx4_txq_stamp_freed_wqe(struct mlx4_sq *sq, uint16_t index, uint8_t owner) +{ + uint32_t stamp = + rte_cpu_to_be_32(SQ_STAMP_VAL | (!!owner << SQ_STAMP_SHIFT)); + void *end = sq->buf + sq->size; + void *wqe = mlx4_get_send_wqe(sq, index & sq->txbb_cnt_mask); + uint32_t *ptr = wqe; + int num_txbbs = SIZE_TO_TXBBS(mlx4_wqe_get_real_size(wqe)); + int i; + + /* Optimize the common case when there are no wraparounds */ + if (likely((char *)wqe + num_txbbs * TXBB_SIZE <= (char *)end)) { + /* Stamp the freed descriptor */ + for (i = 0; i < num_txbbs * TXBB_SIZE; i += SQ_STAMP_STRIDE) { + *ptr = stamp; + ptr += SQ_STAMP_DWORDS; + } + } else { + /* Stamp the freed descriptor */ + for (i = 0; i < num_txbbs * TXBB_SIZE; i += SQ_STAMP_STRIDE) { + *ptr = stamp; + ptr += SQ_STAMP_DWORDS; + if ((void *)ptr >= end) { + ptr = (uint32_t *)sq->buf; + stamp ^= rte_cpu_to_be_32(0x80000000); + } + } + } + return num_txbbs; +} + +/** + * Poll a CQ for work completions. + * + * @param txq + * The txq its cq we wish to poll. + * @param max_cqes + * Max num of CQEs to handle in this call. + * + * @return + * The number of pkts that were handled. + */ +static int +mlx4_tx_poll_cq(struct txq *txq, int max_cqes) +{ + struct mlx4_cq *cq = &txq->mcq; + struct mlx4_sq *sq = &txq->msq; + struct mlx4_cqe *cqe; + uint32_t cons_index = cq->cons_index; + uint16_t new_index; + uint16_t nr_txbbs = 0; + int pkts = 0; + + while ((max_cqes-- > 0) && + ((cqe = mlx4_get_sw_cqe(cq, cons_index)) != NULL)) { + cqe += cq->factor; /* handle CQES with size > 32 */ + /* + * make sure we read the CQE after we read the + * ownership bit + */ + rte_rmb(); + if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == + MLX4_CQE_OPCODE_ERROR)) { + struct mlx4_err_cqe *cqe_err = + (struct mlx4_err_cqe *)cqe; + ERROR("%p CQE error - vendor syndrome: 0x%x" + " syndrome: 0x%x\n", + txq, cqe_err->vendor_err, cqe_err->syndrome); + } + /* Get wqe num reported in the cqe */ + new_index = + rte_be_to_cpu_16(cqe->wqe_index) & sq->txbb_cnt_mask; + do { + /* free next descriptor */ + nr_txbbs += + mlx4_txq_stamp_freed_wqe(sq, + (sq->tail + nr_txbbs) & sq->txbb_cnt_mask, + !!((sq->tail + nr_txbbs) & sq->txbb_cnt)); + pkts++; + } while (((sq->tail + nr_txbbs) & sq->txbb_cnt_mask) != + new_index); + ++cons_index; + } + /* + * To prevent CQ overflow we first update CQ consumer and only then + * the ring consumer. + */ + cq->cons_index = cons_index; + mlx4_cq_set_ci(cq); + rte_wmb(); + sq->tail = sq->tail + nr_txbbs; + return pkts; +} /** * Manage Tx completions. @@ -80,16 +195,15 @@ unsigned int elts_comp = txq->elts_comp; unsigned int elts_tail = txq->elts_tail; const unsigned int elts_n = txq->elts_n; - struct ibv_wc wcs[elts_comp]; int wcs_n; if (unlikely(elts_comp == 0)) return 0; - wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs); + wcs_n = mlx4_tx_poll_cq(txq, 1); if (unlikely(wcs_n == 0)) return 0; if (unlikely(wcs_n < 0)) { - DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)", + DEBUG("%p: mlx4_poll_cq() failed (wcs_n=%d)", (void *)txq, wcs_n); return -1; } @@ -99,7 +213,7 @@ * Assume WC status is successful as nothing can be done about it * anyway. */ - elts_tail += wcs_n * txq->elts_comp_cd_init; + elts_tail += wcs_n; if (elts_tail >= elts_n) elts_tail -= elts_n; txq->elts_tail = elts_tail; @@ -183,6 +297,80 @@ } /** + * Notify mlx4 that work is pending on TXq. + * + * @param txq + * Pointer to mlx4 Tx queue structure. + */ +static inline void +mlx4_send_flush(struct txq *txq) +{ + rte_write32(txq->msq.doorbell_qpn, txq->msq.db); +} + +/** + * Posts a single work requests to a send queue. + * + * @param txq + * The txq to post to. + * @param wr + * The work request to handle. + * @param bad_wr + * the wr in case that posting had failed. + * + * @return + * 0 - success, -1 error. + */ +static int +mlx4_post_send(struct txq *txq, + struct ibv_send_wr *wr, + struct ibv_send_wr **bad_wr) +{ + struct mlx4_wqe_ctrl_seg *ctrl; + struct mlx4_wqe_data_seg *dseg; + struct mlx4_sq *sq = &txq->msq; + uint32_t srcrb_flags; + uint8_t fence_size; + uint32_t head_idx = sq->head & sq->txbb_cnt_mask; + uint32_t owner_opcode; + int wqe_real_size, nr_txbbs; + + /* for now we support pkts with one buf only */ + if (wr->num_sge != 1) + goto err; + /* Calc the needed wqe size for this packet */ + wqe_real_size = mlx4_wqe_calc_real_size(wr->num_sge); + if (unlikely(!wqe_real_size)) + goto err; + nr_txbbs = SIZE_TO_TXBBS(wqe_real_size); + /* Are we too big to handle ? */ + if (unlikely(mlx4_wq_overflow(sq, nr_txbbs))) + goto err; + /* Get ctrl and single-data wqe entries */ + ctrl = mlx4_get_send_wqe(sq, head_idx); + dseg = (struct mlx4_wqe_data_seg *)(((char *)ctrl) + + sizeof(struct mlx4_wqe_ctrl_seg)); + mlx4_set_data_seg(dseg, wr->sg_list); + /* For raw eth, the SOLICIT flag is used + * to indicate that no icrc should be calculated + */ + srcrb_flags = MLX4_WQE_CTRL_SOLICIT | + ((wr->send_flags & IBV_SEND_SIGNALED) ? + MLX4_WQE_CTRL_CQ_UPDATE : 0); + fence_size = (wr->send_flags & IBV_SEND_FENCE ? + MLX4_WQE_CTRL_FENCE : 0) | ((wqe_real_size / 16) & 0x3f); + owner_opcode = MLX4_OPCODE_SEND | + ((sq->head & sq->txbb_cnt) ? MLX4_EN_BIT_WQE_OWN : 0); + mlx4_set_ctrl_seg(ctrl, owner_opcode, fence_size, srcrb_flags, 0); + sq->head += nr_txbbs; + rte_wmb(); + return 0; +err: + *bad_wr = wr; + return -1; +} + +/** * DPDK callback for Tx. * * @param dpdk_txq @@ -199,8 +387,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n) { struct txq *txq = (struct txq *)dpdk_txq; - struct ibv_send_wr *wr_head = NULL; - struct ibv_send_wr **wr_next = &wr_head; struct ibv_send_wr *wr_bad = NULL; unsigned int elts_head = txq->elts_head; const unsigned int elts_n = txq->elts_n; @@ -275,6 +461,8 @@ elt->buf = NULL; goto stop; } + if (buf->pkt_len <= txq->max_inline) + send_flags |= IBV_SEND_INLINE; /* Update element. */ elt->buf = buf; if (txq->priv->vf) @@ -285,22 +473,31 @@ sge->length = length; sge->lkey = lkey; sent_size += length; + /* Set up WR. */ + wr->sg_list = sge; + wr->num_sge = segs; + wr->opcode = IBV_WR_SEND; + wr->send_flags = send_flags; + wr->next = NULL; + /* post the pkt for sending */ + err = mlx4_post_send(txq, wr, &wr_bad); + if (unlikely(err)) { + if (unlikely(wr_bad->send_flags & + IBV_SEND_SIGNALED)) { + elts_comp_cd = 1; + --elts_comp; + } + elt->buf = NULL; + goto stop; + } + sent_size += length; } else { err = -1; goto stop; } - if (sent_size <= txq->max_inline) - send_flags |= IBV_SEND_INLINE; elts_head = elts_head_next; /* Increment sent bytes counter. */ txq->stats.obytes += sent_size; - /* Set up WR. */ - wr->sg_list = &elt->sge; - wr->num_sge = segs; - wr->opcode = IBV_WR_SEND; - wr->send_flags = send_flags; - *wr_next = wr; - wr_next = &wr->next; } stop: /* Take a shortcut if nothing must be sent. */ @@ -309,38 +506,7 @@ /* Increment sent packets counter. */ txq->stats.opackets += i; /* Ring QP doorbell. */ - *wr_next = NULL; - assert(wr_head); - err = ibv_post_send(txq->qp, wr_head, &wr_bad); - if (unlikely(err)) { - uint64_t obytes = 0; - uint64_t opackets = 0; - - /* Rewind bad WRs. */ - while (wr_bad != NULL) { - int j; - - /* Force completion request if one was lost. */ - if (wr_bad->send_flags & IBV_SEND_SIGNALED) { - elts_comp_cd = 1; - --elts_comp; - } - ++opackets; - for (j = 0; j < wr_bad->num_sge; ++j) - obytes += wr_bad->sg_list[j].length; - elts_head = (elts_head ? elts_head : elts_n) - 1; - wr_bad = wr_bad->next; - } - txq->stats.opackets -= opackets; - txq->stats.obytes -= obytes; - i -= opackets; - DEBUG("%p: ibv_post_send() failed, %" PRIu64 " packets" - " (%" PRIu64 " bytes) rejected: %s", - (void *)txq, - opackets, - obytes, - (err <= -1) ? "Internal error" : strerror(err)); - } + mlx4_send_flush(txq); txq->elts_head = elts_head; txq->elts_comp += elts_comp; txq->elts_comp_cd = elts_comp_cd; diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h index fec998a..e442730 100644 --- a/drivers/net/mlx4/mlx4_rxtx.h +++ b/drivers/net/mlx4/mlx4_rxtx.h @@ -41,6 +41,7 @@ #pragma GCC diagnostic ignored "-Wpedantic" #endif #include +#include #ifdef PEDANTIC #pragma GCC diagnostic error "-Wpedantic" #endif @@ -90,7 +91,7 @@ struct txq_elt { struct rte_mbuf *buf; /**< Buffer. */ }; -/** Rx queue counters. */ +/** Tx queue counters. */ struct mlx4_txq_stats { unsigned int idx; /**< Mapping index. */ uint64_t opackets; /**< Total of successfully sent packets. */ @@ -98,6 +99,31 @@ struct mlx4_txq_stats { uint64_t odropped; /**< Total of packets not sent when Tx ring full. */ }; +/** TXQ Info */ +struct mlx4_sq { + char *buf; /**< SQ buffer. */ + uint32_t size; /**< SQ size in bytes. */ + uint32_t head; /**< SQ head counter in units of TXBBS. */ + uint32_t tail; /**< SQ tail counter in units of TXBBS. */ + uint32_t txbb_cnt; /**< Num of WQEBB in the Q (should be ^2). */ + uint32_t txbb_shift; /**< The log2 size of the basic block. */ + uint32_t txbb_cnt_mask; /**< txbbs_cnt mask (txbb_cnt is ^2). */ + uint32_t headroom_txbbs; /**< Num of txbbs that should be kept free .*/ + uint32_t *db; /**< Pointer to the doorbell. */ + uint32_t doorbell_qpn; /**< qp number to write to the doorbell. */ +}; + +struct mlx4_cq { + char *buf; /**< CQ buffer. */ + uint32_t size; /**< CQ size in bytes. */ + uint32_t cqe_cnt; /**< Num of entries the CQ has. */ + int cqe_size; /**< size (in bytes) of a CQE. */ + uint32_t *set_ci_db; /**< Pionter of the consumer-index doorbell. */ + uint32_t cons_index; /**< last CQE entry that was handled. */ + uint32_t factor; /**< CQ data location in a CQE. */ + int cqn; /**< CQ number */ +}; + /** Tx queue descriptor. */ struct txq { struct priv *priv; /**< Back pointer to private data. */ @@ -118,6 +144,8 @@ struct txq { unsigned int elts_comp_cd_init; /**< Initial value for countdown. */ struct mlx4_txq_stats stats; /**< Tx queue counters. */ unsigned int socket; /**< CPU socket ID for allocations. */ + struct mlx4_sq msq; /**< Info for directly manipulating the SQ. */ + struct mlx4_cq mcq; /**< Info for directly manipulating the CQ. */ }; /* mlx4_rxq.c */ diff --git a/drivers/net/mlx4/mlx4_txq.c b/drivers/net/mlx4/mlx4_txq.c index e0245b0..1273738 100644 --- a/drivers/net/mlx4/mlx4_txq.c +++ b/drivers/net/mlx4/mlx4_txq.c @@ -62,6 +62,7 @@ #include "mlx4_autoconf.h" #include "mlx4_rxtx.h" #include "mlx4_utils.h" +#include "mlx4_prm.h" /** * Allocate Tx queue elements. @@ -109,7 +110,8 @@ assert(ret == 0); return 0; error: - rte_free(elts); + if (elts != NULL) + rte_free(elts); DEBUG("%p: failed, freed everything", (void *)txq); assert(ret > 0); rte_errno = ret; @@ -241,6 +243,36 @@ struct txq_mp2mr_mbuf_check_data { mlx4_txq_mp2mr(txq, mp); } +static void +mlx4_txq_fill_dv_obj_info(struct txq *txq, struct mlx4dv_obj *mlxdv) +{ + struct mlx4_sq *sq = &txq->msq; + struct mlx4_cq *cq = &txq->mcq; + struct mlx4dv_qp *dqp = mlxdv->qp.out; + struct mlx4dv_cq *dcq = mlxdv->cq.out; + + sq->buf = ((char *)dqp->buf.buf) + dqp->sq.offset; + /* Total len, including headroom and spare WQEs*/ + sq->size = (uint32_t)dqp->rq.offset - (uint32_t)dqp->sq.offset; + sq->head = 0; + sq->tail = 0; + sq->txbb_shift = TXBB_SHIFT; + sq->txbb_cnt = (dqp->sq.wqe_cnt << dqp->sq.wqe_shift) >> TXBB_SHIFT; + sq->txbb_cnt_mask = sq->txbb_cnt - 1; + sq->db = dqp->sdb; + sq->doorbell_qpn = dqp->doorbell_qpn; + sq->headroom_txbbs = (2048 + (1 << dqp->sq.wqe_shift)) >> TXBB_SHIFT; + + /* Save CQ params */ + cq->buf = dcq->buf.buf; + cq->size = dcq->buf.length; + cq->cqe_cnt = dcq->cqe_cnt; + cq->cqn = dcq->cqn; + cq->set_ci_db = dcq->set_ci_db; + cq->cqe_size = dcq->cqe_size; + cq->factor = cq->cqe_size > 32 ? 1 : 0; +} + /** * Configure a Tx queue. * @@ -263,7 +295,10 @@ struct txq_mp2mr_mbuf_check_data { unsigned int socket, const struct rte_eth_txconf *conf) { struct priv *priv = dev->data->dev_private; - struct txq tmpl = { + struct mlx4dv_obj mlxdv; + struct mlx4dv_qp dv_qp; + struct mlx4dv_cq dv_cq; + struct txq tmpl = { .priv = priv, .socket = socket }; @@ -308,6 +343,8 @@ struct txq_mp2mr_mbuf_check_data { /* Max number of scatter/gather elements in a WR. */ .max_send_sge = 1, .max_inline_data = MLX4_PMD_MAX_INLINE, + .max_recv_wr = 0, + .max_recv_sge = 0, }, .qp_type = IBV_QPT_RAW_PACKET, /* @@ -370,6 +407,17 @@ struct txq_mp2mr_mbuf_check_data { DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl); /* Pre-register known mempools. */ rte_mempool_walk(mlx4_txq_mp2mr_iter, txq); + /* Retrieve device Q info */ + mlxdv.cq.in = txq->cq; + mlxdv.cq.out = &dv_cq; + mlxdv.qp.in = txq->qp; + mlxdv.qp.out = &dv_qp; + ret = mlx4dv_init_obj(&mlxdv, MLX4DV_OBJ_QP | MLX4DV_OBJ_CQ); + if (ret) { + ERROR("%p: Failed to retrieve device obj info", (void *)dev); + goto error; + } + mlx4_txq_fill_dv_obj_info(txq, &mlxdv); return 0; error: ret = rte_errno; diff --git a/mk/rte.app.mk b/mk/rte.app.mk index c25fdd9..2f1286e 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -128,7 +128,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni endif _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio -_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4 -libverbs +_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4 -libverbs -lmlx4 _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -libverbs _LDLIBS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += -lrte_pmd_nfp _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null