From patchwork Wed May 18 13:57:43 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jerin Jacob X-Patchwork-Id: 12878 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id D8EE0697B; Wed, 18 May 2016 15:58:17 +0200 (CEST) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0056.outbound.protection.outlook.com [157.56.111.56]) by dpdk.org (Postfix) with ESMTP id 4F9B2691A for ; Wed, 18 May 2016 15:58:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=olu8PhWPV5n3y9HVBP4M+6WZr3MgbKKdpPAyW1CQkm0=; b=b9TXdWbOoRy7NIwehloGVY4ekfsUCjU4rfEotkW6PydJCAZYCHdx8S69tCvQNwHjC0oWM9trrgKL62NV6yTZfl+j6Rbhd9jhEPEnFZ2zN7vPHPaGgVs1bLhJQjercBBx8rKSXWINHzRyXnN0ZdsxezBxJUDPwNXO0iGLEFvf97s= Authentication-Results: dpdk.org; dkim=none (message not signed) header.d=none;dpdk.org; dmarc=none action=none header.from=caviumnetworks.com; Received: from localhost.caveonetworks.com (111.93.218.67) by BN3PR0701MB1718.namprd07.prod.outlook.com (10.163.39.17) with Microsoft SMTP Server (TLS) id 15.1.497.12; Wed, 18 May 2016 13:58:12 +0000 From: Jerin Jacob To: CC: , , , , , Jerin Jacob Date: Wed, 18 May 2016 19:27:43 +0530 Message-ID: <1463579863-32053-1-git-send-email-jerin.jacob@caviumnetworks.com> X-Mailer: git-send-email 2.5.5 MIME-Version: 1.0 X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: MA1PR01CA0029.INDPRD01.PROD.OUTLOOK.COM (10.164.117.36) To BN3PR0701MB1718.namprd07.prod.outlook.com (10.163.39.17) X-MS-Office365-Filtering-Correlation-Id: f2910ff4-bd06-4c06-5fbf-08d37f247524 X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1718; 2:f0RoNNGBN6kCsyUQmxMIBhNT4zC+GSFdmUFpL0vYbpcRTynRTw1Avx1qFAt1WrfcFqJKNvTaVQgpDkdllRg6wqDmhiUfr9f30OQNzKicUqwq5jzloSZ/hpHdM5+u4fq9lrnso0SAWSgQ6DLR4DD7VYvS0r4qugVUFtWBmjh9ssjyjRMGMVv+6e+6pfJif4TE; 3:lbPZcjmo22AIaSnXvmBXE9g3Hp7/MezGCAoFT91eOT+jod4v0JBcsRpW7Vuy6ZiWfDX0RCfFjz9dMTKnDsramrExG/pHj7IJWGC5TRBsGXJ+mVSNwUgDrPhhQesiWo9B X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN3PR0701MB1718; X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1718; 25:PYmMZdbSoqB7bs3QjsX6ZEd4s6BDIC8ukjg3iUP9bD9FpaehB9rdA/4MtdrrHJyuFVu2WglLMI75oJujk2zH9p5kMWn/CH4ja+pwbrYaVrrDBwEfaFfXO7IJYEYy+r1arOyhhBxu5nBrb7j/DJvkoYRwImtpvGnD5Dk6FXlnWDQ5dbpIdGQ2Ha7jX0ZOH/jnLQ2dKkbk+YSnqIkVWyNg4d4PYGcul7sSVwPOUFcSL3LFtBMcmkpJOutvRhlgdmis749iWPXnArod7gZKhzBFp0mZH52Rl1CiNtevdnX3hzX2fMAHjp0BvdH0yPERNWKJS9kNRdPp+o5YRaA23/Kqoi6b1d5KQ1FOWUOfiP9KT50Wcv5ao3HtDdHx67PesEUT38e1TR+1VWiTE/3k/P4SlN6jwlZ9szXt6Z5fMd79S04b+rLGKn8HUCJ9xc+B1evJ8Cs1LEj9uL/s9lfiKjhECU7eG+eR2dz06gRX3omkLk4KOzoyJGeHk7UilL2rbyKxWwY3P7C1+Yly0xDaCBuxGuttvzGc/owQiJ4P9vd2fEvfM/ssjJhAaz/D5Qq1oS0RVGNZPXsBimLRjhp9UK2ulDfFm90SIzDO+efTgsKnXzFGGLnYtQUTAWr78MvSIhGdH3o7gqlgBq0FlicA69D/X+yIvwoF84bqtkRPQj031sgVpTxfUPl3wK04Ux1IQQQfU7F1r3KdiDPKdj434L/qOGI949RMxRNhWYphMlBZIVRWM1gx666RY/IX4GOKzVjXQnpfnEP1ZpA80elu+Ji3ZejbymTTtpxS6kxP9UcsYx6lvTxYLP7zyZS2pK3IYeGlqR87PP7PhvPrPUALMoeGhw== X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1718; 20:geZzflBlnvSl9sx2E/rS6oro9ukDlBNs+Md+IWB/FrZ9Fhv6OIONENUEBQc4gIyzwcX+j6JmXDYzhAa0YbkWV4st5oSPZ93SFPCjsKtkxIZ9ai4SvbiRtLAX49SPYquFJankupLY/ARqdZPNzjaDvc3KTaQM5+CYBZpSljSASy55Gd0PSJajWUuUQOZSHGObIK/nITsWUMTq0/haLSpH+GEkS+F/W3OpIrIqcNZMQEILkPmAo9Da3Tsv1jBWpY4xl8qZooWXHYLmouORpHUOcJLOAMGE9RA8YjNLou8NYqYaVGmzQ+UILR/2WrIqD5wk/beiZ7QHa8NhIxMVbOPeFzf4qb8+6dob/U09qxLP4xYIVjxjzqozobGu7FollX2CQZ0EyX5dNrlB0vHHSgZE3OoYFQu2Y7ZNRsUePjVdDuH61eckRe0Sxak6odjJF6e0aUXRgkDoNosgjvl2o1j3YoBYLgpmkFh+L12DTtAbQRQ8Q8M1cidZJ88wDpRtXe+sE+LHLJD7N+mAP+wUUodzo/N3WW/kPYHapLoxw4pdZSdJrAHm+a0ajQxWgErMXuyTO387j7CcDjNZdZwvVIpZyDHbhhh+FIMMHvH1u/wiSxw= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046); SRVR:BN3PR0701MB1718; BCL:0; PCL:0; RULEID:; SRVR:BN3PR0701MB1718; X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1718; 4:dSyvc0B4nYyBHr+bbntJAw/Lw+mmPIlLbQmbwhOPryYCXRwrNY8ebVtuqr84LNbe/FD8bIiwUeXmWtwDSMojHR19hcVsQNeWDeaFkh0gHs5ezF/52WBqL64XyGfbO8FbilvOiGTN9FSPZo7JsyJM0+htUWjCEOjdZBW6EYXgPYT0QSvAVuVR3HsxPkXiBKUIcUkQ5tR3+Pc2NLgh8g+Dnl03SZHeaR1XUq8D5+iHGiDxB1qgvXf3Xc+6By9xvY0UuJueQaEoImuz5PGbGNDyLZXRQXX7883sSTGax3kfYbIxPdN6d8AKCkrrCYn0rheuSrUe5P/r/WKMJ0K6XDcXF6atRZoarso+3rlVTtpWPG78ut8fpTYdqh0r/JkTdOOx X-Forefront-PRVS: 0946DC87A1 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(6069001)(5008740100001)(42186005)(53416004)(50466002)(33646002)(48376002)(229853001)(36756003)(2351001)(15395725005)(47776003)(77096005)(5004730100002)(66066001)(76506005)(189998001)(15975445007)(107886002)(110136002)(6116002)(586003)(5003940100001)(50226002)(19580395003)(19580405001)(4326007)(2906002)(50986999)(81166006)(5009440100003)(8676002)(92566002)(4001430100002)(7099028); DIR:OUT; SFP:1101; SCL:1; SRVR:BN3PR0701MB1718; H:localhost.caveonetworks.com; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN3PR0701MB1718; 23:4iFX7TV9XTrzHMoIHaeAVJItHdR+x6PltNCRscy?= =?us-ascii?Q?BPegRvkScVA7UeJD93UUyhyd9pd4eIzzIySDjaqHHgT9CdmihpqKaGlwynzR?= =?us-ascii?Q?W7uSGhdFWYiN4YPCTLEYvHNaaHqsosw3mWxyuKNOcT90b99NsTISvFDydiNW?= =?us-ascii?Q?/58MZYGfN944fgaVj3OVZePIarOupvh6n1cQpoWCFLdQrjt0n+yo8qu/uZmI?= =?us-ascii?Q?DKtzSaol40qjfugyQibIp+0x2BcmAAUWV71bL4yUc2gE/cWWNxskOiO+eMSU?= =?us-ascii?Q?Qm1i15AVlCfotZJ13Xlqv1OG7nibthKo/rHu9vLMmn/8+d24tv3oWPiUzXas?= =?us-ascii?Q?SWrbRsaN8MV1qSyeCyJzFX/epY72PMfjIxYLduCyYl97Uy6OJpTunVJ1Kr6a?= =?us-ascii?Q?lmzHgiSF5d9zJodI/7ivuWerp9r1BihBZbgmpISpLFvBPUaStjmpcz9KJfA0?= =?us-ascii?Q?N03uYI4sDG9rrFYfOfF3bt7qK7q24w9jiNmuGUVLPYR/dlfa1+Aqr817G1lt?= =?us-ascii?Q?0AOqEbzwRAklzyz//I++Gb6gG2cfwXZY6ebXIVN3dAIvNE0x5gTnCeDP5Fzq?= =?us-ascii?Q?qZEMKukOKgWxbHRZCGC4lJIAtMiLTESToXD/mGrVEmGBDrzS2JySp+9mEaPp?= =?us-ascii?Q?HHsVNIfGWNSYSu6hxYQCzGhCN79Nilnl/mKt6F0rCU2OG1QUQWTpZBaVK2UA?= =?us-ascii?Q?NJkl3PSSFZMlfxeJsls5xnlor/PsctBL28FgBzt7lrz1CKJwYZfV5dBrfDw5?= =?us-ascii?Q?mCnSJ74i8cwrNnHflY89tqx+4DIGn9CRNsN9cpgM8YMpJpq6wCFIdzuZHmDo?= =?us-ascii?Q?YVwMTXsbjsV1vbVjmVLcV2YMrxNpTJw1xBRbpfoVw9ELIJcA0YuvAHmmLr83?= =?us-ascii?Q?cv3h4GxGsrKyGpcmQC6oHuWloLzRC50L1AvFfJ4jtklBmPStnyzFcaXWYH4J?= =?us-ascii?Q?cONDEJX5nXaU5JhN+xtTIIdYexTGoLgjAafZvGzfzX/9vIXFX28244YkzOIj?= =?us-ascii?Q?yMVOYno00IQuG3jh/WiP7XyWyh9eTKjYZJNYbrEtia7KhZQ=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BN3PR0701MB1718; 5:SwWmGPww28ZiEGYG6nawJY5LzQw9enYWPNpnWadGcPH4RLkVOjMVyfSSvLsS61ihXRiBkpdYyvfFTP7gaftegwDsMJMv/jrVibAvrlf2IMaSExvM5aRrq6oAUUUmJW8lvcEmOpMCVGR586r1pAjyww==; 24:vORhzwlfLuXHMyY2aL5yYbsht1M7EsJ9nrY6c53sSX6GEcrD4qTUEc6OMHJum+VJ+6eACsaTbLbZXxFFI76L74VwvX/0eR9rS/Pj4Sbh5PQ=; 7:vFdv8Gcb0WgyeOJdJd9NOhulyO92mj+aQ38im920ZJ5cBiFado6NOxLM9QfhKlQ5QZYnYzqhvz7Gf/lvZgxo0XGx91X4R2PJxsUir+fR3U24mT4QZPt6hTnV6DDk7OEoG3pa+TO29V7QguLmieYV2vAmPDnReVAnGcxm/qGoHoUTMaheWVQ8LlCWbA/5u5Xd SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 May 2016 13:58:12.6645 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR0701MB1718 Subject: [dpdk-dev] [PATCH] mbuf: make rearm_data address naturally aligned X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" To avoid multiple stores on fast path, Ethernet drivers aggregate the writes to data_off, refcnt, nb_segs and port to an uint64_t data and write the data in one shot with uint64_t* at &mbuf->rearm_data address. Some of the non-IA platforms have store operation overhead if the store address is not naturally aligned.This patch fixes the performance issue on those targets. Signed-off-by: Jerin Jacob --- Tested this patch on IA and non-IA(ThunderX) platforms. This patch shows 400Kpps/core improvement on ThunderX + ixgbe + vector environment. and this patch does not have any overhead on IA platform. Have tried an another similar approach by replacing "buf_len" with "pad" (in this patch context), Since it has additional overhead on read and then mask to keep "buf_len" intact, not much improvement is not shown. ref: http://dpdk.org/ml/archives/dev/2016-May/038914.html --- drivers/net/fm10k/fm10k_rxtx_vec.c | 3 --- drivers/net/i40e/i40e_rxtx_vec.c | 5 +---- drivers/net/ixgbe/ixgbe_rxtx_vec.c | 3 --- lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 4 ++-- lib/librte_mbuf/rte_mbuf.h | 6 +++--- 5 files changed, 6 insertions(+), 15 deletions(-) diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 03e4a5c..f3ef1a1 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -314,9 +314,6 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Flush mbuf with pkt template. * Data to be rearmed is 6 bytes long. - * Though, RX will overwrite ol_flags that are coming next - * anyway. So overwrite whole 8 bytes with one load: - * 6 bytes of rearm_data plus first 2 bytes of ol_flags. */ p0 = (uintptr_t)&mb0->rearm_data; *(uint64_t *)p0 = rxq->mbuf_initializer; diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c index f7a62a8..162ce4e 100644 --- a/drivers/net/i40e/i40e_rxtx_vec.c +++ b/drivers/net/i40e/i40e_rxtx_vec.c @@ -86,11 +86,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) mb0 = rxep[0].mbuf; mb1 = rxep[1].mbuf; - /* Flush mbuf with pkt template. + /* Flush mbuf with pkt template. * Data to be rearmed is 6 bytes long. - * Though, RX will overwrite ol_flags that are coming next - * anyway. So overwrite whole 8 bytes with one load: - * 6 bytes of rearm_data plus first 2 bytes of ol_flags. */ p0 = (uintptr_t)&mb0->rearm_data; *(uint64_t *)p0 = rxq->mbuf_initializer; diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c index c4d709b..33b378d 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c @@ -89,9 +89,6 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) /* * Flush mbuf with pkt template. * Data to be rearmed is 6 bytes long. - * Though, RX will overwrite ol_flags that are coming next - * anyway. So overwrite whole 8 bytes with one load: - * 6 bytes of rearm_data plus first 2 bytes of ol_flags. */ p0 = (uintptr_t)&mb0->rearm_data; *(uint64_t *)p0 = rxq->mbuf_initializer; diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h index 2acdfd9..26f61f8 100644 --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h @@ -111,11 +111,11 @@ struct rte_kni_fifo { */ struct rte_kni_mbuf { void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); - char pad0[10]; + char pad0[8]; uint16_t data_off; /**< Start address of data in segment buffer. */ char pad1[2]; uint8_t nb_segs; /**< Number of segments. */ - char pad4[1]; + char pad4[3]; uint64_t ol_flags; /**< Offload features. */ char pad2[4]; uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */ diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 7b92b88..6bc47ed 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -733,10 +733,8 @@ struct rte_mbuf { void *buf_addr; /**< Virtual address of segment buffer. */ phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */ - uint16_t buf_len; /**< Length of segment buffer. */ - /* next 6 bytes are initialised on RX descriptor rearm */ - MARKER8 rearm_data; + MARKER64 rearm_data; uint16_t data_off; /** @@ -753,6 +751,7 @@ struct rte_mbuf { }; uint8_t nb_segs; /**< Number of segments. */ uint8_t port; /**< Input port. */ + uint16_t pad; /**< 2B pad for naturally aligned ol_flags */ uint64_t ol_flags; /**< Offload features. */ @@ -806,6 +805,7 @@ struct rte_mbuf { uint16_t vlan_tci_outer; /**< Outer VLAN Tag Control Identifier (CPU order) */ + uint16_t buf_len; /**< Length of segment buffer. */ /* second cache line - fields only used in slow path or on TX */ MARKER cacheline1 __rte_cache_min_aligned;