Message ID | 20160512091349.GA10395@localhost.localdomain (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Thomas Monjalon |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 6926C69FD; Thu, 12 May 2016 11:14:49 +0200 (CEST) Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2on0093.outbound.protection.outlook.com [65.55.169.93]) by dpdk.org (Postfix) with ESMTP id 2EA926972 for <dev@dpdk.org>; Thu, 12 May 2016 11:14:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=mzaZclzTPMjHdhDHdJOyBDSML+EI8OR3ltXLuZ4IKgU=; b=ZOh1uHWtEpy9rt9VM40X5Ly3gqr9Ji8/u84gGoa0kngPqUSA/ZPruHjuZ3KpcgQ/XFzXJxFjaNeYLgNnkICEPAqXDUC31uZqwr7S3mmfawgKK2+S2mtKX9pUnY146xNmj16raYU5vjlU9vpTJ2hK7iR3PFKC2pIGq7xf28YcG7o= Authentication-Results: dpdk.org; dkim=none (message not signed) header.d=none;dpdk.org; dmarc=none action=none header.from=caviumnetworks.com; Received: from localhost.localdomain (111.92.123.202) by BY1PR0701MB1721.namprd07.prod.outlook.com (10.162.111.140) with Microsoft SMTP Server (TLS) id 15.1.492.11; Thu, 12 May 2016 09:14:31 +0000 Date: Thu, 12 May 2016 14:44:09 +0530 From: Jerin Jacob <jerin.jacob@caviumnetworks.com> To: <dev@dpdk.org> CC: <bruce.richardson@intel.com>, <thomas.monjalon@6wind.com> Message-ID: <20160512091349.GA10395@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.23 (2014-03-12) X-Originating-IP: [111.92.123.202] X-ClientProxiedBy: BM1PR01CA0006.INDPRD01.PROD.OUTLOOK.COM (10.163.198.141) To BY1PR0701MB1721.namprd07.prod.outlook.com (10.162.111.140) X-MS-Office365-Filtering-Correlation-Id: e3548ec2-0790-4047-18a4-08d37a45dc10 X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1721; 2:GwOzeSfDmxuyG+StsN7mOl7BhaAd9sUiB+bTGYcalE0m4B1TkNRcrPln40V9P14RMTurKYZvAFXzlNVaiqw5/exwrkwYIK4zRJKTchdWIx24OFMgBVDoTRoy1841qsR0slZT6eu1WRjT3xhtkr4njTvqQIw2makt3Kn7zNz1cT445EueGDDHP+PnNxS1GT/r; 3:MneevMf5+DS3p2xNmMB6tYhMeLjHjvsP5SfEdjy1PXQC9GsZU9mm3A0YZoZ+DxoSLQcBGWDYe318LIgTr40g5cObCjihSsx9cmWQ/3bkcyXOKvhxSKpSTofx+AK40glt; 25:dYbZt9p8gjF2r3r0C7xbTtSGJvtFC7LwUCJDiQ7wMe6v5KauqcdQfdeJyQksc2fY7ZVTuKZtr+ApZ8Qss6XARlnH2ocANbMWgMWDVyW3jDCTEjuKdbjBmOuW2JC2fYmZYVEtBryWGFCEuiklf0qLsjnZ8cTwDf9ryJLbdBg6IDMQae4CcePBxRl1UL9Ah1w/nNcNuZ0gmdtoIvRko5Dq3Hyv/w3+WDxzyc8SPVeuRmyc4nlD/UiA9Bk3li9gtV+ZhY/V73AhIMXXTj5H41UBJPQ0YlI7SRYIwVMyX0yIsWZBXTlmh+46fz0TvHGtFS78zM5z/Ko0dSa3YilBeTdwpsnayvsQSvMIS58D/bnJyV7+22jtBugs9TjJTOZBkwcIRN+uMMyC9iX982a/GP6JgyQPtRyYoIb5eW+4wQHgaXU= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY1PR0701MB1721; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1721; 20:daE+xchrw6ruljtio1zaKd9rpOL5Vv3Y+EHev09Ps/x49MrO/3anQ1Dja053YCnIS7kTT4vhJbR7sJTxyjOTwfULZ8cBZcSra6cebwff1XFXswnctVqY7wAdnxumEDkEVreESbiu1dhKNOjwMLWPsRcZ2kxD8I0a/XuUvJnwxGBQt7E8R8bHjymDa++Hv0kp2nHIyGmxdiSBJzw8jc+p4pdIfY2pLWmQ9zQRSXfDdMKssdAdSXuF6PlajmK1EBvxJlNwEh7bOLvQ0pqox6bwSX8M1dxmFz3l3AYJjV5IZx5UqlWQJeVk5Zat4NhKr5nrTAx4HTe6C1wGchzy0ZQdcKbLuQvtHV5NfLiAMFD8UWnXWtZDIQzIvLUgp3LEKcPLgeEhUi6vlLsokev9fPPFYGsz+yl7XsnCTqqYGLOamFax20iOdJXVIhPlsW6qZ65eFPfT6g2VvmgFCR4DYW1HpZhZQeVNkzGLaJ4lvb/EKG9+utr05a3/kidXJ5zJIEfdXOci8nG13I4g/L7T3yEvU8m+9O7NueFUByOlN/2zDcxvx/Y6IRXfCYpRaQIoPUAVbm/dxDOxiodPHXQLHnDxgD64yZhJ1P+mSnu2D3fGhUg= X-Microsoft-Antispam-PRVS: <BY1PR0701MB17216C8874BA969EFF9136DD81730@BY1PR0701MB1721.namprd07.prod.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001); SRVR:BY1PR0701MB1721; BCL:0; PCL:0; RULEID:; SRVR:BY1PR0701MB1721; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1721; 4:CYeqmxizSMFJnvvhRob3dZOQrlHvjT1vQwU7Lpd03OcyCFuXYub43+QU2pj9Ntbql+E0LfXP7ikc4c4xUolm8o/Y5hwvzEC+3j0n1r6qDh1v91CDUWmx+DXJosOb8t3JwJlQ9rybWbaZNhgCyGUbqjSdkQwRXwJgwylALiAcd4cC7vnnO7SRw5sNNXLX5qHdHOkRBCk6OT37umDN2Fq1iV1rTRfZ7DYeZcxRMd0JRfKDiCehaX1t6ZmBN3wgvc4AId4nT2PdTxZ8vRjVP3tJH+6S5RNyigS0EvFTXsRBgpK9k4kzkoHUT3hq4rtBjdeby9Z/IeAv8OtvGrapvjiaw7suL4XJ2+5qjYgYPG0Kzm+x8Z4O6dpWxd7wmnOGHAyx X-Forefront-PRVS: 0940A19703 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6069001)(6009001)(53754006)(81166006)(586003)(5008740100001)(61506002)(47776003)(3846002)(6116002)(54356999)(2906002)(42186005)(77096005)(4326007)(92566002)(50986999)(50466002)(229853001)(2351001)(4001350100001)(83506001)(33656002)(23676002)(5009440100003)(2870700001)(1076002)(9686002)(66066001)(5004730100002)(189998001)(110136002); DIR:OUT; SFP:1101; SCL:1; SRVR:BY1PR0701MB1721; H:localhost.localdomain; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1721; 23:6iC7ZswUkR+tPlNSGZinLjKVpT8sFEtIFgO6WzP1VA/a4UFGzj6q/gRBzbxOAKY0qKQqF7I8WZ86upAZLSwmpBInp/rNWoI1olSis4ityC1fcR3nRlI1NW/Zycvu+Bl+S+2UkReHZmuTwePI+Xlz+x57SEssTvArXg/5hxB3JGsAX6UN0eB7cxKJnHWQ/EgKNgsJPE+Cvx96WDfColAdKihRn8i+gHqNZDGV3Hhc15tiAAVX6XkYkUHHa1HcGi8H227oOiMQEsGa4L0GiSjgHbfRPIuXTAd8uMY2AfG6NgjpjVgclyoZfDGV9RZb3KAjIZPUAxyKrdv7rc3V96QE1KqbI9HaFKsgGHSRFLZWthBR8pDf1GiaOaDiOnHtuoqJBI5QN4R+/EtiQma21iQJT342DEkvWMj7wE2cxk6gzLJ7l7MqIdcktryvcov+LfbT81N8iVWE424x6gW7MNJX4jDdEQEyZQrP/5ytq4wj11evWi9BuzYa/9CFQeKqg+fJFabzb9SxE1cGRJFe6F5K3H6Iwdz2BnvmuqTiKq3g/wayIXoslKaW0/A3jAVbi5B1JOUtFnVqQojvb3CnRNrJcFgEn2z7U+h1B+TPda+mAMzA+E50Tr74AHTOAqhyfI2n0nbMSgc/xZuJygTFLriiAVSG3IyS4+RpaRz8kU8GzFYvfrBDniV1mDCpT7dFPL6mfa7IpaLDjQ20svjMaxXVAvD7LYFYz2bCjsHj9YUYW4A+zLXIYTxBr8ocbz9XYhOkMsccuoVdY3ufI9gwMerFs0AY/CbTAe5vkF1+ULteGUWWAGRxb9NOXhSTpoOqpv9Vj4UZKw7ABNWBB53rexDq1SwKhR7NNn0SsVpzMsOJVNpIgvW4Jvz28Pkfjn9k0Ry4EX+5HbUX61MbYNl6r4vdEw== X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1721; 5:3MLMuBB0QLfE+cXJ7SDvBXz7wrdF8KwEbUN7Dre93JIJ3tMGucS2XwbihPMfCnxgbehj2MP8GPh4z74gMAWUvGlWFzIFEtdPUhlEeBVgOASrfU2wmLxt7TIsUMkyolRfzB2NIkbTePAqVS7Q/IO4sA==; 24:Ua8spUaz0PrT/uD9ymutj9MuQ/n+Hz6X7S8mPHBCOyj/xEd6SHlQyNcMwgi6g5n3dpugQB/PeaP9i3dLloMYGIsqd74mqddnYRZHcgAlDVQ=; 7:+c9otbKr16mscH52T1lJQXMGbLHkn6ItO06RvuOslufsBuRFew1rLYdiJpCOC2EXb6yxebz2ZxnTBQ0+8WvoQLiJLdkzwbatJxW3KZMuJklG3z2DM4npvVnorzuDTaRHVK19TwPFDnseOlUuSzWcGZNDx/qpqQq2zDFbcye0WlQCQKuOzOkgWJ/VuKrR106+ SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 May 2016 09:14:31.6334 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR0701MB1721 Subject: [dpdk-dev] mbuff rearm_data aligmenet issue on non x86 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Jerin Jacob
May 12, 2016, 9:14 a.m. UTC
Hi All, I would like align mbuff rearm_data field to 8 byte aligned so that write to mbuf->rearm_data with uint64_t* will be naturally aligned. I am not sure about IA but some other architecture/implementation has overhead in non-naturally aligned stores. Proposed patch is something like this below, But open for any change to make fit for all other architectures/platform. Any thoughts ? ➜ [master] [dpdk-master] $ git diff /** @@ -754,6 +752,7 @@ struct rte_mbuf { uint8_t nb_segs; /**< Number of segments. */ uint8_t port; /**< Input port. */ + uint16_t buf_len; /**< Length of segment buffer. */ uint64_t ol_flags; /**< Offload features. */ /* remaining bytes are set on RX when pulling packet from * descriptor /Jerin
Comments
Hi Jerrin, > > Hi All, > > I would like align mbuff rearm_data field to 8 byte aligned so that > write to mbuf->rearm_data with uint64_t* will be naturally aligned. > I am not sure about IA but some other architecture/implementation has overhead > in non-naturally aligned stores. > > Proposed patch is something like this below, But open for any change to > make fit for all other architectures/platform. > > Any thoughts ? > > ➜ [master] [dpdk-master] $ git diff > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index 529debb..5a917d0 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -733,10 +733,8 @@ struct rte_mbuf { > void *buf_addr; /**< Virtual address of segment > buffer. */ > phys_addr_t buf_physaddr; /**< Physical address of segment > buffer. */ > > - uint16_t buf_len; /**< Length of segment buffer. */ > - There is no need to move buf_len itself, I think. Just move rearm_data marker prior to buf_len is enough. Though how do you suggest to deal with the fact, that right now we blindly update the whole 64bits pointed by rearm_data: drivers/net/ixgbe/ixgbe_rxtx_vec.c: /* * Flush mbuf with pkt template. * Data to be rearmed is 6 bytes long. * Though, RX will overwrite ol_flags that are coming next * anyway. So overwrite whole 8 bytes with one load: * 6 bytes of rearm_data plus first 2 bytes of ol_flags. */ p0 = (uintptr_t)&mb0->rearm_data; *(uint64_t *)p0 = rxq->mbuf_initializer; ? If buf_len will be inside these 64bits, we can't do it anymore. Are you suggesting something like: uint64_t *p0, v0; p0 = &mb0->rearm_data; v0 = *p0 & REARM_MASK; *p0 = v0 | rxq->mbuf_initializer; ? If so I wonder what would be the performance impact of that change. Konstantin > /* next 6 bytes are initialised on RX descriptor rearm */ > - MARKER8 rearm_data; > + MARKER64 rearm_data; > uint16_t data_off; > > /** > @@ -754,6 +752,7 @@ struct rte_mbuf { > uint8_t nb_segs; /**< Number of segments. */ > uint8_t port; /**< Input port. */ > > + uint16_t buf_len; /**< Length of segment buffer. */ > uint64_t ol_flags; /**< Offload features. */ > > /* remaining bytes are set on RX when pulling packet from > * descriptor > > /Jerin
On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote: > Hi Jerrin, > > > > > Hi All, > > > > I would like align mbuff rearm_data field to 8 byte aligned so that > > write to mbuf->rearm_data with uint64_t* will be naturally aligned. > > I am not sure about IA but some other architecture/implementation has overhead > > in non-naturally aligned stores. > > > > Proposed patch is something like this below, But open for any change to > > make fit for all other architectures/platform. > > > > Any thoughts ? > > > > ➜ [master] [dpdk-master] $ git diff > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > index 529debb..5a917d0 100644 > > --- a/lib/librte_mbuf/rte_mbuf.h > > +++ b/lib/librte_mbuf/rte_mbuf.h > > @@ -733,10 +733,8 @@ struct rte_mbuf { > > void *buf_addr; /**< Virtual address of segment > > buffer. */ > > phys_addr_t buf_physaddr; /**< Physical address of segment > > buffer. */ > > > > - uint16_t buf_len; /**< Length of segment buffer. */ > > - > > > There is no need to move buf_len itself, I think. > Just move rearm_data marker prior to buf_len is enough. > Though how do you suggest to deal with the fact, that right now we blindly > update the whole 64bits pointed by rearm_data: > > drivers/net/ixgbe/ixgbe_rxtx_vec.c: > /* > * Flush mbuf with pkt template. > * Data to be rearmed is 6 bytes long. > * Though, RX will overwrite ol_flags that are coming next > * anyway. So overwrite whole 8 bytes with one load: > * 6 bytes of rearm_data plus first 2 bytes of ol_flags. > */ > p0 = (uintptr_t)&mb0->rearm_data; > *(uint64_t *)p0 = rxq->mbuf_initializer; > > ? > > If buf_len will be inside these 64bits, we can't do it anymore. > > Are you suggesting something like: > > uint64_t *p0, v0; > > p0 = &mb0->rearm_data; > v0 = *p0 & REARM_MASK; > *p0 = v0 | rxq->mbuf_initializer; > ? Due to unaligned rearm_data issue, In ThunderX platform, we need to write multiple half word of aligned stores(so masking was better us). But I think, if we can put 16bit hole between port and ol_flags then we may not need the masking stuff in ixgbe. Right? OR Even better, if we can fill in a uint16_t variable which will replaced later in the flow like "data_len"? and move buf_len at end the first cache line? or any other thoughts to fix unaligned rearm_data issue? Jerin > > If so I wonder what would be the performance impact of that change. > Konstantin > > > > /* next 6 bytes are initialised on RX descriptor rearm */ > > - MARKER8 rearm_data; > > + MARKER64 rearm_data; > > uint16_t data_off; > > > > /** > > @@ -754,6 +752,7 @@ struct rte_mbuf { > > uint8_t nb_segs; /**< Number of segments. */ > > uint8_t port; /**< Input port. */ > > > > + uint16_t buf_len; /**< Length of segment buffer. */ > > uint64_t ol_flags; /**< Offload features. */ > > > > /* remaining bytes are set on RX when pulling packet from > > * descriptor > > > > /Jerin
> > On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote: > > Hi Jerrin, > > > > > > > > Hi All, > > > > > > I would like align mbuff rearm_data field to 8 byte aligned so that > > > write to mbuf->rearm_data with uint64_t* will be naturally aligned. > > > I am not sure about IA but some other architecture/implementation has overhead > > > in non-naturally aligned stores. > > > > > > Proposed patch is something like this below, But open for any change to > > > make fit for all other architectures/platform. > > > > > > Any thoughts ? > > > > > > ➜ [master] [dpdk-master] $ git diff > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > > index 529debb..5a917d0 100644 > > > --- a/lib/librte_mbuf/rte_mbuf.h > > > +++ b/lib/librte_mbuf/rte_mbuf.h > > > @@ -733,10 +733,8 @@ struct rte_mbuf { > > > void *buf_addr; /**< Virtual address of segment > > > buffer. */ > > > phys_addr_t buf_physaddr; /**< Physical address of segment > > > buffer. */ > > > > > > - uint16_t buf_len; /**< Length of segment buffer. */ > > > - > > > > > > There is no need to move buf_len itself, I think. > > Just move rearm_data marker prior to buf_len is enough. > > Though how do you suggest to deal with the fact, that right now we blindly > > update the whole 64bits pointed by rearm_data: > > > > drivers/net/ixgbe/ixgbe_rxtx_vec.c: > > /* > > * Flush mbuf with pkt template. > > * Data to be rearmed is 6 bytes long. > > * Though, RX will overwrite ol_flags that are coming next > > * anyway. So overwrite whole 8 bytes with one load: > > * 6 bytes of rearm_data plus first 2 bytes of ol_flags. > > */ > > p0 = (uintptr_t)&mb0->rearm_data; > > *(uint64_t *)p0 = rxq->mbuf_initializer; > > > > ? > > > > If buf_len will be inside these 64bits, we can't do it anymore. > > > > Are you suggesting something like: > > > > uint64_t *p0, v0; > > > > p0 = &mb0->rearm_data; > > v0 = *p0 & REARM_MASK; > > *p0 = v0 | rxq->mbuf_initializer; > > ? > > Due to unaligned rearm_data issue, In ThunderX platform, we need to write > multiple half word of aligned stores(so masking was better us). Ok, so what would be the gain on ARM if you'll make that change? Again, what would be the drop (if any) on IA? > But I think, if we can put 16bit hole between port and ol_flags then > we may not need the masking stuff in ixgbe. Right? You mean move buf_len somewhere else (end of cacheline0) and introduce a 2B hole between port and ol_flags, right? Yep, that probably wouldn't have any performance impact. > > OR > > Even better, if we can fill in a uint16_t variable which will replaced > later in the flow like "data_len"? data_len is grouped with rx_descriptor_fields1 on purpose - so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16B write. Konstantin > and move buf_len at end the first cache line? >or any other thoughts to fix unaligned rearm_data issue? > > Jerin > > > > > > > If so I wonder what would be the performance impact of that change. > > Konstantin > > > > > > > /* next 6 bytes are initialised on RX descriptor rearm */ > > > - MARKER8 rearm_data; > > > + MARKER64 rearm_data; > > > uint16_t data_off; > > > > > > /** > > > @@ -754,6 +752,7 @@ struct rte_mbuf { > > > uint8_t nb_segs; /**< Number of segments. */ > > > uint8_t port; /**< Input port. */ > > > > > > + uint16_t buf_len; /**< Length of segment buffer. */ > > > uint64_t ol_flags; /**< Offload features. */ > > > > > > /* remaining bytes are set on RX when pulling packet from > > > * descriptor > > > > > > /Jerin
On Thu, May 12, 2016 at 01:14:34PM +0000, Ananyev, Konstantin wrote: > > > > On Thu, May 12, 2016 at 10:07:09AM +0000, Ananyev, Konstantin wrote: > > > Hi Jerrin, > > > > > > > > > > > Hi All, > > > > > > > > I would like align mbuff rearm_data field to 8 byte aligned so that > > > > write to mbuf->rearm_data with uint64_t* will be naturally aligned. > > > > I am not sure about IA but some other architecture/implementation has overhead > > > > in non-naturally aligned stores. > > > > > > > > Proposed patch is something like this below, But open for any change to > > > > make fit for all other architectures/platform. > > > > > > > > Any thoughts ? > > > > > > > > ➜ [master] [dpdk-master] $ git diff > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > > > index 529debb..5a917d0 100644 > > > > --- a/lib/librte_mbuf/rte_mbuf.h > > > > +++ b/lib/librte_mbuf/rte_mbuf.h > > > > @@ -733,10 +733,8 @@ struct rte_mbuf { > > > > void *buf_addr; /**< Virtual address of segment > > > > buffer. */ > > > > phys_addr_t buf_physaddr; /**< Physical address of segment > > > > buffer. */ > > > > > > > > - uint16_t buf_len; /**< Length of segment buffer. */ > > > > - > > > > > > > > > There is no need to move buf_len itself, I think. > > > Just move rearm_data marker prior to buf_len is enough. > > > Though how do you suggest to deal with the fact, that right now we blindly > > > update the whole 64bits pointed by rearm_data: > > > > > > drivers/net/ixgbe/ixgbe_rxtx_vec.c: > > > /* > > > * Flush mbuf with pkt template. > > > * Data to be rearmed is 6 bytes long. > > > * Though, RX will overwrite ol_flags that are coming next > > > * anyway. So overwrite whole 8 bytes with one load: > > > * 6 bytes of rearm_data plus first 2 bytes of ol_flags. > > > */ > > > p0 = (uintptr_t)&mb0->rearm_data; > > > *(uint64_t *)p0 = rxq->mbuf_initializer; > > > > > > ? > > > > > > If buf_len will be inside these 64bits, we can't do it anymore. > > > > > > Are you suggesting something like: > > > > > > uint64_t *p0, v0; > > > > > > p0 = &mb0->rearm_data; > > > v0 = *p0 & REARM_MASK; > > > *p0 = v0 | rxq->mbuf_initializer; > > > ? > > > > Due to unaligned rearm_data issue, In ThunderX platform, we need to write > > multiple half word of aligned stores(so masking was better us). > > Ok, so what would be the gain on ARM if you'll make that change? ~4 cpu cycles per packet.Again it may not be ARM architecture specific as ARM architecture does not define instruction latency so it is more of a implementation specific data. > Again, what would be the drop (if any) on IA? > > > But I think, if we can put 16bit hole between port and ol_flags then > > we may not need the masking stuff in ixgbe. Right? > > You mean move buf_len somewhere else (end of cacheline0) and > introduce a 2B hole between port and ol_flags, right? Yes > Yep, that probably wouldn't have any performance impact. I will try two options and send a patch which don't have any performance impact on IA. > > > > > OR > > > > Even better, if we can fill in a uint16_t variable which will replaced > > later in the flow like "data_len"? > > data_len is grouped with rx_descriptor_fields1 on purpose - > so we can update packet_type,pkt_len, data_len,vlan_tci,rss with one 16B write. OK > > Konstantin > > > and move buf_len at end the first cache line? > >or any other thoughts to fix unaligned rearm_data issue? > > > > Jerin > > > > > > > > > > > > If so I wonder what would be the performance impact of that change. > > > Konstantin > > > > > > > > > > /* next 6 bytes are initialised on RX descriptor rearm */ > > > > - MARKER8 rearm_data; > > > > + MARKER64 rearm_data; > > > > uint16_t data_off; > > > > > > > > /** > > > > @@ -754,6 +752,7 @@ struct rte_mbuf { > > > > uint8_t nb_segs; /**< Number of segments. */ > > > > uint8_t port; /**< Input port. */ > > > > > > > > + uint16_t buf_len; /**< Length of segment buffer. */ > > > > uint64_t ol_flags; /**< Offload features. */ > > > > > > > > /* remaining bytes are set on RX when pulling packet from > > > > * descriptor > > > > > > > > /Jerin
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 529debb..5a917d0 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -733,10 +733,8 @@ struct rte_mbuf { void *buf_addr; /**< Virtual address of segment buffer. */ phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */ - uint16_t buf_len; /**< Length of segment buffer. */ - /* next 6 bytes are initialised on RX descriptor rearm */ - MARKER8 rearm_data; + MARKER64 rearm_data; uint16_t data_off;