Message ID | 1464101442-10501-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Thomas Monjalon |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id D8B40378E; Tue, 24 May 2016 16:51:25 +0200 (CEST) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0073.outbound.protection.outlook.com [207.46.100.73]) by dpdk.org (Postfix) with ESMTP id C5F3F2A1A for <dev@dpdk.org>; Tue, 24 May 2016 16:51:23 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=rr2JDa0PKqgRxd9f2pr2/S6rtJw+aLWO2x6l9atjsgE=; b=IB8jqFoT6hFsdkbqh8xCNEKV6wN4fXOeYHlrplnuvFErmbDENoPFsOUECETy2AYCAfxdXQrDCJ7AT2YJ/JrFlKRiIi5yMnRtBxoaERu3AhDcj1PJC8V+lQU6l3LZAxa4+BfebtuVyPpvQgfjZiMjKohFzzconprCYgd05u9OpPg= Authentication-Results: dpdk.org; dkim=none (message not signed) header.d=none;dpdk.org; dmarc=none action=none header.from=caviumnetworks.com; Received: from localhost.caveonetworks.com (111.93.218.67) by BLUPR0701MB1713.namprd07.prod.outlook.com (10.163.85.14) with Microsoft SMTP Server (TLS) id 15.1.497.12; Tue, 24 May 2016 14:51:19 +0000 From: Jerin Jacob <jerin.jacob@caviumnetworks.com> To: <dev@dpdk.org> CC: <thomas.monjalon@6wind.com>, <bruce.richardson@intel.com>, <konstantin.ananyev@intel.com>, Jerin Jacob <jerin.jacob@caviumnetworks.com> Date: Tue, 24 May 2016 20:20:42 +0530 Message-ID: <1464101442-10501-1-git-send-email-jerin.jacob@caviumnetworks.com> X-Mailer: git-send-email 2.5.5 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: MA1PR01CA0069.INDPRD01.PROD.OUTLOOK.COM (10.164.116.169) To BLUPR0701MB1713.namprd07.prod.outlook.com (10.163.85.14) X-MS-Office365-Filtering-Correlation-Id: 8cd25fac-d7f0-4e7d-5ec6-08d383e2df08 X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1713; 2:Qi6jXK0DMDfri6n8uZCf1EJdbk74h9r/okUc4HTPFPjJZXleo13ILw474CqG6fJvTyFf04JtdnBjtbEj3N5KNxnyu9bbX0hYfThxqYVCd+E9w6KtsgqcdrUW2stVltz5QBSLO5G1ZEMn/nE16LGtNBke0OchJrf3cfblmaBmfJdRsb2Z9p0JSQi9vqZsoaWS; 3:ETfD57im5K1JaolXX86CmjUkamx4lZkaVKWvZBYlf7VKF6vrcU7P6umWdU4ndY10ifPTZDtv7uZGVkJX40qcHNV7ScVoxbXwqZpOgtnvzIcLcQC5QjOcSgiPIpokU08W; 25:K5DCfx/aYYrlSmb/snD61WfknhfO1k9Uc/Rw0+ZRRuhGeYfBhfJa+qIE9Cpr0CztmEOebKAKdzsI++YzNhMvwP8uiFiqHm0fHCKIM6msKUsWTSisHTxnK35bDZoyxn8oBRLwk2e/5C4oDTMPHWNH8R8DkG3A8Z/xcd1bIZWY71OzsFQyaCXyQJ2cSLQLwK13p9t/QAI+z2+Vc03251GUXfmeqt9HxXUGLmqZtluEawU3+YuJtAq32rJhAeiTADuONb+go0eYPm1sDMXGtvdR5POSTUJFrWAT+mHesjNQ/fzCDZq2pWaia6Sd+QdNQ6MCFU+HyMYxA08xiRz5kOsdnv2Y87ULZZsyVkEf8RZxllGDI+oIlKk/5e19bp/R3MDp8AON4YNs9g11WPIWWhv3UTdZIrpCnfwwpBoN1xv0pNw= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR0701MB1713; X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1713; 20:F4aEGJd6KYcukR1pqt34r03lLzRf8YdKNRQk5+dE8UDmm66sfzZuiTDQNz2igCUpXEd5LQwvcRiDrooLEROiIBLAvtY3rZ2k+NllpG6x5LqW0TKn9D6bJOBsPup5v8bSzyAa6wcF3B4+3TaPa3kAhrQr+lX2pE6jfOv2A7wigAx6/G9w5EINSXCI793cGlYzeURdnxbh72DdgD2Vf9OuR/H4tPimNewMm2C9IKGGe4CMcngq7BQBe9I8PwUVTl/ujPCVEGPeCK0YHJUnicbW6jfdU4XvHK/5xF3Um+s5H7GDlYd8YSU9iz1EPTDbxu80JVM4sIf4c7p11/FL6UCfeiHc1TUTXXaZKNGkAtVu76cCvPy/j9qLUj6gnlw64eOML9Y5RwmscLR2QW6FN+L+r3I94x7FDRfEi1qhp3nV7FjUqorbpKEK2UuFqcpZU9WVCB9+AOyi6IpTN/Pu2di4qOOfwSNAOjbr5NLfQZnKflfhZHIFBxMTE7YaCUeqj9u1593SqSs9xgvXJNyiACODFAcY9n18Wdp0LKiIDgwh/XahB+Dttv1CLN7dLJzCkhY/Q634+LuufcQZ8zUcxegiZKtDbPgDgIBu3Oi/5F7Ve1Q= X-Microsoft-Antispam-PRVS: <BLUPR0701MB1713B3C27AB96E2B7F1487A4814F0@BLUPR0701MB1713.namprd07.prod.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046); SRVR:BLUPR0701MB1713; BCL:0; PCL:0; RULEID:; SRVR:BLUPR0701MB1713; X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1713; 4:qHRmav+lrA1QAznF87eNbll5qbLFpo89jCkIqVOh4j6wxszFd3kldhrRdHmeVGkIZiadFjJfrLqHmlV6h5N6Etv0Rdvg9rNBFA819GvKqpUlyGZ5sYyF7l9BeO/CcmwS5bGCvGVk4cxRHOiyhaJu7yUqnMzPOtUsKEsX5leOpxMsWDJ1nAI1UCZ4/WlLtHWS0e/e98VaLiFyDPoXzqQXzpuIqbP9qnX8W3dmeh1SeuDMdPkqkXeRWermS3+rJbf5LrE752+MOicEi05uI09wIVxv1Q6GOE5qIK49UitfwV/WRS324KiRjQm7tWcFCpWMvBFuCrMZ2k3zCKBRhWkl+QdY3ppxw2+tr2zDIAZ9/RTzo/qT8fX7WRm5OVMPE26j X-Forefront-PRVS: 09525C61DB X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(6069001)(4001430100002)(5009440100003)(77096005)(19580395003)(19580405001)(53416004)(66066001)(36756003)(76506005)(5004730100002)(50986999)(5003940100001)(47776003)(42186005)(586003)(6116002)(3846002)(110136002)(50466002)(229853001)(2351001)(48376002)(81166006)(107886002)(8676002)(4326007)(33646002)(189998001)(2906002)(5008740100001)(92566002)(50226002)(7099028); DIR:OUT; SFP:1101; SCL:1; SRVR:BLUPR0701MB1713; H:localhost.caveonetworks.com; FPR:; SPF:None; MLV:nov; PTR:InfoNoRecords; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BLUPR0701MB1713; 23:aRKO/VhjkXldGIUzSDzWwz2IPud4DtAB4Tel46B?= =?us-ascii?Q?OMy0Ac4bftHO0Ffz8TDiBp5YFs3O6MYpFqgUkj2J3jx0aMBZH9LpFn/19hwV?= =?us-ascii?Q?Kogaug7knotMLOXYh3Q/a1zH1cR1Z+rCaPYhvydlCw8ZwHp4gjK00Aj9Ss6X?= =?us-ascii?Q?tM32y+vhVKhfHM5kaqPLKrdowxcTOQ5RiQIfBNZD8oKNyj3IzxeMw/bP0FK8?= =?us-ascii?Q?WnXlYeNQHQ280Xr7nG06FiUDwkY9llfsl9DvM93sra69w16+fDzG5jvJ14uG?= =?us-ascii?Q?DT1ORYQl01/692Yf8VIb7B/m2ymZQXgPFeAS0NLP8UQb/DcLp8UN8b5cZ4uK?= =?us-ascii?Q?KUZCf2U81dj8uyjOj9gsMaPWoJOWInAZSI0eR8uuXh3vmzzt97b8Jp64NwP9?= =?us-ascii?Q?OPCdDyZRFU5ruWbwrWQYIIu8Gd5TlKDTzqiNDfID4+ablJkK36G5WAb0dSiE?= =?us-ascii?Q?DuGHx7S5NwTeb0Olg2ckuDPoS30VmOeV6TZQVyB5IiFU6NMDcmDKGSTmsn4o?= =?us-ascii?Q?EsKKC4/yvI9dLrTXzwr25RHqge2zEvduq8jTDTvB9Szr+5r7+KAW6cTaZU0W?= =?us-ascii?Q?MbI0WJougGqNYs8I/UoYEK2K39XXawQN+BucCUCxD9xB5MybWyqRPoXsnBUq?= =?us-ascii?Q?2m9P2OJASBFP01EYcgeAvj4jLfneli0R0lBrCHvSPXcSA7pPprPzkfQUUvlt?= =?us-ascii?Q?isjjs2roWIq9jW7w8oS060Wjry0U457K+QVXoClmVq5pt1RvyV1iFfse2vRu?= =?us-ascii?Q?HVdJND6iT0Nnvpc6mPZYWQp4tG4+PjUdCHqcc9dNtB5fLxJ8cXX+KesbcPxu?= =?us-ascii?Q?/6dlWI+WuS5oNZSCnn0vuZrOZpQBz2uOc3okk4MCyJzlUKZaGWdmSd4BmOKH?= =?us-ascii?Q?08uQG6UXXzkBGwkZmXlrCoGyhjWizAwCXk6qhIaLMsVKgToZt45+sjpE6MT4?= =?us-ascii?Q?/qzuqFujHALuIFo5CzJgLIPIjTgfumG7BqQ358a+7uWTPFPiIcKMybm+kHCm?= =?us-ascii?Q?scyI=3D?= X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1713; 5:mXZ5DHVgtus3FK1/I9U6MAdRjMzc61g/nhZ/VY9d8dnZH9zCmMQVsJVz5jphRKB4ETaWH0FYYjwHAxYhVgcp7zbwpuCEHWbCa+eLoStRSs/gOz/hi3FSOIBDSQdbrtOhk9AmyHJ++pT4aR8VN0Og3A==; 24:AMXdwPWWQbdhimRMRXC6b+OrVSpn4GEZ33yON2FPjbB6HnMfsb8lRLLoInKI4kaGKtGAXygI3Hdnj1m8M1aHjzuo0aw4awQmboY1i8nWaeo=; 7:lbSoOFdJbSpOPKhA3uHWRZB4Rzg1fXSgPh7BRSLpOBJ/Xtck4gs8U2aFRkfyrmOnlaFOD3tX81ewxiw5yN6gf4R9BWYTZFAWIgth8Del2xtGhXB6/SKwHeTdOXhlfjqurIdk9j2WdMyP6K3zrGCCIXyYcOoj9vdQestiJa9UpqlUUMD6ilmT/1+K+FUIdRKp SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 May 2016 14:51:19.8805 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR0701MB1713 Subject: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with optimized rte_memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Jerin Jacob
May 24, 2016, 2:50 p.m. UTC
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
lib/librte_mempool/rte_mempool.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
Comments
Hi Jerin, On 05/24/2016 04:50 PM, Jerin Jacob wrote: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > lib/librte_mempool/rte_mempool.h | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index ed2c110..ebe399a 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -74,6 +74,7 @@ > #include <rte_memory.h> > #include <rte_branch_prediction.h> > #include <rte_ring.h> > +#include <rte_memcpy.h> > > #ifdef __cplusplus > extern "C" { > @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > unsigned n, __rte_unused int is_mp) > { > struct rte_mempool_cache *cache; > - uint32_t index; > void **cache_objs; > unsigned lcore_id = rte_lcore_id(); > uint32_t cache_size = mp->cache_size; > @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > */ > > /* Add elements back into the cache */ > - for (index = 0; index < n; ++index, obj_table++) > - cache_objs[index] = *obj_table; > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > cache->len += n; > > The commit title should be "mempool" instead of "mbuf". Are you seeing some performance improvement by using rte_memcpy()? Regards Olivier
On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: > Hi Jerin, > > > On 05/24/2016 04:50 PM, Jerin Jacob wrote: > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > lib/librte_mempool/rte_mempool.h | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > > index ed2c110..ebe399a 100644 > > --- a/lib/librte_mempool/rte_mempool.h > > +++ b/lib/librte_mempool/rte_mempool.h > > @@ -74,6 +74,7 @@ > > #include <rte_memory.h> > > #include <rte_branch_prediction.h> > > #include <rte_ring.h> > > +#include <rte_memcpy.h> > > > > #ifdef __cplusplus > > extern "C" { > > @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > unsigned n, __rte_unused int is_mp) > > { > > struct rte_mempool_cache *cache; > > - uint32_t index; > > void **cache_objs; > > unsigned lcore_id = rte_lcore_id(); > > uint32_t cache_size = mp->cache_size; > > @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > */ > > > > /* Add elements back into the cache */ > > - for (index = 0; index < n; ++index, obj_table++) > > - cache_objs[index] = *obj_table; > > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > > > cache->len += n; > > > > > > The commit title should be "mempool" instead of "mbuf". I will fix it. > Are you seeing some performance improvement by using rte_memcpy()? Yes, In some case, In default case, It was replaced with memcpy by the compiler itself(gcc 5.3). But when I tried external mempool manager patch and then performance dropped almost 800Kpps. Debugging further it turns out that external mempool managers unrelated change was knocking out the memcpy. explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still unknown(In my test setup, packets are in the local cache, so it must be something do with __mempool_put_bulk text alignment change or similar. Anyone else observed performance drop with external poolmanager? Jerin > > Regards > Olivier
On 5/24/2016 4:17 PM, Jerin Jacob wrote: > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: > >> Are you seeing some performance improvement by using rte_memcpy()? > Yes, In some case, In default case, It was replaced with memcpy by the > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > then performance dropped almost 800Kpps. Debugging further it turns out that > external mempool managers unrelated change was knocking out the memcpy. > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > unknown(In my test setup, packets are in the local cache, so it must be > something do with __mempool_put_bulk text alignment change or similar. > > Anyone else observed performance drop with external poolmanager? > > Jerin Jerin, I'm seeing a 300kpps drop in throughput when I apply this on top of the external mempool manager patch. If you're seeing an increase if you apply this patch first, then a drop when applying the mempool manager, the two patches must be conflicting in some way. We probably need to investigate further. Regards, Dave.
On Fri, May 27, 2016 at 11:24:57AM +0100, Hunt, David wrote: > > > On 5/24/2016 4:17 PM, Jerin Jacob wrote: > > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: > > > > > Are you seeing some performance improvement by using rte_memcpy()? > > Yes, In some case, In default case, It was replaced with memcpy by the > > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > > then performance dropped almost 800Kpps. Debugging further it turns out that > > external mempool managers unrelated change was knocking out the memcpy. > > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > > unknown(In my test setup, packets are in the local cache, so it must be > > something do with __mempool_put_bulk text alignment change or similar. > > > > Anyone else observed performance drop with external poolmanager? > > > > Jerin > > Jerin, > I'm seeing a 300kpps drop in throughput when I apply this on top of the > external > mempool manager patch. If you're seeing an increase if you apply this patch > first, then > a drop when applying the mempool manager, the two patches must be > conflicting in > some way. We probably need to investigate further. In general, My concern is that most probably this patch also will get dropped on floor due to conflit in different architecture and some architecture/platform need to maintain this out out tree. Unlike other projects, DPDK modules are hand optimized due do that some change are depended register allocations and compiler version and text alignment etc. IMHO, I think we should have means to abstract this _logical_ changes under conditional compilation flags and any arch/platform can choose to select what it suites better for that arch/platform. We may NOT need to have frequent patches to select the specific configuration, but logical patches under compilation flags can be accepted and each arch/platform can choose specific set configuration when we make the final release candidate for the release. Any thoughts? Jerin
On 5/27/2016 11:24 AM, Hunt, David wrote: > > > On 5/24/2016 4:17 PM, Jerin Jacob wrote: >> On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: >> >>> Are you seeing some performance improvement by using rte_memcpy()? >> Yes, In some case, In default case, It was replaced with memcpy by the >> compiler itself(gcc 5.3). But when I tried external mempool manager >> patch and >> then performance dropped almost 800Kpps. Debugging further it turns >> out that >> external mempool managers unrelated change was knocking out the memcpy. >> explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still >> unknown(In my test setup, packets are in the local cache, so it must be >> something do with __mempool_put_bulk text alignment change or similar. >> >> Anyone else observed performance drop with external poolmanager? >> >> Jerin > > Jerin, > I'm seeing a 300kpps drop in throughput when I apply this on top > of the external > mempool manager patch. If you're seeing an increase if you apply this > patch first, then > a drop when applying the mempool manager, the two patches must be > conflicting in > some way. We probably need to investigate further. > Regards, > Dave. > On further investigation, I now have a setup with no performance degradation. My previous tests were accessing the NICS on a different NUMA node. Once I initiated testPMD with the correct coremask, the difference between pre and post rte_memcpy patch is negligible (maybe 0.1% drop). Regards, Dave.
2016-05-27 17:12, Jerin Jacob: > IMHO, I think we should have means to abstract this _logical_ changes > under conditional compilation flags and any arch/platform can choose > to select what it suites better for that arch/platform. > > We may NOT need to have frequent patches to select the specific > configuration, but logical patches under compilation flags can be accepted and > each arch/platform can choose specific set configuration when we make > the final release candidate for the release. > > Any thoughts? Yes having some #ifdefs for arch configuration may be reasonnable. But other methods must be preffered first: 1/ try implementing the function in arch-specific files 2/ and check at runtime if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_X 3/ or check #ifdef RTE_MACHINE_CPUFLAG_X 4/ or check #ifdef RTE_ARCH_Y 5/ or check a specific #ifdef RTE_FEATURE_NAME to choose in config files The option 2 is a nice to have which implies other options. Maybe that doc/guides/contributing/design.rst needs to be updated.
On 05/27/2016 05:05 PM, Thomas Monjalon wrote: > 2016-05-27 17:12, Jerin Jacob: >> IMHO, I think we should have means to abstract this _logical_ changes >> under conditional compilation flags and any arch/platform can choose >> to select what it suites better for that arch/platform. >> >> We may NOT need to have frequent patches to select the specific >> configuration, but logical patches under compilation flags can be accepted and >> each arch/platform can choose specific set configuration when we make >> the final release candidate for the release. >> >> Any thoughts? > > Yes having some #ifdefs for arch configuration may be reasonnable. > But other methods must be preffered first: > 1/ try implementing the function in arch-specific files I agree with Thomas. This option should be preferred, and I think we should avoid as much as possible to have: #if ARCH1 do stuff optimized for arch1 #elif ARCH2 do the same stuff optimized for arch2 #else ... In this particular case, rte_memcpy() seems to be the appropriate function, because it should already be arch-optimized. > 2/ and check at runtime if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_X > 3/ or check #ifdef RTE_MACHINE_CPUFLAG_X > 4/ or check #ifdef RTE_ARCH_Y > 5/ or check a specific #ifdef RTE_FEATURE_NAME to choose in config files > > The option 2 is a nice to have which implies other options. > > Maybe that doc/guides/contributing/design.rst needs to be updated. Regards, Olivier
Hi Jerin, I just ran a couple of tests on this patch on the latest master head on a couple of machines. An older quad socket E5-4650 and a quad socket E5-2699 v3 E5-4650: I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the cached tests. E5-2699 v3: I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the cached tests. This is purely the autotest comparison, I don't have traffic generator results. But based on the above, I don't think there are any performance issues with the patch. Regards, Dave. On 24/5/2016 4:17 PM, Jerin Jacob wrote: > On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote: >> Hi Jerin, >> >> >> On 05/24/2016 04:50 PM, Jerin Jacob wrote: >>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> >>> --- >>> lib/librte_mempool/rte_mempool.h | 5 ++--- >>> 1 file changed, 2 insertions(+), 3 deletions(-) >>> >>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h >>> index ed2c110..ebe399a 100644 >>> --- a/lib/librte_mempool/rte_mempool.h >>> +++ b/lib/librte_mempool/rte_mempool.h >>> @@ -74,6 +74,7 @@ >>> #include <rte_memory.h> >>> #include <rte_branch_prediction.h> >>> #include <rte_ring.h> >>> +#include <rte_memcpy.h> >>> >>> #ifdef __cplusplus >>> extern "C" { >>> @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, >>> unsigned n, __rte_unused int is_mp) >>> { >>> struct rte_mempool_cache *cache; >>> - uint32_t index; >>> void **cache_objs; >>> unsigned lcore_id = rte_lcore_id(); >>> uint32_t cache_size = mp->cache_size; >>> @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, >>> */ >>> >>> /* Add elements back into the cache */ >>> - for (index = 0; index < n; ++index, obj_table++) >>> - cache_objs[index] = *obj_table; >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); >>> >>> cache->len += n; >>> >>> >> The commit title should be "mempool" instead of "mbuf". > I will fix it. > >> Are you seeing some performance improvement by using rte_memcpy()? > Yes, In some case, In default case, It was replaced with memcpy by the > compiler itself(gcc 5.3). But when I tried external mempool manager patch and > then performance dropped almost 800Kpps. Debugging further it turns out that > external mempool managers unrelated change was knocking out the memcpy. > explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still > unknown(In my test setup, packets are in the local cache, so it must be > something do with __mempool_put_bulk text alignment change or similar. > > Anyone else observed performance drop with external poolmanager? > > Jerin > >> Regards >> Olivier
Hi Dave, On 06/24/2016 05:56 PM, Hunt, David wrote: > Hi Jerin, > > I just ran a couple of tests on this patch on the latest master head on > a couple of machines. An older quad socket E5-4650 and a quad socket > E5-2699 v3 > > E5-4650: > I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the > cached tests. > > E5-2699 v3: > I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the > cached tests. > > This is purely the autotest comparison, I don't have traffic generator > results. But based on the above, I don't think there are any performance > issues with the patch. > Thanks for doing the test on your side. I think it's probably enough to integrate Jerin's patch . About using a rte_memcpy() in the mempool_get(), I don't think I'll have the time to do a more exhaustive test before the 16.07, so I'll come back with it later. I'm sending an ack on the v2 thread.
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index ed2c110..ebe399a 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -74,6 +74,7 @@ #include <rte_memory.h> #include <rte_branch_prediction.h> #include <rte_ring.h> +#include <rte_memcpy.h> #ifdef __cplusplus extern "C" { @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n, __rte_unused int is_mp) { struct rte_mempool_cache *cache; - uint32_t index; void **cache_objs; unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size; @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, */ /* Add elements back into the cache */ - for (index = 0; index < n; ++index, obj_table++) - cache_objs[index] = *obj_table; + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); cache->len += n;