Message ID | 1464250025-9191-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Thomas Monjalon |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 668DD2B96; Thu, 26 May 2016 10:07:38 +0200 (CEST) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0087.outbound.protection.outlook.com [157.56.110.87]) by dpdk.org (Postfix) with ESMTP id AFA5B2B88 for <dev@dpdk.org>; Thu, 26 May 2016 10:07:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:To:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=drNi3KraWSXCFWU/48QnYGvRL+//2N3Yf4PxI4Z0nLs=; b=NKjKZQo2zPz6bO5pblY35tIJW7EpCvbfvIpklL/WxRQlbYZVYcnrevwQJT8B5zSBol2ITynIm1ZKCJQUE/1NOLLgAsy7rTDG5rpJ6AHTBiDankl+GGC6srIQ60U8SyQxm4HXDSAOYraEYA0hgX6NrQfpeKvadxYqb99vgFindz0= Authentication-Results: dpdk.org; dkim=none (message not signed) header.d=none;dpdk.org; dmarc=none action=none header.from=caviumnetworks.com; Received: from localhost.caveonetworks.com (111.93.218.67) by BY1PR0701MB1724.namprd07.prod.outlook.com (10.162.111.143) with Microsoft SMTP Server (TLS) id 15.1.501.7; Thu, 26 May 2016 08:07:32 +0000 From: Jerin Jacob <jerin.jacob@caviumnetworks.com> To: <dev@dpdk.org> CC: <thomas.monjalon@6wind.com>, <bruce.richardson@intel.com>, <konstantin.ananyev@intel.com>, <olivier.matz@6wind.com>, Jerin Jacob <jerin.jacob@caviumnetworks.com> Date: Thu, 26 May 2016 13:37:05 +0530 Message-ID: <1464250025-9191-1-git-send-email-jerin.jacob@caviumnetworks.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1464101442-10501-1-git-send-email-jerin.jacob@caviumnetworks.com> References: <1464101442-10501-1-git-send-email-jerin.jacob@caviumnetworks.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: MAXPR01CA0053.INDPRD01.PROD.OUTLOOK.COM (10.164.146.153) To BY1PR0701MB1724.namprd07.prod.outlook.com (10.162.111.143) X-MS-Office365-Filtering-Correlation-Id: 72842b1f-5d0f-4157-03ed-08d3853ccb99 X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1724; 2:FvHrWZ7nfwq0KS0P5HTDN6N6IzMjsrOeyMzFYKpoFd+/mxJncJZZXvrQavH1GJVkDk1rvDk8tHeGmxeGsrAaLmSTr4gauEzCoishB8wAE71+oxD+2kCmhMOQB4cn1YpJ4Rf5JK//LZ8kl0l7FNw9Ld62c0EklcYBJjqL4r5oLR6nKx5UskQGYqovTPyfQR4y; 3:2ROyJ7guNXSlHbYA5v/3fKCVfoAvLTYFJLdJn3/9b0EurdLNGOXJfxgTfWLSvAvAVM1oHlfA7w9aEXMs1QL3M6nBDm3DL4eo9e7Ty67rZZkvVQfrwT0e8hGHILGXH+HX; 25:CBF6Cq7YIg2GR7oKoEZwy4re8yPNbDusbqZXzSmEgFWRfWjTwpjPVP6wrBT39qnIwdQALSCxGUmdqg8m+ICobtXEc6KtfyIaH2ClTPAgqSmDrsnK/+kUJ1kF9WUrQdndkX7mHGnn5Cchg8Ntlhr0jbwezwQTo9q4KjTj1ghU5UlqpoiGi4qNFZ5zJK6YZdNeLSLKFVWAAo4wpf2unltWBdYYxN9wRKWd4XBRspmYm19fTVjbxREBo96IxeCy51mlX0AbmnHx6EYzYXcI9/1guM8lZQNJFjOMmoF6FCZV8opZKMqKIMwhu0Ggix27rCpiQd1SVejzVjacDwbg+jcf2I6FalWAWeIHeD7P0fphffhaa96eip3DrCkiCdJ0HGQ+cCrXp8CXxn3ox1JB3fvLXCh3O/ZqnT0MyU3dDsdwxGs= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY1PR0701MB1724; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1724; 20:Zfs4s9pFAH0WsT1FfxgZElKcG5w7SOcRTjcoqQix9hA6tFF0G2qw5xhtJs9S7d7nvfHgJAEKkriUXYy5NFzF0DNK2UgQHQdIHnhkM/5izrgD5F2ssCMcazVytz1GhdYKW183dEoG9zM18tm6AMJiHLk66nGuQigF1plLsh6WYwqamf794d5Ai+KPHgM1AvY+N1Vz8JMiogTDvSHYzYApYZO0mt2EBfbpiCQzP4H5DHIDGRNzkLAb/YjophFIEl6oSMn2Yijh1Rfe8eTaoVEtz+onOgoEoaqjS60QDSGLWWEJ43T6j7I6/6blrSSmJfMAMul96ikzFGdOPWRwNRnQS7iVMmRokl94UHXkffUJkHa10qRckZdQ3YYyuf6AErtU6qjSwDSVkOghAY8/ZepQw86DbfNCoayuHaBOHs9LmaKh7j4V7np81AMc9Gb6IrC/DlhoVFtYQQcdJuA4Be4oHD9OpCz+6TxEygaGEM4X+x6hx0NQ2JvuyTk41S8ZhEoRxzJbcgE8nvZAxqo/3ViSjzGqdL9+6rnaI5FTG1HX0qNBhKAyIASaZrd+fcrybdUk3op8h1uY+4eMe0hw/O/eEAddTnZZyMeCB8fXU4+0Cqc= X-Microsoft-Antispam-PRVS: <BY1PR0701MB172457D195E49BB96C3708A381410@BY1PR0701MB1724.namprd07.prod.outlook.com> X-Exchange-Antispam-Report-Test: UriScan:(131327999870524)(788757137089); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001); SRVR:BY1PR0701MB1724; BCL:0; PCL:0; RULEID:; SRVR:BY1PR0701MB1724; X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1724; 4:Y8LtS8noG6DdyL/q89rI7YyoIWjRmcfy6QOHA1Ya8ZqgwQCNTFub1I7KWU4Xm6XcFWJ3TFdXCyV2HYGYrvu81DUiBNWBOY3Js0Imi6G4VkT84TbBtD03SAcDyaowJ5nborp5SEkerCPsI+BOloMWy4QzUl65NNS4pAfeplNgqg+fvgGR1QqgReIGDbUH78pdur2R4yAQTgB+RMpmJd2YDr/9gE5bncCXt1pPPOhVq/YnYYc3MREENRK5l3uawrIOZOHXRRRJXTtzkYKUsDgJsjzfCM5aGttnh0mTokDsgttIL1K3CDne5DUqSzcs36qpxUWnnl58duD2AmD3VbtHoGQZ2eOmOdiqk6Y7QhzReenm0KkfzBdWCq75DuYGl4/Fv9hE7qwKSdAI5NFkumaCSFe29g3rKYL8LejoCTw+PKJWcPsK7ZapFoha7i9AFgVA X-Forefront-PRVS: 0954EE4910 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(6069001)(5009440100003)(50466002)(5003940100001)(53416004)(8676002)(76176999)(50986999)(4001430100002)(2906002)(48376002)(5004730100002)(42186005)(66066001)(36756003)(81166006)(76506005)(4326007)(47776003)(33646002)(189998001)(110136002)(5008740100001)(107886002)(3846002)(50226002)(2950100001)(6116002)(586003)(2351001)(229853001)(19580405001)(92566002)(77096005)(19580395003)(7099028); DIR:OUT; SFP:1101; SCL:1; SRVR:BY1PR0701MB1724; H:localhost.caveonetworks.com; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BY1PR0701MB1724; 23:adU/Un6yCyKIAeQ+K57DXFVh321ZqcWXDZybJaG?= =?us-ascii?Q?+BglIKHC1KPwsJrY97OMMUky8txCGOM+onoYBr1alvYrZ+atfIdt7fqZDo3m?= =?us-ascii?Q?Z5dCbktDe8c5Wyz19qSFTQXpuXE78ZEHxupsMQeZZ7ArAUQPeZs+F1yzZFHo?= =?us-ascii?Q?J6dLAkEv581YeOL5IZfivtmJXbGUbQnkQtxU8SFSi5FMmX7zkgvVm/JzeigO?= =?us-ascii?Q?lpuIx2fMU8L/S4EXl1HLnrOVPl4ty8DomlYBuvlvwqny+Y8+N7o4mDh6NHqq?= =?us-ascii?Q?KW2O8pFtvlJwdnmbF7UAbL6dXGC7BGaYLydtoZIXdYtf2Nr4gxLvJw7xZ68g?= =?us-ascii?Q?StwvgkjVYo+4qRiRWRcywo/NFSmXAd4oVZYyahc8DzS2FcW9GXaOvq6+NeDN?= =?us-ascii?Q?Exbuu351ZgNDrbjWDXl7RrNaCkNaJ5ooZ+FRPPwN85HwBVuB2ePJECi7QeBL?= =?us-ascii?Q?0woaEDVBPpudg5J+uhA+MObSvJ2knnI9nD5CBl7D6mNqFlKQ5KqVH73LlLWE?= =?us-ascii?Q?hQSXf2OXfWpPEbhd60EVX+1oSxgCVZVOo+kUyqkC7OgAvXGIBVvN4pK+aByP?= =?us-ascii?Q?aOVr9fro2gPBl89+s8pjQGEsJdDMBo0d8bbyMj5JZ6m2X4g977/exxMHaBOY?= =?us-ascii?Q?DIhlCjgNjDEQTMz8vOwDO809GWsd/wSMVxaZXSm/R6dvX1t8I8uRcRLF8v7D?= =?us-ascii?Q?OKVVguYBCmnFYDXy2Mvl3Iamjg+wGXe64KIo6LjemBEaguxv3DxLmSWYtHFP?= =?us-ascii?Q?yhb5tlrJ8YEpT670YOK7NfWvHWYIc0BGbRsKVFwXza4N58XbFNNVT5uDJEwX?= =?us-ascii?Q?DX4fp4sk1aiLurD64ZZyYaxQkWOYf3cBUGPxywDvJdyaAiLxKQk9C88TJ8lr?= =?us-ascii?Q?Y2stH2yoMnwSNoehd41g1yybh5UA4wnIsukJcuUNi5s98w1ZrU00toOoeuJs?= =?us-ascii?Q?DgX6+gPGsE5k+/BC8DPufSG2esqt37FV5WtELc9Y2TV+mdmbtYcdXwd3ID75?= =?us-ascii?Q?x64E5k2j+jaBza+Ma83DQ7mr/02L0rTz9MUmKvWo/R7l8iA=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; BY1PR0701MB1724; 5:kgx5pYWle7eM3p0FDMAz0hrUv/RU2jBdnyei8hwFafOxW8il7uyzJWOw86gp23ZXbeUlKgZRWJQJRZcb+lYinAyaITjo5UXWTH0VbRppUNXcFo3fdsJG0mi7EoZY+hdweGYXsQSRWrKPKx1X6U9Rng==; 24:ehCqrDmH4JSPhPcb54EbafKpN01Kjscitp+gk+73CzbdGo9zy1web2/5/sMZ3xTCFxy/2N23mdHi97HWB0AA9n/qPeg54UvVx0VXYijnnF0=; 7:duliNDJ8sn3mJE7Ja9BY4gK5jeFA9XisN76YO7gIt3l4o8k3/3VtQqLtR6C+CweY6J3D4sb1I7BYwPgi9l6x4eNJSlPTsX25wAR09JO8CqYk2w/Tg3mB8lOBDmYRTRgqvs2K28OUFskD/fUXs4Y9LMB3/ZWBLLnwg9eNhInrDnAompE4fiG6VKm4+2edWdZO SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 May 2016 08:07:32.1893 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY1PR0701MB1724 Subject: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Jerin Jacob
May 26, 2016, 8:07 a.m. UTC
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v1..v2
Corrected the the git commit message(s/mbuf/mempool/g)
---
lib/librte_mempool/rte_mempool.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
Comments
Hi Jerin, On 05/26/2016 10:07 AM, Jerin Jacob wrote: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > v1..v2 > Corrected the the git commit message(s/mbuf/mempool/g) > --- > lib/librte_mempool/rte_mempool.h | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index 60339bd..24876a2 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -73,6 +73,7 @@ > #include <rte_memory.h> > #include <rte_branch_prediction.h> > #include <rte_ring.h> > +#include <rte_memcpy.h> > > #ifdef __cplusplus > extern "C" { > @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > unsigned n, int is_mp) > { > struct rte_mempool_cache *cache; > - uint32_t index; > void **cache_objs; > unsigned lcore_id = rte_lcore_id(); > uint32_t cache_size = mp->cache_size; > @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > */ > > /* Add elements back into the cache */ > - for (index = 0; index < n; ++index, obj_table++) > - cache_objs[index] = *obj_table; > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > cache->len += n; > > I also checked in the get_bulk() function, which looks like that: /* Now fill in the response ... */ for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) *obj_table = cache_objs[len]; I think we could replace it by something like: rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); The only difference is that it won't reverse the pointers in the table, but I don't see any problem with that. What do you think? Regards, Olivier
On Mon, May 30, 2016 at 10:45:11AM +0200, Olivier Matz wrote: > Hi Jerin, > > On 05/26/2016 10:07 AM, Jerin Jacob wrote: > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > v1..v2 > > Corrected the the git commit message(s/mbuf/mempool/g) > > --- > > lib/librte_mempool/rte_mempool.h | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > > index 60339bd..24876a2 100644 > > --- a/lib/librte_mempool/rte_mempool.h > > +++ b/lib/librte_mempool/rte_mempool.h > > @@ -73,6 +73,7 @@ > > #include <rte_memory.h> > > #include <rte_branch_prediction.h> > > #include <rte_ring.h> > > +#include <rte_memcpy.h> > > > > #ifdef __cplusplus > > extern "C" { > > @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > unsigned n, int is_mp) > > { > > struct rte_mempool_cache *cache; > > - uint32_t index; > > void **cache_objs; > > unsigned lcore_id = rte_lcore_id(); > > uint32_t cache_size = mp->cache_size; > > @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > > */ > > > > /* Add elements back into the cache */ > > - for (index = 0; index < n; ++index, obj_table++) > > - cache_objs[index] = *obj_table; > > + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > > > > cache->len += n; > > > > > > I also checked in the get_bulk() function, which looks like that: > > /* Now fill in the response ... */ > for (index = 0, len = cache->len - 1; > index < n; > ++index, len--, obj_table++) > *obj_table = cache_objs[len]; > > I think we could replace it by something like: > > rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); > > The only difference is that it won't reverse the pointers in the > table, but I don't see any problem with that. > > What do you think? In true sense, it will _not_ be LIFO. Not sure about cache usage implications on the specific use cases. Jerin > > > Regards, > Olivier >
Hi Jerin, >>> /* Add elements back into the cache */ >>> - for (index = 0; index < n; ++index, obj_table++) >>> - cache_objs[index] = *obj_table; >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); >>> >>> cache->len += n; >>> >>> >> >> I also checked in the get_bulk() function, which looks like that: >> >> /* Now fill in the response ... */ >> for (index = 0, len = cache->len - 1; >> index < n; >> ++index, len--, obj_table++) >> *obj_table = cache_objs[len]; >> >> I think we could replace it by something like: >> >> rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); >> >> The only difference is that it won't reverse the pointers in the >> table, but I don't see any problem with that. >> >> What do you think? > > In true sense, it will _not_ be LIFO. Not sure about cache usage implications > on the specific use cases. Today, the objects pointers are reversed only in the get(). It means that this code: rte_mempool_get_bulk(mp, table, 4); for (i = 0; i < 4; i++) printf("obj = %p\n", t[i]); rte_mempool_put_bulk(mp, table, 4); printf("-----\n"); rte_mempool_get_bulk(mp, table, 4); for (i = 0; i < 4; i++) printf("obj = %p\n", t[i]); rte_mempool_put_bulk(mp, table, 4); prints: addr1 addr2 addr3 addr4 ----- addr4 addr3 addr2 addr1 Which is quite strange. I don't think it would be an issue to replace the loop by a rte_memcpy(), it may increase the copy speed and it will be more coherent with the put(). Olivier
On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote: > Hi Jerin, Hi Olivier, > > >>> /* Add elements back into the cache */ > >>> - for (index = 0; index < n; ++index, obj_table++) > >>> - cache_objs[index] = *obj_table; > >>> + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); > >>> > >>> cache->len += n; > >>> > >>> > >> > >> I also checked in the get_bulk() function, which looks like that: > >> > >> /* Now fill in the response ... */ > >> for (index = 0, len = cache->len - 1; > >> index < n; > >> ++index, len--, obj_table++) > >> *obj_table = cache_objs[len]; > >> > >> I think we could replace it by something like: > >> > >> rte_memcpy(obj_table, &cache_objs[len - n], sizeof(void *) * n); > >> > >> The only difference is that it won't reverse the pointers in the > >> table, but I don't see any problem with that. > >> > >> What do you think? > > > > In true sense, it will _not_ be LIFO. Not sure about cache usage implications > > on the specific use cases. > > Today, the objects pointers are reversed only in the get(). It means > that this code: > > rte_mempool_get_bulk(mp, table, 4); > for (i = 0; i < 4; i++) > printf("obj = %p\n", t[i]); > rte_mempool_put_bulk(mp, table, 4); > > > printf("-----\n"); > rte_mempool_get_bulk(mp, table, 4); > for (i = 0; i < 4; i++) > printf("obj = %p\n", t[i]); > rte_mempool_put_bulk(mp, table, 4); > > prints: > > addr1 > addr2 > addr3 > addr4 > ----- > addr4 > addr3 > addr2 > addr1 > > Which is quite strange. IMO, It is the expected LIFO behavior. Right ? What is not expected is the following, which is the case after change. Or Am I missing something here? addr1 addr2 addr3 addr4 ----- addr1 addr2 addr3 addr4 > > I don't think it would be an issue to replace the loop by a > rte_memcpy(), it may increase the copy speed and it will be > more coherent with the put(). > > > Olivier
Hi Jerin, On 06/01/2016 09:00 AM, Jerin Jacob wrote: > On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote: >> Today, the objects pointers are reversed only in the get(). It means >> that this code: >> >> rte_mempool_get_bulk(mp, table, 4); >> for (i = 0; i < 4; i++) >> printf("obj = %p\n", t[i]); >> rte_mempool_put_bulk(mp, table, 4); >> >> >> printf("-----\n"); >> rte_mempool_get_bulk(mp, table, 4); >> for (i = 0; i < 4; i++) >> printf("obj = %p\n", t[i]); >> rte_mempool_put_bulk(mp, table, 4); >> >> prints: >> >> addr1 >> addr2 >> addr3 >> addr4 >> ----- >> addr4 >> addr3 >> addr2 >> addr1 >> >> Which is quite strange. > > IMO, It is the expected LIFO behavior. Right ? > > What is not expected is the following, which is the case after change. Or Am I > missing something here? > > addr1 > addr2 > addr3 > addr4 > ----- > addr1 > addr2 > addr3 > addr4 > >> >> I don't think it would be an issue to replace the loop by a >> rte_memcpy(), it may increase the copy speed and it will be >> more coherent with the put(). >> I think the LIFO behavior should occur on a per-bulk basis. I mean, it should behave like in the exemplaes below: // pool cache is in state X obj1 = mempool_get(mp) obj2 = mempool_get(mp) mempool_put(mp, obj2) mempool_put(mp, obj1) // pool cache is back in state X // pool cache is in state X bulk1 = mempool_get_bulk(mp, 16) bulk2 = mempool_get_bulk(mp, 16) mempool_put_bulk(mp, bulk2, 16) mempool_put_bulk(mp, bulk1, 16) // pool cache is back in state X Note that today it's not the case for bulks, since object addresses are reversed only in get(), we are not back in the original state. I don't really see the advantage of this. Removing the reversing may accelerate the cache in case of bulk get, I think. Regards, Olivier
On Thu, Jun 02, 2016 at 09:36:34AM +0200, Olivier MATZ wrote: > Hi Jerin, > > On 06/01/2016 09:00 AM, Jerin Jacob wrote: > > On Tue, May 31, 2016 at 11:05:30PM +0200, Olivier MATZ wrote: > >> Today, the objects pointers are reversed only in the get(). It means > >> that this code: > >> > >> rte_mempool_get_bulk(mp, table, 4); > >> for (i = 0; i < 4; i++) > >> printf("obj = %p\n", t[i]); > >> rte_mempool_put_bulk(mp, table, 4); > >> > >> > >> printf("-----\n"); > >> rte_mempool_get_bulk(mp, table, 4); > >> for (i = 0; i < 4; i++) > >> printf("obj = %p\n", t[i]); > >> rte_mempool_put_bulk(mp, table, 4); > >> > >> prints: > >> > >> addr1 > >> addr2 > >> addr3 > >> addr4 > >> ----- > >> addr4 > >> addr3 > >> addr2 > >> addr1 > >> > >> Which is quite strange. > > > > IMO, It is the expected LIFO behavior. Right ? > > > > What is not expected is the following, which is the case after change. Or Am I > > missing something here? > > > > addr1 > > addr2 > > addr3 > > addr4 > > ----- > > addr1 > > addr2 > > addr3 > > addr4 > > > >> > >> I don't think it would be an issue to replace the loop by a > >> rte_memcpy(), it may increase the copy speed and it will be > >> more coherent with the put(). > >> > > I think the LIFO behavior should occur on a per-bulk basis. I mean, > it should behave like in the exemplaes below: > > // pool cache is in state X > obj1 = mempool_get(mp) > obj2 = mempool_get(mp) > mempool_put(mp, obj2) > mempool_put(mp, obj1) > // pool cache is back in state X > > // pool cache is in state X > bulk1 = mempool_get_bulk(mp, 16) > bulk2 = mempool_get_bulk(mp, 16) > mempool_put_bulk(mp, bulk2, 16) > mempool_put_bulk(mp, bulk1, 16) > // pool cache is back in state X > Per entry LIFO behavior make more sense in _bulk_ case as recently en-queued buffer comes out for next "get" makes more chance that buffer in Last level cache. > Note that today it's not the case for bulks, since object addresses > are reversed only in get(), we are not back in the original state. > I don't really see the advantage of this. > > Removing the reversing may accelerate the cache in case of bulk get, > I think. I tried in my setup, it was dropping the performance. Have you got improvement in any setup? Jerin > > Regards, > Olivier
Hi Jerin, On 06/02/2016 11:39 AM, Jerin Jacob wrote: > On Thu, Jun 02, 2016 at 09:36:34AM +0200, Olivier MATZ wrote: >> I think the LIFO behavior should occur on a per-bulk basis. I mean, >> it should behave like in the exemplaes below: >> >> // pool cache is in state X >> obj1 = mempool_get(mp) >> obj2 = mempool_get(mp) >> mempool_put(mp, obj2) >> mempool_put(mp, obj1) >> // pool cache is back in state X >> >> // pool cache is in state X >> bulk1 = mempool_get_bulk(mp, 16) >> bulk2 = mempool_get_bulk(mp, 16) >> mempool_put_bulk(mp, bulk2, 16) >> mempool_put_bulk(mp, bulk1, 16) >> // pool cache is back in state X >> > > Per entry LIFO behavior make more sense in _bulk_ case as recently en-queued buffer > comes out for next "get" makes more chance that buffer in Last level cache. Yes, from a memory cache perspective, I think you are right. In practise, I'm not sure it's so important because on many hw drivers, a packet buffer returns to the pool only after a round of the tx ring. So I'd say it won't make a big difference here. >> Note that today it's not the case for bulks, since object addresses >> are reversed only in get(), we are not back in the original state. >> I don't really see the advantage of this. >> >> Removing the reversing may accelerate the cache in case of bulk get, >> I think. > > I tried in my setup, it was dropping the performance. Have you got > improvement in any setup? I know that the mempool_perf autotest is not representative of a real use case, but it gives a trend. I did a quick test with - the legacy code, - the rte_memcpy in put() - the rte_memcpy in both put() and get() (no reverse) It seems that removing the reversing brings ~50% of enhancement with bulks of 32 (on an westmere): legacy mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=839922483 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=849792204 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=1617022156 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1675087052 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=3202914713 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=3268725963 rte_memcpy in put() (your patch proposal) mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=762157465 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=744593817 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=1500276326 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1461347942 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=2974076107 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=2928122264 rte_memcpy in put() and get() mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=974834892 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1129329459 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=2147798220 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=2232457625 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=4510816664 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=4582421298 This is probably more a measure of the pure CPU cost of the mempool function, without considering the memory cache aspect. So, of course, a real use-case test should be done to confirm or not that it increases the performance. I'll manage to do a test and let you know the result. By the way, not all drivers are allocating or freeing the mbufs by bulk, so this modification would only affect these ones. What driver are you using for your test? Regards, Olivier
On Thu, Jun 02, 2016 at 11:16:16PM +0200, Olivier MATZ wrote: Hi Olivier, > This is probably more a measure of the pure CPU cost of the mempool > function, without considering the memory cache aspect. So, of course, > a real use-case test should be done to confirm or not that it increases > the performance. I'll manage to do a test and let you know the result. OK IMO, put rte_memcpy makes sense(this patch) as their no behavior change. However, if get rte_memcpy with behavioral changes makes sense some platform then we can enable it on conditional basics(I am OK with that) > > By the way, not all drivers are allocating or freeing the mbufs by > bulk, so this modification would only affect these ones. What driver > are you using for your test? I have tested with ThunderX nicvf pmd(uses the bulk mode). Recently sent out driver in ml for review Jerin > > > Regards, > Olivier > >
Hi Jerin, On 06/03/2016 09:02 AM, Jerin Jacob wrote: > On Thu, Jun 02, 2016 at 11:16:16PM +0200, Olivier MATZ wrote: > Hi Olivier, > >> This is probably more a measure of the pure CPU cost of the mempool >> function, without considering the memory cache aspect. So, of course, >> a real use-case test should be done to confirm or not that it increases >> the performance. I'll manage to do a test and let you know the result. > > OK > > IMO, put rte_memcpy makes sense(this patch) as their no behavior change. > However, if get rte_memcpy with behavioral changes makes sense some platform > then we can enable it on conditional basics(I am OK with that) > >> >> By the way, not all drivers are allocating or freeing the mbufs by >> bulk, so this modification would only affect these ones. What driver >> are you using for your test? > > I have tested with ThunderX nicvf pmd(uses the bulk mode). > Recently sent out driver in ml for review Just to let you know I do not forget this. I still need to find some time to do a performance test. Regards, Olivier
On 06/17/2016 12:40 PM, Olivier Matz wrote: > Hi Jerin, > > On 06/03/2016 09:02 AM, Jerin Jacob wrote: >> On Thu, Jun 02, 2016 at 11:16:16PM +0200, Olivier MATZ wrote: >> Hi Olivier, >> >>> This is probably more a measure of the pure CPU cost of the mempool >>> function, without considering the memory cache aspect. So, of course, >>> a real use-case test should be done to confirm or not that it increases >>> the performance. I'll manage to do a test and let you know the result. >> >> OK >> >> IMO, put rte_memcpy makes sense(this patch) as their no behavior change. >> However, if get rte_memcpy with behavioral changes makes sense some platform >> then we can enable it on conditional basics(I am OK with that) >> >>> >>> By the way, not all drivers are allocating or freeing the mbufs by >>> bulk, so this modification would only affect these ones. What driver >>> are you using for your test? >> >> I have tested with ThunderX nicvf pmd(uses the bulk mode). >> Recently sent out driver in ml for review > > Just to let you know I do not forget this. I still need to > find some time to do a performance test. Quoting from the other thread [1] too to save this in patchwork: [1] http://dpdk.org/ml/archives/dev/2016-June/042701.html > On 06/24/2016 05:56 PM, Hunt, David wrote: >> Hi Jerin, >> >> I just ran a couple of tests on this patch on the latest master head on >> a couple of machines. An older quad socket E5-4650 and a quad socket >> E5-2699 v3 >> >> E5-4650: >> I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the >> cached tests. >> >> E5-2699 v3: >> I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the >> cached tests. >> >> This is purely the autotest comparison, I don't have traffic generator >> results. But based on the above, I don't think there are any performance >> issues with the patch. >> > > Thanks for doing the test on your side. I think it's probably enough > to integrate Jerin's patch . > > About using a rte_memcpy() in the mempool_get(), I don't think I'll have > the time to do a more exhaustive test before the 16.07, so I'll come > back with it later. > > I'm sending an ack on the v2 thread. Acked-by: Olivier Matz <olivier.matz@6wind.com>
2016-05-26 13:37, Jerin Jacob:
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Please Jerin (or anyone else), could you rebase this patch?
Thanks
On Thu, Jun 30, 2016 at 11:41:59AM +0200, Thomas Monjalon wrote: > 2016-05-26 13:37, Jerin Jacob: > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > Please Jerin (or anyone else), could you rebase this patch? OK. I will send the rebased version > Thanks
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 60339bd..24876a2 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -73,6 +73,7 @@ #include <rte_memory.h> #include <rte_branch_prediction.h> #include <rte_ring.h> +#include <rte_memcpy.h> #ifdef __cplusplus extern "C" { @@ -739,7 +740,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, unsigned n, int is_mp) { struct rte_mempool_cache *cache; - uint32_t index; void **cache_objs; unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size; @@ -768,8 +768,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, */ /* Add elements back into the cache */ - for (index = 0; index < n; ++index, obj_table++) - cache_objs[index] = *obj_table; + rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n); cache->len += n;