Message ID | 7F861DC0615E0C47A872E6F3C5FCDDBD011A9934@BPXM14GP.gisp.nec.co.jp (mailing list archive) |
---|---|
State | Superseded, archived |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 384F55909; Thu, 11 Sep 2014 09:52:50 +0200 (CEST) Received: from tyo202.gate.nec.co.jp (TYO202.gate.nec.co.jp [210.143.35.52]) by dpdk.org (Postfix) with ESMTP id 128495901 for <dev@dpdk.org>; Thu, 11 Sep 2014 09:52:47 +0200 (CEST) Received: from mailgate3.nec.co.jp ([10.7.69.195]) by tyo202.gate.nec.co.jp (8.13.8/8.13.4) with ESMTP id s8B7vxMc013433 for <dev@dpdk.org>; Thu, 11 Sep 2014 16:57:59 +0900 (JST) Received: from mailsv.nec.co.jp (imss62.nec.co.jp [10.7.69.157]) by mailgate3.nec.co.jp (8.11.7/3.7W-MAILGATE-NEC) with ESMTP id s8B7vxx08641 for <dev@dpdk.org>; Thu, 11 Sep 2014 16:57:59 +0900 (JST) Received: from mail02.kamome.nec.co.jp (mail02.kamome.nec.co.jp [10.25.43.5]) by mailsv.nec.co.jp (8.13.8/8.13.4) with ESMTP id s8B7vw8V021635 for <dev@dpdk.org>; Thu, 11 Sep 2014 16:57:58 +0900 (JST) Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.143] [10.38.151.143]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-1866691; Thu, 11 Sep 2014 16:48:32 +0900 Received: from BPXM14GP.gisp.nec.co.jp ([169.254.1.238]) by BPXC15GP.gisp.nec.co.jp ([10.38.151.143]) with mapi id 14.02.0328.011; Thu, 11 Sep 2014 16:48:31 +0900 From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> To: "dev@dpdk.org" <dev@dpdk.org> Thread-Topic: [memnic PATCH 4/7] pmd: use compiler barrier Thread-Index: Ac/NlMCoOHz0IbAaSSqSD302Hm2F/A== Date: Thu, 11 Sep 2014 07:48:31 +0000 Message-ID: <7F861DC0615E0C47A872E6F3C5FCDDBD011A9934@BPXM14GP.gisp.nec.co.jp> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.205.5.123] Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Hayato Momma <h-momma@ce.jp.nec.com> Subject: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Hiroshi Shimamoto
Sept. 11, 2014, 7:48 a.m. UTC
From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> x86 can keep store ordering with standard operations. Using memory barrier is much expensive in main packet processing loop. Removing this improves xmit/recv packet performance. We can see performance improvements with memnic-tester. Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU. size | before | after 64 | 4.18Mpps | 4.59Mpps 128 | 3.85Mpps | 4.87Mpps 256 | 4.01Mpps | 4.72Mpps 512 | 3.52Mpps | 4.41Mpps 1024 | 3.18Mpps | 3.64Mpps 1280 | 2.86Mpps | 3.15Mpps 1518 | 2.59Mpps | 2.87Mpps Note: we have to take care if we use temporal cache. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Reviewed-by: Hayato Momma <h-momma@ce.jp.nec.com> --- pmd/pmd_memnic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Comments
2014-09-11 07:48, Hiroshi Shimamoto: > x86 can keep store ordering with standard operations. Are we sure it's always the case (including old 32-bit CPU)? I would prefer to have a reference here. I know we already discussed this kind of things but having a reference in commit log could help for future discussions. > Using memory barrier is much expensive in main packet processing loop. > Removing this improves xmit/recv packet performance. > > We can see performance improvements with memnic-tester. > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU. > size | before | after > 64 | 4.18Mpps | 4.59Mpps > 128 | 3.85Mpps | 4.87Mpps > 256 | 4.01Mpps | 4.72Mpps > 512 | 3.52Mpps | 4.41Mpps > 1024 | 3.18Mpps | 3.64Mpps > 1280 | 2.86Mpps | 3.15Mpps > 1518 | 2.59Mpps | 2.87Mpps > > Note: we have to take care if we use temporal cache. Please, could you explain this last sentence? Thanks
> Subject: Re: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier > > 2014-09-11 07:48, Hiroshi Shimamoto: > > x86 can keep store ordering with standard operations. > > Are we sure it's always the case (including old 32-bit CPU)? > I would prefer to have a reference here. I know we already discussed > this kind of things but having a reference in commit log could help > for future discussions. > > > Using memory barrier is much expensive in main packet processing loop. > > Removing this improves xmit/recv packet performance. > > > > We can see performance improvements with memnic-tester. > > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU. > > size | before | after > > 64 | 4.18Mpps | 4.59Mpps > > 128 | 3.85Mpps | 4.87Mpps > > 256 | 4.01Mpps | 4.72Mpps > > 512 | 3.52Mpps | 4.41Mpps > > 1024 | 3.18Mpps | 3.64Mpps > > 1280 | 2.86Mpps | 3.15Mpps > > 1518 | 2.59Mpps | 2.87Mpps > > > > Note: we have to take care if we use temporal cache. > > Please, could you explain this last sentence? Oops, I have mistaken the word, "temporal" should be "non-temporal". By the way, there are some instructions which use non-temporal cache liek MOVNTx series. The store ordering of these instructions is not kept. Ref. Intel Software Developer Manual Vol.1 10.4.6.2 Caching of Temporal vs. Non-Temporal Data Vol.3 8.2 Memory Ordering thanks, Hiroshi > > Thanks > -- > Thomas
diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c index 8341da7..c22a14d 100644 --- a/pmd/pmd_memnic.c +++ b/pmd/pmd_memnic.c @@ -316,7 +316,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue, bytes += p->len; drop: - rte_mb(); + rte_compiler_barrier(); p->status = MEMNIC_PKT_ST_FREE; if (++idx >= MEMNIC_NR_PACKET) @@ -403,7 +403,7 @@ retry: pkts++; bytes += pkt_len; - rte_mb(); + rte_compiler_barrier(); p->status = MEMNIC_PKT_ST_FILLED; rte_pktmbuf_free(tx_pkts[nr]);