From patchwork Wed Aug 14 08:54:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 57705 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EF9C31BEA7; Fri, 16 Aug 2019 00:11:04 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 5F1BF1BDAC for ; Wed, 14 Aug 2019 10:54:58 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D52FB344; Wed, 14 Aug 2019 01:54:57 -0700 (PDT) Received: from net-arm-c2400.shanghai.arm.com (net-arm-c2400.shanghai.arm.com [10.169.40.36]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4BD023F694; Wed, 14 Aug 2019 01:54:56 -0700 (PDT) From: Feifei Wang To: dev@dpdk.org Cc: gavin.hu@arm.com, ruifeng.wang@arm.com, phil.yang@arm.com, Honnappa.Nagarahalli@arm.com, nd@arm.com Date: Wed, 14 Aug 2019 16:54:30 +0800 Message-Id: <1565772870-24903-1-git-send-email-feifei.wang@arm.com> X-Mailer: git-send-email 2.7.4 X-Mailman-Approved-At: Fri, 16 Aug 2019 00:11:03 +0200 Subject: [dpdk-dev] [PATCH] examples/l3fwd: prefetch the content of the next packet X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The cache-misses problem is very serious when the function lpm_cb_parse_ptype is called to read the content of packets. That is because the contents of packages previously stored in the cache are overwritten by the following instructions or variables. Thus the prefetch order can be used to prefetch the next packet into the cache to avoid CPU spending too much time on it. On Octeon TX platform with built-in NIC, 12% performance gain was measured by running RFC2544 NDR test with l3fwd. Furthermore, the cache-misses event of the function lpm_cb_parse_ptype was reduced by 20%, and the CPU task-clock of it dropped from 16.49% to 11.3%, based on the forwarding test for one minute with the 64B packet. On the dpaa2 platform, no performance improvement nor drop were seen with this patch by running RFC2544 NDR test with l3fwd. On the x86 platform, 15.7% performance gain was measured by running RFC2544 NDR test with l3fwd. Signed-off-by: Feifei Wang Reviewed-by: Gavin Hu Reviewed-by: Ruifeng Wang Reviewed-by: Phil Yang --- examples/l3fwd/l3fwd_lpm.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index 4143683..a3a65f7 100644 --- a/examples/l3fwd/l3fwd_lpm.c +++ b/examples/l3fwd/l3fwd_lpm.c @@ -400,10 +400,17 @@ lpm_cb_parse_ptype(uint16_t port __rte_unused, uint16_t queue __rte_unused, uint16_t max_pkts __rte_unused, void *user_param __rte_unused) { - unsigned i; - - for (i = 0; i < nb_pkts; ++i) + unsigned int i; + + if (unlikely(nb_pkts == 0)) + return nb_pkts; + rte_prefetch0(rte_pktmbuf_mtod(pkts[0], struct ether_hdr *)); + for (i = 0; i < (unsigned int) (nb_pkts - 1); ++i) { + rte_prefetch0(rte_pktmbuf_mtod(pkts[i+1], + struct ether_hdr *)); lpm_parse_ptype(pkts[i]); + } + lpm_parse_ptype(pkts[i]); return nb_pkts; }