From patchwork Fri Jun 17 07:42:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rahul Bhansali X-Patchwork-Id: 112970 X-Patchwork-Delegate: gakhil@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B2667A0093; Fri, 17 Jun 2022 09:43:03 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8961C40DDD; Fri, 17 Jun 2022 09:43:03 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id E6D0340698 for ; Fri, 17 Jun 2022 09:43:01 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25H3v4Qg006665; Fri, 17 Jun 2022 00:42:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=Sf3RVznT+lUraOJoBvgVO480f+NdB7dBUgNJBNKyzYg=; b=WagywXkgKUbqs4vdJhgChI4TPFb+h5UYb1D9jbG6ldOV3ZoRiSsfZHrb5FSDcIPRFZRf BM/W0DwHKMMJJBfXWdrdDFOVq1q+DXRHxkf7ugUusojKYnl+e/iNOrjbm8jAgl8e1AuM Mt1jOsX9bNSwB9WY4fQ9SkernlbpKEDVhZ7OJdQvBm0eCPMsbhmKT+3uwtvxC9V4JSuv RRX0ojR6p/LFIlDiKwcR3ESMsQ+oRAfIcst+uzAog8Ndhi1CJKyQ1MgO8zvlG5evFlSU hAbmotmAVZonyOB8vR+YVFbgYm6tZNNbziyExpPgteyCBEEelotHjVlDbHw7VPITq30R rw== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3grj5h8rm3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Fri, 17 Jun 2022 00:42:57 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 17 Jun 2022 00:42:56 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Fri, 17 Jun 2022 00:42:56 -0700 Received: from localhost.localdomain (unknown [10.28.48.107]) by maili.marvell.com (Postfix) with ESMTP id C50BF3F7051; Fri, 17 Jun 2022 00:42:54 -0700 (PDT) From: Rahul Bhansali To: , Ruifeng Wang CC: , Rahul Bhansali Subject: [PATCH v2 1/2] examples/l3fwd: common packet group functionality Date: Fri, 17 Jun 2022 13:12:40 +0530 Message-ID: <20220617074241.3260496-1-rbhansali@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220524095717.3875284-1-rbhansali@marvell.com> References: <20220524095717.3875284-1-rbhansali@marvell.com> MIME-Version: 1.0 X-Proofpoint-GUID: vBqFtKK2NfUJCawRftX_YCiVxJbbt3Vt X-Proofpoint-ORIG-GUID: vBqFtKK2NfUJCawRftX_YCiVxJbbt3Vt X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-17_07,2022-06-16_01,2022-02-23_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org This will make the packet grouping function common, so that other examples can utilize as per need. Signed-off-by: Rahul Bhansali Acked-by: Akhil Goyal --- Changes in v2: New patch to address review comment. examples/common/neon_common.h | 50 ++++++++++++ examples/common/pkt_group.h | 139 ++++++++++++++++++++++++++++++++++ examples/l3fwd/Makefile | 5 +- examples/l3fwd/l3fwd.h | 2 - examples/l3fwd/l3fwd_common.h | 129 +------------------------------ examples/l3fwd/l3fwd_neon.h | 43 +---------- examples/meson.build | 2 +- 7 files changed, 198 insertions(+), 172 deletions(-) create mode 100644 examples/common/neon_common.h create mode 100644 examples/common/pkt_group.h -- 2.25.1 diff --git a/examples/common/neon_common.h b/examples/common/neon_common.h new file mode 100644 index 0000000000..f01b5ab6bc --- /dev/null +++ b/examples/common/neon_common.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2016-2018 Intel Corporation. + * Copyright(c) 2017-2018 Linaro Limited. + * Copyright(C) 2022 Marvell. + */ + +#ifndef _NEON_COMMON_H_ +#define _NEON_COMMON_H_ + +#include "pkt_group.h" + +/* + * Group consecutive packets with the same destination port in bursts of 4. + * Suppose we have array of destination ports: + * dst_port[] = {a, b, c, d,, e, ... } + * dp1 should contain: , dp2: . + * We doing 4 comparisons at once and the result is 4 bit mask. + * This mask is used as an index into prebuild array of pnum values. + */ +static inline uint16_t * +neon_port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, + uint16x8_t dp2) +{ + union { + uint16_t u16[FWDSTEP + 1]; + uint64_t u64; + } *pnum = (void *)pn; + + uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0}; + int32_t v; + + dp1 = vceqq_u16(dp1, dp2); + dp1 = vandq_u16(dp1, mask); + v = vaddvq_u16(dp1); + + /* update last port counter. */ + lp[0] += gptbl[v].lpv; + rte_compiler_barrier(); + + /* if dest port value has changed. */ + if (v != GRPMSK) { + pnum->u64 = gptbl[v].pnum; + pnum->u16[FWDSTEP] = 1; + lp = pnum->u16 + gptbl[v].idx; + } + + return lp; +} + +#endif /* _NEON_COMMON_H_ */ diff --git a/examples/common/pkt_group.h b/examples/common/pkt_group.h new file mode 100644 index 0000000000..8b26d9380f --- /dev/null +++ b/examples/common/pkt_group.h @@ -0,0 +1,139 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2016-2018 Intel Corporation. + * Copyright(c) 2017-2018 Linaro Limited. + * Copyright(C) 2022 Marvell. + */ + +#ifndef _PKT_GROUP_H_ +#define _PKT_GROUP_H_ + +#define FWDSTEP 4 + +/* + * Group consecutive packets with the same destination port into one burst. + * To avoid extra latency this is done together with some other packet + * processing, but after we made a final decision about packet's destination. + * To do this we maintain: + * pnum - array of number of consecutive packets with the same dest port for + * each packet in the input burst. + * lp - pointer to the last updated element in the pnum. + * dlp - dest port value lp corresponds to. + */ + +#define GRPSZ (1 << FWDSTEP) +#define GRPMSK (GRPSZ - 1) + +#define GROUP_PORT_STEP(dlp, dcp, lp, pn, idx) do { \ + if (likely((dlp) == (dcp)[(idx)])) { \ + (lp)[0]++; \ + } else { \ + (dlp) = (dcp)[idx]; \ + (lp) = (pn) + (idx); \ + (lp)[0] = 1; \ + } \ +} while (0) + +static const struct { + uint64_t pnum; /* prebuild 4 values for pnum[]. */ + int32_t idx; /* index for new last updated elemnet. */ + uint16_t lpv; /* add value to the last updated element. */ +} gptbl[GRPSZ] = { + { + /* 0: a != b, b != c, c != d, d != e */ + .pnum = UINT64_C(0x0001000100010001), + .idx = 4, + .lpv = 0, + }, + { + /* 1: a == b, b != c, c != d, d != e */ + .pnum = UINT64_C(0x0001000100010002), + .idx = 4, + .lpv = 1, + }, + { + /* 2: a != b, b == c, c != d, d != e */ + .pnum = UINT64_C(0x0001000100020001), + .idx = 4, + .lpv = 0, + }, + { + /* 3: a == b, b == c, c != d, d != e */ + .pnum = UINT64_C(0x0001000100020003), + .idx = 4, + .lpv = 2, + }, + { + /* 4: a != b, b != c, c == d, d != e */ + .pnum = UINT64_C(0x0001000200010001), + .idx = 4, + .lpv = 0, + }, + { + /* 5: a == b, b != c, c == d, d != e */ + .pnum = UINT64_C(0x0001000200010002), + .idx = 4, + .lpv = 1, + }, + { + /* 6: a != b, b == c, c == d, d != e */ + .pnum = UINT64_C(0x0001000200030001), + .idx = 4, + .lpv = 0, + }, + { + /* 7: a == b, b == c, c == d, d != e */ + .pnum = UINT64_C(0x0001000200030004), + .idx = 4, + .lpv = 3, + }, + { + /* 8: a != b, b != c, c != d, d == e */ + .pnum = UINT64_C(0x0002000100010001), + .idx = 3, + .lpv = 0, + }, + { + /* 9: a == b, b != c, c != d, d == e */ + .pnum = UINT64_C(0x0002000100010002), + .idx = 3, + .lpv = 1, + }, + { + /* 0xa: a != b, b == c, c != d, d == e */ + .pnum = UINT64_C(0x0002000100020001), + .idx = 3, + .lpv = 0, + }, + { + /* 0xb: a == b, b == c, c != d, d == e */ + .pnum = UINT64_C(0x0002000100020003), + .idx = 3, + .lpv = 2, + }, + { + /* 0xc: a != b, b != c, c == d, d == e */ + .pnum = UINT64_C(0x0002000300010001), + .idx = 2, + .lpv = 0, + }, + { + /* 0xd: a == b, b != c, c == d, d == e */ + .pnum = UINT64_C(0x0002000300010002), + .idx = 2, + .lpv = 1, + }, + { + /* 0xe: a != b, b == c, c == d, d == e */ + .pnum = UINT64_C(0x0002000300040001), + .idx = 1, + .lpv = 0, + }, + { + /* 0xf: a == b, b == c, c == d, d == e */ + .pnum = UINT64_C(0x0002000300040005), + .idx = 0, + .lpv = 4, + }, +}; + +#endif /* _PKT_GROUP_H_ */ diff --git a/examples/l3fwd/Makefile b/examples/l3fwd/Makefile index 8efe6378e2..8dbe85c2e6 100644 --- a/examples/l3fwd/Makefile +++ b/examples/l3fwd/Makefile @@ -22,6 +22,7 @@ shared: build/$(APP)-shared static: build/$(APP)-static ln -sf $(APP)-static build/$(APP) +INCLUDES =-I../common PC_FILE := $(shell $(PKGCONF) --path libdpdk 2>/dev/null) CFLAGS += -O3 $(shell $(PKGCONF) --cflags libdpdk) # Added for 'rte_eth_link_to_str()' @@ -38,10 +39,10 @@ endif endif build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build - $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) + $(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build - $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) + $(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) build: @mkdir -p $@ diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index 8a52c90755..40b5f32a9e 100644 --- a/examples/l3fwd/l3fwd.h +++ b/examples/l3fwd/l3fwd.h @@ -44,8 +44,6 @@ /* Used to mark destination port as 'invalid'. */ #define BAD_PORT ((uint16_t)-1) -#define FWDSTEP 4 - /* replace first 12B of the ethernet header. */ #define MASK_ETH 0x3f diff --git a/examples/l3fwd/l3fwd_common.h b/examples/l3fwd/l3fwd_common.h index 8e4c27218f..224b1c08e8 100644 --- a/examples/l3fwd/l3fwd_common.h +++ b/examples/l3fwd/l3fwd_common.h @@ -7,6 +7,8 @@ #ifndef _L3FWD_COMMON_H_ #define _L3FWD_COMMON_H_ +#include "pkt_group.h" + #ifdef DO_RFC_1812_CHECKS #define IPV4_MIN_VER_IHL 0x45 @@ -50,133 +52,6 @@ rfc1812_process(struct rte_ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype) #define rfc1812_process(mb, dp, ptype) do { } while (0) #endif /* DO_RFC_1812_CHECKS */ -/* - * We group consecutive packets with the same destination port into one burst. - * To avoid extra latency this is done together with some other packet - * processing, but after we made a final decision about packet's destination. - * To do this we maintain: - * pnum - array of number of consecutive packets with the same dest port for - * each packet in the input burst. - * lp - pointer to the last updated element in the pnum. - * dlp - dest port value lp corresponds to. - */ - -#define GRPSZ (1 << FWDSTEP) -#define GRPMSK (GRPSZ - 1) - -#define GROUP_PORT_STEP(dlp, dcp, lp, pn, idx) do { \ - if (likely((dlp) == (dcp)[(idx)])) { \ - (lp)[0]++; \ - } else { \ - (dlp) = (dcp)[idx]; \ - (lp) = (pn) + (idx); \ - (lp)[0] = 1; \ - } \ -} while (0) - -static const struct { - uint64_t pnum; /* prebuild 4 values for pnum[]. */ - int32_t idx; /* index for new last updated element. */ - uint16_t lpv; /* add value to the last updated element. */ -} gptbl[GRPSZ] = { - { - /* 0: a != b, b != c, c != d, d != e */ - .pnum = UINT64_C(0x0001000100010001), - .idx = 4, - .lpv = 0, - }, - { - /* 1: a == b, b != c, c != d, d != e */ - .pnum = UINT64_C(0x0001000100010002), - .idx = 4, - .lpv = 1, - }, - { - /* 2: a != b, b == c, c != d, d != e */ - .pnum = UINT64_C(0x0001000100020001), - .idx = 4, - .lpv = 0, - }, - { - /* 3: a == b, b == c, c != d, d != e */ - .pnum = UINT64_C(0x0001000100020003), - .idx = 4, - .lpv = 2, - }, - { - /* 4: a != b, b != c, c == d, d != e */ - .pnum = UINT64_C(0x0001000200010001), - .idx = 4, - .lpv = 0, - }, - { - /* 5: a == b, b != c, c == d, d != e */ - .pnum = UINT64_C(0x0001000200010002), - .idx = 4, - .lpv = 1, - }, - { - /* 6: a != b, b == c, c == d, d != e */ - .pnum = UINT64_C(0x0001000200030001), - .idx = 4, - .lpv = 0, - }, - { - /* 7: a == b, b == c, c == d, d != e */ - .pnum = UINT64_C(0x0001000200030004), - .idx = 4, - .lpv = 3, - }, - { - /* 8: a != b, b != c, c != d, d == e */ - .pnum = UINT64_C(0x0002000100010001), - .idx = 3, - .lpv = 0, - }, - { - /* 9: a == b, b != c, c != d, d == e */ - .pnum = UINT64_C(0x0002000100010002), - .idx = 3, - .lpv = 1, - }, - { - /* 0xa: a != b, b == c, c != d, d == e */ - .pnum = UINT64_C(0x0002000100020001), - .idx = 3, - .lpv = 0, - }, - { - /* 0xb: a == b, b == c, c != d, d == e */ - .pnum = UINT64_C(0x0002000100020003), - .idx = 3, - .lpv = 2, - }, - { - /* 0xc: a != b, b != c, c == d, d == e */ - .pnum = UINT64_C(0x0002000300010001), - .idx = 2, - .lpv = 0, - }, - { - /* 0xd: a == b, b != c, c == d, d == e */ - .pnum = UINT64_C(0x0002000300010002), - .idx = 2, - .lpv = 1, - }, - { - /* 0xe: a != b, b == c, c == d, d == e */ - .pnum = UINT64_C(0x0002000300040001), - .idx = 1, - .lpv = 0, - }, - { - /* 0xf: a == b, b == c, c == d, d == e */ - .pnum = UINT64_C(0x0002000300040005), - .idx = 0, - .lpv = 4, - }, -}; - static __rte_always_inline void send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[], uint32_t num) diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h index e3d33a5229..5fa765b640 100644 --- a/examples/l3fwd/l3fwd_neon.h +++ b/examples/l3fwd/l3fwd_neon.h @@ -7,6 +7,7 @@ #define _L3FWD_NEON_H_ #include "l3fwd.h" +#include "neon_common.h" #include "l3fwd_common.h" /* @@ -62,44 +63,6 @@ processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP]) &dst_port[3], pkt[3]->packet_type); } -/* - * Group consecutive packets with the same destination port in bursts of 4. - * Suppose we have array of destination ports: - * dst_port[] = {a, b, c, d,, e, ... } - * dp1 should contain: , dp2: . - * We doing 4 comparisons at once and the result is 4 bit mask. - * This mask is used as an index into prebuild array of pnum values. - */ -static inline uint16_t * -port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1, - uint16x8_t dp2) -{ - union { - uint16_t u16[FWDSTEP + 1]; - uint64_t u64; - } *pnum = (void *)pn; - - int32_t v; - uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0}; - - dp1 = vceqq_u16(dp1, dp2); - dp1 = vandq_u16(dp1, mask); - v = vaddvq_u16(dp1); - - /* update last port counter. */ - lp[0] += gptbl[v].lpv; - rte_compiler_barrier(); - - /* if dest port value has changed. */ - if (v != GRPMSK) { - pnum->u64 = gptbl[v].pnum; - pnum->u16[FWDSTEP] = 1; - lp = pnum->u16 + gptbl[v].idx; - } - - return lp; -} - /** * Process one packet: * Update source and destination MAC addresses in the ethernet header. @@ -161,7 +124,7 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst, * */ dp2 = vld1q_u16(&dst_port[j - FWDSTEP + 1]); - lp = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2); + lp = neon_port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2); /* * dp1: @@ -175,7 +138,7 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst, */ dp2 = vextq_u16(dp1, dp1, 1); dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3); - lp = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2); + lp = neon_port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2); /* * remove values added by the last repeated diff --git a/examples/meson.build b/examples/meson.build index 78de0e1f37..81e93799f2 100644 --- a/examples/meson.build +++ b/examples/meson.build @@ -97,7 +97,7 @@ foreach example: examples ldflags = default_ldflags ext_deps = [] - includes = [include_directories(example)] + includes = [include_directories(example, 'common')] deps = ['eal', 'mempool', 'net', 'mbuf', 'ethdev', 'cmdline'] subdir(example) From patchwork Fri Jun 17 07:42:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rahul Bhansali X-Patchwork-Id: 112971 X-Patchwork-Delegate: gakhil@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7F8B1A0093; Fri, 17 Jun 2022 09:43:14 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 73DA441181; Fri, 17 Jun 2022 09:43:14 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id F2EFD41148 for ; Fri, 17 Jun 2022 09:43:12 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25H3vrBi007566; Fri, 17 Jun 2022 00:43:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=vCjrQlB/4S70gYQwTXLI0ZVPF6p/IrkTvy1OLh0y3Os=; b=WjvG8iLsoYsLS7TM1OMA7gIFU97S/c+Z2nsCT8+4Tc8tkQzq1y1bQgZxwoC6b3TcRqpz RRz8gQ+Oc0BhGqS+QSBvypb1SqrLeEyabJ7Z1spBFPfsfBBKswU76DPTqdFdb5mADyUR iUeM2HtSbQeSXVGoDna66hESKKIOFV1IXabIFn9B6Ju13X7+v9kIeeMJDZfsDfLJo3Yw LSQk4lQ41c/miYfggqgSs/fiMSb4XjKzihgMmKWvnjwYihuZ/12tQ3kupYrNia093LwE V/6Ls8LL9oTHh0nwTIBpAQrQ8V+OGe9pcFkALOl8KRGy2+t4P8duwRgFuxa/ovOMlec2 sw== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3grj5h8rnc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Fri, 17 Jun 2022 00:43:08 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 17 Jun 2022 00:43:06 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Fri, 17 Jun 2022 00:43:06 -0700 Received: from localhost.localdomain (unknown [10.28.48.107]) by maili.marvell.com (Postfix) with ESMTP id EF5533F7057; Fri, 17 Jun 2022 00:43:04 -0700 (PDT) From: Rahul Bhansali To: , Radu Nicolau , Akhil Goyal , Ruifeng Wang CC: , Rahul Bhansali Subject: [PATCH v2 2/2] examples/ipsec-secgw: add support of NEON with poll mode Date: Fri, 17 Jun 2022 13:12:41 +0530 Message-ID: <20220617074241.3260496-2-rbhansali@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220617074241.3260496-1-rbhansali@marvell.com> References: <20220524095717.3875284-1-rbhansali@marvell.com> <20220617074241.3260496-1-rbhansali@marvell.com> MIME-Version: 1.0 X-Proofpoint-GUID: obx1eXx9XKgXZqFzhb_HZJFG3XD3hff2 X-Proofpoint-ORIG-GUID: obx1eXx9XKgXZqFzhb_HZJFG3XD3hff2 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-17_07,2022-06-16_01,2022-02-23_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org This adds the support of NEON based lpm lookup along with multi packet processing for burst send in packets routing. Performance impact: On cn10k, with poll mode inline protocol, outbound performance increased by upto ~8% and inbound performance increased by upto ~6%. Signed-off-by: Rahul Bhansali Acked-by: Akhil Goyal --- Changes in v2: Removed Neon packet grouping function and used the common one. examples/ipsec-secgw/Makefile | 5 +- examples/ipsec-secgw/ipsec-secgw.c | 25 ++ examples/ipsec-secgw/ipsec_lpm_neon.h | 213 +++++++++++++++++ examples/ipsec-secgw/ipsec_neon.h | 321 ++++++++++++++++++++++++++ examples/ipsec-secgw/ipsec_worker.c | 9 + 5 files changed, 571 insertions(+), 2 deletions(-) create mode 100644 examples/ipsec-secgw/ipsec_lpm_neon.h create mode 100644 examples/ipsec-secgw/ipsec_neon.h -- 2.25.1 diff --git a/examples/ipsec-secgw/Makefile b/examples/ipsec-secgw/Makefile index 89af54bd37..ffe232774d 100644 --- a/examples/ipsec-secgw/Makefile +++ b/examples/ipsec-secgw/Makefile @@ -36,6 +36,7 @@ shared: build/$(APP)-shared static: build/$(APP)-static ln -sf $(APP)-static build/$(APP) +INCLUDES =-I../common PC_FILE := $(shell $(PKGCONF) --path libdpdk 2>/dev/null) CFLAGS += -O3 $(shell $(PKGCONF) --cflags libdpdk) LDFLAGS_SHARED = $(shell $(PKGCONF) --libs libdpdk) @@ -53,10 +54,10 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API CFLAGS += -Wno-address-of-packed-member build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build - $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) + $(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build - $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) + $(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) build: @mkdir -p $@ diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c index 4d8a4a71b8..b650668305 100644 --- a/examples/ipsec-secgw/ipsec-secgw.c +++ b/examples/ipsec-secgw/ipsec-secgw.c @@ -56,6 +56,10 @@ #include "parser.h" #include "sad.h" +#if defined(__ARM_NEON) +#include "ipsec_lpm_neon.h" +#endif + volatile bool force_quit; #define MAX_JUMBO_PKT_LEN 9600 @@ -100,6 +104,12 @@ struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] = { { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) } }; +/* + * To hold ethernet header per port, which will be applied + * to outgoing packets. + */ +xmm_t val_eth[RTE_MAX_ETHPORTS]; + struct flow_info flow_info_tbl[RTE_MAX_ETHPORTS]; #define CMD_LINE_OPT_CONFIG "config" @@ -568,9 +578,16 @@ process_pkts(struct lcore_conf *qconf, struct rte_mbuf **pkts, process_pkts_outbound(&qconf->outbound, &traffic); } +#if defined __ARM_NEON + /* Neon optimized packet routing */ + route4_pkts_neon(qconf->rt4_ctx, traffic.ip4.pkts, traffic.ip4.num, + qconf->outbound.ipv4_offloads, true); + route6_pkts_neon(qconf->rt6_ctx, traffic.ip6.pkts, traffic.ip6.num); +#else route4_pkts(qconf->rt4_ctx, traffic.ip4.pkts, traffic.ip4.num, qconf->outbound.ipv4_offloads, true); route6_pkts(qconf->rt6_ctx, traffic.ip6.pkts, traffic.ip6.num); +#endif } static inline void @@ -1403,6 +1420,8 @@ add_dst_ethaddr(uint16_t port, const struct rte_ether_addr *addr) return -EINVAL; ethaddr_tbl[port].dst = ETHADDR_TO_UINT64(addr); + rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[port].dst, + (struct rte_ether_addr *)(val_eth + port)); return 0; } @@ -1865,6 +1884,12 @@ port_init(uint16_t portid, uint64_t req_rx_offloads, uint64_t req_tx_offloads) portid, rte_strerror(-ret)); ethaddr_tbl[portid].src = ETHADDR_TO_UINT64(ðaddr); + + rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].dst, + (struct rte_ether_addr *)(val_eth + portid)); + rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].src, + (struct rte_ether_addr *)(val_eth + portid) + 1); + print_ethaddr("Address: ", ðaddr); printf("\n"); diff --git a/examples/ipsec-secgw/ipsec_lpm_neon.h b/examples/ipsec-secgw/ipsec_lpm_neon.h new file mode 100644 index 0000000000..959a5a8666 --- /dev/null +++ b/examples/ipsec-secgw/ipsec_lpm_neon.h @@ -0,0 +1,213 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2022 Marvell. + */ + +#ifndef __IPSEC_LPM_NEON_H__ +#define __IPSEC_LPM_NEON_H__ + +#include +#include "ipsec_neon.h" + +/* + * Append ethernet header and read destination IPV4 addresses from 4 mbufs. + */ +static inline void +processx4_step1(struct rte_mbuf *pkt[FWDSTEP], int32x4_t *dip, + uint64_t *inline_flag) +{ + struct rte_ipv4_hdr *ipv4_hdr; + struct rte_ether_hdr *eth_hdr; + int32_t dst[FWDSTEP]; + int i; + + for (i = 0; i < FWDSTEP; i++) { + eth_hdr = (struct rte_ether_hdr *)rte_pktmbuf_prepend(pkt[i], + RTE_ETHER_HDR_LEN); + pkt[i]->ol_flags |= RTE_MBUF_F_TX_IPV4; + pkt[i]->l2_len = RTE_ETHER_HDR_LEN; + + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); + + /* Fetch destination IPv4 address */ + dst[i] = ipv4_hdr->dst_addr; + *inline_flag |= pkt[i]->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD; + } + + dip[0] = vld1q_s32(dst); +} + +/* + * Lookup into LPM for destination port. + */ +static inline void +processx4_step2(struct rt_ctx *rt_ctx, int32x4_t dip, uint64_t inline_flag, + struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP]) +{ + uint32_t next_hop; + rte_xmm_t dst; + uint8_t i; + + dip = vreinterpretq_s32_u8(vrev32q_u8(vreinterpretq_u8_s32(dip))); + + /* If all 4 packets are non-inline */ + if (!inline_flag) { + rte_lpm_lookupx4((struct rte_lpm *)rt_ctx, dip, dst.u32, + BAD_PORT); + /* get rid of unused upper 16 bit for each dport. */ + vst1_s16((int16_t *)dprt, vqmovn_s32(dst.x)); + return; + } + + /* Inline and non-inline packets */ + dst.x = dip; + for (i = 0; i < FWDSTEP; i++) { + if (pkt[i]->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) { + next_hop = get_hop_for_offload_pkt(pkt[i], 0); + dprt[i] = (uint16_t) (((next_hop & + RTE_LPM_LOOKUP_SUCCESS) != 0) + ? next_hop : BAD_PORT); + + } else { + dprt[i] = (uint16_t) ((rte_lpm_lookup( + (struct rte_lpm *)rt_ctx, + dst.u32[i], &next_hop) == 0) + ? next_hop : BAD_PORT); + } + } +} + +/* + * Process single packets for destination port. + */ +static inline void +process_single_pkt(struct rt_ctx *rt_ctx, struct rte_mbuf *pkt, + uint16_t *dst_port) +{ + struct rte_ether_hdr *eth_hdr; + struct rte_ipv4_hdr *ipv4_hdr; + uint32_t next_hop; + uint32_t dst_ip; + + eth_hdr = (struct rte_ether_hdr *)rte_pktmbuf_prepend(pkt, + RTE_ETHER_HDR_LEN); + pkt->ol_flags |= RTE_MBUF_F_TX_IPV4; + pkt->l2_len = RTE_ETHER_HDR_LEN; + + if (pkt->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) { + next_hop = get_hop_for_offload_pkt(pkt, 0); + *dst_port = (uint16_t) (((next_hop & + RTE_LPM_LOOKUP_SUCCESS) != 0) + ? next_hop : BAD_PORT); + } else { + ipv4_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); + dst_ip = rte_be_to_cpu_32(ipv4_hdr->dst_addr); + *dst_port = (uint16_t) ((rte_lpm_lookup( + (struct rte_lpm *)rt_ctx, + dst_ip, &next_hop) == 0) + ? next_hop : BAD_PORT); + } +} + +/* + * Buffer optimized handling of IPv6 packets. + */ +static inline void +route6_pkts_neon(struct rt_ctx *rt_ctx, struct rte_mbuf **pkts, int nb_rx) +{ + uint8_t dst_ip6[MAX_PKT_BURST][16]; + int32_t dst_port[MAX_PKT_BURST]; + struct rte_ether_hdr *eth_hdr; + struct rte_ipv6_hdr *ipv6_hdr; + int32_t hop[MAX_PKT_BURST]; + struct rte_mbuf *pkt; + uint8_t lpm_pkts = 0; + int32_t i; + + if (nb_rx == 0) + return; + + /* Need to do an LPM lookup for non-inline packets. Inline packets will + * have port ID in the SA + */ + + for (i = 0; i < nb_rx; i++) { + pkt = pkts[i]; + eth_hdr = (struct rte_ether_hdr *)rte_pktmbuf_prepend(pkt, + RTE_ETHER_HDR_LEN); + pkt->l2_len = RTE_ETHER_HDR_LEN; + pkt->ol_flags |= RTE_MBUF_F_TX_IPV6; + + if (!(pkt->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)) { + /* Security offload not enabled. So an LPM lookup is + * required to get the hop + */ + ipv6_hdr = (struct rte_ipv6_hdr *)(eth_hdr + 1); + memcpy(&dst_ip6[lpm_pkts][0], + ipv6_hdr->dst_addr, 16); + lpm_pkts++; + } + } + + rte_lpm6_lookup_bulk_func((struct rte_lpm6 *)rt_ctx, dst_ip6, + hop, lpm_pkts); + + lpm_pkts = 0; + + for (i = 0; i < nb_rx; i++) { + pkt = pkts[i]; + if (pkt->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) { + /* Read hop from the SA */ + dst_port[i] = get_hop_for_offload_pkt(pkt, 1); + } else { + /* Need to use hop returned by lookup */ + dst_port[i] = hop[lpm_pkts++]; + } + if (dst_port[i] == -1) + dst_port[i] = BAD_PORT; + } + + /* Send packets */ + send_multi_pkts(pkts, (uint16_t *)dst_port, nb_rx, 0, 0, false); +} + +/* + * Buffer optimized handling of IPv4 packets. + */ +static inline void +route4_pkts_neon(struct rt_ctx *rt_ctx, struct rte_mbuf **pkts, int nb_rx, + uint64_t tx_offloads, bool ip_cksum) +{ + const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); + const int32_t m = nb_rx % FWDSTEP; + uint16_t dst_port[MAX_PKT_BURST]; + uint64_t inline_flag = 0; + int32x4_t dip; + int32_t i; + + if (nb_rx == 0) + return; + + for (i = 0; i != k; i += FWDSTEP) { + processx4_step1(&pkts[i], &dip, &inline_flag); + processx4_step2(rt_ctx, dip, inline_flag, &pkts[i], + &dst_port[i]); + } + + /* Classify last up to 3 packets one by one */ + switch (m) { + case 3: + process_single_pkt(rt_ctx, pkts[i], &dst_port[i]); + i++; + /* fallthrough */ + case 2: + process_single_pkt(rt_ctx, pkts[i], &dst_port[i]); + i++; + /* fallthrough */ + case 1: + process_single_pkt(rt_ctx, pkts[i], &dst_port[i]); + } + + send_multi_pkts(pkts, dst_port, nb_rx, tx_offloads, ip_cksum, true); +} + +#endif /* __IPSEC_LPM_NEON_H__ */ diff --git a/examples/ipsec-secgw/ipsec_neon.h b/examples/ipsec-secgw/ipsec_neon.h new file mode 100644 index 0000000000..0f72219ed0 --- /dev/null +++ b/examples/ipsec-secgw/ipsec_neon.h @@ -0,0 +1,321 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2022 Marvell. + */ + +#ifndef _IPSEC_NEON_H_ +#define _IPSEC_NEON_H_ + +#include "ipsec.h" +#include "neon_common.h" + +#define MAX_TX_BURST (MAX_PKT_BURST / 2) +#define BAD_PORT ((uint16_t)-1) + +extern xmm_t val_eth[RTE_MAX_ETHPORTS]; + +/* + * Update source and destination MAC addresses in the ethernet header. + */ +static inline void +processx4_step3(struct rte_mbuf *pkts[FWDSTEP], uint16_t dst_port[FWDSTEP], + uint64_t tx_offloads, bool ip_cksum, uint8_t *l_pkt) +{ + uint32x4_t te[FWDSTEP]; + uint32x4_t ve[FWDSTEP]; + uint32_t *p[FWDSTEP]; + struct rte_mbuf *pkt; + uint8_t i; + + for (i = 0; i < FWDSTEP; i++) { + pkt = pkts[i]; + + /* Check if it is a large packet */ + if (pkt->pkt_len - RTE_ETHER_HDR_LEN > mtu_size) + *l_pkt |= 1; + + p[i] = rte_pktmbuf_mtod(pkt, uint32_t *); + ve[i] = vreinterpretq_u32_s32(val_eth[dst_port[i]]); + te[i] = vld1q_u32(p[i]); + + /* Update last 4 bytes */ + ve[i] = vsetq_lane_u32(vgetq_lane_u32(te[i], 3), ve[i], 3); + vst1q_u32(p[i], ve[i]); + + if (ip_cksum) { + struct rte_ipv4_hdr *ip; + + pkt->ol_flags |= tx_offloads; + + ip = (struct rte_ipv4_hdr *) + (p[i] + RTE_ETHER_HDR_LEN + 1); + ip->hdr_checksum = 0; + + /* calculate IPv4 cksum in SW */ + if ((pkt->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) == 0) + ip->hdr_checksum = rte_ipv4_cksum(ip); + } + + } +} + +/** + * Process single packet: + * Update source and destination MAC addresses in the ethernet header. + */ +static inline void +process_packet(struct rte_mbuf *pkt, uint16_t *dst_port, uint64_t tx_offloads, + bool ip_cksum, uint8_t *l_pkt) +{ + struct rte_ether_hdr *eth_hdr; + uint32x4_t te, ve; + + /* Check if it is a large packet */ + if (pkt->pkt_len - RTE_ETHER_HDR_LEN > mtu_size) + *l_pkt |= 1; + + eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); + + te = vld1q_u32((uint32_t *)eth_hdr); + ve = vreinterpretq_u32_s32(val_eth[dst_port[0]]); + + ve = vcopyq_laneq_u32(ve, 3, te, 3); + vst1q_u32((uint32_t *)eth_hdr, ve); + + if (ip_cksum) { + struct rte_ipv4_hdr *ip; + + pkt->ol_flags |= tx_offloads; + + ip = (struct rte_ipv4_hdr *)(eth_hdr + 1); + ip->hdr_checksum = 0; + + /* calculate IPv4 cksum in SW */ + if ((pkt->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) == 0) + ip->hdr_checksum = rte_ipv4_cksum(ip); + } +} + +static inline void +send_packets(struct rte_mbuf *m[], uint16_t port, uint32_t num, bool is_ipv4) +{ + uint8_t proto; + uint32_t i; + + proto = is_ipv4 ? IPPROTO_IP : IPPROTO_IPV6; + for (i = 0; i < num; i++) + send_single_packet(m[i], port, proto); +} + +static inline void +send_packetsx4(struct rte_mbuf *m[], uint16_t port, uint32_t num) +{ + unsigned int lcoreid = rte_lcore_id(); + struct lcore_conf *qconf; + uint32_t len, j, n; + + qconf = &lcore_conf[lcoreid]; + + len = qconf->tx_mbufs[port].len; + + /* + * If TX buffer for that queue is empty, and we have enough packets, + * then send them straightway. + */ + if (num >= MAX_TX_BURST && len == 0) { + n = rte_eth_tx_burst(port, qconf->tx_queue_id[port], m, num); + core_stats_update_tx(n); + if (unlikely(n < num)) { + do { + rte_pktmbuf_free(m[n]); + } while (++n < num); + } + return; + } + + /* + * Put packets into TX buffer for that queue. + */ + + n = len + num; + n = (n > MAX_PKT_BURST) ? MAX_PKT_BURST - len : num; + + j = 0; + switch (n % FWDSTEP) { + while (j < n) { + case 0: + qconf->tx_mbufs[port].m_table[len + j] = m[j]; + j++; + /* fallthrough */ + case 3: + qconf->tx_mbufs[port].m_table[len + j] = m[j]; + j++; + /* fallthrough */ + case 2: + qconf->tx_mbufs[port].m_table[len + j] = m[j]; + j++; + /* fallthrough */ + case 1: + qconf->tx_mbufs[port].m_table[len + j] = m[j]; + j++; + } + } + + len += n; + + /* enough pkts to be sent */ + if (unlikely(len == MAX_PKT_BURST)) { + + send_burst(qconf, MAX_PKT_BURST, port); + + /* copy rest of the packets into the TX buffer. */ + len = num - n; + if (len == 0) + goto exit; + + j = 0; + switch (len % FWDSTEP) { + while (j < len) { + case 0: + qconf->tx_mbufs[port].m_table[j] = m[n + j]; + j++; + /* fallthrough */ + case 3: + qconf->tx_mbufs[port].m_table[j] = m[n + j]; + j++; + /* fallthrough */ + case 2: + qconf->tx_mbufs[port].m_table[j] = m[n + j]; + j++; + /* fallthrough */ + case 1: + qconf->tx_mbufs[port].m_table[j] = m[n + j]; + j++; + } + } + } + +exit: + qconf->tx_mbufs[port].len = len; +} + +/** + * Send packets burst to the ports in dst_port array + */ +static __rte_always_inline void +send_multi_pkts(struct rte_mbuf **pkts, uint16_t dst_port[MAX_PKT_BURST], + int nb_rx, uint64_t tx_offloads, bool ip_cksum, bool is_ipv4) +{ + unsigned int lcoreid = rte_lcore_id(); + uint16_t pnum[MAX_PKT_BURST + 1]; + uint8_t l_pkt = 0; + uint16_t dlp, *lp; + int i = 0, k; + + /* + * Finish packet processing and group consecutive + * packets with the same destination port. + */ + k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); + + if (k != 0) { + uint16x8_t dp1, dp2; + + lp = pnum; + lp[0] = 1; + + processx4_step3(pkts, dst_port, tx_offloads, ip_cksum, &l_pkt); + + /* dp1: */ + dp1 = vld1q_u16(dst_port); + + for (i = FWDSTEP; i != k; i += FWDSTEP) { + processx4_step3(&pkts[i], &dst_port[i], tx_offloads, + ip_cksum, &l_pkt); + + /* + * dp2: + * + */ + dp2 = vld1q_u16(&dst_port[i - FWDSTEP + 1]); + lp = neon_port_groupx4(&pnum[i - FWDSTEP], lp, dp1, dp2); + + /* + * dp1: + * + */ + dp1 = vextq_u16(dp2, dp1, FWDSTEP - 1); + } + + /* + * dp2: + */ + dp2 = vextq_u16(dp1, dp1, 1); + dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3); + lp = neon_port_groupx4(&pnum[i - FWDSTEP], lp, dp1, dp2); + + /* + * remove values added by the last repeated + * dst port. + */ + lp[0]--; + dlp = dst_port[i - 1]; + } else { + /* set dlp and lp to the never used values. */ + dlp = BAD_PORT - 1; + lp = pnum + MAX_PKT_BURST; + } + + /* Process up to last 3 packets one by one. */ + switch (nb_rx % FWDSTEP) { + case 3: + process_packet(pkts[i], dst_port + i, tx_offloads, ip_cksum, + &l_pkt); + GROUP_PORT_STEP(dlp, dst_port, lp, pnum, i); + i++; + /* fallthrough */ + case 2: + process_packet(pkts[i], dst_port + i, tx_offloads, ip_cksum, + &l_pkt); + GROUP_PORT_STEP(dlp, dst_port, lp, pnum, i); + i++; + /* fallthrough */ + case 1: + process_packet(pkts[i], dst_port + i, tx_offloads, ip_cksum, + &l_pkt); + GROUP_PORT_STEP(dlp, dst_port, lp, pnum, i); + } + + /* + * Send packets out, through destination port. + * Consecutive packets with the same destination port + * are already grouped together. + * If destination port for the packet equals BAD_PORT, + * then free the packet without sending it out. + */ + for (i = 0; i < nb_rx; i += k) { + + uint16_t pn; + + pn = dst_port[i]; + k = pnum[i]; + + if (likely(pn != BAD_PORT)) { + if (l_pkt) + /* Large packet is present, need to send + * individual packets with fragment + */ + send_packets(pkts + i, pn, k, is_ipv4); + else + send_packetsx4(pkts + i, pn, k); + + } else { + free_pkts(&pkts[i], k); + if (is_ipv4) + core_statistics[lcoreid].lpm4.miss++; + else + core_statistics[lcoreid].lpm6.miss++; + } + } +} + +#endif /* _IPSEC_NEON_H_ */ diff --git a/examples/ipsec-secgw/ipsec_worker.c b/examples/ipsec-secgw/ipsec_worker.c index e1d4e3d864..803157d8ee 100644 --- a/examples/ipsec-secgw/ipsec_worker.c +++ b/examples/ipsec-secgw/ipsec_worker.c @@ -12,6 +12,10 @@ #include "ipsec-secgw.h" #include "ipsec_worker.h" +#if defined(__ARM_NEON) +#include "ipsec_lpm_neon.h" +#endif + struct port_drv_mode_data { struct rte_security_session *sess; struct rte_security_ctx *ctx; @@ -1248,8 +1252,13 @@ ipsec_poll_mode_wrkr_inl_pr(void) v6_num = ip6.num; } +#if defined __ARM_NEON + route4_pkts_neon(rt4_ctx, v4, v4_num, 0, false); + route6_pkts_neon(rt6_ctx, v6, v6_num); +#else route4_pkts(rt4_ctx, v4, v4_num, 0, false); route6_pkts(rt6_ctx, v6, v6_num); +#endif } } }