[v3,1/2] examples/l3fwd: common packet group functionality

Message ID 20220623093816.254830-1-rbhansali@marvell.com (mailing list archive)
State Accepted, archived
Delegated to: akhil goyal
Headers
Series [v3,1/2] examples/l3fwd: common packet group functionality |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Rahul Bhansali June 23, 2022, 9:38 a.m. UTC
  This will make the packet grouping function common, so
that other examples can utilize as per need.

For each architecture sse/neon/altivec, port group
headers will be created under examples/common/<arch>.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
Changes in v3: Created common port-group headers for
architectures sse/neon/altivec as suggested by Konstantin.

Changes in v2: New patch to address review comment.

 examples/common/altivec/port_group.h |  48 +++++++++
 examples/common/neon/port_group.h    |  50 ++++++++++
 examples/common/pkt_group.h          | 139 +++++++++++++++++++++++++++
 examples/common/sse/port_group.h     |  47 +++++++++
 examples/l3fwd/Makefile              |   5 +-
 examples/l3fwd/l3fwd.h               |   2 -
 examples/l3fwd/l3fwd_altivec.h       |  37 +------
 examples/l3fwd/l3fwd_common.h        | 129 +------------------------
 examples/l3fwd/l3fwd_neon.h          |  39 +-------
 examples/l3fwd/l3fwd_sse.h           |  36 +------
 examples/meson.build                 |   2 +-
 11 files changed, 293 insertions(+), 241 deletions(-)
 create mode 100644 examples/common/altivec/port_group.h
 create mode 100644 examples/common/neon/port_group.h
 create mode 100644 examples/common/pkt_group.h
 create mode 100644 examples/common/sse/port_group.h

--
2.25.1
  

Comments

Konstantin Ananyev June 26, 2022, 7 p.m. UTC | #1
23/06/2022 10:38, Rahul Bhansali пишет:
> This will make the packet grouping function common, so
> that other examples can utilize as per need.
> 
> For each architecture sse/neon/altivec, port group
> headers will be created under examples/common/<arch>.
> 
> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> ---
> Changes in v3: Created common port-group headers for
> architectures sse/neon/altivec as suggested by Konstantin.
> 
> Changes in v2: New patch to address review comment.
> 


Tested-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>



> 
> --
> 2.25.1
>
  
Akhil Goyal June 28, 2022, 8:54 a.m. UTC | #2
> 23/06/2022 10:38, Rahul Bhansali пишет:
> > This will make the packet grouping function common, so
> > that other examples can utilize as per need.
> >
> > For each architecture sse/neon/altivec, port group
> > headers will be created under examples/common/<arch>.
> >
> > Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > ---
> > Changes in v3: Created common port-group headers for
> > architectures sse/neon/altivec as suggested by Konstantin.
> >
> > Changes in v2: New patch to address review comment.
> >
> 
> 
> Tested-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Series Applied to dpdk-next-crypto

Thanks.
  
Thomas Monjalon July 3, 2022, 9:40 p.m. UTC | #3
23/06/2022 11:38, Rahul Bhansali:
> This will make the packet grouping function common, so
> that other examples can utilize as per need.
> 
> For each architecture sse/neon/altivec, port group
> headers will be created under examples/common/<arch>.
> 
> Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> ---
> Changes in v3: Created common port-group headers for
> architectures sse/neon/altivec as suggested by Konstantin.
> 
> Changes in v2: New patch to address review comment.
> 
>  examples/common/altivec/port_group.h |  48 +++++++++
>  examples/common/neon/port_group.h    |  50 ++++++++++
>  examples/common/pkt_group.h          | 139 +++++++++++++++++++++++++++
>  examples/common/sse/port_group.h     |  47 +++++++++
>  examples/l3fwd/Makefile              |   5 +-
>  examples/l3fwd/l3fwd.h               |   2 -
>  examples/l3fwd/l3fwd_altivec.h       |  37 +------
>  examples/l3fwd/l3fwd_common.h        | 129 +------------------------
>  examples/l3fwd/l3fwd_neon.h          |  39 +-------
>  examples/l3fwd/l3fwd_sse.h           |  36 +------
>  examples/meson.build                 |   2 +-

OK you move code from l3fwd to another place.
That's probably a step in the right direction.
What about taking the extra step of making it an EAL API?
  
Rahul Bhansali July 4, 2022, 12:49 p.m. UTC | #4
Hi,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 4, 2022 3:10 AM
> To: Rahul Bhansali <rbhansali@marvell.com>
> Cc: dev@dpdk.org; David Christensen <drc@linux.vnet.ibm.com>; Ruifeng Wang
> <ruifeng.wang@arm.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Akhil Goyal <gakhil@marvell.com>;
> david.marchand@redhat.com
> Subject: [EXT] Re: [PATCH v3 1/2] examples/l3fwd: common packet group
> functionality
> 
> External Email
> 
> ----------------------------------------------------------------------
> 23/06/2022 11:38, Rahul Bhansali:
> > This will make the packet grouping function common, so that other
> > examples can utilize as per need.
> >
> > For each architecture sse/neon/altivec, port group headers will be
> > created under examples/common/<arch>.
> >
> > Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > ---
> > Changes in v3: Created common port-group headers for architectures
> > sse/neon/altivec as suggested by Konstantin.
> >
> > Changes in v2: New patch to address review comment.
> >
> >  examples/common/altivec/port_group.h |  48 +++++++++
> >  examples/common/neon/port_group.h    |  50 ++++++++++
> >  examples/common/pkt_group.h          | 139 +++++++++++++++++++++++++++
> >  examples/common/sse/port_group.h     |  47 +++++++++
> >  examples/l3fwd/Makefile              |   5 +-
> >  examples/l3fwd/l3fwd.h               |   2 -
> >  examples/l3fwd/l3fwd_altivec.h       |  37 +------
> >  examples/l3fwd/l3fwd_common.h        | 129 +------------------------
> >  examples/l3fwd/l3fwd_neon.h          |  39 +-------
> >  examples/l3fwd/l3fwd_sse.h           |  36 +------
> >  examples/meson.build                 |   2 +-
> 
> OK you move code from l3fwd to another place.
> That's probably a step in the right direction.
> What about taking the extra step of making it an EAL API?
Thanks for the suggestion. These changes are specific to fast path and I think EAL is more focused towards control path (Correct me if I am wrong).
Instead of EAL API, we can have it in library, but currently these are very few changes to form a library. 
Later in future if we can identify more such common APIs then we can form a library around these specific things, so that more examples/app/library can use it.
Please suggest if this makes sense.

> 
> 
>
  
Thomas Monjalon July 4, 2022, 2:04 p.m. UTC | #5
04/07/2022 14:49, Rahul Bhansali:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 23/06/2022 11:38, Rahul Bhansali:
> > > This will make the packet grouping function common, so that other
> > > examples can utilize as per need.
> > >
> > > For each architecture sse/neon/altivec, port group headers will be
> > > created under examples/common/<arch>.
> > >
> > > Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
> > > ---
> > > Changes in v3: Created common port-group headers for architectures
> > > sse/neon/altivec as suggested by Konstantin.
> > >
> > > Changes in v2: New patch to address review comment.
> > >
> > >  examples/common/altivec/port_group.h |  48 +++++++++
> > >  examples/common/neon/port_group.h    |  50 ++++++++++
> > >  examples/common/pkt_group.h          | 139 +++++++++++++++++++++++++++
> > >  examples/common/sse/port_group.h     |  47 +++++++++
> > >  examples/l3fwd/Makefile              |   5 +-
> > >  examples/l3fwd/l3fwd.h               |   2 -
> > >  examples/l3fwd/l3fwd_altivec.h       |  37 +------
> > >  examples/l3fwd/l3fwd_common.h        | 129 +------------------------
> > >  examples/l3fwd/l3fwd_neon.h          |  39 +-------
> > >  examples/l3fwd/l3fwd_sse.h           |  36 +------
> > >  examples/meson.build                 |   2 +-
> > 
> > OK you move code from l3fwd to another place.
> > That's probably a step in the right direction.
> > What about taking the extra step of making it an EAL API?
> 
> Thanks for the suggestion. These changes are specific to fast path and I think EAL is more focused towards control path (Correct me if I am wrong).

No, EAL is just a set of basic functions.
Locks, time counters, bit ops are examples of EAL functions
which can be used in data path.

> Instead of EAL API, we can have it in library, but currently these are very few changes to form a library.
> Later in future if we can identify more such common APIs then we can form a library around these specific things, so that more examples/app/library can use it.
> Please suggest if this makes sense.

These are just computations, it can be a file in EAL.
  
Thomas Monjalon July 4, 2022, 2:48 p.m. UTC | #6
23/06/2022 11:38, Rahul Bhansali:
> +#ifndef _PORT_GROUP_H_
> +#define _PORT_GROUP_H_

No need of underscores at begin and end.

> +
> +#include "pkt_group.h"
> +
> +/*
> + * Group consecutive packets with the same destination port in bursts of 4.
> + * Suppose we have array of destination ports:
> + * dst_port[] = {a, b, c, d,, e, ... }
> + * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
> + * We doing 4 comparisons at once and the result is 4 bit mask.
> + * This mask is used as an index into prebuild array of pnum values.
> + */

This explanation is not clear to me.

> +static inline uint16_t *
> +port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp,

array parameter is not standard, you should make it a simple pointer

> +	     __vector unsigned short dp1,
> +	     __vector unsigned short dp2)


longer parameter names would help

[...]
> --- a/examples/l3fwd/Makefile
> +++ b/examples/l3fwd/Makefile
> +INCLUDES =-I../common
>  PC_FILE := $(shell $(PKGCONF) --path libdpdk 2>/dev/null)
>  CFLAGS += -O3 $(shell $(PKGCONF) --cflags libdpdk)
>  # Added for 'rte_eth_link_to_str()'
> @@ -38,10 +39,10 @@ endif
>  endif
> 
>  build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
> -	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
> +	$(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
> 
>  build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
> -	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
> +	$(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)

No need to introduce INCLUDES, you can expand CFLAGS.

I will fix this last one while pulling.
Please work on better names and explanations for an EAL integration.
  
Rahul Bhansali July 5, 2022, 4:11 p.m. UTC | #7
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 4, 2022 8:18 PM
> To: Rahul Bhansali <rbhansali@marvell.com>
> Cc: dev@dpdk.org; David Christensen <drc@linux.vnet.ibm.com>; Ruifeng Wang
> <ruifeng.wang@arm.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>; Jerin Jacob
> Kollanukkaran <jerinj@marvell.com>; Akhil Goyal <gakhil@marvell.com>
> Subject: [EXT] Re: [PATCH v3 1/2] examples/l3fwd: common packet group
> functionality
> 
> External Email
> 
> ----------------------------------------------------------------------
> 23/06/2022 11:38, Rahul Bhansali:
> > +#ifndef _PORT_GROUP_H_
> > +#define _PORT_GROUP_H_
> 
> No need of underscores at begin and end.
> 
> > +
> > +#include "pkt_group.h"
> > +
> > +/*
> > + * Group consecutive packets with the same destination port in bursts of 4.
> > + * Suppose we have array of destination ports:
> > + * dst_port[] = {a, b, c, d,, e, ... }
> > + * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
> > + * We doing 4 comparisons at once and the result is 4 bit mask.
> > + * This mask is used as an index into prebuild array of pnum values.
> > + */
> 
> This explanation is not clear to me.
> 
> > +static inline uint16_t *
> > +port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp,
> 
> array parameter is not standard, you should make it a simple pointer
> 
> > +	     __vector unsigned short dp1,
> > +	     __vector unsigned short dp2)
> 
> 
> longer parameter names would help
> 
> [...]
> > --- a/examples/l3fwd/Makefile
> > +++ b/examples/l3fwd/Makefile
> > +INCLUDES =-I../common
> >  PC_FILE := $(shell $(PKGCONF) --path libdpdk 2>/dev/null)  CFLAGS +=
> > -O3 $(shell $(PKGCONF) --cflags libdpdk)  # Added for
> > 'rte_eth_link_to_str()'
> > @@ -38,10 +39,10 @@ endif
> >  endif
> >
> >  build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
> > -	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
> > +	$(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS)
> > +$(LDFLAGS_SHARED)
> >
> >  build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
> > -	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
> > +	$(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS)
> > +$(LDFLAGS_STATIC)
> 
> No need to introduce INCLUDES, you can expand CFLAGS.
> 
> I will fix this last one while pulling.
> Please work on better names and explanations for an EAL integration.
> 
Ack, will make changes for an EAL integration.

>
  

Patch

diff --git a/examples/common/altivec/port_group.h b/examples/common/altivec/port_group.h
new file mode 100644
index 0000000000..d96d14ca94
--- /dev/null
+++ b/examples/common/altivec/port_group.h
@@ -0,0 +1,48 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2017 IBM Corporation.
+ * Copyright(C) 2022 Marvell.
+ */
+
+#ifndef _PORT_GROUP_H_
+#define _PORT_GROUP_H_
+
+#include "pkt_group.h"
+
+/*
+ * Group consecutive packets with the same destination port in bursts of 4.
+ * Suppose we have array of destination ports:
+ * dst_port[] = {a, b, c, d,, e, ... }
+ * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
+ * We doing 4 comparisons at once and the result is 4 bit mask.
+ * This mask is used as an index into prebuild array of pnum values.
+ */
+static inline uint16_t *
+port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp,
+	     __vector unsigned short dp1,
+	     __vector unsigned short dp2)
+{
+	union {
+		uint16_t u16[FWDSTEP + 1];
+		uint64_t u64;
+	} *pnum = (void *)pn;
+
+	int32_t v;
+
+	v = vec_any_eq(dp1, dp2);
+
+
+	/* update last port counter. */
+	lp[0] += gptbl[v].lpv;
+
+	/* if dest port value has changed. */
+	if (v != GRPMSK) {
+		pnum->u64 = gptbl[v].pnum;
+		pnum->u16[FWDSTEP] = 1;
+		lp = pnum->u16 + gptbl[v].idx;
+	}
+
+	return lp;
+}
+
+#endif /* _PORT_GROUP_H_ */
diff --git a/examples/common/neon/port_group.h b/examples/common/neon/port_group.h
new file mode 100644
index 0000000000..82c6ed6d73
--- /dev/null
+++ b/examples/common/neon/port_group.h
@@ -0,0 +1,50 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2018 Intel Corporation.
+ * Copyright(c) 2017-2018 Linaro Limited.
+ * Copyright(C) 2022 Marvell.
+ */
+
+#ifndef _PORT_GROUP_H_
+#define _PORT_GROUP_H_
+
+#include "pkt_group.h"
+
+/*
+ * Group consecutive packets with the same destination port in bursts of 4.
+ * Suppose we have array of destination ports:
+ * dst_port[] = {a, b, c, d,, e, ... }
+ * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
+ * We doing 4 comparisons at once and the result is 4 bit mask.
+ * This mask is used as an index into prebuild array of pnum values.
+ */
+static inline uint16_t *
+port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1,
+		  uint16x8_t dp2)
+{
+	union {
+		uint16_t u16[FWDSTEP + 1];
+		uint64_t u64;
+	} *pnum = (void *)pn;
+
+	uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0};
+	int32_t v;
+
+	dp1 = vceqq_u16(dp1, dp2);
+	dp1 = vandq_u16(dp1, mask);
+	v = vaddvq_u16(dp1);
+
+	/* update last port counter. */
+	lp[0] += gptbl[v].lpv;
+	rte_compiler_barrier();
+
+	/* if dest port value has changed. */
+	if (v != GRPMSK) {
+		pnum->u64 = gptbl[v].pnum;
+		pnum->u16[FWDSTEP] = 1;
+		lp = pnum->u16 + gptbl[v].idx;
+	}
+
+	return lp;
+}
+
+#endif /* _PORT_GROUP_H_ */
diff --git a/examples/common/pkt_group.h b/examples/common/pkt_group.h
new file mode 100644
index 0000000000..8b26d9380f
--- /dev/null
+++ b/examples/common/pkt_group.h
@@ -0,0 +1,139 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016-2018 Intel Corporation.
+ * Copyright(c) 2017-2018 Linaro Limited.
+ * Copyright(C) 2022 Marvell.
+ */
+
+#ifndef _PKT_GROUP_H_
+#define _PKT_GROUP_H_
+
+#define FWDSTEP	4
+
+/*
+ * Group consecutive packets with the same destination port into one burst.
+ * To avoid extra latency this is done together with some other packet
+ * processing, but after we made a final decision about packet's destination.
+ * To do this we maintain:
+ * pnum - array of number of consecutive packets with the same dest port for
+ * each packet in the input burst.
+ * lp - pointer to the last updated element in the pnum.
+ * dlp - dest port value lp corresponds to.
+ */
+
+#define	GRPSZ	(1 << FWDSTEP)
+#define	GRPMSK	(GRPSZ - 1)
+
+#define GROUP_PORT_STEP(dlp, dcp, lp, pn, idx)	do { \
+	if (likely((dlp) == (dcp)[(idx)])) {         \
+		(lp)[0]++;                           \
+	} else {                                     \
+		(dlp) = (dcp)[idx];                  \
+		(lp) = (pn) + (idx);                 \
+		(lp)[0] = 1;                         \
+	}                                            \
+} while (0)
+
+static const struct {
+	uint64_t pnum; /* prebuild 4 values for pnum[]. */
+	int32_t  idx;  /* index for new last updated elemnet. */
+	uint16_t lpv;  /* add value to the last updated element. */
+} gptbl[GRPSZ] = {
+	{
+		/* 0: a != b, b != c, c != d, d != e */
+		.pnum = UINT64_C(0x0001000100010001),
+		.idx = 4,
+		.lpv = 0,
+	},
+	{
+		/* 1: a == b, b != c, c != d, d != e */
+		.pnum = UINT64_C(0x0001000100010002),
+		.idx = 4,
+		.lpv = 1,
+	},
+	{
+		/* 2: a != b, b == c, c != d, d != e */
+		.pnum = UINT64_C(0x0001000100020001),
+		.idx = 4,
+		.lpv = 0,
+	},
+	{
+		/* 3: a == b, b == c, c != d, d != e */
+		.pnum = UINT64_C(0x0001000100020003),
+		.idx = 4,
+		.lpv = 2,
+	},
+	{
+		/* 4: a != b, b != c, c == d, d != e */
+		.pnum = UINT64_C(0x0001000200010001),
+		.idx = 4,
+		.lpv = 0,
+	},
+	{
+		/* 5: a == b, b != c, c == d, d != e */
+		.pnum = UINT64_C(0x0001000200010002),
+		.idx = 4,
+		.lpv = 1,
+	},
+	{
+		/* 6: a != b, b == c, c == d, d != e */
+		.pnum = UINT64_C(0x0001000200030001),
+		.idx = 4,
+		.lpv = 0,
+	},
+	{
+		/* 7: a == b, b == c, c == d, d != e */
+		.pnum = UINT64_C(0x0001000200030004),
+		.idx = 4,
+		.lpv = 3,
+	},
+	{
+		/* 8: a != b, b != c, c != d, d == e */
+		.pnum = UINT64_C(0x0002000100010001),
+		.idx = 3,
+		.lpv = 0,
+	},
+	{
+		/* 9: a == b, b != c, c != d, d == e */
+		.pnum = UINT64_C(0x0002000100010002),
+		.idx = 3,
+		.lpv = 1,
+	},
+	{
+		/* 0xa: a != b, b == c, c != d, d == e */
+		.pnum = UINT64_C(0x0002000100020001),
+		.idx = 3,
+		.lpv = 0,
+	},
+	{
+		/* 0xb: a == b, b == c, c != d, d == e */
+		.pnum = UINT64_C(0x0002000100020003),
+		.idx = 3,
+		.lpv = 2,
+	},
+	{
+		/* 0xc: a != b, b != c, c == d, d == e */
+		.pnum = UINT64_C(0x0002000300010001),
+		.idx = 2,
+		.lpv = 0,
+	},
+	{
+		/* 0xd: a == b, b != c, c == d, d == e */
+		.pnum = UINT64_C(0x0002000300010002),
+		.idx = 2,
+		.lpv = 1,
+	},
+	{
+		/* 0xe: a != b, b == c, c == d, d == e */
+		.pnum = UINT64_C(0x0002000300040001),
+		.idx = 1,
+		.lpv = 0,
+	},
+	{
+		/* 0xf: a == b, b == c, c == d, d == e */
+		.pnum = UINT64_C(0x0002000300040005),
+		.idx = 0,
+		.lpv = 4,
+	},
+};
+
+#endif /* _PKT_GROUP_H_ */
diff --git a/examples/common/sse/port_group.h b/examples/common/sse/port_group.h
new file mode 100644
index 0000000000..1ec09f8e4e
--- /dev/null
+++ b/examples/common/sse/port_group.h
@@ -0,0 +1,47 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(C) 2022 Marvell.
+ */
+
+#ifndef _PORT_GROUP_H_
+#define _PORT_GROUP_H_
+
+#include "pkt_group.h"
+
+/*
+ * Group consecutive packets with the same destination port in bursts of 4.
+ * Suppose we have array of destination ports:
+ * dst_port[] = {a, b, c, d,, e, ... }
+ * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
+ * We doing 4 comparisons at once and the result is 4 bit mask.
+ * This mask is used as an index into prebuild array of pnum values.
+ */
+static inline uint16_t *
+port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1,
+		 __m128i dp2)
+{
+	union {
+		uint16_t u16[FWDSTEP + 1];
+		uint64_t u64;
+	} *pnum = (void *)pn;
+
+	int32_t v;
+
+	dp1 = _mm_cmpeq_epi16(dp1, dp2);
+	dp1 = _mm_unpacklo_epi16(dp1, dp1);
+	v = _mm_movemask_ps((__m128)dp1);
+
+	/* update last port counter. */
+	lp[0] += gptbl[v].lpv;
+
+	/* if dest port value has changed. */
+	if (v != GRPMSK) {
+		pnum->u64 = gptbl[v].pnum;
+		pnum->u16[FWDSTEP] = 1;
+		lp = pnum->u16 + gptbl[v].idx;
+	}
+
+	return lp;
+}
+
+#endif /* _PORT_GROUP_H_ */
diff --git a/examples/l3fwd/Makefile b/examples/l3fwd/Makefile
index 8efe6378e2..8dbe85c2e6 100644
--- a/examples/l3fwd/Makefile
+++ b/examples/l3fwd/Makefile
@@ -22,6 +22,7 @@  shared: build/$(APP)-shared
 static: build/$(APP)-static
 	ln -sf $(APP)-static build/$(APP)

+INCLUDES =-I../common
 PC_FILE := $(shell $(PKGCONF) --path libdpdk 2>/dev/null)
 CFLAGS += -O3 $(shell $(PKGCONF) --cflags libdpdk)
 # Added for 'rte_eth_link_to_str()'
@@ -38,10 +39,10 @@  endif
 endif

 build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
-	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+	$(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)

 build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
-	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+	$(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)

 build:
 	@mkdir -p $@
diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 8a52c90755..40b5f32a9e 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -44,8 +44,6 @@ 
 /* Used to mark destination port as 'invalid'. */
 #define	BAD_PORT ((uint16_t)-1)

-#define FWDSTEP	4
-
 /* replace first 12B of the ethernet header. */
 #define	MASK_ETH 0x3f

diff --git a/examples/l3fwd/l3fwd_altivec.h b/examples/l3fwd/l3fwd_altivec.h
index 88fb41843b..87018f5dbe 100644
--- a/examples/l3fwd/l3fwd_altivec.h
+++ b/examples/l3fwd/l3fwd_altivec.h
@@ -8,6 +8,7 @@ 
 #define _L3FWD_ALTIVEC_H_

 #include "l3fwd.h"
+#include "altivec/port_group.h"
 #include "l3fwd_common.h"

 /*
@@ -82,42 +83,6 @@  processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 			&dst_port[3], pkt[3]->packet_type);
 }

-/*
- * Group consecutive packets with the same destination port in bursts of 4.
- * Suppose we have array of destination ports:
- * dst_port[] = {a, b, c, d,, e, ... }
- * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
- * We doing 4 comparisons at once and the result is 4 bit mask.
- * This mask is used as an index into prebuild array of pnum values.
- */
-static inline uint16_t *
-port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp,
-		__vector unsigned short dp1,
-		__vector unsigned short dp2)
-{
-	union {
-		uint16_t u16[FWDSTEP + 1];
-		uint64_t u64;
-	} *pnum = (void *)pn;
-
-	int32_t v;
-
-	v = vec_any_eq(dp1, dp2);
-
-
-	/* update last port counter. */
-	lp[0] += gptbl[v].lpv;
-
-	/* if dest port value has changed. */
-	if (v != GRPMSK) {
-		pnum->u64 = gptbl[v].pnum;
-		pnum->u16[FWDSTEP] = 1;
-		lp = pnum->u16 + gptbl[v].idx;
-	}
-
-	return lp;
-}
-
 /**
  * Process one packet:
  * Update source and destination MAC addresses in the ethernet header.
diff --git a/examples/l3fwd/l3fwd_common.h b/examples/l3fwd/l3fwd_common.h
index 8e4c27218f..224b1c08e8 100644
--- a/examples/l3fwd/l3fwd_common.h
+++ b/examples/l3fwd/l3fwd_common.h
@@ -7,6 +7,8 @@ 
 #ifndef _L3FWD_COMMON_H_
 #define _L3FWD_COMMON_H_

+#include "pkt_group.h"
+
 #ifdef DO_RFC_1812_CHECKS

 #define	IPV4_MIN_VER_IHL	0x45
@@ -50,133 +52,6 @@  rfc1812_process(struct rte_ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 #define	rfc1812_process(mb, dp, ptype)	do { } while (0)
 #endif /* DO_RFC_1812_CHECKS */

-/*
- * We group consecutive packets with the same destination port into one burst.
- * To avoid extra latency this is done together with some other packet
- * processing, but after we made a final decision about packet's destination.
- * To do this we maintain:
- * pnum - array of number of consecutive packets with the same dest port for
- * each packet in the input burst.
- * lp - pointer to the last updated element in the pnum.
- * dlp - dest port value lp corresponds to.
- */
-
-#define	GRPSZ	(1 << FWDSTEP)
-#define	GRPMSK	(GRPSZ - 1)
-
-#define GROUP_PORT_STEP(dlp, dcp, lp, pn, idx)	do { \
-	if (likely((dlp) == (dcp)[(idx)])) {             \
-		(lp)[0]++;                                   \
-	} else {                                         \
-		(dlp) = (dcp)[idx];                          \
-		(lp) = (pn) + (idx);                         \
-		(lp)[0] = 1;                                 \
-	}                                                \
-} while (0)
-
-static const struct {
-	uint64_t pnum; /* prebuild 4 values for pnum[]. */
-	int32_t  idx;  /* index for new last updated element. */
-	uint16_t lpv;  /* add value to the last updated element. */
-} gptbl[GRPSZ] = {
-	{
-		/* 0: a != b, b != c, c != d, d != e */
-		.pnum = UINT64_C(0x0001000100010001),
-		.idx = 4,
-		.lpv = 0,
-	},
-	{
-		/* 1: a == b, b != c, c != d, d != e */
-		.pnum = UINT64_C(0x0001000100010002),
-		.idx = 4,
-		.lpv = 1,
-	},
-	{
-		/* 2: a != b, b == c, c != d, d != e */
-		.pnum = UINT64_C(0x0001000100020001),
-		.idx = 4,
-		.lpv = 0,
-	},
-	{
-		/* 3: a == b, b == c, c != d, d != e */
-		.pnum = UINT64_C(0x0001000100020003),
-		.idx = 4,
-		.lpv = 2,
-	},
-	{
-		/* 4: a != b, b != c, c == d, d != e */
-		.pnum = UINT64_C(0x0001000200010001),
-		.idx = 4,
-		.lpv = 0,
-	},
-	{
-		/* 5: a == b, b != c, c == d, d != e */
-		.pnum = UINT64_C(0x0001000200010002),
-		.idx = 4,
-		.lpv = 1,
-	},
-	{
-		/* 6: a != b, b == c, c == d, d != e */
-		.pnum = UINT64_C(0x0001000200030001),
-		.idx = 4,
-		.lpv = 0,
-	},
-	{
-		/* 7: a == b, b == c, c == d, d != e */
-		.pnum = UINT64_C(0x0001000200030004),
-		.idx = 4,
-		.lpv = 3,
-	},
-	{
-		/* 8: a != b, b != c, c != d, d == e */
-		.pnum = UINT64_C(0x0002000100010001),
-		.idx = 3,
-		.lpv = 0,
-	},
-	{
-		/* 9: a == b, b != c, c != d, d == e */
-		.pnum = UINT64_C(0x0002000100010002),
-		.idx = 3,
-		.lpv = 1,
-	},
-	{
-		/* 0xa: a != b, b == c, c != d, d == e */
-		.pnum = UINT64_C(0x0002000100020001),
-		.idx = 3,
-		.lpv = 0,
-	},
-	{
-		/* 0xb: a == b, b == c, c != d, d == e */
-		.pnum = UINT64_C(0x0002000100020003),
-		.idx = 3,
-		.lpv = 2,
-	},
-	{
-		/* 0xc: a != b, b != c, c == d, d == e */
-		.pnum = UINT64_C(0x0002000300010001),
-		.idx = 2,
-		.lpv = 0,
-	},
-	{
-		/* 0xd: a == b, b != c, c == d, d == e */
-		.pnum = UINT64_C(0x0002000300010002),
-		.idx = 2,
-		.lpv = 1,
-	},
-	{
-		/* 0xe: a != b, b == c, c == d, d == e */
-		.pnum = UINT64_C(0x0002000300040001),
-		.idx = 1,
-		.lpv = 0,
-	},
-	{
-		/* 0xf: a == b, b == c, c == d, d == e */
-		.pnum = UINT64_C(0x0002000300040005),
-		.idx = 0,
-		.lpv = 4,
-	},
-};
-
 static __rte_always_inline void
 send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[],
 		uint32_t num)
diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
index e3d33a5229..ce515e0bc4 100644
--- a/examples/l3fwd/l3fwd_neon.h
+++ b/examples/l3fwd/l3fwd_neon.h
@@ -7,6 +7,7 @@ 
 #define _L3FWD_NEON_H_

 #include "l3fwd.h"
+#include "neon/port_group.h"
 #include "l3fwd_common.h"

 /*
@@ -62,44 +63,6 @@  processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 			&dst_port[3], pkt[3]->packet_type);
 }

-/*
- * Group consecutive packets with the same destination port in bursts of 4.
- * Suppose we have array of destination ports:
- * dst_port[] = {a, b, c, d,, e, ... }
- * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
- * We doing 4 comparisons at once and the result is 4 bit mask.
- * This mask is used as an index into prebuild array of pnum values.
- */
-static inline uint16_t *
-port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, uint16x8_t dp1,
-	     uint16x8_t dp2)
-{
-	union {
-		uint16_t u16[FWDSTEP + 1];
-		uint64_t u64;
-	} *pnum = (void *)pn;
-
-	int32_t v;
-	uint16x8_t mask = {1, 2, 4, 8, 0, 0, 0, 0};
-
-	dp1 = vceqq_u16(dp1, dp2);
-	dp1 = vandq_u16(dp1, mask);
-	v = vaddvq_u16(dp1);
-
-	/* update last port counter. */
-	lp[0] += gptbl[v].lpv;
-	rte_compiler_barrier();
-
-	/* if dest port value has changed. */
-	if (v != GRPMSK) {
-		pnum->u64 = gptbl[v].pnum;
-		pnum->u16[FWDSTEP] = 1;
-		lp = pnum->u16 + gptbl[v].idx;
-	}
-
-	return lp;
-}
-
 /**
  * Process one packet:
  * Update source and destination MAC addresses in the ethernet header.
diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
index d5a717e18c..0f0d0323a2 100644
--- a/examples/l3fwd/l3fwd_sse.h
+++ b/examples/l3fwd/l3fwd_sse.h
@@ -7,6 +7,7 @@ 
 #define _L3FWD_SSE_H_

 #include "l3fwd.h"
+#include "sse/port_group.h"
 #include "l3fwd_common.h"

 /*
@@ -62,41 +63,6 @@  processx4_step3(struct rte_mbuf *pkt[FWDSTEP], uint16_t dst_port[FWDSTEP])
 			&dst_port[3], pkt[3]->packet_type);
 }

-/*
- * Group consecutive packets with the same destination port in bursts of 4.
- * Suppose we have array of destination ports:
- * dst_port[] = {a, b, c, d,, e, ... }
- * dp1 should contain: <a, b, c, d>, dp2: <b, c, d, e>.
- * We doing 4 comparisons at once and the result is 4 bit mask.
- * This mask is used as an index into prebuild array of pnum values.
- */
-static inline uint16_t *
-port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
-{
-	union {
-		uint16_t u16[FWDSTEP + 1];
-		uint64_t u64;
-	} *pnum = (void *)pn;
-
-	int32_t v;
-
-	dp1 = _mm_cmpeq_epi16(dp1, dp2);
-	dp1 = _mm_unpacklo_epi16(dp1, dp1);
-	v = _mm_movemask_ps((__m128)dp1);
-
-	/* update last port counter. */
-	lp[0] += gptbl[v].lpv;
-
-	/* if dest port value has changed. */
-	if (v != GRPMSK) {
-		pnum->u64 = gptbl[v].pnum;
-		pnum->u16[FWDSTEP] = 1;
-		lp = pnum->u16 + gptbl[v].idx;
-	}
-
-	return lp;
-}
-
 /**
  * Process one packet:
  * Update source and destination MAC addresses in the ethernet header.
diff --git a/examples/meson.build b/examples/meson.build
index 78de0e1f37..81e93799f2 100644
--- a/examples/meson.build
+++ b/examples/meson.build
@@ -97,7 +97,7 @@  foreach example: examples
     ldflags = default_ldflags

     ext_deps = []
-    includes = [include_directories(example)]
+    includes = [include_directories(example, 'common')]
     deps = ['eal', 'mempool', 'net', 'mbuf', 'ethdev', 'cmdline']
     subdir(example)