gro : fix reordering of packets in GRO library

Message ID 20221013101854.95244-1-kumaraparmesh92@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series gro : fix reordering of packets in GRO library |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-aarch64-compile-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed

Commit Message

Kumara Parameshwaran Oct. 13, 2022, 10:18 a.m. UTC
  From: Kumara Parameshwaran <kumaraparamesh92@gmail.com>

When a TCP packet contains flags like PSH it is returned
immediately to the application though there might be packets of
the same flow in the GRO table. If PSH flag is set on a segment
packets upto the segment should be delivered immediately. But the
current implementation delivers the last arrived packet with PSH flag
set causing re-ordering

With this patch, if a packet does not contain only ACK flag and if there are
no previous packets for the flow the packet would be returned
immediately, else will be merged with the previous segment and the
flag on the last segment will be set on the entire segment.
This is the behaviour with linux stack as well

Signed-off-by: Kumara Parameshwaran <kumaraparamesh92@gmail.com>
---
v1:
    If the received packet is not a pure ACK packet, we check if
    there are any previous packets in the flow, if present we indulge
    the received packet also in the coalescing logic and update the flags
    of the last recived packet to the entire segment which would avoid
    re-ordering.

    Lets say a case where P1(PSH), P2(ACK), P3(ACK)  are received in burst mode,
    P1 contains PSH flag and since it does not contain any prior packets in the flow
    we copy it to unprocess_packets and P2(ACK) and P3(ACK) are merged together.
    In the existing case the  P2,P3 would be delivered as single segment first and the
    unprocess_packets will be copied later which will cause reordering. With the patch
    copy the unprocess packets first and then the packets from the GRO table.

    Testing done
    The csum test-pmd was modifited to support the following
    GET request of 10MB from client to server via test-pmd (static arp entries added in client
    and server). Enable GRO and TSO in test-pmd where the packets recived from the client mac
    would be sent to server mac and vice versa.
    In above testing, without the patch the client observerd re-ordering of 25 packets
    and with the patch there were no packet re-ordering observerd.

 lib/gro/gro_tcp4.c | 35 ++++++++++++++++++++++++++++-------
 lib/gro/rte_gro.c  | 18 +++++++++---------
 2 files changed, 37 insertions(+), 16 deletions(-)
  

Comments

Kumara Parameshwaran Oct. 13, 2022, 10:20 a.m. UTC | #1
Please find the attached pcap files for the testing done.

Thanks,
Kumara.

On Thu, Oct 13, 2022 at 3:49 PM Kumara Parameshwaran <
kumaraparamesh92@gmail.com> wrote:

> From: Kumara Parameshwaran <kumaraparamesh92@gmail.com>
>
> When a TCP packet contains flags like PSH it is returned
> immediately to the application though there might be packets of
> the same flow in the GRO table. If PSH flag is set on a segment
> packets upto the segment should be delivered immediately. But the
> current implementation delivers the last arrived packet with PSH flag
> set causing re-ordering
>
> With this patch, if a packet does not contain only ACK flag and if there
> are
> no previous packets for the flow the packet would be returned
> immediately, else will be merged with the previous segment and the
> flag on the last segment will be set on the entire segment.
> This is the behaviour with linux stack as well
>
> Signed-off-by: Kumara Parameshwaran <kumaraparamesh92@gmail.com>
> ---
> v1:
>     If the received packet is not a pure ACK packet, we check if
>     there are any previous packets in the flow, if present we indulge
>     the received packet also in the coalescing logic and update the flags
>     of the last recived packet to the entire segment which would avoid
>     re-ordering.
>
>     Lets say a case where P1(PSH), P2(ACK), P3(ACK)  are received in burst
> mode,
>     P1 contains PSH flag and since it does not contain any prior packets
> in the flow
>     we copy it to unprocess_packets and P2(ACK) and P3(ACK) are merged
> together.
>     In the existing case the  P2,P3 would be delivered as single segment
> first and the
>     unprocess_packets will be copied later which will cause reordering.
> With the patch
>     copy the unprocess packets first and then the packets from the GRO
> table.
>
>     Testing done
>     The csum test-pmd was modifited to support the following
>     GET request of 10MB from client to server via test-pmd (static arp
> entries added in client
>     and server). Enable GRO and TSO in test-pmd where the packets recived
> from the client mac
>     would be sent to server mac and vice versa.
>     In above testing, without the patch the client observerd re-ordering
> of 25 packets
>     and with the patch there were no packet re-ordering observerd.
>
>  lib/gro/gro_tcp4.c | 35 ++++++++++++++++++++++++++++-------
>  lib/gro/rte_gro.c  | 18 +++++++++---------
>  2 files changed, 37 insertions(+), 16 deletions(-)
>
> diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c
> index 8f5e800250..9ed891c253 100644
> --- a/lib/gro/gro_tcp4.c
> +++ b/lib/gro/gro_tcp4.c
> @@ -188,6 +188,19 @@ update_header(struct gro_tcp4_item *item)
>                         pkt->l2_len);
>  }
>
> +static inline void
> +update_tcp_hdr_flags(struct rte_tcp_hdr *tcp_hdr, struct rte_mbuf *pkt)
> +{
> +       struct rte_ether_hdr *eth_hdr;
> +       struct rte_ipv4_hdr *ipv4_hdr;
> +       struct rte_tcp_hdr *merged_tcp_hdr;
> +
> +       eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *);
> +       ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> +       merged_tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr +
> pkt->l3_len);
> +       merged_tcp_hdr->tcp_flags |= tcp_hdr->tcp_flags;
> +}
> +
>  int32_t
>  gro_tcp4_reassemble(struct rte_mbuf *pkt,
>                 struct gro_tcp4_tbl *tbl,
> @@ -206,6 +219,7 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>         uint32_t i, max_flow_num, remaining_flow_num;
>         int cmp;
>         uint8_t find;
> +       uint32_t start_idx;
>
>         /*
>          * Don't process the packet whose TCP header length is greater
> @@ -219,12 +233,6 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>         tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
>         hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>
> -       /*
> -        * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE
> -        * or CWR set.
> -        */
> -       if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG)
> -               return -1;
>         /*
>          * Don't process the packet whose payload length is less than or
>          * equal to 0.
> @@ -263,12 +271,21 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>                 if (tbl->flows[i].start_index != INVALID_ARRAY_INDEX) {
>                         if (is_same_tcp4_flow(tbl->flows[i].key, key)) {
>                                 find = 1;
> +                               start_idx = tbl->flows[i].start_index;
>                                 break;
>                         }
>                         remaining_flow_num--;
>                 }
>         }
>
> +       if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG) {
> +               if (find)
> +                       /* Since PSH flag is set, start time will be set
> to 0 so it will be flushed immediately */
> +                       tbl->items[start_idx].start_time = 0;
> +               else
> +                       return -1;
> +       }
> +
>         /*
>          * Fail to find a matched flow. Insert a new flow and store the
>          * packet into the flow.
> @@ -303,8 +320,12 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>                                 is_atomic);
>                 if (cmp) {
>                         if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
> -                                               pkt, cmp, sent_seq, ip_id,
> 0))
> +                                               pkt, cmp, sent_seq, ip_id,
> 0)) {
> +                               if (tbl->items[cur_idx].start_time == 0)
> +                                       update_tcp_hdr_flags(tcp_hdr,
> tbl->items[cur_idx].firstseg);
>                                 return 1;
> +                       }
> +
>                         /*
>                          * Fail to merge the two packets, as the packet
>                          * length is greater than the max value. Store
> diff --git a/lib/gro/rte_gro.c b/lib/gro/rte_gro.c
> index e35399fd42..87c5502dce 100644
> --- a/lib/gro/rte_gro.c
> +++ b/lib/gro/rte_gro.c
> @@ -283,10 +283,17 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>         if ((nb_after_gro < nb_pkts)
>                  || (unprocess_num < nb_pkts)) {
>                 i = 0;
> +               /* Copy unprocessed packets */
> +               if (unprocess_num > 0) {
> +                       memcpy(&pkts[i], unprocess_pkts,
> +                                       sizeof(struct rte_mbuf *) *
> +                                       unprocess_num);
> +                       i = unprocess_num;
> +               }
>                 /* Flush all packets from the tables */
>                 if (do_vxlan_tcp_gro) {
> -                       i =
> gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
> -                                       0, pkts, nb_pkts);
> +                       i +=
> gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
> +                                       0, &pkts[i], nb_pkts - i);
>                 }
>
>                 if (do_vxlan_udp_gro) {
> @@ -304,13 +311,6 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>                         i += gro_udp4_tbl_timeout_flush(&udp_tbl, 0,
>                                         &pkts[i], nb_pkts - i);
>                 }
> -               /* Copy unprocessed packets */
> -               if (unprocess_num > 0) {
> -                       memcpy(&pkts[i], unprocess_pkts,
> -                                       sizeof(struct rte_mbuf *) *
> -                                       unprocess_num);
> -               }
> -               nb_after_gro = i + unprocess_num;
>         }
>
>         return nb_after_gro;
> --
> 2.25.1
>
>
  

Patch

diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c
index 8f5e800250..9ed891c253 100644
--- a/lib/gro/gro_tcp4.c
+++ b/lib/gro/gro_tcp4.c
@@ -188,6 +188,19 @@  update_header(struct gro_tcp4_item *item)
 			pkt->l2_len);
 }
 
+static inline void
+update_tcp_hdr_flags(struct rte_tcp_hdr *tcp_hdr, struct rte_mbuf *pkt)
+{
+	struct rte_ether_hdr *eth_hdr;
+	struct rte_ipv4_hdr *ipv4_hdr;
+	struct rte_tcp_hdr *merged_tcp_hdr;
+
+	eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *);
+	ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
+	merged_tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
+	merged_tcp_hdr->tcp_flags |= tcp_hdr->tcp_flags;
+}
+
 int32_t
 gro_tcp4_reassemble(struct rte_mbuf *pkt,
 		struct gro_tcp4_tbl *tbl,
@@ -206,6 +219,7 @@  gro_tcp4_reassemble(struct rte_mbuf *pkt,
 	uint32_t i, max_flow_num, remaining_flow_num;
 	int cmp;
 	uint8_t find;
+	uint32_t start_idx;
 
 	/*
 	 * Don't process the packet whose TCP header length is greater
@@ -219,12 +233,6 @@  gro_tcp4_reassemble(struct rte_mbuf *pkt,
 	tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
 	hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len;
 
-	/*
-	 * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE
-	 * or CWR set.
-	 */
-	if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG)
-		return -1;
 	/*
 	 * Don't process the packet whose payload length is less than or
 	 * equal to 0.
@@ -263,12 +271,21 @@  gro_tcp4_reassemble(struct rte_mbuf *pkt,
 		if (tbl->flows[i].start_index != INVALID_ARRAY_INDEX) {
 			if (is_same_tcp4_flow(tbl->flows[i].key, key)) {
 				find = 1;
+				start_idx = tbl->flows[i].start_index;
 				break;
 			}
 			remaining_flow_num--;
 		}
 	}
 
+	if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG) {
+		if (find)
+			/* Since PSH flag is set, start time will be set to 0 so it will be flushed immediately */
+			tbl->items[start_idx].start_time = 0;
+		else
+			return -1;
+	}
+
 	/*
 	 * Fail to find a matched flow. Insert a new flow and store the
 	 * packet into the flow.
@@ -303,8 +320,12 @@  gro_tcp4_reassemble(struct rte_mbuf *pkt,
 				is_atomic);
 		if (cmp) {
 			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
-						pkt, cmp, sent_seq, ip_id, 0))
+						pkt, cmp, sent_seq, ip_id, 0)) {
+				if (tbl->items[cur_idx].start_time == 0)
+					update_tcp_hdr_flags(tcp_hdr, tbl->items[cur_idx].firstseg);
 				return 1;
+			}
+
 			/*
 			 * Fail to merge the two packets, as the packet
 			 * length is greater than the max value. Store
diff --git a/lib/gro/rte_gro.c b/lib/gro/rte_gro.c
index e35399fd42..87c5502dce 100644
--- a/lib/gro/rte_gro.c
+++ b/lib/gro/rte_gro.c
@@ -283,10 +283,17 @@  rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 	if ((nb_after_gro < nb_pkts)
 		 || (unprocess_num < nb_pkts)) {
 		i = 0;
+		/* Copy unprocessed packets */
+		if (unprocess_num > 0) {
+			memcpy(&pkts[i], unprocess_pkts,
+					sizeof(struct rte_mbuf *) *
+					unprocess_num);
+			i = unprocess_num;
+		}
 		/* Flush all packets from the tables */
 		if (do_vxlan_tcp_gro) {
-			i = gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
-					0, pkts, nb_pkts);
+			i += gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
+					0, &pkts[i], nb_pkts - i);
 		}
 
 		if (do_vxlan_udp_gro) {
@@ -304,13 +311,6 @@  rte_gro_reassemble_burst(struct rte_mbuf **pkts,
 			i += gro_udp4_tbl_timeout_flush(&udp_tbl, 0,
 					&pkts[i], nb_pkts - i);
 		}
-		/* Copy unprocessed packets */
-		if (unprocess_num > 0) {
-			memcpy(&pkts[i], unprocess_pkts,
-					sizeof(struct rte_mbuf *) *
-					unprocess_num);
-		}
-		nb_after_gro = i + unprocess_num;
 	}
 
 	return nb_after_gro;