Message ID | 1423041925-26956-4-git-send-email-olivier.matz@6wind.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 8A31FADCE; Wed, 4 Feb 2015 10:25:46 +0100 (CET) Received: from mail-wg0-f54.google.com (mail-wg0-f54.google.com [74.125.82.54]) by dpdk.org (Postfix) with ESMTP id 7B04CADBA for <dev@dpdk.org>; Wed, 4 Feb 2015 10:25:43 +0100 (CET) Received: by mail-wg0-f54.google.com with SMTP id b13so494262wgh.13 for <dev@dpdk.org>; Wed, 04 Feb 2015 01:25:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=0Cxnv5wdfhplHhIbTqbYbf7ag2v7uhwF0/L8l4ctjp8=; b=gCF3xXLJ6+nM0aDp2JKXtWY3UiY8UDtY5yw76oaZhGLWBirZ0orwI/ccUOo/d1PG+A oKwM56NyheiORprDWa7nD1FAGvzPoEQIi5dVgjlhCwr89Qix3Z9YWc1mzchlPXL9SXKC VOod4VsE7sljpCy6Uczte9ZRhgTHWJOCZlY0UH3zVid+ZUXcFgUcKJThQG7QiufXWw+O pUqnn7anjIkcTPcEwyJg32SW9SNu2YYT+qnKnjuzXYjbfO4ZVxKFDzUACHKq/01at0m9 NGgy8m5+3szhtEOohQIODrzd+8LpV8S62HcmkgyfNXKG8VOnamPyNldcfBfZAnCj+ut8 DlZQ== X-Gm-Message-State: ALoCoQk+NNo9YDLOoJG0ZItF9y0HM6HtwHBuxi6n2Pn7BFu+L0ZE3QVGH3tWoty7Tl7ZeeJ8xF3r X-Received: by 10.194.223.68 with SMTP id qs4mr26596579wjc.58.1423041943369; Wed, 04 Feb 2015 01:25:43 -0800 (PST) Received: from glumotte.dev.6wind.com (6wind.net2.nerim.net. [213.41.180.237]) by mx.google.com with ESMTPSA id a13sm2346491wic.3.2015.02.04.01.25.41 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 04 Feb 2015 01:25:42 -0800 (PST) From: Olivier Matz <olivier.matz@6wind.com> To: dev@dpdk.org Date: Wed, 4 Feb 2015 10:25:08 +0100 Message-Id: <1423041925-26956-4-git-send-email-olivier.matz@6wind.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1423041925-26956-1-git-send-email-olivier.matz@6wind.com> References: <1422623775-8050-1-git-send-email-olivier.matz@6wind.com> <1423041925-26956-1-git-send-email-olivier.matz@6wind.com> Subject: [dpdk-dev] [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for offloaded packets X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Olivier Matz
Feb. 4, 2015, 9:25 a.m. UTC
From i40e datasheet:
The IP header type and its offload. In case of tunneling, the IIPT
relates to the inner IP header. See also EIPT field for the outer
(External) IP header offload.
00 - non IP packet or packet type is not defined by software
01 - IPv6 packet
10 - IPv4 packet with no IP checksum offload
11 - IPv4 packet with IP checksum offload
Therefore it is not needed to fill the IIPT field if no offload is
requested (we can keep the value to 00). For instance, the linux driver
code does not set it when (skb->ip_summed != CHECKSUM_PARTIAL). We can
do the same in the dpdk driver.
The function i40e_txd_enable_checksum() that fills the offload registers
can only be called for packets requiring an offload.
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
lib/librte_pmd_i40e/i40e_rxtx.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
Comments
> -----Original Message----- > From: Olivier Matz [mailto:olivier.matz@6wind.com] > Sent: Wednesday, February 4, 2015 5:25 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin; Liu, Jijiang; Zhang, Helin; olivier.matz@6wind.com > Subject: [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for > offloaded packets > > From i40e datasheet: > > The IP header type and its offload. In case of tunneling, the IIPT > relates to the inner IP header. See also EIPT field for the outer > (External) IP header offload. > > 00 - non IP packet or packet type is not defined by software > 01 - IPv6 packet > 10 - IPv4 packet with no IP checksum offload > 11 - IPv4 packet with IP checksum offload > > Therefore it is not needed to fill the IIPT field if no offload is requested (we can > keep the value to 00). For instance, the linux driver code does not set it when > (skb->ip_summed != CHECKSUM_PARTIAL). We can do the same in the dpdk > driver. > > The function i40e_txd_enable_checksum() that fills the offload registers can > only be called for packets requiring an offload. > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com> > --- > lib/librte_pmd_i40e/i40e_rxtx.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c > index 8e9df96..9acdeee 100644 > --- a/lib/librte_pmd_i40e/i40e_rxtx.c > +++ b/lib/librte_pmd_i40e/i40e_rxtx.c > @@ -74,6 +74,11 @@ > > #define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | > I40E_TX_DESC_CMD_RS) > > +#define I40E_TX_CKSUM_OFFLOAD_MASK ( \ > + PKT_TX_IP_CKSUM | \ > + PKT_TX_L4_MASK | \ > + PKT_TX_OUTER_IP_CKSUM) > + > #define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \ > (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM) > > @@ -1272,10 +1277,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf > **tx_pkts, uint16_t nb_pkts) > > /* Enable checksum offloading */ > cd_tunneling_params = 0; > - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, > - l2_len, l3_len, outer_l2_len, > - outer_l3_len, > - &cd_tunneling_params); > + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { likely should be added. > + i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, > + l2_len, l3_len, outer_l2_len, > + outer_l3_len, > + &cd_tunneling_params); > + } As this code changes are in fast path, performance regression test is needed. I would like to see the performance difference with or without this patch set. Hopefully nothing different. If you need any helps, just let me know. Regards, Helin > > if (unlikely(nb_ctx)) { > /* Setup TX context descriptor if required */ > -- > 2.1.4
Hi Helin, On 02/10/2015 07:03 AM, Zhang, Helin wrote: >> /* Enable checksum offloading */ >> cd_tunneling_params = 0; >> - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, >> - l2_len, l3_len, outer_l2_len, >> - outer_l3_len, >> - &cd_tunneling_params); >> + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { > likely should be added. I would say unlikely() instead. I think the non-offload case should be the default one. What do you think? >> + i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, >> + l2_len, l3_len, outer_l2_len, >> + outer_l3_len, >> + &cd_tunneling_params); >> + } > As this code changes are in fast path, performance regression test is needed. I would > like to see the performance difference with or without this patch set. Hopefully nothing > different. If you need any helps, just let me know. I'm sorry, I won't have the needed resources to bench this as I would have to setup a performance platform with i40e devices. But I'm pretty sure that the code in non-offload case would be faster with this patch as it will avoid many operations in i40e_txd_enable_checksum(). For the offload case, as we also removed the if (l2_len == 0) and if (l3_len == 0), I think there are also less tests than before my patch series. So in my opinion, adding this test does not really justify to check the performance. Regards, Olivier
> -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Wednesday, February 11, 2015 1:07 AM > To: Zhang, Helin; dev@dpdk.org > Cc: Ananyev, Konstantin; Liu, Jijiang > Subject: Re: [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for > offloaded packets > > Hi Helin, > > On 02/10/2015 07:03 AM, Zhang, Helin wrote: > >> /* Enable checksum offloading */ > >> cd_tunneling_params = 0; > >> - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, > >> - l2_len, l3_len, outer_l2_len, > >> - outer_l3_len, > >> - &cd_tunneling_params); > >> + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { > > likely should be added. > > I would say unlikely() instead. I think the non-offload case should be the default > one. What do you think? > > >> + i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, > >> + l2_len, l3_len, outer_l2_len, > >> + outer_l3_len, > >> + &cd_tunneling_params); > >> + } > > As this code changes are in fast path, performance regression test is > > needed. I would like to see the performance difference with or without > > this patch set. Hopefully nothing different. If you need any helps, just let me > know. > > I'm sorry, I won't have the needed resources to bench this as I would have to > setup a performance platform with i40e devices. > > But I'm pretty sure that the code in non-offload case would be faster with this > patch as it will avoid many operations in i40e_txd_enable_checksum(). > > For the offload case, as we also removed the if (l2_len == 0) and if (l3_len == 0), > I think there are also less tests than before my patch series. > > So in my opinion, adding this test does not really justify to check the > performance. As 40G is quite sensitive on cpu cycles, we'd better to avoid any performance drop during our modifying the code for fast path. Performance is what we care about too much. Based on my experiences, even minor code changes may result in big performance impact. It seems that we may need to help you on performance measurement. Regards, Helin > > Regards, > Olivier
Hi Helin, On 02/11/2015 06:32 AM, Zhang, Helin wrote: >> On 02/10/2015 07:03 AM, Zhang, Helin wrote: >>>> /* Enable checksum offloading */ >>>> cd_tunneling_params = 0; >>>> - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, >>>> - l2_len, l3_len, outer_l2_len, >>>> - outer_l3_len, >>>> - &cd_tunneling_params); >>>> + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { >>> likely should be added. >> >> I would say unlikely() instead. I think the non-offload case should be the default >> one. What do you think? Maybe you missed this comment. Any thoughts? >>>> + i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, >>>> + l2_len, l3_len, outer_l2_len, >>>> + outer_l3_len, >>>> + &cd_tunneling_params); >>>> + } >>> As this code changes are in fast path, performance regression test is >>> needed. I would like to see the performance difference with or without >>> this patch set. Hopefully nothing different. If you need any helps, just let me >> know. >> >> I'm sorry, I won't have the needed resources to bench this as I would have to >> setup a performance platform with i40e devices. >> >> But I'm pretty sure that the code in non-offload case would be faster with this >> patch as it will avoid many operations in i40e_txd_enable_checksum(). >> >> For the offload case, as we also removed the if (l2_len == 0) and if (l3_len == 0), >> I think there are also less tests than before my patch series. >> >> So in my opinion, adding this test does not really justify to check the >> performance. > As 40G is quite sensitive on cpu cycles, we'd better to avoid any performance drop > during our modifying the code for fast path. Performance is what we care about too > much. Based on my experiences, even minor code changes may result in big > performance impact. > It seems that we may need to help you on performance measurement. Thanks, indeed it's helpful if you can check performance non-regression. Regards, Olivier
> -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Thursday, February 12, 2015 1:14 AM > To: Zhang, Helin; dev@dpdk.org > Cc: Ananyev, Konstantin; Liu, Jijiang > Subject: Re: [PATCH v2 03/20] i40e: call i40e_txd_enable_checksum only for > offloaded packets > > Hi Helin, > > On 02/11/2015 06:32 AM, Zhang, Helin wrote: > >> On 02/10/2015 07:03 AM, Zhang, Helin wrote: > >>>> /* Enable checksum offloading */ > >>>> cd_tunneling_params = 0; > >>>> - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, > >>>> - l2_len, l3_len, outer_l2_len, > >>>> - outer_l3_len, > >>>> - &cd_tunneling_params); > >>>> + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { > >>> likely should be added. > >> > >> I would say unlikely() instead. I think the non-offload case should > >> be the default one. What do you think? > > Maybe you missed this comment. Any thoughts? Ohh, sorry for the missing! I'd prefer to have likely, as hardware offload is preferred if there is. If you don't think so, how about to keep nothing (no likely/unlikely) as it is. > > > >>>> + i40e_txd_enable_checksum(ol_flags, &td_cmd, > &td_offset, > >>>> + l2_len, l3_len, outer_l2_len, > >>>> + outer_l3_len, > >>>> + &cd_tunneling_params); > >>>> + } > >>> As this code changes are in fast path, performance regression test > >>> is needed. I would like to see the performance difference with or > >>> without this patch set. Hopefully nothing different. If you need any > >>> helps, just let me > >> know. > >> > >> I'm sorry, I won't have the needed resources to bench this as I would > >> have to setup a performance platform with i40e devices. > >> > >> But I'm pretty sure that the code in non-offload case would be faster > >> with this patch as it will avoid many operations in > i40e_txd_enable_checksum(). > >> > >> For the offload case, as we also removed the if (l2_len == 0) and if > >> (l3_len == 0), I think there are also less tests than before my patch series. > >> > >> So in my opinion, adding this test does not really justify to check > >> the performance. > > As 40G is quite sensitive on cpu cycles, we'd better to avoid any > > performance drop during our modifying the code for fast path. > > Performance is what we care about too much. Based on my experiences, > > even minor code changes may result in big performance impact. > > It seems that we may need to help you on performance measurement. > > Thanks, indeed it's helpful if you can check performance non-regression. I have asked our validation guys here to help you on that, but might not in high priority. In addition, we all will take vocation for the coming Chinese new year. Regards, Helin > > Regards, > Olivier
Hi, On 02/13/2015 03:25 AM, Zhang, Helin wrote: >> On 02/11/2015 06:32 AM, Zhang, Helin wrote: >>>> On 02/10/2015 07:03 AM, Zhang, Helin wrote: >>>>>> /* Enable checksum offloading */ >>>>>> cd_tunneling_params = 0; >>>>>> - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, >>>>>> - l2_len, l3_len, outer_l2_len, >>>>>> - outer_l3_len, >>>>>> - &cd_tunneling_params); >>>>>> + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { >>>>> likely should be added. >>>> >>>> I would say unlikely() instead. I think the non-offload case should >>>> be the default one. What do you think? >> >> Maybe you missed this comment. Any thoughts? > Ohh, sorry for the missing! > I'd prefer to have likely, as hardware offload is preferred if there is. If you > don't think so, how about to keep nothing (no likely/unlikely) as it is. OK, I'll use likely() as you initially suggested. >>> As 40G is quite sensitive on cpu cycles, we'd better to avoid any >>> performance drop during our modifying the code for fast path. >>> Performance is what we care about too much. Based on my experiences, >>> even minor code changes may result in big performance impact. >>> It seems that we may need to help you on performance measurement. >> >> Thanks, indeed it's helpful if you can check performance non-regression. > I have asked our validation guys here to help you on that, but might not in > high priority. In addition, we all will take vocation for the coming Chinese new year. OK, it's noted Thanks, Olivier
diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c index 8e9df96..9acdeee 100644 --- a/lib/librte_pmd_i40e/i40e_rxtx.c +++ b/lib/librte_pmd_i40e/i40e_rxtx.c @@ -74,6 +74,11 @@ #define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS) +#define I40E_TX_CKSUM_OFFLOAD_MASK ( \ + PKT_TX_IP_CKSUM | \ + PKT_TX_L4_MASK | \ + PKT_TX_OUTER_IP_CKSUM) + #define RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb) \ (uint64_t) ((mb)->buf_physaddr + RTE_PKTMBUF_HEADROOM) @@ -1272,10 +1277,12 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) /* Enable checksum offloading */ cd_tunneling_params = 0; - i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, - l2_len, l3_len, outer_l2_len, - outer_l3_len, - &cd_tunneling_params); + if (ol_flags & I40E_TX_CKSUM_OFFLOAD_MASK) { + i40e_txd_enable_checksum(ol_flags, &td_cmd, &td_offset, + l2_len, l3_len, outer_l2_len, + outer_l3_len, + &cd_tunneling_params); + } if (unlikely(nb_ctx)) { /* Setup TX context descriptor if required */