[dpdk-dev] app/testpmd: add packet data prefetch in macswap loop

Message ID 1462190377-26865-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Jerin Jacob May 2, 2016, 11:59 a.m. UTC
  prefetch the next packet data address in advance in macswap loop
for performance improvement.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 app/test-pmd/macswap.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

De Lara Guarch, Pablo May 2, 2016, 5:48 p.m. UTC | #1
Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Monday, May 02, 2016 1:00 PM
> To: dev@dpdk.org
> Cc: De Lara Guarch, Pablo; Jerin Jacob
> Subject: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in
> macswap loop
> 
> prefetch the next packet data address in advance in macswap loop
> for performance improvement.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  app/test-pmd/macswap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index 154889d..c10f4b5 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
>  		ol_flags |= PKT_TX_QINQ_PKT;
>  	for (i = 0; i < nb_rx; i++) {
> +		if (likely(i < nb_rx - 1))
> +			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> +						       void *));
>  		mb = pkts_burst[i];
>  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> 
> --
> 2.1.0

This looks good. Could you also add it in the other forwarding modes (the ones that make changes in the packets)?

Thanks,
Pablo
  
Bruce Richardson May 3, 2016, 9:45 a.m. UTC | #2
On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote:
> prefetch the next packet data address in advance in macswap loop
> for performance improvement.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  app/test-pmd/macswap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index 154889d..c10f4b5 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
>  		ol_flags |= PKT_TX_QINQ_PKT;
>  	for (i = 0; i < nb_rx; i++) {
> +		if (likely(i < nb_rx - 1))
> +			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> +						       void *));

At least on IA platforms, there is no issue with prefetching beyond the end of
the array, since it's only a hint to the cpu. If this is true for other platforms,
then I suggest we just drop the conditional and just always prefetch.

/Bruce

>  		mb = pkts_burst[i];
>  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
>  
> -- 
> 2.1.0
>
  
De Lara Guarch, Pablo May 3, 2016, 9:48 a.m. UTC | #3
> -----Original Message-----
> From: Richardson, Bruce
> Sent: Tuesday, May 03, 2016 10:45 AM
> To: Jerin Jacob
> Cc: dev@dpdk.org; De Lara Guarch, Pablo
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in
> macswap loop
> 
> On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote:
> > prefetch the next packet data address in advance in macswap loop
> > for performance improvement.
> >
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  app/test-pmd/macswap.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> > index 154889d..c10f4b5 100644
> > --- a/app/test-pmd/macswap.c
> > +++ b/app/test-pmd/macswap.c
> > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >  	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
> >  		ol_flags |= PKT_TX_QINQ_PKT;
> >  	for (i = 0; i < nb_rx; i++) {
> > +		if (likely(i < nb_rx - 1))
> > +			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> > +						       void *));
> 
> At least on IA platforms, there is no issue with prefetching beyond the end of
> the array, since it's only a hint to the cpu. If this is true for other platforms,
> then I suggest we just drop the conditional and just always prefetch.

That's what I thought when I saw this patch, but the problem is that the prefetch is not for pkts_burst,
but for rte_pktmbuf_mtod(pkts_burst...), so there must be a limit in nb_rx - 2, or there will be a seg fault.

Pablo
> 
> /Bruce
> 
> >  		mb = pkts_burst[i];
> >  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> >
> > --
> > 2.1.0
> >
  
Ivan Boule May 3, 2016, 9:50 a.m. UTC | #4
On 05/03/2016 11:45 AM, Bruce Richardson wrote:
> On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote:
>> prefetch the next packet data address in advance in macswap loop
>> for performance improvement.
>>
>> ...
>>   	for (i = 0; i < nb_rx; i++) {
>> +		if (likely(i < nb_rx - 1))
>> +			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
>> +						       void *));
>
> At least on IA platforms, there is no issue with prefetching beyond the end of
> the array, since it's only a hint to the cpu. If this is true for other platforms,
> then I suggest we just drop the conditional and just always prefetch.

This is an interesting point.
Bruce, are you suggesting that prefetching at an invalid [virtual] 
address won't trigger a CPU exception?

Ivan

>
> /Bruce
>
>>   		mb = pkts_burst[i];
>>   		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
  
Bruce Richardson May 3, 2016, 10:16 a.m. UTC | #5
On Tue, May 03, 2016 at 10:48:34AM +0100, De Lara Guarch, Pablo wrote:
> 
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Tuesday, May 03, 2016 10:45 AM
> > To: Jerin Jacob
> > Cc: dev@dpdk.org; De Lara Guarch, Pablo
> > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in
> > macswap loop
> > 
> > On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote:
> > > prefetch the next packet data address in advance in macswap loop
> > > for performance improvement.
> > >
> > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > >  app/test-pmd/macswap.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> > > index 154889d..c10f4b5 100644
> > > --- a/app/test-pmd/macswap.c
> > > +++ b/app/test-pmd/macswap.c
> > > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > >  	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
> > >  		ol_flags |= PKT_TX_QINQ_PKT;
> > >  	for (i = 0; i < nb_rx; i++) {
> > > +		if (likely(i < nb_rx - 1))
> > > +			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> > > +						       void *));
> > 
> > At least on IA platforms, there is no issue with prefetching beyond the end of
> > the array, since it's only a hint to the cpu. If this is true for other platforms,
> > then I suggest we just drop the conditional and just always prefetch.
> 
> That's what I thought when I saw this patch, but the problem is that the prefetch is not for pkts_burst,
> but for rte_pktmbuf_mtod(pkts_burst...), so there must be a limit in nb_rx - 2, or there will be a seg fault.
> 
> Pablo
Good point, Pablo, I missed that subtlety here.

/Bruce
> > 
> > /Bruce
> > 
> > >  		mb = pkts_burst[i];
> > >  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> > >
> > > --
> > > 2.1.0
> > >
  
Bruce Richardson May 3, 2016, 10:20 a.m. UTC | #6
On Tue, May 03, 2016 at 11:50:31AM +0200, Ivan Boule wrote:
> On 05/03/2016 11:45 AM, Bruce Richardson wrote:
> >On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote:
> >>prefetch the next packet data address in advance in macswap loop
> >>for performance improvement.
> >>
> >>...
> >>  	for (i = 0; i < nb_rx; i++) {
> >>+		if (likely(i < nb_rx - 1))
> >>+			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> >>+						       void *));
> >
> >At least on IA platforms, there is no issue with prefetching beyond the end of
> >the array, since it's only a hint to the cpu. If this is true for other platforms,
> >then I suggest we just drop the conditional and just always prefetch.
> 
> This is an interesting point.
> Bruce, are you suggesting that prefetching at an invalid [virtual] address
> won't trigger a CPU exception?
> 

Yep. For example, adding "rte_prefetch0(NULL)" at the start of main in testpmd
causes no ill effects when running the app.

/Bruce
  
Jerin Jacob May 3, 2016, 12:46 p.m. UTC | #7
On Mon, May 02, 2016 at 05:48:02PM +0000, De Lara Guarch, Pablo wrote:
> Hi Jerin,
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Monday, May 02, 2016 1:00 PM
> > To: dev@dpdk.org
> > Cc: De Lara Guarch, Pablo; Jerin Jacob
> > Subject: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in
> > macswap loop
> > 
> > prefetch the next packet data address in advance in macswap loop
> > for performance improvement.
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >  app/test-pmd/macswap.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> > index 154889d..c10f4b5 100644
> > --- a/app/test-pmd/macswap.c
> > +++ b/app/test-pmd/macswap.c
> > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >  	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
> >  		ol_flags |= PKT_TX_QINQ_PKT;
> >  	for (i = 0; i < nb_rx; i++) {
> > +		if (likely(i < nb_rx - 1))
> > +			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> > +						       void *));
> >  		mb = pkts_burst[i];
> >  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> > 
> > --
> > 2.1.0
> 
> This looks good. Could you also add it in the other forwarding modes (the ones that make changes in the packets)?

OK Pablo.

I will add the similar logic in the following forwarding modes in
testpmd.

macswap
macfwd
macfwd-retry
csumonly
icmpecho

/Jerin

> 
> Thanks,
> Pablo
  
Ananyev, Konstantin May 10, 2016, 12:26 p.m. UTC | #8
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Tuesday, May 03, 2016 11:20 AM
> To: Ivan Boule
> Cc: Jerin Jacob; dev@dpdk.org; De Lara Guarch, Pablo
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
> 
> On Tue, May 03, 2016 at 11:50:31AM +0200, Ivan Boule wrote:
> > On 05/03/2016 11:45 AM, Bruce Richardson wrote:
> > >On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote:
> > >>prefetch the next packet data address in advance in macswap loop
> > >>for performance improvement.
> > >>
> > >>...
> > >>  	for (i = 0; i < nb_rx; i++) {
> > >>+		if (likely(i < nb_rx - 1))
> > >>+			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
> > >>+						       void *));
> > >
> > >At least on IA platforms, there is no issue with prefetching beyond the end of
> > >the array, since it's only a hint to the cpu. If this is true for other platforms,
> > >then I suggest we just drop the conditional and just always prefetch.
> >
> > This is an interesting point.
> > Bruce, are you suggesting that prefetching at an invalid [virtual] address
> > won't trigger a CPU exception?
> >
> 
> Yep. For example, adding "rte_prefetch0(NULL)" at the start of main in testpmd
> causes no ill effects when running the app.
> 

One correction - while on IA prefetch(inval_addr) wouldn't cause any functional problems,
it still might cause DTLB miss and can be a source of noticeable performance degradation.
So it is better to avoid such constructions for performance critical code.
Konstantin
  

Patch

diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index 154889d..c10f4b5 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -113,6 +113,9 @@  pkt_burst_mac_swap(struct fwd_stream *fs)
 	if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ)
 		ol_flags |= PKT_TX_QINQ_PKT;
 	for (i = 0; i < nb_rx; i++) {
+		if (likely(i < nb_rx - 1))
+			rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1],
+						       void *));
 		mb = pkts_burst[i];
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);