[dpdk-dev,v2] ethdev: make struct rte_eth_dev cache aligned

Message ID 1462279327-9876-1-git-send-email-jerin.jacob@caviumnetworks.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Jerin Jacob May 3, 2016, 12:42 p.m. UTC
  Elements of struct rte_eth_dev used in the fast path.
Make struct rte_eth_dev cache aligned to avoid the cases where
rte_eth_dev elements share the same cache line with other structures.

Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v2:
Remove __rte_cache_aligned from rte_eth_devices and keep
it only at struct rte_eth_dev definition as suggested by Bruce
http://dpdk.org/dev/patchwork/patch/12328/
---
 lib/librte_ether/rte_ethdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Bruce Richardson May 4, 2016, 11:09 a.m. UTC | #1
On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> v2:
> Remove __rte_cache_aligned from rte_eth_devices and keep
> it only at struct rte_eth_dev definition as suggested by Bruce
> http://dpdk.org/dev/patchwork/patch/12328/
> ---
>  lib/librte_ether/rte_ethdev.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 2757510..48f14d5 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
>  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>  	uint8_t attached; /**< Flag indicating the port is attached */
>  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> -};
> +} __rte_cache_aligned;
>  
>  struct rte_eth_dev_sriov {
>  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
> -- 

Hi Jerin,

have you seen a performance degradation due to ethdev elements sharing a cache
line? I ask because, surprisingly for me, I actually see a performance regression
when I apply the above patch. It's not a big change - perf reduction of <1% - but
still noticable across multiple runs using testpmd. I'm using two 1x40G NICs
using i40e driver, and I see ~100kpps less traffic per port after applying the
patch. [CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz]

Testpmd cmd and output shown below.

Regards,
/Bruce

$ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c C0000 -n 4 -- --rxd=512 --txd=512 --numa
EAL: Detected 36 lcore(s)
EAL: Probing VFIO support...
PMD: nfp_net_pmd_init(): librte_pmd_nfp_net version 0.1

EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL: PCI device 0000:05:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:05:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:08:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:08:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154a rte_ixgbe_pmd
EAL: PCI device 0000:81:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1584 rte_i40e_pmd
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
EAL: PCI device 0000:88:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1584 rte_i40e_pmd
PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
Configuring Port 0 (socket 1)
Port 0: 68:05:CA:27:D4:4E
Configuring Port 1 (socket 1)
Port 1: 68:05:CA:27:D2:0A
Checking link statuses...
Port 0 Link Up - speed 40000 Mbps - full-duplex
Port 1 Link Up - speed 40000 Mbps - full-duplex
Done
No commandline core given, start packet forwarding
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=2
  RX queues=1 - RX desc=512 - RX free threshold=32
  RX threshold registers: pthresh=8 hthresh=8 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=32
  TX threshold registers: pthresh=32 hthresh=0 wthresh=0
  TX RS bit threshold=32 - TXQ flags=0xf01
Press enter to exit

Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 1940564672     RX-dropped: 1456035742    RX-total: 3396600414
  TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
  ----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1  ----------------------
  RX-packets: 1940564671     RX-dropped: 1456036082    RX-total: 3396600753
  TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 3881129343     RX-dropped: 2912071824    RX-total: 6793201167
  TX-packets: 3881129472     TX-dropped: 0             TX-total: 3881129472
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

Shutting down port 0...
Stopping ports...
Done
Closing ports...
Done

Shutting down port 1...
Stopping ports...
Done
Closing ports...
Done

Bye...
  
Jerin Jacob May 4, 2016, 1:42 p.m. UTC | #2
On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > Elements of struct rte_eth_dev used in the fast path.
> > Make struct rte_eth_dev cache aligned to avoid the cases where
> > rte_eth_dev elements share the same cache line with other structures.
> > 
> > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> > v2:
> > Remove __rte_cache_aligned from rte_eth_devices and keep
> > it only at struct rte_eth_dev definition as suggested by Bruce
> > http://dpdk.org/dev/patchwork/patch/12328/
> > ---
> >  lib/librte_ether/rte_ethdev.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> > index 2757510..48f14d5 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> >  	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> >  	uint8_t attached; /**< Flag indicating the port is attached */
> >  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
> > -};
> > +} __rte_cache_aligned;
> >  
> >  struct rte_eth_dev_sriov {
> >  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */
> > -- 
> 
> Hi Jerin,

Hi Bruce,

> 
> have you seen a performance degradation due to ethdev elements sharing a cache

No. Not because of sharing the cache line.

> line? I ask because, surprisingly for me, I actually see a performance regression

I see performance degradation in PMD in my setup where independent
changes are causing the performance issue in PMD(~<100k). That's the reason
I thought making aligned cache line stuff where ever it makes sense so that
independent change shouldn't impact the PMD performance and this patch
was an initiative for the same.

> when I apply the above patch. It's not a big change - perf reduction of <1% - but
> still noticable across multiple runs using testpmd. I'm using two 1x40G NICs
> using i40e driver, and I see ~100kpps less traffic per port after applying the
> patch. [CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz]

This particular patch does not have any performance degradation in my
setup.
CPU: ThunderX

> 
> Testpmd cmd and output shown below.
> 
> Regards,
> /Bruce
> 
> $ sudo ./x86_64-native-linuxapp-gcc/app/testpmd -c C0000 -n 4 -- --rxd=512 --txd=512 --numa
> EAL: Detected 36 lcore(s)
> EAL: Probing VFIO support...
> PMD: nfp_net_pmd_init(): librte_pmd_nfp_net version 0.1
> 
> EAL: PCI device 0000:01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL: PCI device 0000:01:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:1521 rte_igb_pmd
> EAL: PCI device 0000:05:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:05:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:08:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:08:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:154a rte_ixgbe_pmd
> EAL: PCI device 0000:81:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1584 rte_i40e_pmd
> PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
> EAL: PCI device 0000:88:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1584 rte_i40e_pmd
> PMD: eth_i40e_dev_init(): FW 5.0 API 1.5 NVM 05.00.02 eetrack 80002281
> Configuring Port 0 (socket 1)
> Port 0: 68:05:CA:27:D4:4E
> Configuring Port 1 (socket 1)
> Port 1: 68:05:CA:27:D2:0A
> Checking link statuses...
> Port 0 Link Up - speed 40000 Mbps - full-duplex
> Port 1 Link Up - speed 40000 Mbps - full-duplex
> Done
> No commandline core given, start packet forwarding
>   io packet forwarding - CRC stripping disabled - packets/burst=32
>   nb forwarding cores=1 - nb forwarding ports=2
>   RX queues=1 - RX desc=512 - RX free threshold=32
>   RX threshold registers: pthresh=8 hthresh=8 wthresh=0
>   TX queues=1 - TX desc=512 - TX free threshold=32
>   TX threshold registers: pthresh=32 hthresh=0 wthresh=0
>   TX RS bit threshold=32 - TXQ flags=0xf01
> Press enter to exit
> 
> Telling cores to stop...
> Waiting for lcores to finish...
> 
>   ---------------------- Forward statistics for port 0  ----------------------
>   RX-packets: 1940564672     RX-dropped: 1456035742    RX-total: 3396600414
>   TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
>   ----------------------------------------------------------------------------
> 
>   ---------------------- Forward statistics for port 1  ----------------------
>   RX-packets: 1940564671     RX-dropped: 1456036082    RX-total: 3396600753
>   TX-packets: 1940564736     TX-dropped: 0             TX-total: 1940564736
>   ----------------------------------------------------------------------------
> 
>   +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
>   RX-packets: 3881129343     RX-dropped: 2912071824    RX-total: 6793201167
>   TX-packets: 3881129472     TX-dropped: 0             TX-total: 3881129472
>   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Done.
> 
> Shutting down port 0...
> Stopping ports...
> Done
> Closing ports...
> Done
> 
> Shutting down port 1...
> Stopping ports...
> Done
> Closing ports...
> Done
> 
> Bye...
> 
>
  
Bruce Richardson May 4, 2016, 1:53 p.m. UTC | #3
> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Wednesday, May 4, 2016 2:43 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> aligned
> 
> On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> > On Tue, May 03, 2016 at 06:12:07PM +0530, Jerin Jacob wrote:
> > > Elements of struct rte_eth_dev used in the fast path.
> > > Make struct rte_eth_dev cache aligned to avoid the cases where
> > > rte_eth_dev elements share the same cache line with other structures.
> > >
> > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > > v2:
> > > Remove __rte_cache_aligned from rte_eth_devices and keep it only at
> > > struct rte_eth_dev definition as suggested by Bruce
> > > http://dpdk.org/dev/patchwork/patch/12328/
> > > ---
> > >  lib/librte_ether/rte_ethdev.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 2757510..48f14d5 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -1615,7 +1615,7 @@ struct rte_eth_dev {
> > >  	struct rte_eth_rxtx_callback
> *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> > >  	uint8_t attached; /**< Flag indicating the port is attached */
> > >  	enum rte_eth_dev_type dev_type; /**< Flag indicating the device
> > > type */ -};
> > > +} __rte_cache_aligned;
> > >
> > >  struct rte_eth_dev_sriov {
> > >  	uint8_t active;               /**< SRIOV is active with 16, 32 or 64
> pools */
> > > --
> >
> > Hi Jerin,
> 
> Hi Bruce,
> 
> >
> > have you seen a performance degradation due to ethdev elements sharing
> > a cache
> 
> No. Not because of sharing the cache line.
> 
> > line? I ask because, surprisingly for me, I actually see a performance
> > regression
> 
> I see performance degradation in PMD in my setup where independent changes
> are causing the performance issue in PMD(~<100k). That's the reason I
> thought making aligned cache line stuff where ever it makes sense so that
> independent change shouldn't impact the PMD performance and this patch was
> an initiative for the same.
> 
> > when I apply the above patch. It's not a big change - perf reduction
> > of <1% - but still noticable across multiple runs using testpmd. I'm
> > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > E5-2699 v3 @ 2.30GHz]
> 
> This particular patch does not have any performance degradation in my
> setup.
> CPU: ThunderX

Ok, so I take it that this patch is performance neutral on your setup, then?
If that's the case, can we hold off on merging it on the basis that it's not needed and does cause a slight regression.

Thanks,
/Bruce
  
Jerin Jacob May 4, 2016, 3:19 p.m. UTC | #4
On Wed, May 04, 2016 at 01:53:39PM +0000, Richardson, Bruce wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Wednesday, May 4, 2016 2:43 PM
> > To: Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: dev@dpdk.org; thomas.monjalon@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> > aligned
> > 
> > On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:

Snip

> > >
> > > Hi Jerin,
> > 
> > Hi Bruce,
> > 
> > >
> > > have you seen a performance degradation due to ethdev elements sharing
> > > a cache
> > 
> > No. Not because of sharing the cache line.
> > 
> > > line? I ask because, surprisingly for me, I actually see a performance
> > > regression
> > 
> > I see performance degradation in PMD in my setup where independent changes
> > are causing the performance issue in PMD(~<100k). That's the reason I
> > thought making aligned cache line stuff where ever it makes sense so that
> > independent change shouldn't impact the PMD performance and this patch was
> > an initiative for the same.
> > 
> > > when I apply the above patch. It's not a big change - perf reduction
> > > of <1% - but still noticable across multiple runs using testpmd. I'm
> > > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > > E5-2699 v3 @ 2.30GHz]
> > 
> > This particular patch does not have any performance degradation in my
> > setup.
> > CPU: ThunderX
> 
> Ok, so I take it that this patch is performance neutral on your setup, then?
> If that's the case, can we hold off on merging it on the basis that it's not needed and does cause a slight regression.

OK

Can you share the base dpdk.org upstream change set where this patch
creates the slight regression? I will test it the same on IA and
ThunderX platform.

In my testpmd build, rte_eth_devices(0x0000000000751ef8) share the cache
line with inactive "notify_ops" that the reason for its not creating any benefit.
I guess the case would have been different if its active write location.

 COMMON         0x0000000000751ef0        0x8
/home/jerin/dpdk-thunderx/build/lib/librte_vhost.a(virtio-net.o)
                0x0000000000751ef0                notify_ops
 COMMON         0x0000000000751ef8    0x80900
/home/jerin/dpdk-thunderx/build/lib/libethdev.a(rte_ethdev.o)
                0x0000000000751ef8                rte_eth_devices

Thanks,
Jerin

> 
> Thanks,
> /Bruce
  
Bruce Richardson May 4, 2016, 3:48 p.m. UTC | #5
On Wed, May 04, 2016 at 08:49:38PM +0530, Jerin Jacob wrote:
> On Wed, May 04, 2016 at 01:53:39PM +0000, Richardson, Bruce wrote:
> > 
> > 
> > > -----Original Message-----
> > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > Sent: Wednesday, May 4, 2016 2:43 PM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>
> > > Cc: dev@dpdk.org; thomas.monjalon@6wind.com
> > > Subject: Re: [dpdk-dev] [PATCH v2] ethdev: make struct rte_eth_dev cache
> > > aligned
> > > 
> > > On Wed, May 04, 2016 at 12:09:50PM +0100, Bruce Richardson wrote:
> 
> Snip
> 
> > > >
> > > > Hi Jerin,
> > > 
> > > Hi Bruce,
> > > 
> > > >
> > > > have you seen a performance degradation due to ethdev elements sharing
> > > > a cache
> > > 
> > > No. Not because of sharing the cache line.
> > > 
> > > > line? I ask because, surprisingly for me, I actually see a performance
> > > > regression
> > > 
> > > I see performance degradation in PMD in my setup where independent changes
> > > are causing the performance issue in PMD(~<100k). That's the reason I
> > > thought making aligned cache line stuff where ever it makes sense so that
> > > independent change shouldn't impact the PMD performance and this patch was
> > > an initiative for the same.
> > > 
> > > > when I apply the above patch. It's not a big change - perf reduction
> > > > of <1% - but still noticable across multiple runs using testpmd. I'm
> > > > using two 1x40G NICs using i40e driver, and I see ~100kpps less
> > > > traffic per port after applying the patch. [CPU: Intel(R) Xeon(R) CPU
> > > > E5-2699 v3 @ 2.30GHz]
> > > 
> > > This particular patch does not have any performance degradation in my
> > > setup.
> > > CPU: ThunderX
> > 
> > Ok, so I take it that this patch is performance neutral on your setup, then?
> > If that's the case, can we hold off on merging it on the basis that it's not needed and does cause a slight regression.
> 
> OK
> 
> Can you share the base dpdk.org upstream change set where this patch
> creates the slight regression? I will test it the same on IA and
> ThunderX platform.
> 
> In my testpmd build, rte_eth_devices(0x0000000000751ef8) share the cache
> line with inactive "notify_ops" that the reason for its not creating any benefit.
> I guess the case would have been different if its active write location.
> 
>  COMMON         0x0000000000751ef0        0x8
> /home/jerin/dpdk-thunderx/build/lib/librte_vhost.a(virtio-net.o)
>                 0x0000000000751ef0                notify_ops
>  COMMON         0x0000000000751ef8    0x80900
> /home/jerin/dpdk-thunderx/build/lib/libethdev.a(rte_ethdev.o)
>                 0x0000000000751ef8                rte_eth_devices
> 
> Thanks,
> Jerin
> 
> > 
Hi Jerin,

I see no change with the latest code on dpdk.org mainline. I do see a change though,
with the latest code on the dpdk-next-net/rel_16_07 branch, which has additional
i40e driver improvements on it.
However, on re-running the tests, I realise I may have got my measurement results
backwards and that this patch may actually cause a slight increase instead. :-(

Therefore, I withdraw my current objections to this patch. But I would like to
see someone else verify what performance difference, if any, this causes on
different platforms, so your planned tests on ThunderX and IA would still be
great to see! :-)

Thanks,
/Bruce
  
Thomas Monjalon June 22, 2016, 9:20 p.m. UTC | #6
2016-05-03 18:12, Jerin Jacob:
> Elements of struct rte_eth_dev used in the fast path.
> Make struct rte_eth_dev cache aligned to avoid the cases where
> rte_eth_dev elements share the same cache line with other structures.
> 
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Let's try it in real tests.
Applied, thanks.
  

Patch

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2757510..48f14d5 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1615,7 +1615,7 @@  struct rte_eth_dev {
 	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	uint8_t attached; /**< Flag indicating the port is attached */
 	enum rte_eth_dev_type dev_type; /**< Flag indicating the device type */
-};
+} __rte_cache_aligned;
 
 struct rte_eth_dev_sriov {
 	uint8_t active;               /**< SRIOV is active with 16, 32 or 64 pools */