[dpdk-dev] vhost: fix virtio_net cache sharing of broadcast_rarp

Message ID 1489605049-18686-1-git-send-email-ktraynor@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Yuanhan Liu
Headers

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Kevin Traynor March 15, 2017, 7:10 p.m. UTC
  The virtio_net structure is used in both enqueue and dequeue datapaths.
broadcast_rarp is checked with cmpset in the dequeue datapath regardless
of whether descriptors are available or not.

It is observed in some cases where dequeue and enqueue are performed by
different cores and no packets are available on the dequeue datapath
(i.e. uni-directional traffic), the frequent checking of broadcast_rarp
in dequeue causes performance degradation for the enqueue datapath.

In OVS the issue can cause a uni-directional performance drop of up to 15%.

Fix that by moving broadcast_rarp to a different cache line in
virtio_net struct.

Fixes: a66bcad32240 ("vhost: arrange struct fields for better cache sharing")
Cc: stable@dpdk.org

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/librte_vhost/vhost.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Yuanhan Liu March 16, 2017, 6:21 a.m. UTC | #1
On Wed, Mar 15, 2017 at 07:10:49PM +0000, Kevin Traynor wrote:
> The virtio_net structure is used in both enqueue and dequeue datapaths.
> broadcast_rarp is checked with cmpset in the dequeue datapath regardless
> of whether descriptors are available or not.
> 
> It is observed in some cases where dequeue and enqueue are performed by
> different cores and no packets are available on the dequeue datapath
> (i.e. uni-directional traffic), the frequent checking of broadcast_rarp
> in dequeue causes performance degradation for the enqueue datapath.
> 
> In OVS the issue can cause a uni-directional performance drop of up to 15%.
> 
> Fix that by moving broadcast_rarp to a different cache line in
> virtio_net struct.

Thanks, but I'm a bit confused. The drop looks like being caused by
cache false sharing, but I don't see anything would lead to a false
sharing. I mean, there is no write in the same cache line where the
broadcast_rarp belongs. Or, the "volatile" type is the culprit here?

Talking about that, I had actually considered to turn "broadcast_rarp"
to a simple "int" or "uint16_t" type, to make it more light weight.
The reason I used atomic type is to exactly send one broadcast RARP
packet once SEND_RARP request is recieved. Otherwise, we may send more
than one RARP packet when MQ is invovled. But I think we don't have
to be that accurate: it's tolerable when more RARP are sent. I saw 4
SEND_RARP requests (aka 4 RARP packets) in the last time I tried
vhost-user live migration after all. I don't quite remember why
it was 4 though.

That said, I think it also would resolve the performance issue if you
change "rte_atomic16_t" to "uint16_t", without moving the place?

	--yliu
  

Patch

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 22564f1..a254328 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -156,6 +156,4 @@  struct virtio_net {
 	uint32_t		flags;
 	uint16_t		vhost_hlen;
-	/* to tell if we need broadcast rarp packet */
-	rte_atomic16_t		broadcast_rarp;
 	uint32_t		virt_qp_nb;
 	int			dequeue_zero_copy;
@@ -167,4 +165,6 @@  struct virtio_net {
 	uint64_t		log_addr;
 	struct ether_addr	mac;
+	/* to tell if we need broadcast rarp packet */
+	rte_atomic16_t		broadcast_rarp;
 
 	uint32_t		nr_guest_pages;