Message ID | 20161010073721-mutt-send-email-mst@kernel.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id CA5C2378E; Mon, 10 Oct 2016 06:40:02 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id A14D12C6E for <dev@dpdk.org>; Mon, 10 Oct 2016 06:40:01 +0200 (CEST) Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A02FD3F1FD; Mon, 10 Oct 2016 04:40:00 +0000 (UTC) Received: from redhat.com (vpn-53-157.rdu2.redhat.com [10.10.53.157]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id u9A4dxfj026400; Mon, 10 Oct 2016 00:39:59 -0400 Date: Mon, 10 Oct 2016 07:39:59 +0300 From: "Michael S. Tsirkin" <mst@redhat.com> To: "Wang, Zhihong" <zhihong.wang@intel.com> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>, Maxime Coquelin <maxime.coquelin@redhat.com>, Stephen Hemminger <stephen@networkplumber.org>, "dev@dpdk.org" <dev@dpdk.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org> Message-ID: <20161010073721-mutt-send-email-mst@kernel.org> References: <20160927224935-mutt-send-email-mst@kernel.org> <20160928022848.GE1597@yliu-dev.sh.intel.com> <bfe463e9-0d1a-c136-4115-7620f9e6ec90@redhat.com> <20160929205047-mutt-send-email-mst@kernel.org> <2889e609-f750-a4e1-66f8-768bb07a2339@redhat.com> <20160929231252-mutt-send-email-mst@kernel.org> <20161010033744.GW1597@yliu-dev.sh.intel.com> <20161010064113-mutt-send-email-mst@kernel.org> <20161010035910.GY1597@yliu-dev.sh.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7BC050@SHSMSX103.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7BC050@SHSMSX103.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Mon, 10 Oct 2016 04:40:00 +0000 (UTC) Subject: Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Michael S. Tsirkin
Oct. 10, 2016, 4:39 a.m. UTC
On Mon, Oct 10, 2016 at 04:16:19AM +0000, Wang, Zhihong wrote: > > > > -----Original Message----- > > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] > > Sent: Monday, October 10, 2016 11:59 AM > > To: Michael S. Tsirkin <mst@redhat.com> > > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Stephen Hemminger > > <stephen@networkplumber.org>; dev@dpdk.org; qemu- > > devel@nongnu.org; Wang, Zhihong <zhihong.wang@intel.com> > > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature > > > > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote: > > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote: > > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote: > > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: > > > > > > > > > > > > > > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote: > > > > > Yes but two points. > > > > > > > > > > 1. why is this memset expensive? > > > > > > > > I don't have the exact answer, but just some rough thoughts: > > > > > > > > It's an external clib function: there is a call stack and the > > > > IP register will bounch back and forth. > > > > > > for memset 0? gcc 5.3.1 on fedora happily inlines it. > > > > Good to know! > > > > > > overkill to use that for resetting 14 bytes structure. > > > > > > > > Some trick like > > > > *(struct virtio_net_hdr *)hdr = {0, }; > > > > > > > > Or even > > > > hdr->xxx = 0; > > > > hdr->yyy = 0; > > > > > > > > should behaviour better. > > > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > --yliu > > > > > > I'd say that's weird. what's your config? any chance you > > > are using an old compiler? > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > he said the memset is not well optimized for Ivybridge server. > > The dst is remote in that case. It's fine on Haswell but has complication > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue. > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. So try something like this then: Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good for performance. Move fields used on data path into vq and use from there to avoid indirections?
Comments
On Mon, Oct 10, 2016 at 07:39:59AM +0300, Michael S. Tsirkin wrote: > > > > > > 1. why is this memset expensive? > > > > > > > > > > I don't have the exact answer, but just some rough thoughts: > > > > > > > > > > It's an external clib function: there is a call stack and the > > > > > IP register will bounch back and forth. > > > > > > > > for memset 0? gcc 5.3.1 on fedora happily inlines it. > > > > > > Good to know! > > > > > > > > overkill to use that for resetting 14 bytes structure. > > > > > > > > > > Some trick like > > > > > *(struct virtio_net_hdr *)hdr = {0, }; > > > > > > > > > > Or even > > > > > hdr->xxx = 0; > > > > > hdr->yyy = 0; > > > > > > > > > > should behaviour better. > > > > > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > --yliu > > > > > > > > I'd say that's weird. what's your config? any chance you > > > > are using an old compiler? > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > > he said the memset is not well optimized for Ivybridge server. > > > > The dst is remote in that case. It's fine on Haswell but has complication > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue. > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > So try something like this then: Yes, I saw memset is inlined when this diff is applied. So, mind to send a formal patch? You might want to try build at least: it doesn't build. > Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good > for performance. Move fields used on data path into vq > and use from there to avoid indirections? Good suggestion! --yliu
On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote: > > > > > > There was an example: the vhost enqueue optmization patchset from > > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC) Though it doesn't matter now, but I have verified it yesterday (with and wihtout memset), the drop could be up to 30+%. This is to let you know that it could behaviour badly if memset is not inlined. > > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > > > --yliu > > > > > > > > > > I'd say that's weird. what's your config? any chance you > > > > > are using an old compiler? > > > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC, > > > > he said the memset is not well optimized for Ivybridge server. > > > > > > The dst is remote in that case. It's fine on Haswell but has complication > > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue. > > > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > > > > So try something like this then: > > Yes, I saw memset is inlined when this diff is applied. I have another concern though: It's a trick could let gcc do the inline, I am not quite sure whether that's ture with other compilers (i.e. clang, icc, or even, older gcc). For this case, I think I still prefer some trick like *(struct ..*) = {0, } Or even, we may could introduce rte_memset(). IIRC, that has been proposed somehow before? --yliu
Hi, Yuanhan: > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu > Sent: Wednesday, October 12, 2016 11:22 AM > To: Michael S. Tsirkin <mst@redhat.com>; Thomas Monjalon > <thomas.monjalon@6wind.com> > Cc: Wang, Zhihong <zhihong.wang@intel.com>; Maxime Coquelin > <maxime.coquelin@redhat.com>; Stephen Hemminger > <stephen@networkplumber.org>; dev@dpdk.org; qemu- > devel@nongnu.org > Subject: Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout > feature > > On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote: > > > > > > > There was an example: the vhost enqueue optmization patchset > > > > > > > from Zhihong [0] uses memset, and it introduces more than > > > > > > > 15% drop (IIRC) > > Though it doesn't matter now, but I have verified it yesterday (with and > wihtout memset), the drop could be up to 30+%. > > This is to let you know that it could behaviour badly if memset is not inlined. > > > > > > > > on my Ivybridge server: it has no such issue on his server though. > > > > > > > > > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html > > > > > > > > > > > > > > --yliu > > > > > > > > > > > > I'd say that's weird. what's your config? any chance you are > > > > > > using an old compiler? > > > > > > > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. > > > > > IIRC, he said the memset is not well optimized for Ivybridge server. > > > > > > > > The dst is remote in that case. It's fine on Haswell but has > > > > complication in Ivy Bridge which (wasn't supposed to but) causes > serious frontend issue. > > > > > > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1. > > > > > > > > > So try something like this then: > > > > Yes, I saw memset is inlined when this diff is applied. > > I have another concern though: It's a trick could let gcc do the inline, I am not > quite sure whether that's ture with other compilers (i.e. clang, icc, or even, > older gcc). > > For this case, I think I still prefer some trick like > *(struct ..*) = {0, } > > Or even, we may could introduce rte_memset(). IIRC, that has been > proposed somehow before? > I'm trying to introduce rte_memset to have a prototype It have Gotten some performance enhancement For small size, I'm optimize it further. --Zhiyong > --yliu
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index dd7693f..7a3f88e 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -292,6 +292,16 @@ vtpci_with_feature(struct virtio_hw *hw, uint64_t bit) return (hw->guest_features & (1ULL << bit)) != 0; } +static inline int +vtnet_hdr_size(struct virtio_hw *hw) +{ + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) || + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) + return sizeof(struct virtio_net_hdr_mrg_rxbuf); + else + return sizeof(struct virtio_net_hdr); +} + /* * Function declaration from virtio_pci.c */ diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index a27208e..21a45e1 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -216,7 +216,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie, struct vring_desc *start_dp; uint16_t seg_num = cookie->nb_segs; uint16_t head_idx, idx; - uint16_t head_size = vq->hw->vtnet_hdr_size; + uint16_t head_size = vtnet_hdr_size(vq->hw); unsigned long offs; head_idx = vq->vq_desc_head_idx;