Message ID | CAATJJ0+i0+Qyv4sUwqPJAgPF3QdAS-RP3jZr+_OvGZLjoMEPPA@mail.gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Yuanhan Liu |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 47C7A2C6A; Tue, 19 Apr 2016 18:34:13 +0200 (CEST) Received: from mail-qg0-f44.google.com (mail-qg0-f44.google.com [209.85.192.44]) by dpdk.org (Postfix) with ESMTP id 0D3022C37 for <dev@dpdk.org>; Tue, 19 Apr 2016 18:34:11 +0200 (CEST) Received: by mail-qg0-f44.google.com with SMTP id f74so10876393qge.2 for <dev@dpdk.org>; Tue, 19 Apr 2016 09:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=qhoCo/S5AQfbTqMhFqz+IrCe4D2XGITKrL3WikZw20Y=; b=eCeGF4FpCfW3a03utlYT4NZOIkNvSG6qbm3e82XLPxe3mMfkzHADByaEcUhtKWlaUY iF7VSD3FRqsQQlV+URBrNIsE0GGGKvJ9XIdD55i08Bl3Kcrc68+EfztXo486Ubh6R8Dn EWrmF5vLnEw9HPf54rNoUw9e3+3EwTkbVQRcOR4ZWaJy8gCOZEzthgkr7+fsM3jHZKTW zrOVMyFNJ77J8LpBlIfKM7eSriQOoGOt0VeX0CuJ6dzM1GMIpzjN1D26NtWSQT+GNtX5 rR7us5PWQTCi5giIGc6erZbJW6UNK9AM3Nggybarrfs7Y5SxkT4SMZbFzd7eGe4uI41I bVEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=qhoCo/S5AQfbTqMhFqz+IrCe4D2XGITKrL3WikZw20Y=; b=J2p7j8YERN614aJ+gpxO67+PrI7ozPbOFQVgIO4mbalzlNejQmnScI9psGBlVzHZwU F3O4ERHCbnEL/UDS63pSiCoo6l8SkfBSiJWDcu0Vuuu6KaEC6qtMpo1AmBWWpwTnppC3 C5WClTk4ZJ5eyhWr3qpPzWQ+TNnXQxnFZDCymW+bETRtTXY0X/7JwSdMiSywHnAKaOza diAfrlmZkHai6MmUMJYPa80q9g3sOl2efO91q26dIcSqm3Z26JcrgrbQc39rvC7vEWK0 HyP6BEa8LxkyqjbTxvdRD68LyTebNJ7BRvrPsq9U5GYC+KPirCcwSbUo5h658zI+uQzE IqGA== X-Gm-Message-State: AOPr4FVFMNX+JjePYAKOYSLasFwNxxH+Sp3T3PEbe+Ms4hZkQzkOHqArnyr+OvCgFDsz/48thrRZAxufHqSsai/c X-Received: by 10.140.92.65 with SMTP id a59mr3497361qge.93.1461083650256; Tue, 19 Apr 2016 09:34:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.55.212.91 with HTTP; Tue, 19 Apr 2016 09:33:50 -0700 (PDT) In-Reply-To: <20160418181422.GE2576@yliu-dev.sh.intel.com> References: <CAATJJ0JLd9iqNBvD=kyfpzk5DjGF_pGz+hBNeionZpuctNBymQ@mail.gmail.com> <20160418174650.GD2576@yliu-dev.sh.intel.com> <20160418181422.GE2576@yliu-dev.sh.intel.com> From: Christian Ehrhardt <christian.ehrhardt@canonical.com> Date: Tue, 19 Apr 2016 18:33:50 +0200 Message-ID: <CAATJJ0+i0+Qyv4sUwqPJAgPF3QdAS-RP3jZr+_OvGZLjoMEPPA@mail.gmail.com> To: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: dev <dev@dpdk.org>, Daniele Di Proietto <diproiettod@vmware.com> Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Memory leak when adding/removing vhost_user ports X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Christian Ehrhardt
April 19, 2016, 4:33 p.m. UTC
Hi, thanks for the patch. I backported it this way to my DPDK 2.2 based environment for now (see below): With that applied one (and only one) of my two guests looses connectivity after removing the ports the first time. No traffic seems to pass, setting the device in the guest down/up doesn't get anything. But It isn't totally broken - stopping/starting the guest gets it working again. So openvswitch/dpdk is still somewhat working - it just seems the guest lost something, after tapping on that vhost_user interface again it works. I will check tomorrow and let you know: - if I'm more lucky with that patch on top of 16.04 - if it looses connectivity after the first or a certain amount of port removes If you find issues with my backport adaption let me know. --- Backport and reasoning: new fix relies on a lot of new code, vhost_destroy_device looks totally different from the former destroy_device. History on todays function content: 4796ad63 - original code moved from examples to lib a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device 71dc571e - simple check against null pointers 45ca9c6f - this changed the code from linked list to arrays New code cleans with: notify_ops->destroy_device (callback into the parent) cleanup_device was existing before even in 2.2 code free_device as well existing before even in 2.2 code Old code cleans with: notify_ops->destroy_device - still there rm_config_ll_entry -> eventually calls cleanup_device and free_device (just in the more complex linked list way) So the "only" adaption for backporting needed is to replace vhost_destroy_device with ops->destroy_device(ctx) Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote: > > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote: > > > I assume there is a leak somewhere on adding/removing vhost_user ports. > > > Although it could also be "only" a fragmentation issue. > > > > > > Reproduction is easy: > > > I set up a pair of nicely working OVS-DPDK connected KVM Guests. > > > Then in a loop I > > > - add up to more 512 ports > > > - test connectivity between the two guests > > > - remove up to 512 ports > > > > > > Depending on memory and the amount of multiqueue/rxq I use it seems to > > > slightly change when exactly it breaks. But for my default setup of 4 > > > queues and 5G Hugepages initialized by DPDK it always breaks at the > sixth > > > iteration. > > > Here a link to the stack trace indicating a memory shortage (TBC): > > > https://launchpadlibrarian.net/253916410/apport-retrace.log > > > > > > Known Todos: > > > - I want to track it down more, and will try to come up with a non > > > openvswitch based looping testcase that might show it as well to > simplify > > > debugging. > > > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK > 16.04 and > > > Openvswitch master is planned. > > > > > > I will go on debugging this and let you know, but I wanted to give a > heads > > > up to everyone. > > > > Thanks for the report. > > > > > In case this is a known issue for some of you please let me know. > > > > Yeah, it might be. I'm wondering that virtio_net struct is not freed. > > It will be freed only (if I'm not mistaken) when guest quits, by far. > > Would you try following diff and to see if it fix your issue? > > --yliu > > --- > lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++++++ > lib/librte_vhost/vhost_user/vhost-net-user.h | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c > b/lib/librte_vhost/vhost_user/vhost-net-user.c > index df2bd64..8f7ebd7 100644 > --- a/lib/librte_vhost/vhost_user/vhost-net-user.c > +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c > @@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused > int *remove) > } > > vdev_ctx.fh = fh; > + vserver->fh = fh; > size = strnlen(vserver->path, PATH_MAX); > vhost_set_ifname(vdev_ctx, vserver->path, > size); > @@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path) > > for (i = 0; i < g_vhost_server.vserver_cnt; i++) { > if (!strcmp(g_vhost_server.server[i]->path, path)) { > + struct vhost_device_ctx ctx; > + > + ctx.fh = g_vhost_server.server[i]->fh; > + vhost_destroy_device(ctx); > + > fdset_del(&g_vhost_server.fdset, > g_vhost_server.server[i]->listenfd); > > diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h > b/lib/librte_vhost/vhost_user/vhost-net-user.h > index e3bb413..7cf21db 100644 > --- a/lib/librte_vhost/vhost_user/vhost-net-user.h > +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h > @@ -43,6 +43,7 @@ > struct vhost_server { > char *path; /**< The path the uds is bind to. */ > int listenfd; /**< The listener sockfd. */ > + uint32_t fh; > }; > > /* refer to hw/virtio/vhost-user.c */ > -- > 1.9.3 > > >
Comments
On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote: > Hi, > thanks for the patch. > I backported it this way to my DPDK 2.2 based environment for now (see below): > > With that applied one (and only one) of my two guests looses connectivity after > removing the ports the first time. Yeah, that's should be because I invoked the "->destroy_device()" callback. BTW, I'm curious how do you do the test? I saw you added 256 ports, but with 2 guests only? So, 254 of them are idle, just for testing the memory leak bug? And then you remove all of them, without stopping the guest. How that's gonna work? I mean, the vhost-user connection would be broken, and data flow would not work. --yliu > No traffic seems to pass, setting the device in the guest down/up doesn't get > anything. > But It isn't totally broken - stopping/starting the guest gets it working > again. > So openvswitch/dpdk is still somewhat working - it just seems the guest lost > something, after tapping on that vhost_user interface again it works. > > I will check tomorrow and let you know: > - if I'm more lucky with that patch on top of 16.04 > - if it looses connectivity after the first or a certain amount of port removes > > If you find issues with my backport adaption let me know. > > > --- > > Backport and reasoning: > > new fix relies on a lot of new code, vhost_destroy_device looks totally > different from the former destroy_device. > History on todays function content: > 4796ad63 - original code moved from examples to lib > a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device > 71dc571e - simple check against null pointers > 45ca9c6f - this changed the code from linked list to arrays > > New code cleans with: > notify_ops->destroy_device (callback into the parent) > cleanup_device was existing before even in 2.2 code > free_device as well existing before even in 2.2 code > Old code cleans with: > notify_ops->destroy_device - still there > rm_config_ll_entry -> eventually calls cleanup_device and free_device > (just in the more complex linked list way) > > So the "only" adaption for backporting needed is to replace > vhost_destroy_device > with ops->destroy_device(ctx) > > Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c > =================================================================== > --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c > +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c > @@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _ > } > > vdev_ctx.fh = fh; > + vserver->fh = fh; > size = strnlen(vserver->path, PATH_MAX); > ops->set_ifname(vdev_ctx, vserver->path, > size); > @@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char * > > for (i = 0; i < g_vhost_server.vserver_cnt; i++) { > if (!strcmp(g_vhost_server.server[i]->path, path)) { > + struct vhost_device_ctx ctx; > + > + ctx.fh = g_vhost_server.server[i]->fh; > + ops->destroy_device(ctx); > + > fdset_del(&g_vhost_server.fdset, > g_vhost_server.server[i]->listenfd); > > Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h > =================================================================== > --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h > +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h > @@ -43,6 +43,7 @@ > struct vhost_server { > char *path; /**< The path the uds is bind to. */ > int listenfd; /**< The listener sockfd. */ > + uint32_t fh; > }; > > /* refer to hw/virtio/vhost-user.c */ > > > > > Christian Ehrhardt > Software Engineer, Ubuntu Server > Canonical Ltd > > On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: > > On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote: > > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote: > > > I assume there is a leak somewhere on adding/removing vhost_user ports. > > > Although it could also be "only" a fragmentation issue. > > > > > > Reproduction is easy: > > > I set up a pair of nicely working OVS-DPDK connected KVM Guests. > > > Then in a loop I > > > - add up to more 512 ports > > > - test connectivity between the two guests > > > - remove up to 512 ports > > > > > > Depending on memory and the amount of multiqueue/rxq I use it seems to > > > slightly change when exactly it breaks. But for my default setup of 4 > > > queues and 5G Hugepages initialized by DPDK it always breaks at the > sixth > > > iteration. > > > Here a link to the stack trace indicating a memory shortage (TBC): > > > https://launchpadlibrarian.net/253916410/apport-retrace.log > > > > > > Known Todos: > > > - I want to track it down more, and will try to come up with a non > > > openvswitch based looping testcase that might show it as well to > simplify > > > debugging. > > > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 > and > > > Openvswitch master is planned. > > > > > > I will go on debugging this and let you know, but I wanted to give a > heads > > > up to everyone. > > > > Thanks for the report. > > > > > In case this is a known issue for some of you please let me know. > > > > Yeah, it might be. I'm wondering that virtio_net struct is not freed. > > It will be freed only (if I'm not mistaken) when guest quits, by far. > > Would you try following diff and to see if it fix your issue? > > --yliu > > --- > lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++++++ > lib/librte_vhost/vhost_user/vhost-net-user.h | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/ > librte_vhost/vhost_user/vhost-net-user.c > index df2bd64..8f7ebd7 100644 > --- a/lib/librte_vhost/vhost_user/vhost-net-user.c > +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c > @@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused int > *remove) > } > > vdev_ctx.fh = fh; > + vserver->fh = fh; > size = strnlen(vserver->path, PATH_MAX); > vhost_set_ifname(vdev_ctx, vserver->path, > size); > @@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path) > > for (i = 0; i < g_vhost_server.vserver_cnt; i++) { > if (!strcmp(g_vhost_server.server[i]->path, path)) { > + struct vhost_device_ctx ctx; > + > + ctx.fh = g_vhost_server.server[i]->fh; > + vhost_destroy_device(ctx); > + > fdset_del(&g_vhost_server.fdset, > g_vhost_server.server[i]->listenfd); > > diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/ > librte_vhost/vhost_user/vhost-net-user.h > index e3bb413..7cf21db 100644 > --- a/lib/librte_vhost/vhost_user/vhost-net-user.h > +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h > @@ -43,6 +43,7 @@ > struct vhost_server { > char *path; /**< The path the uds is bind to. */ > int listenfd; /**< The listener sockfd. */ > + uint32_t fh; > }; > > /* refer to hw/virtio/vhost-user.c */ > -- > 1.9.3 > > > >
On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote: > [...] > > With that applied one (and only one) of my two guests looses > connectivity after > > removing the ports the first time. > > Yeah, that's should be because I invoked the "->destroy_device()" > callback. > Shouldn't that not only destroy the particular vhost_user device I remove? See below for some better details on the test to clarify that. BTW, I'm curious how do you do the test? I saw you added 256 ports, but > with 2 guests only? So, 254 of them are idle, just for testing the > memory leak bug? > Maybe I should describe it better: 1. Spawn some vhost-user ports (40 in my case) 2. Spawn a pair of guests that connect via four of those ports per guest 3. Guests only intialize one of that vhost_user based NICs 4. check connectivity between guests via the vhost_user based connection (working at this stage) LOOP 5-7: 5. add ports 41-512 6. remove ports 41-512 7. check connectivity between guests via the vhost_user based connection So the vhost_user ports the guests are using are never deleted. Only some extra (not even used) ports are added&removed in the loop to search for potential leaks over a longer lifetime of an openvswitch-dpdk based solution.
On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote: > On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: > > On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote: > > [...] > > > With that applied one (and only one) of my two guests looses connectivity > after > > removing the ports the first time. > > Yeah, that's should be because I invoked the "->destroy_device()" > callback. > > > Shouldn't that not only destroy the particular vhost_user device I remove? I assume the "not" is typed wrong here, then yes. Well, it turned out that I accidentally destroyed the first guest (with id 0) with following code: ctx.fh = g_vhost_server.server[i]->fh; vhost_destroy_device(ctx); server[i]->fh is initialized with 0 when no connection is established (check below for more info), and the first id is started with 0. Anyway, this could be fixed easily. > See below for some better details on the test to clarify that. > > > BTW, I'm curious how do you do the test? I saw you added 256 ports, but > with 2 guests only? So, 254 of them are idle, just for testing the > memory leak bug? > > > Maybe I should describe it better: > 1. Spawn some vhost-user ports (40 in my case) > 2. Spawn a pair of guests that connect via four of those ports per guest > 3. Guests only intialize one of that vhost_user based NICs > 4. check connectivity between guests via the vhost_user based connection > (working at this stage) > LOOP 5-7: > 5. add ports 41-512 > 6. remove ports 41-512 > 7. check connectivity between guests via the vhost_user based connection Yes, it's much clearer now. Thanks. I then don't see it's a leak from DPDK vhost-user, at least not the leak on "struct virtio_net" I have mentioned before. "struct virito_net" will not even be allocated for those ports never used (ports 41-512 in your case), as it will be allocated only when there is a connection established, aka, a guest is connected. BTW, will you be able to reproduce it without any connections? Say, all 512 ports are added, and then deleted. Thanks. --yliu > > So the vhost_user ports the guests are using are never deleted. > Only some extra (not even used) ports are added&removed in the loop to search > for potential leaks over a longer lifetime of an openvswitch-dpdk based > solution. >
Hi, I can follow your argument that - and agree that in this case the leak can't be solved your patch. Still I found it useful to revise it along our discussion as eventually it will still be a good patch to have. I followed your suggestion and found: - rte_vhost_driver_register callocs vserver (implies fh=0) - later when on init when getting the callback to vserver_new_vq_conn it would get set by ops->new_device(vdev_ctx); - but as you pointed out that could be fh = 0 for the first - so I initialized vserver->fh with -1 in rte_vhost_driver_register - that won't ever be a real fh. - later on get_config_ll_entry won't find a device with that then on the call by destroy_device. - so the revised patch currently in use (still for DPDK 2.2) can be found here http://paste.ubuntu.com/15961394/ Also as you requested I tried with no guest attached at all - that way I can still reproduce it. Here is a new stacktrace, but to me it looks the same http://paste.ubuntu.com/15961185/ Also as you asked before a log of the vswitch, but it is 895MB since a lot of messages repeat on port add/remove. Even compressed still 27MB - I need to do something about verbosity there. Also the system journal of the same time. Therefore I only added links to bz2 files. The crash is at "2016-04-21T07:54:47.782Z" in the logs. => http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/mem-leak-addremove.journal.bz2 => http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/ovs-vswitchd.log.bz2 Kind Regards, Christian Ehrhardt Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Thu, Apr 21, 2016 at 7:54 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote: > > On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu < > yuanhan.liu@linux.intel.com> > > wrote: > > > > On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote: > > > > [...] > > > > > With that applied one (and only one) of my two guests looses > connectivity > > after > > > removing the ports the first time. > > > > Yeah, that's should be because I invoked the "->destroy_device()" > > callback. > > > > > > Shouldn't that not only destroy the particular vhost_user device I > remove? > > I assume the "not" is typed wrong here, then yes. Well, it turned > out that I accidentally destroyed the first guest (with id 0) with > following code: > > ctx.fh = g_vhost_server.server[i]->fh; > vhost_destroy_device(ctx); > > server[i]->fh is initialized with 0 when no connection is established > (check below for more info), and the first id is started with 0. Anyway, > this could be fixed easily. > > > See below for some better details on the test to clarify that. > > > > > > BTW, I'm curious how do you do the test? I saw you added 256 ports, > but > > with 2 guests only? So, 254 of them are idle, just for testing the > > memory leak bug? > > > > > > Maybe I should describe it better: > > 1. Spawn some vhost-user ports (40 in my case) > > 2. Spawn a pair of guests that connect via four of those ports per guest > > 3. Guests only intialize one of that vhost_user based NICs > > 4. check connectivity between guests via the vhost_user based connection > > (working at this stage) > > LOOP 5-7: > > 5. add ports 41-512 > > 6. remove ports 41-512 > > 7. check connectivity between guests via the vhost_user based > connection > > Yes, it's much clearer now. Thanks. > > I then don't see it's a leak from DPDK vhost-user, at least not the leak > on "struct virtio_net" I have mentioned before. "struct virito_net" will > not even be allocated for those ports never used (ports 41-512 in your > case), > as it will be allocated only when there is a connection established, aka, > a guest is connected. > > BTW, will you be able to reproduce it without any connections? Say, all > 512 ports are added, and then deleted. > > Thanks. > > --yliu > > > > > So the vhost_user ports the guests are using are never deleted. > > Only some extra (not even used) ports are added&removed in the loop to > search > > for potential leaks over a longer lifetime of an openvswitch-dpdk based > > solution. > > >
Hi, while checking for dpdk 16.07 what backports are accepted in the meantime so I can drop them I found this particular discussion has been silently forgotten by all of us. Back then we had the patch and discussion first in http://dpdk.org/dev/patchwork/patch/12103/ and then http://dpdk.org/dev/patchwork/patch/12118/ Things worked fine as I reported and I integrated the patch in our packaging as it fixed a severe issue. Since it was reported by someone else I do not seem to be the only one :-) So today I rebased the patch including my updates I made based on our discussion and I think it would make as much sense as it made back then to fix this. Christian Ehrhardt (1): vhost_user: avoid crash when exeeding file descriptors lib/librte_vhost/vhost_user/fd_man.c | 11 ++++++----- lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++-- 2 files changed, 23 insertions(+), 7 deletions(-)
Sorry, please ignore the two links, the cover letter has - they belong to a different issue I have to bring up again. Everything else still applies. Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Wed, Jul 6, 2016 at 2:24 PM, Christian Ehrhardt < christian.ehrhardt@canonical.com> wrote: > Hi, > while checking for dpdk 16.07 what backports are accepted in the meantime > so I > can drop them I found this particular discussion has been silently > forgotten by > all of us. > > Back then we had the patch and discussion first in > http://dpdk.org/dev/patchwork/patch/12103/ > and then > http://dpdk.org/dev/patchwork/patch/12118/ > > Things worked fine as I reported and I integrated the patch in our > packaging as > it fixed a severe issue. Since it was reported by someone else I do not > seem to > be the only one :-) > > So today I rebased the patch including my updates I made based on our > discussion > and I think it would make as much sense as it made back then to fix this. > > Christian Ehrhardt (1): > vhost_user: avoid crash when exeeding file descriptors > > lib/librte_vhost/vhost_user/fd_man.c | 11 ++++++----- > lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++-- > 2 files changed, 23 insertions(+), 7 deletions(-) > > -- > 2.7.4 > >
This is the series this really belongs to http://dpdk.org/dev/patchwork/patch/11581/ Message ID <1458292380-9258-1-git-send-email-patrik.r.andersson@ericsson.com > Should I wait for a v2 to point the patch at the right ID or do you prefer a fixed resubmit right away? Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Wed, Jul 6, 2016 at 2:26 PM, Christian Ehrhardt < christian.ehrhardt@canonical.com> wrote: > Sorry, > please ignore the two links, the cover letter has - they belong to a > different issue I have to bring up again. > Everything else still applies. > > Christian Ehrhardt > Software Engineer, Ubuntu Server > Canonical Ltd > > On Wed, Jul 6, 2016 at 2:24 PM, Christian Ehrhardt < > christian.ehrhardt@canonical.com> wrote: > >> Hi, >> while checking for dpdk 16.07 what backports are accepted in the meantime >> so I >> can drop them I found this particular discussion has been silently >> forgotten by >> all of us. >> >> Back then we had the patch and discussion first in >> http://dpdk.org/dev/patchwork/patch/12103/ >> and then >> http://dpdk.org/dev/patchwork/patch/12118/ >> >> Things worked fine as I reported and I integrated the patch in our >> packaging as >> it fixed a severe issue. Since it was reported by someone else I do not >> seem to >> be the only one :-) >> >> So today I rebased the patch including my updates I made based on our >> discussion >> and I think it would make as much sense as it made back then to fix this. >> >> Christian Ehrhardt (1): >> vhost_user: avoid crash when exeeding file descriptors >> >> lib/librte_vhost/vhost_user/fd_man.c | 11 ++++++----- >> lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++-- >> 2 files changed, 23 insertions(+), 7 deletions(-) >> >> -- >> 2.7.4 >> >> >
Now let us really get to this old discussion. Once again - sorry for mixing this up :-/ Found too much at once today. ## Ok - refocus ## Back then in http://dpdk.org/dev/patchwork/patch/12103/ http://dpdk.org/dev/patchwork/patch/12118/ We came up with a nice solution for the leak. But it was never picked up upstream. There were a lot of changes to all of that code, especially vhost client/server. Lacking an openvswitch for DPDK 16.07 I can't test if the issue still shows up, but looking at the code suggests it still does. Unfortunately the internal structures and scopes are slightly different so that I couldn't come up with a working forward port of the old patch. Attached is a patch as close as I got (obviously not working). I'm convinced that Yuanhan Liu and Xie Huawei with their deep context knowledge of dpdk's vhost_user will quickly know: - if the root cause still applies - what the best new way of fixing this would be As this makes long term usage of dpdk aborting by a leak I hope we have a chance to get this into 16.07 still. Kind Regards, Christian Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Wed, Jul 6, 2016 at 2:30 PM, Christian Ehrhardt < christian.ehrhardt@canonical.com> wrote: > This is the series this really belongs to > http://dpdk.org/dev/patchwork/patch/11581/ > Message ID < > 1458292380-9258-1-git-send-email-patrik.r.andersson@ericsson.com> > > Should I wait for a v2 to point the patch at the right ID or do you prefer > a fixed resubmit right away? > > Christian Ehrhardt > Software Engineer, Ubuntu Server > Canonical Ltd > > On Wed, Jul 6, 2016 at 2:26 PM, Christian Ehrhardt < > christian.ehrhardt@canonical.com> wrote: > >> Sorry, >> please ignore the two links, the cover letter has - they belong to a >> different issue I have to bring up again. >> Everything else still applies. >> >> Christian Ehrhardt >> Software Engineer, Ubuntu Server >> Canonical Ltd >> >> On Wed, Jul 6, 2016 at 2:24 PM, Christian Ehrhardt < >> christian.ehrhardt@canonical.com> wrote: >> >>> Hi, >>> while checking for dpdk 16.07 what backports are accepted in the >>> meantime so I >>> can drop them I found this particular discussion has been silently >>> forgotten by >>> all of us. >>> >>> Back then we had the patch and discussion first in >>> http://dpdk.org/dev/patchwork/patch/12103/ >>> and then >>> http://dpdk.org/dev/patchwork/patch/12118/ >>> >>> Things worked fine as I reported and I integrated the patch in our >>> packaging as >>> it fixed a severe issue. Since it was reported by someone else I do not >>> seem to >>> be the only one :-) >>> >>> So today I rebased the patch including my updates I made based on our >>> discussion >>> and I think it would make as much sense as it made back then to fix this. >>> >>> Christian Ehrhardt (1): >>> vhost_user: avoid crash when exeeding file descriptors >>> >>> lib/librte_vhost/vhost_user/fd_man.c | 11 ++++++----- >>> lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++-- >>> 2 files changed, 23 insertions(+), 7 deletions(-) >>> >>> -- >>> 2.7.4 >>> >>> >> >
On Wed, Jul 06, 2016 at 02:24:57PM +0200, Christian Ehrhardt wrote: > Hi, > while checking for dpdk 16.07 what backports are accepted in the meantime so I > can drop them I found this particular discussion has been silently forgotten by > all of us. Not really. As later I found that my patch was actually wrong (besides the issue you already found). I will revisit this issue thoroughly when get time, hopefully, next week. --yliu
On Wed, Jul 06, 2016 at 09:08:12PM +0800, Yuanhan Liu wrote: > On Wed, Jul 06, 2016 at 02:24:57PM +0200, Christian Ehrhardt wrote: > > Hi, > > while checking for dpdk 16.07 what backports are accepted in the meantime so I > > can drop them I found this particular discussion has been silently forgotten by > > all of us. > > Not really. As later I found that my patch was actually wrong (besides > the issue you already found). I will revisit this issue thoroughly when > get time, hopefully, next week. Here it is: vhost_destroy() will be invoked when: - QEMU quits gracefully - QEMU terminates unexpectedly Meaning, there should be no memory leak. I think we are fine here (I maybe wrong though, feeling a bit dizzy now :( --yliu
Hi, thanks for evaluating. I'll need a few days until I'm ready for OVS again and until OVS is ready for DPDK 16.07. Then I can run an adapted version of my old testcase to be sure. In the worst cases it will be in 16.11 and I'll backport to 16.07 as distributed. I'll let you know then. Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Tue, Jul 12, 2016 at 2:08 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Wed, Jul 06, 2016 at 09:08:12PM +0800, Yuanhan Liu wrote: > > On Wed, Jul 06, 2016 at 02:24:57PM +0200, Christian Ehrhardt wrote: > > > Hi, > > > while checking for dpdk 16.07 what backports are accepted in the > meantime so I > > > can drop them I found this particular discussion has been silently > forgotten by > > > all of us. > > > > Not really. As later I found that my patch was actually wrong (besides > > the issue you already found). I will revisit this issue thoroughly when > > get time, hopefully, next week. > > Here it is: vhost_destroy() will be invoked when: > > - QEMU quits gracefully > - QEMU terminates unexpectedly > > Meaning, there should be no memory leak. I think we are fine here (I > maybe wrong though, feeling a bit dizzy now :( > > --yliu >
Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c =================================================================== --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _ } vdev_ctx.fh = fh; + vserver->fh = fh; size = strnlen(vserver->path, PATH_MAX); ops->set_ifname(vdev_ctx, vserver->path, size); @@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char * for (i = 0; i < g_vhost_server.vserver_cnt; i++) { if (!strcmp(g_vhost_server.server[i]->path, path)) { + struct vhost_device_ctx ctx; + + ctx.fh = g_vhost_server.server[i]->fh; + ops->destroy_device(ctx); + fdset_del(&g_vhost_server.fdset, g_vhost_server.server[i]->listenfd); Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h =================================================================== --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h @@ -43,6 +43,7 @@ struct vhost_server { char *path; /**< The path the uds is bind to. */ int listenfd; /**< The listener sockfd. */ + uint32_t fh; }; /* refer to hw/virtio/vhost-user.c */