[dpdk-dev] vhost: fix segfault as handle set_mem_table message

Message ID 1510746068-143223-1-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Yuanhan Liu
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Jianfeng Tan Nov. 15, 2017, 11:41 a.m. UTC
  In a running VM, operations (like device attach/detach) will
trigger the QEMU to resend set_mem_table to vhost-user backend.

DPDK vhost-user handles this message rudely by unmap all existing
regions and map new ones. This might lead to segfault if there
is pmd thread just trying to touch those unmapped memory regions.

But for most cases, except VM memory hotplug, QEMU still sends the
set_mem_table message even the memory regions are not changed as
QEMU vhost-user filters out those not backed by file (fd > 0).

To fix this case, we add a check in the handler to see if the
memory regions are really changed; if not, we just keep old memory
regions.

Fixes: 8f972312b8f4 ("vhost: support vhost-user")

CC: stable@dpdk.org

CC: Yuanhan Liu <yliu@fridaylinux.org>
CC: Maxime Coquelin <maxime.coquelin@redhat.com>

Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 lib/librte_vhost/vhost_user.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
  

Comments

Maxime Coquelin Nov. 28, 2017, 12:09 p.m. UTC | #1
On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
> In a running VM, operations (like device attach/detach) will
> trigger the QEMU to resend set_mem_table to vhost-user backend.
> 
> DPDK vhost-user handles this message rudely by unmap all existing
> regions and map new ones. This might lead to segfault if there
> is pmd thread just trying to touch those unmapped memory regions.
> 
> But for most cases, except VM memory hotplug, QEMU still sends the
> set_mem_table message even the memory regions are not changed as
> QEMU vhost-user filters out those not backed by file (fd > 0).
> 
> To fix this case, we add a check in the handler to see if the
> memory regions are really changed; if not, we just keep old memory
> regions.
> 
> Fixes: 8f972312b8f4 ("vhost: support vhost-user")
> 
> CC: stable@dpdk.org
> 
> CC: Yuanhan Liu <yliu@fridaylinux.org>
> CC: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
> Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
> Signed-off-by: Yi Yang <yi.y.yang@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>   lib/librte_vhost/vhost_user.c | 33 +++++++++++++++++++++++++++++++++
>   1 file changed, 33 insertions(+)

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime
  
Yuanhan Liu Dec. 5, 2017, 2:19 p.m. UTC | #2
On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:
> 
> 
> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
> >In a running VM, operations (like device attach/detach) will
> >trigger the QEMU to resend set_mem_table to vhost-user backend.
> >
> >DPDK vhost-user handles this message rudely by unmap all existing
> >regions and map new ones. This might lead to segfault if there
> >is pmd thread just trying to touch those unmapped memory regions.
> >
> >But for most cases, except VM memory hotplug, QEMU still sends the
> >set_mem_table message even the memory regions are not changed as
> >QEMU vhost-user filters out those not backed by file (fd > 0).
> >
> >To fix this case, we add a check in the handler to see if the
> >memory regions are really changed; if not, we just keep old memory
> >regions.
> >
> >Fixes: 8f972312b8f4 ("vhost: support vhost-user")
> >
> >CC: stable@dpdk.org
> >
> >CC: Yuanhan Liu <yliu@fridaylinux.org>
> >CC: Maxime Coquelin <maxime.coquelin@redhat.com>
> >
> >Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
> >Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
> >Signed-off-by: Yi Yang <yi.y.yang@intel.com>
> >Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >---
> >  lib/librte_vhost/vhost_user.c | 33 +++++++++++++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> 
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Applied to dpdk-next-virtio.

Thanks.

	--yliu
  
Maxime Coquelin Dec. 5, 2017, 2:28 p.m. UTC | #3
On 12/05/2017 03:19 PM, Yuanhan Liu wrote:
> On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:
>>
>>
>> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
>>> In a running VM, operations (like device attach/detach) will
>>> trigger the QEMU to resend set_mem_table to vhost-user backend.
>>>
>>> DPDK vhost-user handles this message rudely by unmap all existing
>>> regions and map new ones. This might lead to segfault if there
>>> is pmd thread just trying to touch those unmapped memory regions.
>>>
>>> But for most cases, except VM memory hotplug, 

FYI, Victor is working on implementing a lock-less protection mechanism
to prevent crashes in such cases. It is intended first to protect
log_base in case of multiqueue + live-migration, but would solve thi
issue too.

>>>> QEMU still sends the
>>> set_mem_table message even the memory regions are not changed as
>>> QEMU vhost-user filters out those not backed by file (fd > 0).
>>>
>>> To fix this case, we add a check in the handler to see if the
>>> memory regions are really changed; if not, we just keep old memory
>>> regions.
>>>
>>> Fixes: 8f972312b8f4 ("vhost: support vhost-user")
>>>
>>> CC: stable@dpdk.org
>>>
>>> CC: Yuanhan Liu <yliu@fridaylinux.org>
>>> CC: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>
>>> Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
>>> Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
>>> Signed-off-by: Yi Yang <yi.y.yang@intel.com>
>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>> ---
>>>   lib/librte_vhost/vhost_user.c | 33 +++++++++++++++++++++++++++++++++
>>>   1 file changed, 33 insertions(+)
>>
>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Applied to dpdk-next-virtio.
> 
> Thanks.
> 
> 	--yliu
> 

Maxime
  
Jianfeng Tan March 29, 2018, 7:01 a.m. UTC | #4
Hi Maxime and Victor,

> -----Original Message-----

> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]

> Sent: Tuesday, December 5, 2017 10:28 PM

> To: Yuanhan Liu; Tan, Jianfeng; Victor Kaplansky

> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Yi Y

> Subject: Re: [PATCH] vhost: fix segfault as handle set_mem_table message

> 

> 

> 

> On 12/05/2017 03:19 PM, Yuanhan Liu wrote:

> > On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:

> >>

> >>

> >> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:

> >>> In a running VM, operations (like device attach/detach) will

> >>> trigger the QEMU to resend set_mem_table to vhost-user backend.

> >>>

> >>> DPDK vhost-user handles this message rudely by unmap all existing

> >>> regions and map new ones. This might lead to segfault if there

> >>> is pmd thread just trying to touch those unmapped memory regions.

> >>>

> >>> But for most cases, except VM memory hotplug,

> 

> FYI, Victor is working on implementing a lock-less protection mechanism

> to prevent crashes in such cases. It is intended first to protect

> log_base in case of multiqueue + live-migration, but would solve thi

> issue too.


Bring this issue back for discussion.

Reported by SPDK guys, even with per-queue lock, they could still run into crash as of memory hot plug or unplug.
In detail, you add the lock in an internal struct, vhost_queue which is, unfortunately, not visible to the external datapath, like vhost-scsi in SPDK.

The memory hot plug and unplug would be the main issue from SPDK side so far. For this specific issue, I think we can continue this patch to filter out the changed regions, and keep unchanged regions not remapped.

But I know that the per-vq lock is not only to resolve the memory table issue, some other vhost-user messages could also lead to problems? If yes, shall we take a step back, to think about how to solve this kind of issues for backends, like vhost-scsi.

Thoughts?

Thanks,
Jianfeng

> 

> >>>> QEMU still sends the

> >>> set_mem_table message even the memory regions are not changed as

> >>> QEMU vhost-user filters out those not backed by file (fd > 0).

> >>>

> >>> To fix this case, we add a check in the handler to see if the

> >>> memory regions are really changed; if not, we just keep old memory

> >>> regions.

> >>>

> >>> Fixes: 8f972312b8f4 ("vhost: support vhost-user")

> >>>

> >>> CC: stable@dpdk.org

> >>>

> >>> CC: Yuanhan Liu <yliu@fridaylinux.org>

> >>> CC: Maxime Coquelin <maxime.coquelin@redhat.com>

> >>>

> >>> Reported-by: Yang Zhang <zy107165@alibaba-inc.com>

> >>> Reported-by: Xin Long <longxin.xl@alibaba-inc.com>

> >>> Signed-off-by: Yi Yang <yi.y.yang@intel.com>

> >>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

> >>> ---

> >>>   lib/librte_vhost/vhost_user.c | 33

> +++++++++++++++++++++++++++++++++

> >>>   1 file changed, 33 insertions(+)

> >>

> >> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

> >

> > Applied to dpdk-next-virtio.

> >

> > Thanks.

> >

> > 	--yliu

> >

> 

> Maxime
  
Maxime Coquelin March 29, 2018, 7:35 a.m. UTC | #5
On 03/29/2018 09:01 AM, Tan, Jianfeng wrote:
> Hi Maxime and Victor,
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Tuesday, December 5, 2017 10:28 PM
>> To: Yuanhan Liu; Tan, Jianfeng; Victor Kaplansky
>> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Yi Y
>> Subject: Re: [PATCH] vhost: fix segfault as handle set_mem_table message
>>
>>
>>
>> On 12/05/2017 03:19 PM, Yuanhan Liu wrote:
>>> On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
>>>>> In a running VM, operations (like device attach/detach) will
>>>>> trigger the QEMU to resend set_mem_table to vhost-user backend.
>>>>>
>>>>> DPDK vhost-user handles this message rudely by unmap all existing
>>>>> regions and map new ones. This might lead to segfault if there
>>>>> is pmd thread just trying to touch those unmapped memory regions.
>>>>>
>>>>> But for most cases, except VM memory hotplug,
>>
>> FYI, Victor is working on implementing a lock-less protection mechanism
>> to prevent crashes in such cases. It is intended first to protect
>> log_base in case of multiqueue + live-migration, but would solve thi
>> issue too.
> 
> Bring this issue back for discussion.
> 
> Reported by SPDK guys, even with per-queue lock, they could still run into crash as of memory hot plug or unplug.
> In detail, you add the lock in an internal struct, vhost_queue which is, unfortunately, not visible to the external datapath, like vhost-scsi in SPDK.

Yes, I agree current solution is not enough

> The memory hot plug and unplug would be the main issue from SPDK side so far. For this specific issue, I think we can continue this patch to filter out the changed regions, and keep unchanged regions not remapped.
> 
> But I know that the per-vq lock is not only to resolve the memory table issue, some other vhost-user messages could also lead to problems? If yes, shall we take a step back, to think about how to solve this kind of issues for backends, like vhost-scsi.

Right, any message that can change the device or virtqueue states can be
problematic.

> Thoughts?

In another thread, SPDK people proposed to destroy and re-create the
device for every message. I think this is not acceptable.

I proposed an alternative that I think would work:
- external backends & applications implements the .vring_state_changed()
   callback. On disable it stops processing the rings and only return
   once all descriptor buffers are processed. On enable, they resume the
   rings processing.
- In vhost lib, we implement vhost_vring_pause and vhost_vring_resume
   functions. In pause function, we save device state (enable or
   disable) in a variable, and if the ring is enabled at device level it
   calls .vring_state_changed() with disabled state. In resume, it checks
   the vring state in the device and call .vring_state_changed() with
   enable state if it was enabled in device state.

So, I think that would work but I hadn't had a clear reply from SPDK
people proving it wouldn't.

They asked we revert Victor's patch, but I don't see the need as it does
not hurt SPDK (but doesn't protect anything for them I agree), while it
really fixes real issues with internal Net backend.

What do you think of my proposal? Do you see other alternative?

Thanks,
Maxime

> Thanks,
> Jianfeng
> 
>>
>>>>>> QEMU still sends the
>>>>> set_mem_table message even the memory regions are not changed as
>>>>> QEMU vhost-user filters out those not backed by file (fd > 0).
>>>>>
>>>>> To fix this case, we add a check in the handler to see if the
>>>>> memory regions are really changed; if not, we just keep old memory
>>>>> regions.
>>>>>
>>>>> Fixes: 8f972312b8f4 ("vhost: support vhost-user")
>>>>>
>>>>> CC: stable@dpdk.org
>>>>>
>>>>> CC: Yuanhan Liu <yliu@fridaylinux.org>
>>>>> CC: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>
>>>>> Reported-by: Yang Zhang <zy107165@alibaba-inc.com>
>>>>> Reported-by: Xin Long <longxin.xl@alibaba-inc.com>
>>>>> Signed-off-by: Yi Yang <yi.y.yang@intel.com>
>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>>> ---
>>>>>    lib/librte_vhost/vhost_user.c | 33
>> +++++++++++++++++++++++++++++++++
>>>>>    1 file changed, 33 insertions(+)
>>>>
>>>> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>
>>> Applied to dpdk-next-virtio.
>>>
>>> Thanks.
>>>
>>> 	--yliu
>>>
>>
>> Maxime
  
Jianfeng Tan March 29, 2018, 12:59 p.m. UTC | #6
On 3/29/2018 3:35 PM, Maxime Coquelin wrote:
>
>
> On 03/29/2018 09:01 AM, Tan, Jianfeng wrote:
>> Hi Maxime and Victor,
>>
>>> -----Original Message-----
>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>> Sent: Tuesday, December 5, 2017 10:28 PM
>>> To: Yuanhan Liu; Tan, Jianfeng; Victor Kaplansky
>>> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Yi Y
>>> Subject: Re: [PATCH] vhost: fix segfault as handle set_mem_table 
>>> message
>>>
>>>
>>>
>>> On 12/05/2017 03:19 PM, Yuanhan Liu wrote:
>>>> On Tue, Nov 28, 2017 at 01:09:29PM +0100, Maxime Coquelin wrote:
>>>>>
>>>>>
>>>>> On 11/15/2017 12:41 PM, Jianfeng Tan wrote:
>>>>>> In a running VM, operations (like device attach/detach) will
>>>>>> trigger the QEMU to resend set_mem_table to vhost-user backend.
>>>>>>
>>>>>> DPDK vhost-user handles this message rudely by unmap all existing
>>>>>> regions and map new ones. This might lead to segfault if there
>>>>>> is pmd thread just trying to touch those unmapped memory regions.
>>>>>>
>>>>>> But for most cases, except VM memory hotplug,
>>>
>>> FYI, Victor is working on implementing a lock-less protection mechanism
>>> to prevent crashes in such cases. It is intended first to protect
>>> log_base in case of multiqueue + live-migration, but would solve thi
>>> issue too.
>>
>> Bring this issue back for discussion.
>>
>> Reported by SPDK guys, even with per-queue lock, they could still run 
>> into crash as of memory hot plug or unplug.
>> In detail, you add the lock in an internal struct, vhost_queue which 
>> is, unfortunately, not visible to the external datapath, like 
>> vhost-scsi in SPDK.
>
> Yes, I agree current solution is not enough
>
>> The memory hot plug and unplug would be the main issue from SPDK side 
>> so far. For this specific issue, I think we can continue this patch 
>> to filter out the changed regions, and keep unchanged regions not 
>> remapped.

How do you think that we move forward on this specific memory issue? I 
think it can be parallel with the general mechanism below.

>>
>> But I know that the per-vq lock is not only to resolve the memory 
>> table issue, some other vhost-user messages could also lead to 
>> problems? If yes, shall we take a step back, to think about how to 
>> solve this kind of issues for backends, like vhost-scsi.
>
> Right, any message that can change the device or virtqueue states can be
> problematic.
>
>> Thoughts?
>
> In another thread, 

Apologize for starting another thread.

> SPDK people proposed to destroy and re-create the
> device for every message. I think this is not acceptable.

It sounds a little overkill and error-prone in my first impression.

>
> I proposed an alternative that I think would work:
> - external backends & applications implements the .vring_state_changed()
>   callback. On disable it stops processing the rings and only return
>   once all descriptor buffers are processed. On enable, they resume the
>   rings processing.

OK, this gives a chance for external backends & applications to sync 
(lock or event) with polling threads, just like how destroy_device() works.

> - In vhost lib, we implement vhost_vring_pause and vhost_vring_resume
>   functions. In pause function, we save device state (enable or
>   disable) in a variable, and if the ring is enabled at device level it
>   calls .vring_state_changed() with disabled state. In resume, it checks
>   the vring state in the device and call .vring_state_changed() with
>   enable state if it was enabled in device state.

>
> So, I think that would work but I hadn't had a clear reply from SPDK
> people proving it wouldn't.
>
> They asked we revert Victor's patch, but I don't see the need as it does
> not hurt SPDK (but doesn't protect anything for them I agree), while it
> really fixes real issues with internal Net backend.

I agree with you. As SPDK has no chance to call the lock, it can not be 
affected. I think what people (including myself) are really against is 
adding lock in DPDK PMD. But we did not find a better way so far.

>
> What do you think of my proposal? Do you see other alternative?


Sounds a feasible way. Let's wait and see how SPDK guys think.

Thanks,
Jianfeng

>
> Thanks,
> Maxime
>
>> Thanks,
>> Jianfeng
  

Patch

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index f4c7ce4..2291929 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -573,6 +573,29 @@  dump_guest_pages(struct virtio_net *dev)
 #define dump_guest_pages(dev)
 #endif
 
+static bool vhost_memory_changed(struct VhostUserMemory *new,
+				 struct rte_vhost_memory *old)
+{
+	uint32_t i;
+
+	if (new->nregions != old->nregions)
+		return true;
+
+	for (i = 0; i < new->nregions; ++i) {
+		VhostUserMemoryRegion *new_r = &new->regions[i];
+		struct rte_vhost_mem_region *old_r = &old->regions[i];
+
+		if (new_r->guest_phys_addr != old_r->guest_phys_addr)
+			return true;
+		if (new_r->memory_size != old_r->size)
+			return true;
+		if (new_r->userspace_addr != old_r->guest_user_addr)
+			return true;
+	}
+
+	return false;
+}
+
 static int
 vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 {
@@ -585,6 +608,16 @@  vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	uint32_t i;
 	int fd;
 
+	if (dev->mem && !vhost_memory_changed(&memory, dev->mem)) {
+		RTE_LOG(INFO, VHOST_CONFIG,
+			"(%d) memory regions not changed\n", dev->vid);
+
+		for (i = 0; i < memory.nregions; i++)
+			close(pmsg->fds[i]);
+
+		return 0;
+	}
+
 	if (dev->mem) {
 		free_mem_region(dev);
 		rte_free(dev->mem);