[dpdk-dev,2/3] net/virtio_user: fix wrong sequence of messages

Message ID 1470397003-5782-3-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Yuanhan Liu
Headers

Commit Message

Jianfeng Tan Aug. 5, 2016, 11:36 a.m. UTC
  When virtio_user is used with VPP's native vhost user, it cannot
send/receive any packets.

The root cause is that vpp-vhost-user translates the message
VHOST_USER_SET_FEATURES as puting this device into init state,
aka, zero all related structures. However, previous code
puts this message at last in the whole initialization process,
which leads to all previous information are zeroed.

To fix this issue, we rearrange the sequence of those messages.
  - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
    virtqueue structures;
  - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
  - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
  - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
    VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
    queue;
  - ...

Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")

Reported-by: Zhihong Wang <zhihong.wang@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
 1 file changed, 72 insertions(+), 48 deletions(-)
  

Comments

Stephen Hemminger Aug. 5, 2016, 4:36 p.m. UTC | #1
On Fri,  5 Aug 2016 11:36:42 +0000
Jianfeng Tan <jianfeng.tan@intel.com> wrote:

> When virtio_user is used with VPP's native vhost user, it cannot
> send/receive any packets.
> 
> The root cause is that vpp-vhost-user translates the message
> VHOST_USER_SET_FEATURES as puting this device into init state,
> aka, zero all related structures. However, previous code
> puts this message at last in the whole initialization process,
> which leads to all previous information are zeroed.

Not sure what correct behavior is here.  It could be that VPP native
vhost user is broken.  What does QEMU/KVM vhost do in this case?
I would take that as the authoritative source for semantics.

> To fix this issue, we rearrange the sequence of those messages.
>   - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
>     virtqueue structures;
>   - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
>   - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
>   - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
>     VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
>     queue;
>   - ...
> 
> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
  
Jianfeng Tan Aug. 8, 2016, 1:19 a.m. UTC | #2
Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, August 6, 2016 12:36 AM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org; yuanhan.liu@linux.intel.com; Wang, Zhihong;
> lining18@jd.com
> Subject: Re: [dpdk-dev] [PATCH 2/3] net/virtio_user: fix wrong sequence of
> messages
> 
> On Fri,  5 Aug 2016 11:36:42 +0000
> Jianfeng Tan <jianfeng.tan@intel.com> wrote:
> 
> > When virtio_user is used with VPP's native vhost user, it cannot
> > send/receive any packets.
> >
> > The root cause is that vpp-vhost-user translates the message
> > VHOST_USER_SET_FEATURES as puting this device into init state,
> > aka, zero all related structures. However, previous code
> > puts this message at last in the whole initialization process,
> > which leads to all previous information are zeroed.
> 
> Not sure what correct behavior is here.  It could be that VPP native
> vhost user is broken.  What does QEMU/KVM vhost do in this case?
> I would take that as the authoritative source for semantics.

Below corrective message sequence is as per QEMU's behavior. One more thing, QEMU does not have any docs for this, and it's figured out through how the vhost receives messages from QEMU.

Thanks,
Jianfeng

> 
> > To fix this issue, we rearrange the sequence of those messages.
> >   - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
> >     virtqueue structures;
> >   - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
> >   - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
> >   - step 3, send VHOST_USER_SET_VRING_NUM,
> VHOST_USER_SET_VRING_BASE,
> >     VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for
> each
> >     queue;
> >   - ...
> >
> > Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
  
Yuanhan Liu Sept. 6, 2016, 6:42 a.m. UTC | #3
On Fri, Aug 05, 2016 at 11:36:42AM +0000, Jianfeng Tan wrote:
> When virtio_user is used with VPP's native vhost user, it cannot
> send/receive any packets.
> 
> The root cause is that vpp-vhost-user translates the message
> VHOST_USER_SET_FEATURES as puting this device into init state,
> aka, zero all related structures. However, previous code
> puts this message at last in the whole initialization process,
> which leads to all previous information are zeroed.
> 
> To fix this issue, we rearrange the sequence of those messages.
>   - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
>     virtqueue structures;

Yes, it is. However, it's not that right to do that (you see there is
a FIXME in vhost_user_set_vring_call()).

That means it need be fixed: we should not rely on fact that it's the
first per-vring message we will get in the current QEMU implementation
as the truth.

That also means, naming a function like virtio_user_create_queue() based
on above behaviour is wrong.

>   - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
>   - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
>   - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
>     VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
>     queue;
>   - ...
> 
> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
> 
> Reported-by: Zhihong Wang <zhihong.wang@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
>  1 file changed, 72 insertions(+), 48 deletions(-)

That's too much of code for a bug fix. I'm wondering how about just
moving VHOST_USER_GET_PROTOCOL_FEATURES ahead, to the begining of
virtio_user_start_device()? It should fix this issue.

	--yliu
  
Jianfeng Tan Sept. 6, 2016, 7:54 a.m. UTC | #4
Hi Yuanhan,


On 9/6/2016 2:42 PM, Yuanhan Liu wrote:
> On Fri, Aug 05, 2016 at 11:36:42AM +0000, Jianfeng Tan wrote:
>> When virtio_user is used with VPP's native vhost user, it cannot
>> send/receive any packets.
>>
>> The root cause is that vpp-vhost-user translates the message
>> VHOST_USER_SET_FEATURES as puting this device into init state,
>> aka, zero all related structures. However, previous code
>> puts this message at last in the whole initialization process,
>> which leads to all previous information are zeroed.
>>
>> To fix this issue, we rearrange the sequence of those messages.
>>    - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
>>      virtqueue structures;
> Yes, it is. However, it's not that right to do that (you see there is
> a FIXME in vhost_user_set_vring_call()).

I suppose you are specifying vhost_set_vring_call().

>
> That means it need be fixed: we should not rely on fact that it's the
> first per-vring message we will get in the current QEMU implementation
> as the truth.
>
> That also means, naming a function like virtio_user_create_queue() based
> on above behaviour is wrong.

It's actually a good catch. After a light thought, I think in DPDK 
vhost, we may need to create those virtqueues once unix socket gets 
connected, just like in vhost-net, virtqueues are created on char file 
open. Right?

>
>>    - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
>>    - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
>>    - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
>>      VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
>>      queue;
>>    - ...
>>
>> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
>>
>> Reported-by: Zhihong Wang <zhihong.wang@intel.com>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>> ---
>>   drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
>>   1 file changed, 72 insertions(+), 48 deletions(-)
> That's too much of code for a bug fix. I'm wondering how about just
> moving VHOST_USER_GET_PROTOCOL_FEATURES ahead, to the begining of
> virtio_user_start_device()? It should fix this issue.

Why does VHOST_USER_GET_PROTOCOL_FEATURES care? Do you mean shifting 
VHOST_USER_SET_FEATURES earlier?

Thanks,
Jianfeng

>
> 	--yliu
  
Yuanhan Liu Sept. 6, 2016, 8:20 a.m. UTC | #5
On Tue, Sep 06, 2016 at 03:54:30PM +0800, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> 
> On 9/6/2016 2:42 PM, Yuanhan Liu wrote:
> >On Fri, Aug 05, 2016 at 11:36:42AM +0000, Jianfeng Tan wrote:
> >>When virtio_user is used with VPP's native vhost user, it cannot
> >>send/receive any packets.
> >>
> >>The root cause is that vpp-vhost-user translates the message
> >>VHOST_USER_SET_FEATURES as puting this device into init state,
> >>aka, zero all related structures. However, previous code
> >>puts this message at last in the whole initialization process,
> >>which leads to all previous information are zeroed.
> >>
> >>To fix this issue, we rearrange the sequence of those messages.
> >>   - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
> >>     virtqueue structures;
> >Yes, it is. However, it's not that right to do that (you see there is
> >a FIXME in vhost_user_set_vring_call()).
> 
> I suppose you are specifying vhost_set_vring_call().

Oh, I was talking about the new code: I have renamed it to
vhost_user_set_vring_call :)

> >
> >That means it need be fixed: we should not rely on fact that it's the
> >first per-vring message we will get in the current QEMU implementation
> >as the truth.
> >
> >That also means, naming a function like virtio_user_create_queue() based
> >on above behaviour is wrong.
> 
> It's actually a good catch. After a light thought, I think in DPDK vhost, we
> may need to create those virtqueues once unix socket gets connected, just
> like in vhost-net, virtqueues are created on char file open. Right?

There is a difference: for vhost-net and tap mode, IIRC, it knows how
many queues before doing setup. While for vhost-user, it doesn't. That
means, we have to allocate and setup virtqueues reactively: just like
what we have done in vhost_set_vring_call(). What doesn't look perfect
is it assume SET_VRING_CALL is the first per-vring message we will get.

> 
> >
> >>   - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
> >>   - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
> >>   - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
> >>     VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
> >>     queue;
> >>   - ...
> >>
> >>Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
> >>
> >>Reported-by: Zhihong Wang <zhihong.wang@intel.com>
> >>Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >>---
> >>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
> >>  1 file changed, 72 insertions(+), 48 deletions(-)
> >That's too much of code for a bug fix. I'm wondering how about just
> >moving VHOST_USER_GET_PROTOCOL_FEATURES ahead, to the begining of
> >virtio_user_start_device()? It should fix this issue.
> 
> Why does VHOST_USER_GET_PROTOCOL_FEATURES care? Do you mean shifting
> VHOST_USER_SET_FEATURES earlier?

Oops, right, I meant SET_FEATURES. Sorry about confusion introduced by
the silly auto-completion.

	--yliu
  
Jianfeng Tan Sept. 8, 2016, 8:53 a.m. UTC | #6
On 9/6/2016 4:20 PM, Yuanhan Liu wrote:
> On Tue, Sep 06, 2016 at 03:54:30PM +0800, Tan, Jianfeng wrote:
>> Hi Yuanhan,
>>
>>
>> On 9/6/2016 2:42 PM, Yuanhan Liu wrote:
>>> On Fri, Aug 05, 2016 at 11:36:42AM +0000, Jianfeng Tan wrote:
>>>> When virtio_user is used with VPP's native vhost user, it cannot
>>>> send/receive any packets.
>>>>
>>>> The root cause is that vpp-vhost-user translates the message
>>>> VHOST_USER_SET_FEATURES as puting this device into init state,
>>>> aka, zero all related structures. However, previous code
>>>> puts this message at last in the whole initialization process,
>>>> which leads to all previous information are zeroed.
>>>>
>>>> To fix this issue, we rearrange the sequence of those messages.
>>>>    - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
>>>>      virtqueue structures;
>>> Yes, it is. However, it's not that right to do that (you see there is
>>> a FIXME in vhost_user_set_vring_call()).
>> I suppose you are specifying vhost_set_vring_call().
> Oh, I was talking about the new code: I have renamed it to
> vhost_user_set_vring_call :)
>
>>> That means it need be fixed: we should not rely on fact that it's the
>>> first per-vring message we will get in the current QEMU implementation
>>> as the truth.
>>>
>>> That also means, naming a function like virtio_user_create_queue() based
>>> on above behaviour is wrong.
>> It's actually a good catch. After a light thought, I think in DPDK vhost, we
>> may need to create those virtqueues once unix socket gets connected, just
>> like in vhost-net, virtqueues are created on char file open. Right?
> There is a difference: for vhost-net and tap mode, IIRC, it knows how
> many queues before doing setup.

No, from linux/drivers/vhost/net.c:vhost_net_open(), we can see that 
virtqueues are allocated according to VHOST_NET_VQ_MAX.
How about reconsidering previous suggestion to allocate vq once 
connection is established?
Never mind, above fix on the vhost side will not take effect on existing 
vpp-vhost implementations. We still need to fix it in the virtio side.

>   While for vhost-user, it doesn't. That
> means, we have to allocate and setup virtqueues reactively: just like
> what we have done in vhost_set_vring_call(). What doesn't look perfect
> is it assume SET_VRING_CALL is the first per-vring message we will get.

Yes, depending on the assumption that SET_VRING_CALL is the first 
per-vring message, looks like a bad implementation. As Stephen has 
suggested, it's more like a bug in vpp. If we treat it like that way, we 
will fix nothing here.


>>>>    - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
>>>>    - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
>>>>    - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
>>>>      VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
>>>>      queue;
>>>>    - ...
>>>>
>>>> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
>>>>
>>>> Reported-by: Zhihong Wang <zhihong.wang@intel.com>
>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>> ---
>>>>   drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
>>>>   1 file changed, 72 insertions(+), 48 deletions(-)
>>> That's too much of code for a bug fix. I'm wondering how about just
>>> moving VHOST_USER_GET_PROTOCOL_FEATURES ahead, to the begining of
>>> virtio_user_start_device()? It should fix this issue.
>> Why does VHOST_USER_GET_PROTOCOL_FEATURES care? Do you mean shifting
>> VHOST_USER_SET_FEATURES earlier?
> Oops, right, I meant SET_FEATURES. Sorry about confusion introduced by
> the silly auto-completion.

Still not working. VPP needs SET_VRING_CALL to create vq firstly.

Thanks,
Jianfeng

>
> 	--yliu
  
Yuanhan Liu Sept. 8, 2016, 12:18 p.m. UTC | #7
On Thu, Sep 08, 2016 at 04:53:22PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 9/6/2016 4:20 PM, Yuanhan Liu wrote:
> >On Tue, Sep 06, 2016 at 03:54:30PM +0800, Tan, Jianfeng wrote:
> >>Hi Yuanhan,
> >>
> >>
> >>On 9/6/2016 2:42 PM, Yuanhan Liu wrote:
> >>>On Fri, Aug 05, 2016 at 11:36:42AM +0000, Jianfeng Tan wrote:
> >>>>When virtio_user is used with VPP's native vhost user, it cannot
> >>>>send/receive any packets.
> >>>>
> >>>>The root cause is that vpp-vhost-user translates the message
> >>>>VHOST_USER_SET_FEATURES as puting this device into init state,
> >>>>aka, zero all related structures. However, previous code
> >>>>puts this message at last in the whole initialization process,
> >>>>which leads to all previous information are zeroed.
> >>>>
> >>>>To fix this issue, we rearrange the sequence of those messages.
> >>>>   - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
> >>>>     virtqueue structures;
> >>>Yes, it is. However, it's not that right to do that (you see there is
> >>>a FIXME in vhost_user_set_vring_call()).
> >>I suppose you are specifying vhost_set_vring_call().
> >Oh, I was talking about the new code: I have renamed it to
> >vhost_user_set_vring_call :)
> >
> >>>That means it need be fixed: we should not rely on fact that it's the
> >>>first per-vring message we will get in the current QEMU implementation
> >>>as the truth.
> >>>
> >>>That also means, naming a function like virtio_user_create_queue() based
> >>>on above behaviour is wrong.
> >>It's actually a good catch. After a light thought, I think in DPDK vhost, we
> >>may need to create those virtqueues once unix socket gets connected, just
> >>like in vhost-net, virtqueues are created on char file open. Right?
> >There is a difference: for vhost-net and tap mode, IIRC, it knows how
> >many queues before doing setup.
> 
> No, from linux/drivers/vhost/net.c:vhost_net_open(), we can see that
> virtqueues are allocated according to VHOST_NET_VQ_MAX.

Well, if you took a closer look, you will find VHOST_NET_VQ_MAX is
defined to 2. That means it allocates a queue-pair only.

And FYI, the MQ support for vhost-net is actually done in the tap
driver, but not in vhost-net driver. That results to the MQ
implementation is a bit different between vhost-net and vhost-user.

Put simply, in vhost-net, one queue-pair has a backend fd associated
with it. Vhost requests for different queue-pair are sent by different
fd. That also means the vhost-net doesn't even need be aware of the
MQ stuff.

However, for vhost-user implementation, all queue-pairs share one
socket fd. All requests all also sent over the single socket fd,
thus the backend (DPDK vhost) has to figure out how many queue
pairs are actually enabled: and we detect it by reading the
vring index of SET_VRING_CALL message; it's not quite right though.

> How about reconsidering previous suggestion to allocate vq once connection
> is established?

That's too much, because DPDK claims to support up to 0x8000
queue-pairs. Don't even to say that the vhost_virtqueue struct
was way too big before: it even holds an array of buf_vec with
size 256.

> Never mind, above fix on the vhost side will not take effect on existing
> vpp-vhost implementations.

Actually, I was talking about the DPDK vhost implementation :)

> We still need to fix it in the virtio side.

Yes, we could fix it in our side, even though VPP is broken.

> >  While for vhost-user, it doesn't. That
> >means, we have to allocate and setup virtqueues reactively: just like
> >what we have done in vhost_set_vring_call(). What doesn't look perfect
> >is it assume SET_VRING_CALL is the first per-vring message we will get.
> 
> Yes, depending on the assumption that SET_VRING_CALL is the first per-vring
> message, looks like a bad implementation. As Stephen has suggested, it's
> more like a bug in vpp. If we treat it like that way, we will fix nothing
> here.
> 
> 
> >>>>   - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
> >>>>   - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
> >>>>   - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
> >>>>     VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
> >>>>     queue;
> >>>>   - ...
> >>>>
> >>>>Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
> >>>>
> >>>>Reported-by: Zhihong Wang <zhihong.wang@intel.com>
> >>>>Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >>>>---
> >>>>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
> >>>>  1 file changed, 72 insertions(+), 48 deletions(-)
> >>>That's too much of code for a bug fix. I'm wondering how about just
> >>>moving VHOST_USER_GET_PROTOCOL_FEATURES ahead, to the begining of
> >>>virtio_user_start_device()? It should fix this issue.
> >>Why does VHOST_USER_GET_PROTOCOL_FEATURES care? Do you mean shifting
> >>VHOST_USER_SET_FEATURES earlier?
> >Oops, right, I meant SET_FEATURES. Sorry about confusion introduced by
> >the silly auto-completion.
> 
> Still not working. VPP needs SET_VRING_CALL to create vq firstly.

Didn't get it. In the proposal, SET_FEATURES is sent before every other
messages, thus it should not cause the issue you described in this patch.
Besides, haven't we already sent SET_VRING_CALL before other messages
(well, execept the SET_FEATURES and SET_MEM_TABLE message)? That's still
not enough for vpp's native vhost-user implementation?

	--yliu
  
Jianfeng Tan Sept. 9, 2016, 3:59 a.m. UTC | #8
On 9/8/2016 8:18 PM, Yuanhan Liu wrote:
> On Thu, Sep 08, 2016 at 04:53:22PM +0800, Tan, Jianfeng wrote:
>>
>> On 9/6/2016 4:20 PM, Yuanhan Liu wrote:
>>> On Tue, Sep 06, 2016 at 03:54:30PM +0800, Tan, Jianfeng wrote:
>>>> Hi Yuanhan,
>>>>
>>>>
>>>> On 9/6/2016 2:42 PM, Yuanhan Liu wrote:
>>>>> On Fri, Aug 05, 2016 at 11:36:42AM +0000, Jianfeng Tan wrote:
>>>>>> When virtio_user is used with VPP's native vhost user, it cannot
>>>>>> send/receive any packets.
>>>>>>
>>>>>> The root cause is that vpp-vhost-user translates the message
>>>>>> VHOST_USER_SET_FEATURES as puting this device into init state,
>>>>>> aka, zero all related structures. However, previous code
>>>>>> puts this message at last in the whole initialization process,
>>>>>> which leads to all previous information are zeroed.
>>>>>>
>>>>>> To fix this issue, we rearrange the sequence of those messages.
>>>>>>    - step 0, send VHOST_USER_SET_VRING_CALL so that vhost allocates
>>>>>>      virtqueue structures;
>>>>> Yes, it is. However, it's not that right to do that (you see there is
>>>>> a FIXME in vhost_user_set_vring_call()).
>>>> I suppose you are specifying vhost_set_vring_call().
>>> Oh, I was talking about the new code: I have renamed it to
>>> vhost_user_set_vring_call :)
>>>
>>>>> That means it need be fixed: we should not rely on fact that it's the
>>>>> first per-vring message we will get in the current QEMU implementation
>>>>> as the truth.
>>>>>
>>>>> That also means, naming a function like virtio_user_create_queue() based
>>>>> on above behaviour is wrong.
>>>> It's actually a good catch. After a light thought, I think in DPDK vhost, we
>>>> may need to create those virtqueues once unix socket gets connected, just
>>>> like in vhost-net, virtqueues are created on char file open. Right?
>>> There is a difference: for vhost-net and tap mode, IIRC, it knows how
>>> many queues before doing setup.
>> No, from linux/drivers/vhost/net.c:vhost_net_open(), we can see that
>> virtqueues are allocated according to VHOST_NET_VQ_MAX.
> Well, if you took a closer look, you will find VHOST_NET_VQ_MAX is
> defined to 2. That means it allocates a queue-pair only.
>
> And FYI, the MQ support for vhost-net is actually done in the tap
> driver, but not in vhost-net driver. That results to the MQ
> implementation is a bit different between vhost-net and vhost-user.
>
> Put simply, in vhost-net, one queue-pair has a backend fd associated
> with it. Vhost requests for different queue-pair are sent by different
> fd. That also means the vhost-net doesn't even need be aware of the
> MQ stuff.
>
> However, for vhost-user implementation, all queue-pairs share one
> socket fd. All requests all also sent over the single socket fd,
> thus the backend (DPDK vhost) has to figure out how many queue
> pairs are actually enabled: and we detect it by reading the
> vring index of SET_VRING_CALL message; it's not quite right though.

Aha, right, nice analysis.

>
>> How about reconsidering previous suggestion to allocate vq once connection
>> is established?
> That's too much, because DPDK claims to support up to 0x8000
> queue-pairs. Don't even to say that the vhost_virtqueue struct
> was way too big before: it even holds an array of buf_vec with
> size 256.

Another mistake of my memory, I was remember it wrongly as only 8 VQs 
are supported.
One thing not related, provided that VHOST_MAX_QUEUE_PAIRS equals to 
0x8000, struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2] 
spends 4MB for each virtio device, which could be a refined.

>
>> Never mind, above fix on the vhost side will not take effect on existing
>> vpp-vhost implementations.
> Actually, I was talking about the DPDK vhost implementation :)

This patch is talking about vpp's native vhost implementation, not 
dpdk-vhost, and not the way vpp uses dpdk-vhost.

>
>> We still need to fix it in the virtio side.
> Yes, we could fix it in our side, even though VPP is broken.

OK, let's back to this patch.

>
>>>   While for vhost-user, it doesn't. That
>>> means, we have to allocate and setup virtqueues reactively: just like
>>> what we have done in vhost_set_vring_call(). What doesn't look perfect
>>> is it assume SET_VRING_CALL is the first per-vring message we will get.
>> Yes, depending on the assumption that SET_VRING_CALL is the first per-vring
>> message, looks like a bad implementation. As Stephen has suggested, it's
>> more like a bug in vpp. If we treat it like that way, we will fix nothing
>> here.
>>
>>
>>>>>>    - step 1, send VHOST_USER_SET_FEATURES to confirm the features;
>>>>>>    - step 2, send VHOST_USER_SET_MEM_TABLE to share mem regions;
>>>>>>    - step 3, send VHOST_USER_SET_VRING_NUM, VHOST_USER_SET_VRING_BASE,
>>>>>>      VHOST_USER_SET_VRING_ADDR, VHOST_USER_SET_VRING_KICK for each
>>>>>>      queue;
>>>>>>    - ...
>>>>>>
>>>>>> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
>>>>>>
>>>>>> Reported-by: Zhihong Wang <zhihong.wang@intel.com>
>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>>>> ---
>>>>>>   drivers/net/virtio/virtio_user/virtio_user_dev.c | 120 ++++++++++++++---------
>>>>>>   1 file changed, 72 insertions(+), 48 deletions(-)
>>>>> That's too much of code for a bug fix. I'm wondering how about just
>>>>> moving VHOST_USER_GET_PROTOCOL_FEATURES ahead, to the begining of
>>>>> virtio_user_start_device()? It should fix this issue.
>>>> Why does VHOST_USER_GET_PROTOCOL_FEATURES care? Do you mean shifting
>>>> VHOST_USER_SET_FEATURES earlier?
>>> Oops, right, I meant SET_FEATURES. Sorry about confusion introduced by
>>> the silly auto-completion.
>> Still not working. VPP needs SET_VRING_CALL to create vq firstly.
> Didn't get it. In the proposal, SET_FEATURES is sent before every other
> messages, thus it should not cause the issue you described in this patch.

OK. Let me try to explain. We take three vhost implementations into 
consideration: dpdk-2.2-vhost, dpdk-master-vhost, vpp-native-vhost.

If set_feature before set_vring_call, dpdk-2.2-vhost will fail: inside 
set_feature handler, assigning header length to VQs which will be 
created in set_vring_call handler.
So we need to keep set_vring_call firstly. Then set_feature needs to be 
sent before any other msgs, this is what vpp-native-vhost requires. In 
all, the sequence is like this:
1. set_vring_call,
2. set_feature,
3. other msgs

> Besides, haven't we already sent SET_VRING_CALL before other messages
> (well, execept the SET_FEATURES and SET_MEM_TABLE message)?

Yes, set_vring_call is already in the first place, but we need to plugin 
set_feature between set_vring_call and other msgs. Previously, 
set_vring_call and other msgs are together.

Thanks,
Jianfeng

> That's still
> not enough for vpp's native vhost-user implementation?

> 	--yliu
  
Yuanhan Liu Sept. 9, 2016, 4:19 a.m. UTC | #9
On Fri, Sep 09, 2016 at 11:59:18AM +0800, Tan, Jianfeng wrote:
>                 It's actually a good catch. After a light thought, I think in DPDK vhost, we
>                 may need to create those virtqueues once unix socket gets connected, just
>                 like in vhost-net, virtqueues are created on char file open. Right?
> 
>             There is a difference: for vhost-net and tap mode, IIRC, it knows how
>             many queues before doing setup.
> 
>         No, from linux/drivers/vhost/net.c:vhost_net_open(), we can see that
>         virtqueues are allocated according to VHOST_NET_VQ_MAX.
> 
>     Well, if you took a closer look, you will find VHOST_NET_VQ_MAX is
>     defined to 2. That means it allocates a queue-pair only.
> 
>     And FYI, the MQ support for vhost-net is actually done in the tap
>     driver, but not in vhost-net driver. That results to the MQ
>     implementation is a bit different between vhost-net and vhost-user.
> 
>     Put simply, in vhost-net, one queue-pair has a backend fd associated
>     with it. Vhost requests for different queue-pair are sent by different
>     fd. That also means the vhost-net doesn't even need be aware of the
>     MQ stuff.
> 
>     However, for vhost-user implementation, all queue-pairs share one
>     socket fd. All requests all also sent over the single socket fd,
>     thus the backend (DPDK vhost) has to figure out how many queue
>     pairs are actually enabled: and we detect it by reading the
>     vring index of SET_VRING_CALL message; it's not quite right though.
> 
> 
> Aha, right, nice analysis.

Just some rough memory in my mind ;-)
> 
> 
> 
>         How about reconsidering previous suggestion to allocate vq once connection
>         is established?
> 
>     That's too much, because DPDK claims to support up to 0x8000
>     queue-pairs. Don't even to say that the vhost_virtqueue struct
>     was way too big before: it even holds an array of buf_vec with
>     size 256.
> 
> 
> Another mistake of my memory, I was remember it wrongly as only 8 VQs are
> supported.
> One thing not related, provided that VHOST_MAX_QUEUE_PAIRS equals to 0x8000,
> struct vhost_virtqueue  *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2] spends 4MB for
> each virtio device, which could be a refined.

Yes, we could allocate a small array first, and then re-allocate if
necessary.

> 
> 
> 
>         Never mind, above fix on the vhost side will not take effect on existing
>         vpp-vhost implementations.
> 
>     Actually, I was talking about the DPDK vhost implementation :)
> 
> 
> This patch is talking about vpp's native vhost implementation, not dpdk-vhost,
> and not the way vpp uses dpdk-vhost.

Yes, I know. What I meant is there was a "workaround" in DPDK vhost
implementation, and since you bring this issue on the table again,
it's a chance to think about how can we fix it.

A rough idea come to my mind is we could check all the per-vring message
at the begining of vhost_user_msg_handler() and allocate related vq when
necessary (when it's the first vring message we got).

Yeah, I know it's a bit ugly, but it at least gets rid of that "not-that-true"
assumption.

> 
>         Still not working. VPP needs SET_VRING_CALL to create vq firstly.
> 
>     Didn't get it. In the proposal, SET_FEATURES is sent before every other
>     messages, thus it should not cause the issue you described in this patch.
> 
> 
> OK. Let me try to explain. We take three vhost implementations into
> consideration: dpdk-2.2-vhost, dpdk-master-vhost, vpp-native-vhost.
> 
> If set_feature before set_vring_call, dpdk-2.2-vhost will fail: inside
> set_feature handler, assigning header length to VQs which will be created in
> set_vring_call handler.

Oh, right. That was an in-correct implementation.

> So we need to keep set_vring_call firstly.
> Then set_feature needs to be sent
> before any other msgs, this is what vpp-native-vhost requires. In all, the
> sequence is like this:
> 1. set_vring_call,
> 2. set_feature,
> 3. other msgs
> 
> 
>     Besides, haven't we already sent SET_VRING_CALL before other messages
>     (well, execept the SET_FEATURES and SET_MEM_TABLE message)?
> 
> 
> Yes, set_vring_call is already in the first place, but we need to plugin
> set_feature between set_vring_call and other msgs. Previously, set_vring_call
> and other msgs are together.

Okay. Another thing I noticed is that virtio-user lacks some feature
negotiations, like GET_FEATURES and GET_PROTOCOL_FEATURES. I think you
might need add them back somewhen?

	--yliu
  
Jianfeng Tan Sept. 9, 2016, 5:50 a.m. UTC | #10
On 9/9/2016 12:19 PM, Yuanhan Liu wrote:

>>
>>
>>          Never mind, above fix on the vhost side will not take effect on existing
>>          vpp-vhost implementations.
>>
>>      Actually, I was talking about the DPDK vhost implementation :)
>>
>>
>> This patch is talking about vpp's native vhost implementation, not dpdk-vhost,
>> and not the way vpp uses dpdk-vhost.
> Yes, I know. What I meant is there was a "workaround" in DPDK vhost
> implementation, and since you bring this issue on the table again,
> it's a chance to think about how can we fix it.
>
> A rough idea come to my mind is we could check all the per-vring message
> at the begining of vhost_user_msg_handler() and allocate related vq when
> necessary (when it's the first vring message we got).
>
> Yeah, I know it's a bit ugly, but it at least gets rid of that "not-that-true"
> assumption.

Sounds workable. So we'd define those vq-specific msgs, like:
VHOST_USER_SET_VRING_NUM,
VHOST_USER_SET_VRING_ADDR,
VHOST_USER_SET_VRING_BASE,
VHOST_USER_GET_VRING_BASE(?),
VHOST_USER_SET_VRING_KICK,
VHOST_USER_SET_VRING_CALL,
VHOST_USER_SET_VRING_ENABLE,


>
>>          Still not working. VPP needs SET_VRING_CALL to create vq firstly.
>>
>>      Didn't get it. In the proposal, SET_FEATURES is sent before every other
>>      messages, thus it should not cause the issue you described in this patch.
>>
>>
>> OK. Let me try to explain. We take three vhost implementations into
>> consideration: dpdk-2.2-vhost, dpdk-master-vhost, vpp-native-vhost.
>>
>> If set_feature before set_vring_call, dpdk-2.2-vhost will fail: inside
>> set_feature handler, assigning header length to VQs which will be created in
>> set_vring_call handler.
> Oh, right. That was an in-correct implementation.
>
>> So we need to keep set_vring_call firstly.
>> Then set_feature needs to be sent
>> before any other msgs, this is what vpp-native-vhost requires. In all, the
>> sequence is like this:
>> 1. set_vring_call,
>> 2. set_feature,
>> 3. other msgs
>>
>>
>>      Besides, haven't we already sent SET_VRING_CALL before other messages
>>      (well, execept the SET_FEATURES and SET_MEM_TABLE message)?
>>
>>
>> Yes, set_vring_call is already in the first place, but we need to plugin
>> set_feature between set_vring_call and other msgs. Previously, set_vring_call
>> and other msgs are together.
> Okay. Another thing I noticed is that virtio-user lacks some feature
> negotiations, like GET_FEATURES and GET_PROTOCOL_FEATURES. I think you
> might need add them back somewhen?

GET_FEATURES has been done in virtio_user_dev_init(). 
GET_PROTOCOL_FEATURES is not supported yet. I see those features in 
PROTOCOL_FEATURES is for live migration (right?). Assuming that, anyone 
using container/process now enables live migration so far? I don't think so.

Thanks,
Jianfeng


>
> 	--yliu
  
Yuanhan Liu Sept. 9, 2016, 6:03 a.m. UTC | #11
On Fri, Sep 09, 2016 at 01:50:16PM +0800, Tan, Jianfeng wrote:
> On 9/9/2016 12:19 PM, Yuanhan Liu wrote:
> 
> >>
> >>
> >>         Never mind, above fix on the vhost side will not take effect on existing
> >>         vpp-vhost implementations.
> >>
> >>     Actually, I was talking about the DPDK vhost implementation :)
> >>
> >>
> >>This patch is talking about vpp's native vhost implementation, not dpdk-vhost,
> >>and not the way vpp uses dpdk-vhost.
> >Yes, I know. What I meant is there was a "workaround" in DPDK vhost
> >implementation, and since you bring this issue on the table again,
> >it's a chance to think about how can we fix it.
> >
> >A rough idea come to my mind is we could check all the per-vring message
> >at the begining of vhost_user_msg_handler() and allocate related vq when
> >necessary (when it's the first vring message we got).
> >
> >Yeah, I know it's a bit ugly, but it at least gets rid of that "not-that-true"
> >assumption.
> 
> Sounds workable. So we'd define those vq-specific msgs, like:
> VHOST_USER_SET_VRING_NUM,
> VHOST_USER_SET_VRING_ADDR,
> VHOST_USER_SET_VRING_BASE,
> VHOST_USER_GET_VRING_BASE(?),
> VHOST_USER_SET_VRING_KICK,
> VHOST_USER_SET_VRING_CALL,
> VHOST_USER_SET_VRING_ENABLE,

Yes.

> >>         Still not working. VPP needs SET_VRING_CALL to create vq firstly.
> >>
> >>     Didn't get it. In the proposal, SET_FEATURES is sent before every other
> >>     messages, thus it should not cause the issue you described in this patch.
> >>
> >>
> >>OK. Let me try to explain. We take three vhost implementations into
> >>consideration: dpdk-2.2-vhost, dpdk-master-vhost, vpp-native-vhost.
> >>
> >>If set_feature before set_vring_call, dpdk-2.2-vhost will fail: inside
> >>set_feature handler, assigning header length to VQs which will be created in
> >>set_vring_call handler.
> >Oh, right. That was an in-correct implementation.
> >
> >>So we need to keep set_vring_call firstly.
> >>Then set_feature needs to be sent
> >>before any other msgs, this is what vpp-native-vhost requires. In all, the
> >>sequence is like this:
> >>1. set_vring_call,
> >>2. set_feature,
> >>3. other msgs
> >>
> >>
> >>     Besides, haven't we already sent SET_VRING_CALL before other messages
> >>     (well, execept the SET_FEATURES and SET_MEM_TABLE message)?
> >>
> >>
> >>Yes, set_vring_call is already in the first place, but we need to plugin
> >>set_feature between set_vring_call and other msgs. Previously, set_vring_call
> >>and other msgs are together.
> >Okay. Another thing I noticed is that virtio-user lacks some feature
> >negotiations, like GET_FEATURES and GET_PROTOCOL_FEATURES. I think you
> >might need add them back somewhen?
> 
> GET_FEATURES has been done in virtio_user_dev_init().

Oh, sorry, I missed that.

> GET_PROTOCOL_FEATURES
> is not supported yet. I see those features in PROTOCOL_FEATURES is for live
> migration (right?).

Not exactly. PROTOCOL_FEATURES was firstly introduced while MQ was
enabled. Thus it's no wonder MQ is the first protocol feature vhost
user supports:

    [yliu@yliu-dev ~/dpdk]$ gg PROTOCOL_F_ lib/librte_vhost/
    lib/librte_vhost/vhost_user.h:46:#define VHOST_USER_PROTOCOL_F_MQ 0
    lib/librte_vhost/vhost_user.h:47:#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
    lib/librte_vhost/vhost_user.h:48:#define VHOST_USER_PROTOCOL_F_RARP 2

	--yliu

> Assuming that, anyone using container/process now
> enables live migration so far? I don't think so.
> 
> Thanks,
> Jianfeng
> 
> 
> >
> >	--yliu
  
Jianfeng Tan Sept. 9, 2016, 6:24 a.m. UTC | #12
On 9/9/2016 2:03 PM, Yuanhan Liu wrote:
>> GET_PROTOCOL_FEATURES
>> is not supported yet. I see those features in PROTOCOL_FEATURES is for live
>> migration (right?).
> Not exactly. PROTOCOL_FEATURES was firstly introduced while MQ was
> enabled. Thus it's no wonder MQ is the first protocol feature vhost
> user supports:
>
>      [yliu@yliu-dev ~/dpdk]$ gg PROTOCOL_F_ lib/librte_vhost/
>      lib/librte_vhost/vhost_user.h:46:#define VHOST_USER_PROTOCOL_F_MQ 0
>      lib/librte_vhost/vhost_user.h:47:#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
>      lib/librte_vhost/vhost_user.h:48:#define VHOST_USER_PROTOCOL_F_RARP 2
>
> 	--yliu

OK, I got it. The maximum of queue pair number is now a parameter of 
virtio_user, but we need to depend on PROTOCOL_FEATURES (further, 
VHOST_USER_GET_QUEUE_NUM) to maximum queue pair number that vhost can 
support.

Just wonder why not QEMU depends on (1ULL << VIRTIO_NET_F_MQ) inside 
features to do that?

Thanks,
Jianfeng

>
>> Assuming that, anyone using container/process now
>> enables live migration so far? I don't think so.
>>
>> Thanks,
>> Jianfeng
>>
>>
>>> 	--yliu
  
Yuanhan Liu Sept. 9, 2016, 6:31 a.m. UTC | #13
On Fri, Sep 09, 2016 at 02:24:20PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 9/9/2016 2:03 PM, Yuanhan Liu wrote:
> >>GET_PROTOCOL_FEATURES
> >>is not supported yet. I see those features in PROTOCOL_FEATURES is for live
> >>migration (right?).
> >Not exactly. PROTOCOL_FEATURES was firstly introduced while MQ was
> >enabled. Thus it's no wonder MQ is the first protocol feature vhost
> >user supports:
> >
> >     [yliu@yliu-dev ~/dpdk]$ gg PROTOCOL_F_ lib/librte_vhost/
> >     lib/librte_vhost/vhost_user.h:46:#define VHOST_USER_PROTOCOL_F_MQ 0
> >     lib/librte_vhost/vhost_user.h:47:#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
> >     lib/librte_vhost/vhost_user.h:48:#define VHOST_USER_PROTOCOL_F_RARP 2
> >
> >	--yliu
> 
> OK, I got it. The maximum of queue pair number is now a parameter of
> virtio_user, but we need to depend on PROTOCOL_FEATURES (further,
> VHOST_USER_GET_QUEUE_NUM) to maximum queue pair number that vhost can
> support.
> 
> Just wonder why not QEMU depends on (1ULL << VIRTIO_NET_F_MQ) inside
> features to do that?

VIRTIO_NET_F_MQ belongs to virtio spec, while VHOST_USER_PROTOCOL_F_MQ
belongs to vhost-user spec.

	--yliu
  

Patch

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 2c4e999..afdf721 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -45,20 +45,14 @@ 
 #include "../virtio_ethdev.h"
 
 static int
-virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
+virtio_user_create_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 {
-	int callfd, kickfd;
+	/* Of all per virtqueue MSGs, make sure VHOST_SET_VRING_CALL come
+	 * firstly because vhost depends on this msg to allocate virtqueue
+	 * pair.
+	 */
+	int callfd;
 	struct vhost_vring_file file;
-	struct vhost_vring_state state;
-	struct vring *vring = &dev->vrings[queue_sel];
-	struct vhost_vring_addr addr = {
-		.index = queue_sel,
-		.desc_user_addr = (uint64_t)(uintptr_t)vring->desc,
-		.avail_user_addr = (uint64_t)(uintptr_t)vring->avail,
-		.used_user_addr = (uint64_t)(uintptr_t)vring->used,
-		.log_guest_addr = 0,
-		.flags = 0, /* disable log */
-	};
 
 	/* May use invalid flag, but some backend leverages kickfd and callfd as
 	 * criteria to judge if dev is alive. so finally we use real event_fd.
@@ -68,22 +62,30 @@  virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 		PMD_DRV_LOG(ERR, "callfd error, %s\n", strerror(errno));
 		return -1;
 	}
-	kickfd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK);
-	if (kickfd < 0) {
-		close(callfd);
-		PMD_DRV_LOG(ERR, "kickfd error, %s\n", strerror(errno));
-		return -1;
-	}
-
-	/* Of all per virtqueue MSGs, make sure VHOST_SET_VRING_CALL come
-	 * firstly because vhost depends on this msg to allocate virtqueue
-	 * pair.
-	 */
 	file.index = queue_sel;
 	file.fd = callfd;
 	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_CALL, &file);
 	dev->callfds[queue_sel] = callfd;
 
+	return 0;
+}
+
+static int
+virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
+{
+	int kickfd;
+	struct vhost_vring_file file;
+	struct vhost_vring_state state;
+	struct vring *vring = &dev->vrings[queue_sel];
+	struct vhost_vring_addr addr = {
+		.index = queue_sel,
+		.desc_user_addr = (uint64_t)(uintptr_t)vring->desc,
+		.avail_user_addr = (uint64_t)(uintptr_t)vring->avail,
+		.used_user_addr = (uint64_t)(uintptr_t)vring->used,
+		.log_guest_addr = 0,
+		.flags = 0, /* disable log */
+	};
+
 	state.index = queue_sel;
 	state.num = vring->num;
 	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_NUM, &state);
@@ -97,6 +99,12 @@  virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	 * lastly because vhost depends on this msg to judge if
 	 * virtio is ready.
 	 */
+	kickfd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK);
+	if (kickfd < 0) {
+		PMD_DRV_LOG(ERR, "kickfd error, %s\n", strerror(errno));
+		return -1;
+	}
+	file.index = queue_sel;
 	file.fd = kickfd;
 	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_KICK, &file);
 	dev->kickfds[queue_sel] = kickfd;
@@ -104,44 +112,43 @@  virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	return 0;
 }
 
-int
-virtio_user_start_device(struct virtio_user_dev *dev)
+static int
+virtio_user_queue_setup(struct virtio_user_dev *dev,
+			int (*fn)(struct virtio_user_dev *, uint32_t))
 {
-	uint64_t features;
 	uint32_t i, queue_sel;
-	int ret;
-
-	/* construct memory region inside each implementation */
-	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_MEM_TABLE, NULL);
-	if (ret < 0)
-		goto error;
 
 	for (i = 0; i < dev->max_queue_pairs; ++i) {
 		queue_sel = 2 * i + VTNET_SQ_RQ_QUEUE_IDX;
-		if (virtio_user_kick_queue(dev, queue_sel) < 0) {
-			PMD_DRV_LOG(INFO, "kick rx vq fails: %u", i);
-			goto error;
+		if (fn(dev, queue_sel) < 0) {
+			PMD_DRV_LOG(INFO, "setup rx vq fails: %u", i);
+			return -1;
 		}
 	}
 	for (i = 0; i < dev->max_queue_pairs; ++i) {
 		queue_sel = 2 * i + VTNET_SQ_TQ_QUEUE_IDX;
-		if (virtio_user_kick_queue(dev, queue_sel) < 0) {
-			PMD_DRV_LOG(INFO, "kick tx vq fails: %u", i);
-			goto error;
+		if (fn(dev, queue_sel) < 0) {
+			PMD_DRV_LOG(INFO, "setup tx vq fails: %u", i);
+			return -1;
 		}
 	}
 
-	/* As this feature is negotiated from the vhost, all queues are
-	 * initialized in the disabled state. For non-mq case, we enable
-	 * the 1st queue pair by default.
-	 */
-	if (dev->features & (1ull << VHOST_USER_GET_PROTOCOL_FEATURES))
-		vhost_user_enable_queue_pair(dev->vhostfd, 0, 1);
+	return 0;
+}
 
-	/* After setup all virtqueues, we need to set_features so that these
-	 * features can be set into each virtqueue in vhost side. And before
-	 * that, make sure VHOST_USER_F_PROTOCOL_FEATURES is added if mq is
-	 * enabled, and VIRTIO_NET_F_MAC is stripped.
+int
+virtio_user_start_device(struct virtio_user_dev *dev)
+{
+	uint64_t features;
+	int ret;
+
+	/* Step 0: tell vhost to create queues */
+	if (virtio_user_queue_setup(dev, virtio_user_create_queue) < 0)
+		goto error;
+
+	/* Step 1: set features
+	 * Make sure VHOST_USER_F_PROTOCOL_FEATURES is added if mq is enabled,
+	 * and VIRTIO_NET_F_MAC is stripped.
 	 */
 	features = dev->features;
 	if (dev->max_queue_pairs > 1)
@@ -152,6 +159,23 @@  virtio_user_start_device(struct virtio_user_dev *dev)
 		goto error;
 	PMD_DRV_LOG(INFO, "set features: %" PRIx64, features);
 
+	/* Step 2: share memory regions */
+	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_MEM_TABLE, NULL);
+	if (ret < 0)
+		goto error;
+
+	/* Step 3: kick queues */
+	if (virtio_user_queue_setup(dev, virtio_user_kick_queue) < 0)
+		goto error;
+
+	/* Step 4: enable queues
+	 * As this feature is negotiated from the vhost, all queues are
+	 * initialized in the disabled state. For non-mq case, we enable
+	 * the 1st queue pair by default.
+	 */
+	if (dev->features & (1ull << VHOST_USER_GET_PROTOCOL_FEATURES))
+		vhost_user_enable_queue_pair(dev->vhostfd, 0, 1);
+
 	return 0;
 error:
 	/* TODO: free resource here or caller to check */