[dpdk-dev,v4,6/8] virtio-user: add new virtual pci driver for virtio

Message ID 1461892716-19122-7-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Changes Requested, archived
Delegated to: Yuanhan Liu
Headers

Commit Message

Jianfeng Tan April 29, 2016, 1:18 a.m. UTC
  This patch implements another new instance of struct virtio_pci_ops to
drive the virtio-user virtual device. Instead of rd/wr ioport or PCI
configuration space, this virtual pci driver will rd/wr the virtual
device struct virtio_user_hw, and when necessary, invokes APIs provided
by device emulation later to start/stop the device.

  ----------------------
  | ------------------ |
  | | virtio driver  | |----> (virtio_user_pci.c)
  | ------------------ |
  |         |          |
  | ------------------ | ------>  virtio-user PMD
  | | device emulate | |
  | |                | |
  | | vhost adapter  | |
  | ------------------ |
  ----------------------
            |
            |
            |
   ------------------
   | vhost backend  |
   ------------------

Signed-off-by: Huawei Xie <huawei.xie@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
---
 drivers/net/virtio/Makefile                      |   1 +
 drivers/net/virtio/virtio_user/virtio_user_dev.h |   2 +
 drivers/net/virtio/virtio_user/virtio_user_pci.c | 209 +++++++++++++++++++++++
 3 files changed, 212 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_user/virtio_user_pci.c
  

Comments

Yuanhan Liu May 12, 2016, 2:12 a.m. UTC | #1
On Fri, Apr 29, 2016 at 01:18:34AM +0000, Jianfeng Tan wrote:
> +static void
> +vdev_read_dev_config(struct virtio_hw *hw, uint64_t offset,
> +		     void *dst, int length)
> +{
> +	int i;
> +	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;

Unnecessary cast.

> +static int
> +vdev_setup_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
> +{
> +	/* Changed to use virtual addr */
> +	vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
> +	if (vq->virtio_net_hdr_mz) {
> +		vq->virtio_net_hdr_mem =
> +			(phys_addr_t)vq->virtio_net_hdr_mz->addr;
> +		/* Do it one more time after we reset virtio_net_hdr_mem */
> +		vring_hdr_desc_init(vq);
> +	}
> +	vq->offset = offsetof(struct rte_mbuf, buf_addr);
> +	return 0;

Here as last email said, you should not mix vq stuff. What's more,
why do you invoke vring_hdr_desc_init() here? If it needs a special
handling, do it in driver.

The "setup_queue" method is actually for telling the device where desc,
avail and used vring are located. Hence, the implementation could be simple:
just log them.

> +
> +const struct virtio_pci_ops vdev_ops = {

Note that this is the interface for the driver to talk to the device,
we should put this file into upper layer then, in the driver.

And let me make a summary, trying to make it clear:

- We should not use any structures/functions from the virtio driver
  here, unless it's really a must.

- It's allowed for driver to make *few* special handling for the virtio
  user device. And that's what the driver supposed to do: to handle
  different device variants.

  So, I think it's okay to export the virtio_user_device struct to 
  driver and do all those kind of "fake pci" configration there.

	--yliu

> +	.read_dev_cfg	= vdev_read_dev_config,
> +	.write_dev_cfg	= vdev_write_dev_config,
> +	.reset		= vdev_reset,
> +	.get_status	= vdev_get_status,
> +	.set_status	= vdev_set_status,
> +	.get_features	= vdev_get_features,
> +	.set_features	= vdev_set_features,
> +	.get_isr	= vdev_get_isr,
> +	.set_config_irq	= vdev_set_config_irq,
> +	.get_queue_num	= vdev_get_queue_num,
> +	.setup_queue	= vdev_setup_queue,
> +	.del_queue	= vdev_del_queue,
> +	.notify_queue	= vdev_notify_queue,
> +};
> -- 
> 2.1.4
  
Jianfeng Tan May 12, 2016, 7:08 a.m. UTC | #2
Hi yuanhan,


On 5/12/2016 10:12 AM, Yuanhan Liu wrote:
> On Fri, Apr 29, 2016 at 01:18:34AM +0000, Jianfeng Tan wrote:
>> +static void
>> +vdev_read_dev_config(struct virtio_hw *hw, uint64_t offset,
>> +		     void *dst, int length)
>> +{
>> +	int i;
>> +	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
> Unnecessary cast.

Yes.

>
>> +static int
>> +vdev_setup_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
>> +{
>> +	/* Changed to use virtual addr */
>> +	vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
>> +	if (vq->virtio_net_hdr_mz) {
>> +		vq->virtio_net_hdr_mem =
>> +			(phys_addr_t)vq->virtio_net_hdr_mz->addr;
>> +		/* Do it one more time after we reset virtio_net_hdr_mem */
>> +		vring_hdr_desc_init(vq);
>> +	}
>> +	vq->offset = offsetof(struct rte_mbuf, buf_addr);
>> +	return 0;
> Here as last email said, you should not mix vq stuff. What's more,
> why do you invoke vring_hdr_desc_init() here?

vring_hdr_desc_init() is to init header desc according to 
vq->virtio_net_hdr_mem, and here we change to use virtual address, so we 
need to invoke this after vq->virtio_net_hdr_mem is decided.

But for this case, you remind me that we can achieve that by: inside 
virtio_dev_queue_setup(), move vring_hdr_desc_init() after setup_queue().

> If it needs a special
> handling, do it in driver.

As discussed in previous mail with David, we should hide special 
handling inside pci ops, such as real virtio device needs to check 
address (patch 1). See below comments for more detail.

>
> The "setup_queue" method is actually for telling the device where desc,
> avail and used vring are located. Hence, the implementation could be simple:
> just log them.

>
>> +
>> +const struct virtio_pci_ops vdev_ops = {
> Note that this is the interface for the driver to talk to the device,
> we should put this file into upper layer then, in the driver.
>
> And let me make a summary, trying to make it clear:
>
> - We should not use any structures/functions from the virtio driver
>    here, unless it's really a must.

Firstly I agree this point (although I see a difference in how we take 
"a must"). My original principle is to maximize the use of existing 
structures instead of maintain any new ones. And I already give up that 
principle when I accept your previous suggestion to use struct 
virtio_user_device to store virtio-user specific fields. So I agree to 
add the new struct virtqueue to avoid use of driver-layer virtqueues.

>
> - It's allowed for driver to make *few* special handling for the virtio
>    user device. And that's what the driver supposed to do: to handle
>    different device variants.

So here are two contradictory ways. Compared to the way you suggest,  
another way is to keep a unified driver and maintain all special 
handling inside struct virtio_pci_ops.

I prefer the latter because:
(1) Special handling for each kind of device will be gather together 
instead of scattered everywhere of driver code.
(2) It's more aligned to previous logic to hide the detail to 
differentiate modern/legacy device.


Thanks,
Jianfeng


>
>    So, I think it's okay to export the virtio_user_device struct to
>    driver and do all those kind of "fake pci" configration there.
>
> 	--yliu
>
>> +	.read_dev_cfg	= vdev_read_dev_config,
>> +	.write_dev_cfg	= vdev_write_dev_config,
>> +	.reset		= vdev_reset,
>> +	.get_status	= vdev_get_status,
>> +	.set_status	= vdev_set_status,
>> +	.get_features	= vdev_get_features,
>> +	.set_features	= vdev_set_features,
>> +	.get_isr	= vdev_get_isr,
>> +	.set_config_irq	= vdev_set_config_irq,
>> +	.get_queue_num	= vdev_get_queue_num,
>> +	.setup_queue	= vdev_setup_queue,
>> +	.del_queue	= vdev_del_queue,
>> +	.notify_queue	= vdev_notify_queue,
>> +};
>> -- 
>> 2.1.4
  
Yuanhan Liu May 12, 2016, 4:40 p.m. UTC | #3
On Thu, May 12, 2016 at 03:08:05PM +0800, Tan, Jianfeng wrote:
> >>+static int
> >>+vdev_setup_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
> >>+{
> >>+	/* Changed to use virtual addr */
> >>+	vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
> >>+	if (vq->virtio_net_hdr_mz) {
> >>+		vq->virtio_net_hdr_mem =
> >>+			(phys_addr_t)vq->virtio_net_hdr_mz->addr;
> >>+		/* Do it one more time after we reset virtio_net_hdr_mem */
> >>+		vring_hdr_desc_init(vq);
> >>+	}
> >>+	vq->offset = offsetof(struct rte_mbuf, buf_addr);
> >>+	return 0;
> >Here as last email said, you should not mix vq stuff. What's more,
> >why do you invoke vring_hdr_desc_init() here?
> 
> vring_hdr_desc_init() is to init header desc according to
> vq->virtio_net_hdr_mem, and here we change to use virtual address, so we
> need to invoke this after vq->virtio_net_hdr_mem is decided.
> 
> But for this case, you remind me that we can achieve that by: inside
> virtio_dev_queue_setup(), move vring_hdr_desc_init() after setup_queue().
> 
> >If it needs a special
> >handling, do it in driver.
> 
> As discussed in previous mail with David, we should hide special handling
> inside pci ops,

Generally speaking, yes.

> such as real virtio device needs to check address (patch 1).

And that's a good one: I've already acked. But here, I doubt it introduces 
any benefits to do that. Firstly, it's a Tx queue specific setting, moving
it to common code path means you have to add a check like the way you did
in this patch. BTW, it's an implicit check, which hurts readability a bit.

Secondly, you have to do this kind of check/settings in 3 different places: 

- legacy queue_setup() method
- modern queue_setup() method
- your vdev queue_setup() method

And another remind is that Huawei planned to split Rx/Tx queue settings,
here you mixed them again, and I don't think Huawei would like it. Don't
even to say that after the split, the Tx specific stuff will be no longer
in the common vq structure.

So, I'd suggest something like following:

	if (is_vdev(..)) {
		/* comment here that we use VA for vdev */
		vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
		vq->virtio_net_hdr_mem = ...;
		vq->offset = ...;
	} else {
		vq->vq_ring_mem = ...;
		...
	}
	vring_hdr_desc_init(vq);

> See below comments for more detail.
> 
> >
> >The "setup_queue" method is actually for telling the device where desc,
> >avail and used vring are located. Hence, the implementation could be simple:
> >just log them.
> 
> >
> >>+
> >>+const struct virtio_pci_ops vdev_ops = {
> >Note that this is the interface for the driver to talk to the device,
> >we should put this file into upper layer then, in the driver.
> >
> >And let me make a summary, trying to make it clear:
> >
> >- We should not use any structures/functions from the virtio driver
> >   here, unless it's really a must.
> 
> Firstly I agree this point (although I see a difference in how we take "a
> must"). My original principle is to maximize the use of existing structures
> instead of maintain any new ones.

If that could save you a lot of efforts and make the design clean, I
might would say, yes, go for it. But it's obviously NO in this case.

> And I already give up that principle when
> I accept your previous suggestion to use struct virtio_user_device to store
> virtio-user specific fields. So I agree to add the new struct virtqueue to
> avoid use of driver-layer virtqueues.
> 
> >
> >- It's allowed for driver to make *few* special handling for the virtio
> >   user device. And that's what the driver supposed to do: to handle
> >   different device variants.
> 
> So here are two contradictory ways. Compared to the way you suggest,
> another way is to keep a unified driver and maintain all special handling
> inside struct virtio_pci_ops.
> 
> I prefer the latter because:
> (1) Special handling for each kind of device will be gather together instead
> of scattered everywhere of driver code.
> (2) It's more aligned to previous logic to hide the detail to differentiate
> modern/legacy device.

May I ask how many more such handling are needed, excluding the tx queue
header desc setup? And as stated, in generic, yes, we should try that.

	--yliu
  
Michael S. Tsirkin May 12, 2016, 5:02 p.m. UTC | #4
On Thu, May 12, 2016 at 03:08:05PM +0800, Tan, Jianfeng wrote:
> (2) It's more aligned to previous logic to hide the detail to differentiate
> modern/legacy device.

Why is there a need to support legacy interfaces at all?  It's a container
so if it's in use one can be reasonably sure you have a new kernel.
  
Jianfeng Tan May 13, 2016, 1:54 a.m. UTC | #5
On 5/13/2016 12:40 AM, Yuanhan Liu wrote:
> On Thu, May 12, 2016 at 03:08:05PM +0800, Tan, Jianfeng wrote:
>>>> +static int
>>>> +vdev_setup_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
>>>> +{
>>>> +	/* Changed to use virtual addr */
>>>> +	vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
>>>> +	if (vq->virtio_net_hdr_mz) {
>>>> +		vq->virtio_net_hdr_mem =
>>>> +			(phys_addr_t)vq->virtio_net_hdr_mz->addr;
>>>> +		/* Do it one more time after we reset virtio_net_hdr_mem */
>>>> +		vring_hdr_desc_init(vq);
>>>> +	}
>>>> +	vq->offset = offsetof(struct rte_mbuf, buf_addr);
>>>> +	return 0;
>>> Here as last email said, you should not mix vq stuff. What's more,
>>> why do you invoke vring_hdr_desc_init() here?
>> vring_hdr_desc_init() is to init header desc according to
>> vq->virtio_net_hdr_mem, and here we change to use virtual address, so we
>> need to invoke this after vq->virtio_net_hdr_mem is decided.
>>
>> But for this case, you remind me that we can achieve that by: inside
>> virtio_dev_queue_setup(), move vring_hdr_desc_init() after setup_queue().
>>
>>> If it needs a special
>>> handling, do it in driver.
>> As discussed in previous mail with David, we should hide special handling
>> inside pci ops,
> Generally speaking, yes.
>
>> such as real virtio device needs to check address (patch 1).
> And that's a good one: I've already acked. But here, I doubt it introduces
> any benefits to do that. Firstly, it's a Tx queue specific setting, moving
> it to common code path means you have to add a check like the way you did
> in this patch.

I agree about this point. But is it really big deal? Please go on with 
remaining comments before make decision.

> BTW, it's an implicit check, which hurts readability a bit.

At the point of readability, I think driver does not need to care about 
devices use physical addresses or virtual addresses, and this is why we 
do that inside pci ops to hide those details.

Another side, pci ops is introduced to cover the details of talking to a 
device as I understand, do you think it offends such semantics? (this 
one could be your argument :-))

>
> Secondly, you have to do this kind of check/settings in 3 different places:
>
> - legacy queue_setup() method
> - modern queue_setup() method
> - your vdev queue_setup() method

Each kind of device does such check/settings on its own requirement. So 
only indispensable check/settings are added into these three different 
devices.

>
> And another remind is that Huawei planned to split Rx/Tx queue settings,
> here you mixed them again, and I don't think Huawei would like it. Don't
> even to say that after the split, the Tx specific stuff will be no longer
> in the common vq structure.

Yes, this is a blocker issue of my current implementation. Let's see you 
suggestion firstly.

>
> So, I'd suggest something like following:
>
> 	if (is_vdev(..)) {

The blocker issue of your suggestion is that we have no such condition.

Previously, I use dev_type, but as David's comment said:

    "The reason of those comments is that dev_type in ethdev is going to
    disappear, see [1] and [2].
    Drivers are called through their own specific ethdev/crypto ops and
    so, those drivers know implicitely that their are either pci or vdev
    (or whatever in the future) drivers."

Another consideration is comparing vtpci_ops pointer, I don't think it's 
elegant.

> 		/* comment here that we use VA for vdev */
> 		vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
> 		vq->virtio_net_hdr_mem = ...;
> 		vq->offset = ...;
> 	} else {
> 		vq->vq_ring_mem = ...;
> 		...
> 	}
> 	vring_hdr_desc_init(vq);
>
>> See below comments for more detail.
>>
>>> The "setup_queue" method is actually for telling the device where desc,
>>> avail and used vring are located. Hence, the implementation could be simple:
>>> just log them.
>>>> +
>>>> +const struct virtio_pci_ops vdev_ops = {
>>> Note that this is the interface for the driver to talk to the device,
>>> we should put this file into upper layer then, in the driver.
>>>
>>> And let me make a summary, trying to make it clear:
>>>
>>> - We should not use any structures/functions from the virtio driver
>>>    here, unless it's really a must.
>> Firstly I agree this point (although I see a difference in how we take "a
>> must"). My original principle is to maximize the use of existing structures
>> instead of maintain any new ones.
> If that could save you a lot of efforts and make the design clean, I
> might would say, yes, go for it. But it's obviously NO in this case.
>
>> And I already give up that principle when
>> I accept your previous suggestion to use struct virtio_user_device to store
>> virtio-user specific fields. So I agree to add the new struct virtqueue to
>> avoid use of driver-layer virtqueues.
>>
>>> - It's allowed for driver to make *few* special handling for the virtio
>>>    user device. And that's what the driver supposed to do: to handle
>>>    different device variants.
>> So here are two contradictory ways. Compared to the way you suggest,
>> another way is to keep a unified driver and maintain all special handling
>> inside struct virtio_pci_ops.
>>
>> I prefer the latter because:
>> (1) Special handling for each kind of device will be gather together instead
>> of scattered everywhere of driver code.
>> (2) It's more aligned to previous logic to hide the detail to differentiate
>> modern/legacy device.
> May I ask how many more such handling are needed, excluding the tx queue
> header desc setup? And as stated, in generic, yes, we should try that.

Those which need special handling:
(1) vq->vq_ring_mem: it is set but never used, so it's out of question.
(2) vq->virtio_net_hdr_mem and vring_hdr_desc_init
(3) vq->offset

Just (2) and (3) so far. And the question is quite clear: where to put 
these two special handling.

Thanks,
Jianfeng

>
> 	--yliu
  
Jianfeng Tan May 13, 2016, 2 a.m. UTC | #6
Hi Michael,

On 5/13/2016 1:02 AM, Michael S. Tsirkin wrote:
> On Thu, May 12, 2016 at 03:08:05PM +0800, Tan, Jianfeng wrote:
>> (2) It's more aligned to previous logic to hide the detail to differentiate
>> modern/legacy device.
> Why is there a need to support legacy interfaces at all?  It's a container
> so if it's in use one can be reasonably sure you have a new kernel.
>

No, there's no need. The added device, virtio-user, is parallel to 
legacy and modern device. But there's a feature bit inside vhost user 
protocol for vhost user to decide the length of header, current 
implementation by default set this feature bit.

Thanks,
Jianfeng
  
Yuanhan Liu May 13, 2016, 4:45 a.m. UTC | #7
On Fri, May 13, 2016 at 09:54:33AM +0800, Tan, Jianfeng wrote:
> 
>     So, I'd suggest something like following:
> 
>             if (is_vdev(..)) {
> 
> 
> The blocker issue of your suggestion is that we have no such condition.
> 
> Previously, I use dev_type, but as David's comment said:

That's not the only option. There should be others, for example,
checking the existence of virtio_user_device. Or even, you could
add a new flag inside virtio hw while initiating your vdev.

>     May I ask how many more such handling are needed, excluding the tx queue
>     header desc setup? And as stated, in generic, yes, we should try that.
> 
> 
> Those which need special handling:
> (1) vq->vq_ring_mem: it is set but never used, so it's out of question.
> (2) vq->virtio_net_hdr_mem and vring_hdr_desc_init

vring_hdr_desc_init is common.

> (3) vq->offset
> 
> Just (2) and (3) so far. And the question is quite clear: where to put these
> two special handling.

Apparently, you can't put it into the queue_setup(). And I still think
my proposal works great here.

	--yliu
  
Jianfeng Tan May 16, 2016, 1:48 a.m. UTC | #8
Hi Yuanhan,

> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Friday, May 13, 2016 12:45 PM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org; Xie, Huawei; rich.lane@bigswitch.com; mst@redhat.com;
> nakajima.yoshihiro@lab.ntt.co.jp; p.fedin@samsung.com;
> ann.zhuangyanying@huawei.com; mukawa@igel.co.jp;
> nhorman@tuxdriver.com
> Subject: Re: [PATCH v4 6/8] virtio-user: add new virtual pci driver for virtio
> 
> On Fri, May 13, 2016 at 09:54:33AM +0800, Tan, Jianfeng wrote:
> >
> >     So, I'd suggest something like following:
> >
> >             if (is_vdev(..)) {
> >
> >
> > The blocker issue of your suggestion is that we have no such condition.
> >
> > Previously, I use dev_type, but as David's comment said:
> 
> That's not the only option. There should be others, for example,
> checking the existence of virtio_user_device. Or even, you could
> add a new flag inside virtio hw while initiating your vdev.
> 
> >     May I ask how many more such handling are needed, excluding the tx
> queue
> >     header desc setup? And as stated, in generic, yes, we should try that.
> >
> >
> > Those which need special handling:
> > (1) vq->vq_ring_mem: it is set but never used, so it's out of question.
> > (2) vq->virtio_net_hdr_mem and vring_hdr_desc_init
> 
> vring_hdr_desc_init is common.
> 
> > (3) vq->offset
> >
> > Just (2) and (3) so far. And the question is quite clear: where to put these
> > two special handling.
> 
> Apparently, you can't put it into the queue_setup(). And I still think
> my proposal works great here.

OK, since it's indeed inappropriate to put these special handlings inside queue_setup() from semantic perspective, I'll add them according to if hw->vdev_private == NULL in the driver. Thanks for suggestion.

Thanks,
Jianfeng

> 
> 	--yliu
  
Yuanhan Liu May 16, 2016, 2:51 a.m. UTC | #9
On Mon, May 16, 2016 at 01:48:01AM +0000, Tan, Jianfeng wrote:
> > On Fri, May 13, 2016 at 09:54:33AM +0800, Tan, Jianfeng wrote:
> > >
> > >     So, I'd suggest something like following:
> > >
> > >             if (is_vdev(..)) {
> > >
> > >
> > > The blocker issue of your suggestion is that we have no such condition.
> > >
> > > Previously, I use dev_type, but as David's comment said:
> > 
> > That's not the only option. There should be others, for example,
> > checking the existence of virtio_user_device. Or even, you could
> > add a new flag inside virtio hw while initiating your vdev.
> > 
> > >     May I ask how many more such handling are needed, excluding the tx
> > queue
> > >     header desc setup? And as stated, in generic, yes, we should try that.
> > >
> > >
> > > Those which need special handling:
> > > (1) vq->vq_ring_mem: it is set but never used, so it's out of question.
> > > (2) vq->virtio_net_hdr_mem and vring_hdr_desc_init
> > 
> > vring_hdr_desc_init is common.
> > 
> > > (3) vq->offset
> > >
> > > Just (2) and (3) so far. And the question is quite clear: where to put these
> > > two special handling.
> > 
> > Apparently, you can't put it into the queue_setup(). And I still think
> > my proposal works great here.
> 
> OK, since it's indeed inappropriate to put these special handlings inside queue_setup() from semantic perspective, I'll add them according to if hw->vdev_private == NULL in the driver.

I'm thinking maybe we could rename "vdev_private" to "virtio_user_dev"
(or something like that), to make it explicit that it's for virtio user
device. I mean, there should be no other vdevs after all. OTOH, using
"hw->vdev_private != NULL" to say it's a virtio-user device is a bit
weird; it doesn't even make too much sense.

	--yliu
  

Patch

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 68068bd..13b2b75 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@  SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_pci.c
 endif
 
 # this lib depends upon:
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 76250f0..bc4dc1a 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -56,4 +56,6 @@  struct virtio_user_hw {
 int virtio_user_start_device(struct virtio_user_hw *hw);
 int virtio_user_stop_device(struct virtio_user_hw *hw);
 
+const struct virtio_pci_ops vdev_ops;
+
 #endif
diff --git a/drivers/net/virtio/virtio_user/virtio_user_pci.c b/drivers/net/virtio/virtio_user/virtio_user_pci.c
new file mode 100644
index 0000000..60351d9
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/virtio_user_pci.c
@@ -0,0 +1,209 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "../virtio_logs.h"
+#include "../virtio_pci.h"
+#include "../virtqueue.h"
+#include "virtio_user_dev.h"
+
+static void
+vdev_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+		     void *dst, int length)
+{
+	int i;
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	if (offset == offsetof(struct virtio_net_config, mac) &&
+	    length == ETHER_ADDR_LEN) {
+		for (i = 0; i < ETHER_ADDR_LEN; ++i)
+			((uint8_t *)dst)[i] = uhw->mac_addr[i];
+		return;
+	}
+
+	if (offset == offsetof(struct virtio_net_config, status))
+		*(uint16_t *)dst = uhw->status;
+
+	if (offset == offsetof(struct virtio_net_config, max_virtqueue_pairs))
+		*(uint16_t *)dst = uhw->max_queue_pairs;
+}
+
+static void
+vdev_write_dev_config(struct virtio_hw *hw, uint64_t offset,
+		      const void *src, int length)
+{
+	int i;
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	if ((offset == offsetof(struct virtio_net_config, mac)) &&
+	    (length == ETHER_ADDR_LEN))
+		for (i = 0; i < ETHER_ADDR_LEN; ++i)
+			uhw->mac_addr[i] = ((const uint8_t *)src)[i];
+	else
+		PMD_DRV_LOG(ERR, "not supported offset=%" PRIu64 ", length=%d",
+			    offset, length);
+}
+
+static void
+vdev_set_status(struct virtio_hw *hw, uint8_t status)
+{
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	if (status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
+		virtio_user_start_device(uhw);
+	uhw->status = status;
+}
+
+static void
+vdev_reset(struct virtio_hw *hw)
+{
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	virtio_user_stop_device(uhw);
+}
+
+static uint8_t
+vdev_get_status(struct virtio_hw *hw)
+{
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	return uhw->status;
+}
+
+static uint64_t
+vdev_get_features(struct virtio_hw *hw)
+{
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	return uhw->features;
+}
+
+static void
+vdev_set_features(struct virtio_hw *hw, uint64_t features)
+{
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	uhw->features = features;
+}
+
+static uint8_t
+vdev_get_isr(struct virtio_hw *hw __rte_unused)
+{
+	/* When config interrupt happens, driver calls this function to query
+	 * what kinds of change happen. Interrupt mode not supported for now.
+	 */
+	return 0;
+}
+
+static uint16_t
+vdev_set_config_irq(struct virtio_hw *hw __rte_unused,
+		    uint16_t vec __rte_unused)
+{
+	return VIRTIO_MSI_NO_VECTOR;
+}
+
+/* This function is to get the queue size, aka, number of descs, of a specified
+ * queue. Different with the VHOST_USER_GET_QUEUE_NUM, which is used to get the
+ * max supported queues.
+ */
+static uint16_t
+vdev_get_queue_num(struct virtio_hw *hw,
+		   uint16_t queue_id __rte_unused)
+{
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	/* Currently, each queue has same queue size */
+	return uhw->queue_size;
+}
+
+static int
+vdev_setup_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
+{
+	/* Changed to use virtual addr */
+	vq->vq_ring_mem = (phys_addr_t)vq->mz->addr;
+	if (vq->virtio_net_hdr_mz) {
+		vq->virtio_net_hdr_mem =
+			(phys_addr_t)vq->virtio_net_hdr_mz->addr;
+		/* Do it one more time after we reset virtio_net_hdr_mem */
+		vring_hdr_desc_init(vq);
+	}
+	vq->offset = offsetof(struct rte_mbuf, buf_addr);
+	return 0;
+}
+
+static void
+vdev_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
+{
+	/* For legacy devices, write 0 to VIRTIO_PCI_QUEUE_PFN port, QEMU
+	 * correspondingly stops the ioeventfds, and reset the status of
+	 * the device.
+	 * For modern devices, set queue desc, avail, used in PCI bar to 0,
+	 * not see any more behavior in QEMU.
+	 *
+	 * Here we just care about what information to deliver to vhost-user
+	 * or vhost-kernel. And we just close ioeventfd for now.
+	 */
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	close(uhw->callfds[vq->vq_queue_index]);
+	close(uhw->kickfds[vq->vq_queue_index]);
+}
+
+static void
+vdev_notify_queue(struct virtio_hw *hw, struct virtqueue *vq)
+{
+	uint64_t buf = 1;
+	struct virtio_user_hw *uhw = (struct virtio_user_hw *)hw->vdev_private;
+
+	if (write(uhw->kickfds[vq->vq_queue_index], &buf, sizeof(buf)) < 0)
+		PMD_DRV_LOG(ERR, "failed to kick backend: %d", strerror(errno));
+}
+
+const struct virtio_pci_ops vdev_ops = {
+	.read_dev_cfg	= vdev_read_dev_config,
+	.write_dev_cfg	= vdev_write_dev_config,
+	.reset		= vdev_reset,
+	.get_status	= vdev_get_status,
+	.set_status	= vdev_set_status,
+	.get_features	= vdev_get_features,
+	.set_features	= vdev_set_features,
+	.get_isr	= vdev_get_isr,
+	.set_config_irq	= vdev_set_config_irq,
+	.get_queue_num	= vdev_get_queue_num,
+	.setup_queue	= vdev_setup_queue,
+	.del_queue	= vdev_del_queue,
+	.notify_queue	= vdev_notify_queue,
+};