[dpdk-dev,v4] drivers/net:new PMD using tun/tap host interface

Message ID 1475592311-25749-1-git-send-email-keith.wiles@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Bruce Richardson
Headers

Commit Message

Wiles, Keith Oct. 4, 2016, 2:45 p.m. UTC
  The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
on the local host. The PMD allows for DPDK and the host to
communicate using a raw device interface on the host and in
the DPDK application. The device created is a Tap device with
a L2 packet header.

v4 - merge with latest driver changes
v3 - fix includes by removing ifdef for other type besides Linux.
     Fix the copyright notice in the Makefile
v2 - merge all of the patches into one patch.
     Fix a typo on naming the tap device.
     Update the maintainers list

Signed-off-by: Keith Wiles <keith.wiles@intel.com>
---
 MAINTAINERS                             |   5 +
 config/common_linuxapp                  |   2 +
 doc/guides/nics/tap.rst                 |  84 ++++
 drivers/net/Makefile                    |   1 +
 drivers/net/tap/Makefile                |  57 +++
 drivers/net/tap/rte_eth_tap.c           | 866 ++++++++++++++++++++++++++++++++
 drivers/net/tap/rte_pmd_tap_version.map |   4 +
 mk/rte.app.mk                           |   1 +
 8 files changed, 1020 insertions(+)
 create mode 100644 doc/guides/nics/tap.rst
 create mode 100644 drivers/net/tap/Makefile
 create mode 100644 drivers/net/tap/rte_eth_tap.c
 create mode 100644 drivers/net/tap/rte_pmd_tap_version.map
  

Comments

Ferruh Yigit Oct. 11, 2016, 9:40 a.m. UTC | #1
Hi Keith,

On 10/4/2016 3:45 PM, Keith Wiles wrote:
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
> on the local host. The PMD allows for DPDK and the host to
> communicate using a raw device interface on the host and in
> the DPDK application. The device created is a Tap device with
> a L2 packet header.
> 
> v4 - merge with latest driver changes
> v3 - fix includes by removing ifdef for other type besides Linux.
>      Fix the copyright notice in the Makefile
> v2 - merge all of the patches into one patch.
>      Fix a typo on naming the tap device.
>      Update the maintainers list
> 
> Signed-off-by: Keith Wiles <keith.wiles@intel.com>
> ---
>  MAINTAINERS                             |   5 +
>  config/common_linuxapp                  |   2 +
>  doc/guides/nics/tap.rst                 |  84 ++++
>  drivers/net/Makefile                    |   1 +
>  drivers/net/tap/Makefile                |  57 +++
>  drivers/net/tap/rte_eth_tap.c           | 866 ++++++++++++++++++++++++++++++++
>  drivers/net/tap/rte_pmd_tap_version.map |   4 +
>  mk/rte.app.mk                           |   1 +
>  8 files changed, 1020 insertions(+)
>  create mode 100644 doc/guides/nics/tap.rst
>  create mode 100644 drivers/net/tap/Makefile
>  create mode 100644 drivers/net/tap/rte_eth_tap.c
>  create mode 100644 drivers/net/tap/rte_pmd_tap_version.map

This patch needs to be rebased on top of latest next-net, .init &
.uninit are no more used.

Also patch gives a set of checkpatch warnings, fyi.

Thanks,
ferruh
  
Michał Mirosław Oct. 11, 2016, 11:30 a.m. UTC | #2
2016-10-04 16:45 GMT+02:00, Keith Wiles <keith.wiles@intel.com>:
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
> on the local host. The PMD allows for DPDK and the host to
> communicate using a raw device interface on the host and in
> the DPDK application. The device created is a Tap device with
> a L2 packet header.
[...]
> +static uint16_t
> +pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> +{
> +	int len, n;
> +	struct rte_mbuf *mbuf;
> +	struct rx_queue *rxq = queue;
> +	struct pollfd pfd;
> +	uint16_t num_rx;
> +	unsigned long num_rx_bytes = 0;
> +
> +	pfd.events = POLLIN;
> +	pfd.fd = rxq->fd;
> +	for (num_rx = 0; num_rx < nb_pkts; ) {
> +		n = poll(&pfd, 1, 0);
> +
> +		if (n <= 0)
> +			break;
> +

Considering that syscalls are rather expensive, it would be cheaper to
allocate an mbuf here and free it when read() returns -1 instead of
calling poll() to check whether a packet is waiting. This way you
save a syscall per packet and replace one syscall with one mbuf free
per poll.

Best Regards,
Michał Mirosław
  
Ferruh Yigit Oct. 11, 2016, 11:49 a.m. UTC | #3
On 10/4/2016 3:45 PM, Keith Wiles wrote:
> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
> on the local host. The PMD allows for DPDK and the host to
> communicate using a raw device interface on the host and in
> the DPDK application. The device created is a Tap device with
> a L2 packet header.
> 
> v4 - merge with latest driver changes
> v3 - fix includes by removing ifdef for other type besides Linux.
>      Fix the copyright notice in the Makefile
> v2 - merge all of the patches into one patch.
>      Fix a typo on naming the tap device.
>      Update the maintainers list
> 
> Signed-off-by: Keith Wiles <keith.wiles@intel.com>
> ---
>  MAINTAINERS                             |   5 +
>  config/common_linuxapp                  |   2 +
>  doc/guides/nics/tap.rst                 |  84 ++++
>  drivers/net/Makefile                    |   1 +
>  drivers/net/tap/Makefile                |  57 +++
>  drivers/net/tap/rte_eth_tap.c           | 866 ++++++++++++++++++++++++++++++++
>  drivers/net/tap/rte_pmd_tap_version.map |   4 +
>  mk/rte.app.mk                           |   1 +
>  8 files changed, 1020 insertions(+)
>  create mode 100644 doc/guides/nics/tap.rst
>  create mode 100644 drivers/net/tap/Makefile
>  create mode 100644 drivers/net/tap/rte_eth_tap.c
>  create mode 100644 drivers/net/tap/rte_pmd_tap_version.map
> 
<>
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 2483dfa..59a2053 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -44,3 +44,5 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>  CONFIG_RTE_LIBRTE_POWER=y
>  CONFIG_RTE_VIRTIO_USER=y
> +CONFIG_RTE_LIBRTE_PMD_TAP=y

According existing config items, a default value of a config option
should go to config/common_base, and environment specific config file
overwrites it if required.
So this option needs to be added into config/common_base too as disabled
by default.

> +CONFIG_RTE_PMD_TAP_MAX_QUEUES=32

Is the number of max queues really needs to be a config option, I assume
in normal use case user won't need to update this and will use single
queue, if that is true what about pushing this into source code to not
make config file more complex?

> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst

<...>

> +.. code-block:: console
> +
> +   The interfaced name can be changed by adding the iface=foo0
> +   e.g. --vedv=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...

s/vedv/vdev
eth_tap needs to be net_tap as part of unifying device names work

<...>

> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index bc93230..b4afa98 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -55,6 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
>  DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
>  DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap

Rest of the PMDs sorted alphabetically, please do same.

>  
>  ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost

<...>

> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c

<...>

> +
> +static const char *drivername = "Tap PMD";
> +static int tap_unit = 0;

No need to initialize to zero.

<...>

> +
> +struct pmd_internals {
> +	char name[RTE_ETH_NAME_MAX_LEN];	/* Internal Tap device name */
> +	uint16_t nb_queues;			/* Number of queues supported */
> +	uint16_t pad0;

Why this padding? Is it reserved?

> +	struct ether_addr eth_addr;	/* Mac address of the device port */
> +
> +	int if_index;			/* IF_INDEX for the port */
> +	int fds[RTE_PMD_TAP_MAX_QUEUES]; /* List of all file descriptors */
> +
> +	struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];	/* List of RX queues */
> +	struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];	/* List of TX queues */
> +};
> +
> +/*
> + * Tun/Tap allocation routine
> + *
> + * name is the number of the interface to use, unless NULL to take the host
> + * supplied name.
> + */
> +static int
> +tun_alloc(char * name)

char *name

<...>

> +
> +	/* Always set the fiile descriptor to non-blocking */

s/fiile/file

> +	if (fcntl(fd, F_SETFL, O_NONBLOCK) < 0) {
> +		RTE_LOG(ERR, PMD, "Unable to set to nonblocking\n");
> +		perror("F_SETFL, NONBLOCK");
> +		goto error;
> +	}
> +
> +	/* If the name is different that new name as default */
> +	if (name && strcmp(name, ifr.ifr_name))
> +		strcpy(name, ifr.ifr_name);
What about more secure copy?

> +
> +	return fd;
> +
> +error:
> +	if (fd > 0)
> +		close(fd);
> +	return -1;
> +}
> +

<...>

> +
> +static void
> +tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
> +{
> +	struct pmd_internals *internals = dev->data->dev_private;
> +
> +	dev_info->driver_name = drivername;
> +	dev_info->if_index = internals->if_index;
> +	dev_info->max_mac_addrs = 1;
> +	dev_info->max_rx_pktlen = (uint32_t)ETHER_MAX_VLAN_FRAME_LEN;
> +	dev_info->max_rx_queues = (uint16_t)internals->nb_queues;
> +	dev_info->max_tx_queues = (uint16_t)internals->nb_queues;
casting to uint16_t is not requires, it is already uint16_t.

> +	dev_info->min_rx_bufsize = 0;
> +	dev_info->pci_dev = NULL;
> +}
> +
> +static void
> +tap_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
igb_stats?

> +{
> +	unsigned i, imax;
> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
> +	unsigned long rx_bytes_total = 0, tx_bytes_total = 0;
> +	const struct pmd_internals *internal = dev->data->dev_private;
> +
> +	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
> +		internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
> +
> +	for (i = 0; i < imax; i++) {
> +		igb_stats->q_ipackets[i] = internal->rxq[i].stats.ipackets;
> +		igb_stats->q_ibytes[i] = internal->rxq[i].stats.ibytes;
> +		rx_total += igb_stats->q_ipackets[i];
> +		rx_bytes_total += igb_stats->q_ibytes[i];
> +	}
> +
> +	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
> +		internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
Do we need to duplicate imax calculation?


> +
> +	for (i = 0; i < imax; i++) {
> +		igb_stats->q_opackets[i] = internal->txq[i].stats.opackets;
> +		igb_stats->q_errors[i] = internal->txq[i].stats.errs;
> +		igb_stats->q_obytes[i] = internal->txq[i].stats.obytes;
> +		tx_total += igb_stats->q_opackets[i];
> +		tx_err_total += igb_stats->q_errors[i];
> +		tx_bytes_total += igb_stats->q_obytes[i];
> +	}
> +
> +	igb_stats->ipackets = rx_total;
> +	igb_stats->ibytes = rx_bytes_total;
> +	igb_stats->opackets = tx_total;
> +	igb_stats->oerrors = tx_err_total;
> +	igb_stats->obytes = tx_bytes_total;
> +}
> +

<...>

> +
> +static int
> +rte_eth_dev_create(const char *name,
> +		   struct rte_eth_dev **eth_dev,
> +		   const struct eth_dev_ops *dev_ops,
> +		   void **internals, size_t internal_size,
> +		   uint16_t flag)
> +{
> +	char buff[RTE_ETH_NAME_MAX_LEN];
> +	int numa_node = rte_socket_id();
> +	struct rte_eth_dev *dev = NULL;
> +	struct rte_eth_dev_data *data = NULL;
> +	void *priv = NULL;
> +
> +	if ((name == NULL) || (eth_dev == NULL) || (dev_ops == NULL) ||
> +	    (internals == NULL) || (internal_size == 0)) {
> +		RTE_PMD_DEBUG_TRACE("Paramters are invalid\n");
> +		return -1;
> +	}
> +
> +	dev = rte_eth_dev_allocate(name);
> +	if (dev == NULL) {
> +		RTE_PMD_DEBUG_TRACE("%s: rte_eth_dev_allocate failed for %s\n",
> +				    name, buff);
> +		goto error;
> +	}
> +
> +	if (flag & RTE_USE_PRIVATE_DATA) {

You may need to save this flag value somewhere in internals, to decide
how to free data later.

> +		/*
> +		 * now do all data allocation - for eth_dev structure, dummy
> +		 * pci driver and internal (private) data
> +		 */
> +		snprintf(buff, sizeof(buff), "D-%s-%d", name, numa_node);
> +		data = rte_zmalloc_socket(buff, sizeof(struct rte_eth_dev_data),
> +					  0, numa_node);
> +		if (data == NULL) {
> +			RTE_PMD_DEBUG_TRACE("%s: Unable to allocate memory\n",
> +					    name);
> +			goto error;
> +		}
> +		/* move the current state of the structure to the new one */
> +		rte_memcpy(data, dev->data, sizeof(struct rte_eth_dev_data));
Why do we need to copy, trying to preserve which data?

> +		dev->data = data;	/* Override the current data pointer */
> +	} else
> +		data = dev->data;
> +
> +	snprintf(buff, sizeof(buff), "I-%s-%d", name, numa_node);
> +	priv = rte_zmalloc_socket(buff, internal_size, 0, numa_node);
> +	if (priv == NULL) {
> +		RTE_PMD_DEBUG_TRACE("Unable to allocate internal memory %lu\n",
> +				    internal_size);
> +		goto error;
> +	}
> +
> +	/* Setup some default values */
> +	dev->dev_ops = dev_ops;
> +	data->dev_private = priv;

> +	data->port_id = dev->data->port_id;
> +	memmove(data->name, dev->data->name, strlen(dev->data->name));
These two assignments are useless, needs to be done before "dev->data =
data" assignment.

> +
> +	dev->driver = NULL;
> +	data->dev_flags = RTE_ETH_DEV_DETACHABLE;
> +	data->kdrv = RTE_KDRV_NONE;
> +	data->numa_node = numa_node;
> +
> +	*eth_dev = dev;
> +	*internals = priv;
> +
> +	return 0;
> +error:
> +	rte_free(priv);
> +
> +	if (flag & RTE_USE_PRIVATE_DATA)
> +		rte_free(data);
> +
> +	rte_eth_dev_release_port(dev);
> +
> +	return -1;
> +}
> +
> +static int
> +pmd_init_internals(const char *name, struct tap_info *tap,
> +		   struct pmd_internals **internals,
> +		   struct rte_eth_dev **eth_dev)
> +{
> +	struct rte_eth_dev *dev = NULL;
> +	struct pmd_internals *internal = NULL;
> +	struct rte_eth_dev_data *data = NULL;
> +	int ret, i, fd = -1;
> +
> +	RTE_LOG(INFO, PMD,
> +		"%s: Create TUN/TAP Ethernet device with %d queues on numa %u\n",
> +		name, RTE_PMD_TAP_MAX_QUEUES, rte_socket_id());
> +
> +	pmd_link.link_speed = tap->speed;
> +
> +	ret = rte_eth_dev_create(tap->name, &dev, &ops,
> +				 (void **)&internal, sizeof(struct pmd_internals),
Why rte_eth_dev_create() get "void **internals" which requires casting,
but not "struct pmd_internals **internals" ?

> +				 RTE_USE_PRIVATE_DATA);
> +	if (ret < 0)
> +		return -1;
> +
> +	strncpy(internal->name, tap->name, sizeof(internal->name));
> +
> +	internal->nb_queues = RTE_PMD_TAP_MAX_QUEUES;
> +
> +	/* Create the first Tap device */
> +	if ((fd = tun_alloc(dev->data->name)) < 0) {
> +		RTE_LOG(ERR, PMD, "tun_alloc(%s) failed\n", dev->data->name);
> +		rte_free(internal);
rte_free(dev->data); ?
But needs to check RTE_USE_PRIVATE_DATA ..

> +		rte_eth_dev_release_port(dev);
> +		return -1;
> +	}
> +
> +	/* Presetup the fds to -1 as being not working */
> +	for(i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
> +		internal->fds[i] = -1;
> +		internal->rxq[i].fd = -1;
> +		internal->txq[i].fd = -1;
> +	}
> +
> +	/* Take the TUN/TAP fd and place in the first location */
> +	internal->rxq[0].fd = fd;
> +	internal->txq[0].fd = fd;
> +	internal->fds[0] = fd;
> +
> +	if (pmd_mac_address(fd, dev, &internal->eth_addr) < 0) {
> +		rte_free(internal);
rte_free(dev->data); ?

> +		rte_eth_dev_release_port(dev);
> +		return -1;
> +	}
> +
> +	data = dev->data;
> +
> +	data->dev_link = pmd_link;
> +	data->mac_addrs = &internal->eth_addr;
> +
> +	data->nb_rx_queues = (uint16_t)internal->nb_queues;
> +	data->nb_tx_queues = (uint16_t)internal->nb_queues;
no cast required.

> +	data->drv_name = drivername;
> +
> +	*eth_dev = dev;
> +	*internals = internal;
> +
> +	return 0;
> +}
> +

<...>

> +
> +static int
> +set_interface_speed(const char *key __rte_unused,
> +		    const char *value,
> +		    void *extra_args __rte_unused)
need to drop  __rte_unused for extra_args

> +{
> +	struct tap_info *tap = (struct tap_info *)extra_args;
> +
> +	pmd_link.link_speed = (value) ? atoi(value) : ETH_SPEED_NUM_10G;
> +	tap->speed = pmd_link.link_speed;
> +
> +	return 0;
> +}
> +
> +/*
> + * Open a TAP interface device.
> + */
> +static int
> +rte_pmd_tap_devinit(const char *name, const char *params)
> +{
> +	int ret = 0;
> +	struct rte_kvargs *kvlist;
> +	struct tap_info tap_info;
> +
> +	/* Setup default values */
> +	memset(&tap_info, 0, sizeof(tap_info));
> +
> +	tap_info.speed = ETH_SPEED_NUM_10G;
> +	snprintf(tap_info.name, sizeof(tap_info.name), "dtap%d", tap_unit++);
What about extracting iface name "dtap" into a macro to make it more
visible.

> +
> +	if ((params == NULL) || (params[0] == '\0')) {
> +		RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s\n", name);
> +
> +		ret = eth_dev_tap_create(name, &tap_info);
This "name" is not used at all (except from RTE_LOG), instead tap->name
is used for interface name, so why carying this variable around?

> +		goto leave;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "Initialize %s with params (%s)\n", name, params);
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (!kvlist) {
> +		ret = eth_dev_tap_create(name, &tap_info);
> +		goto leave;
> +	}
> +
> +	if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_TAP_SPEED_ARG,
> +					 &set_interface_speed, &tap_info);
> +		if (ret < 0)
> +			goto leave;
> +	} else
> +		set_interface_speed(NULL, NULL, &tap_info);
This call is redundant, tap_info already has default speed value set.

> +
> +	if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_TAP_IFACE_ARG,
> +					 &set_interface_name, &tap_info);
> +		if (ret < 0)
> +			goto leave;
> +	} else
> +		set_interface_name(NULL, NULL, (void *)&tap_info);
tap_info->name already set to default value (dtap%d), this call is not
required.

> +
> +	rte_kvargs_free(kvlist);
> +
> +leave:
> +	if (ret == -1)
> +		RTE_LOG(INFO, PMD, "Failed to create pmd_tap for %s\n", name);
> +
> +	return ret;
> +}
> +
> +/*
> + * detach a TAP device.
> + */
> +static int
> +rte_pmd_tap_devuninit(const char *name)
> +{
> +	struct rte_eth_dev *eth_dev = NULL;
> +	struct pmd_internals *internals;
> +	int i;
> +
> +	RTE_LOG(INFO, PMD, "Closing TUN/TAP Ethernet device on numa %u\n",
> +		rte_socket_id());
> +
> +	if (name == NULL)
This check is redundant, eal layer won't call this function with "name
== NULL"

> +		return 0;
> +
> +	/* find the ethdev entry */
> +	eth_dev = rte_eth_dev_allocated(name);
> +	if (eth_dev == NULL)
> +		return 0;
> +
> +	internals = eth_dev->data->dev_private;
> +	for (i = 0; i < internals->nb_queues; i++)
> +		if (internals->fds[i] != -1)
> +			close(internals->fds[i]);
> +
> +	rte_free(eth_dev->data->dev_private);
> +	rte_free(eth_dev->data);
data can be shared?
Don't we need a RTE_USE_PRIVATE_DATA flag check?

> +
> +	rte_eth_dev_release_port(eth_dev);
> +
> +	return 0;
> +}
> +
> +static struct rte_vdev_driver pmd_tap_drv = {
> +	.init = rte_pmd_tap_devinit,
> +	.uninit = rte_pmd_tap_devuninit,
> +};
> +
> +DRIVER_REGISTER_VDEV(eth_tap, pmd_tap_drv);
name convention is now: "net_tap"

> +DRIVER_REGISTER_PARAM_STRING(eth_tap,
> +			     "iface=<string>,speed=N");
> diff --git a/drivers/net/tap/rte_pmd_tap_version.map b/drivers/net/tap/rte_pmd_tap_version.map
> new file mode 100644
> index 0000000..61463bf
> --- /dev/null
> +++ b/drivers/net/tap/rte_pmd_tap_version.map
> @@ -0,0 +1,4 @@
> +DPDK_16.11 {
> +
> +	local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 1a0095b..bd1d10f 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -129,6 +129,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
>  endif # $(CONFIG_RTE_LIBRTE_VHOST)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP)        += -lrte_pmd_tap
please put in alphebetical order

>  
>  ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>
  
Ferruh Yigit Oct. 11, 2016, 12:28 p.m. UTC | #4
On 10/4/2016 3:45 PM, Keith Wiles wrote:
> +/*
> + * Open a TAP interface device.
> + */
> +static int
> +rte_pmd_tap_devinit(const char *name, const char *params)
> +{
> +	int ret = 0;
> +	struct rte_kvargs *kvlist;
> +	struct tap_info tap_info;
> +
> +	/* Setup default values */
> +	memset(&tap_info, 0, sizeof(tap_info));
> +
> +	tap_info.speed = ETH_SPEED_NUM_10G;
> +	snprintf(tap_info.name, sizeof(tap_info.name), "dtap%d", tap_unit++);
> +
> +	if ((params == NULL) || (params[0] == '\0')) {
> +		RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s\n", name);
> +
> +		ret = eth_dev_tap_create(name, &tap_info);
> +		goto leave;
> +	}
> +
> +	RTE_LOG(INFO, PMD, "Initialize %s with params (%s)\n", name, params);
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (!kvlist) {
> +		ret = eth_dev_tap_create(name, &tap_info);
> +		goto leave;
> +	}
> +
> +	if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_TAP_SPEED_ARG,
> +					 &set_interface_speed, &tap_info);
> +		if (ret < 0)
> +			goto leave;
> +	} else
> +		set_interface_speed(NULL, NULL, &tap_info);
> +
> +	if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {
> +		ret = rte_kvargs_process(kvlist, ETH_TAP_IFACE_ARG,
> +					 &set_interface_name, &tap_info);
> +		if (ret < 0)
> +			goto leave;
> +	} else
> +		set_interface_name(NULL, NULL, (void *)&tap_info);

Also there must be a eth_dev_tap_create() call after this point to use
tap_info struct with custom values, right?
"--vdev eth_tap0,iface=foo0" parameter shouldn't be working with this
code, right?

> +
> +	rte_kvargs_free(kvlist);
> +
> +leave:
> +	if (ret == -1)
> +		RTE_LOG(INFO, PMD, "Failed to create pmd_tap for %s\n", name);
> +
> +	return ret;
> +}
  
Wiles, Keith Oct. 11, 2016, 8:56 p.m. UTC | #5
Regards,
Keith

> On Oct 11, 2016, at 6:30 AM, Michał Mirosław <mirqus@gmail.com> wrote:

> 

> 2016-10-04 16:45 GMT+02:00, Keith Wiles <keith.wiles@intel.com>:

>> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces

>> on the local host. The PMD allows for DPDK and the host to

>> communicate using a raw device interface on the host and in

>> the DPDK application. The device created is a Tap device with

>> a L2 packet header.

> [...]

>> +static uint16_t

>> +pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)

>> +{

>> +	int len, n;

>> +	struct rte_mbuf *mbuf;

>> +	struct rx_queue *rxq = queue;

>> +	struct pollfd pfd;

>> +	uint16_t num_rx;

>> +	unsigned long num_rx_bytes = 0;

>> +

>> +	pfd.events = POLLIN;

>> +	pfd.fd = rxq->fd;

>> +	for (num_rx = 0; num_rx < nb_pkts; ) {

>> +		n = poll(&pfd, 1, 0);

>> +

>> +		if (n <= 0)

>> +			break;

>> +

> 

> Considering that syscalls are rather expensive, it would be cheaper to

> allocate an mbuf here and free it when read() returns -1 instead of

> calling poll() to check whether a packet is waiting. This way you

> save a syscall per packet and replace one syscall with one mbuf free

> per poll.


I made this change, but saw no performance difference in the two methods. Removing poll seems reasonable as it is simpler. TAP is already so slow is why the performance did not change is my guess. Anyone wanting to use TAP as a high performance interface is going to be surprised. I believe the best use case for the TAP interface is for control or exception path.

> 

> Best Regards,

> Michał Mirosław
  
Wiles, Keith Oct. 11, 2016, 8:57 p.m. UTC | #6
Regards,
Keith

> On Oct 11, 2016, at 7:28 AM, Yigit, Ferruh <ferruh.yigit@intel.com> wrote:
> 
> On 10/4/2016 3:45 PM, Keith Wiles wrote:
>> +/*
>> + * Open a TAP interface device.
>> + */
>> +static int
>> +rte_pmd_tap_devinit(const char *name, const char *params)
>> +{
>> +	int ret = 0;
>> +	struct rte_kvargs *kvlist;
>> +	struct tap_info tap_info;
>> +
>> +	/* Setup default values */
>> +	memset(&tap_info, 0, sizeof(tap_info));
>> +
>> +	tap_info.speed = ETH_SPEED_NUM_10G;
>> +	snprintf(tap_info.name, sizeof(tap_info.name), "dtap%d", tap_unit++);
>> +
>> +	if ((params == NULL) || (params[0] == '\0')) {
>> +		RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s\n", name);
>> +
>> +		ret = eth_dev_tap_create(name, &tap_info);
>> +		goto leave;
>> +	}
>> +
>> +	RTE_LOG(INFO, PMD, "Initialize %s with params (%s)\n", name, params);
>> +
>> +	kvlist = rte_kvargs_parse(params, valid_arguments);
>> +	if (!kvlist) {
>> +		ret = eth_dev_tap_create(name, &tap_info);
>> +		goto leave;
>> +	}
>> +
>> +	if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) == 1) {
>> +		ret = rte_kvargs_process(kvlist, ETH_TAP_SPEED_ARG,
>> +					 &set_interface_speed, &tap_info);
>> +		if (ret < 0)
>> +			goto leave;
>> +	} else
>> +		set_interface_speed(NULL, NULL, &tap_info);
>> +
>> +	if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {
>> +		ret = rte_kvargs_process(kvlist, ETH_TAP_IFACE_ARG,
>> +					 &set_interface_name, &tap_info);
>> +		if (ret < 0)
>> +			goto leave;
>> +	} else
>> +		set_interface_name(NULL, NULL, (void *)&tap_info);
> 
> Also there must be a eth_dev_tap_create() call after this point to use
> tap_info struct with custom values, right?
> "--vdev eth_tap0,iface=foo0" parameter shouldn't be working with this
> code, right?

Removed the extra code.

> 
>> +
>> +	rte_kvargs_free(kvlist);
>> +
>> +leave:
>> +	if (ret == -1)
>> +		RTE_LOG(INFO, PMD, "Failed to create pmd_tap for %s\n", name);
>> +
>> +	return ret;
>> +}
>
  
Wiles, Keith Oct. 11, 2016, 9:07 p.m. UTC | #7
Regards,
Keith

> On Oct 11, 2016, at 6:49 AM, Yigit, Ferruh <ferruh.yigit@intel.com> wrote:

> 

> On 10/4/2016 3:45 PM, Keith Wiles wrote:

>> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces

>> on the local host. The PMD allows for DPDK and the host to

>> communicate using a raw device interface on the host and in

>> the DPDK application. The device created is a Tap device with

>> a L2 packet header.

>> 


Will try to ship out a v5 soon.
>> v4 - merge with latest driver changes

>> v3 - fix includes by removing ifdef for other type besides Linux.

>>     Fix the copyright notice in the Makefile

>> v2 - merge all of the patches into one patch.

>>     Fix a typo on naming the tap device.

>>     Update the maintainers list

>> 

>> Signed-off-by: Keith Wiles <keith.wiles@intel.com>

>> ---

>> MAINTAINERS                             |   5 +

>> config/common_linuxapp                  |   2 +

>> doc/guides/nics/tap.rst                 |  84 ++++

>> drivers/net/Makefile                    |   1 +

>> drivers/net/tap/Makefile                |  57 +++

>> drivers/net/tap/rte_eth_tap.c           | 866 ++++++++++++++++++++++++++++++++

>> drivers/net/tap/rte_pmd_tap_version.map |   4 +

>> mk/rte.app.mk                           |   1 +

>> 8 files changed, 1020 insertions(+)

>> create mode 100644 doc/guides/nics/tap.rst

>> create mode 100644 drivers/net/tap/Makefile

>> create mode 100644 drivers/net/tap/rte_eth_tap.c

>> create mode 100644 drivers/net/tap/rte_pmd_tap_version.map

>> 

> <>

>> diff --git a/config/common_linuxapp b/config/common_linuxapp

>> index 2483dfa..59a2053 100644

>> --- a/config/common_linuxapp

>> +++ b/config/common_linuxapp

>> @@ -44,3 +44,5 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y

>> CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y

>> CONFIG_RTE_LIBRTE_POWER=y

>> CONFIG_RTE_VIRTIO_USER=y

>> +CONFIG_RTE_LIBRTE_PMD_TAP=y

> 

> According existing config items, a default value of a config option

> should go to config/common_base, and environment specific config file

> overwrites it if required.

> So this option needs to be added into config/common_base too as disabled

> by default.


Add the define to common_base as no, plus a comment for Linux only.

> 

>> +CONFIG_RTE_PMD_TAP_MAX_QUEUES=32


Moved this to the .c file as a define.
> 

> Is the number of max queues really needs to be a config option, I assume

> in normal use case user won't need to update this and will use single

> queue, if that is true what about pushing this into source code to not

> make config file more complex?

> 

>> diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst

> 

> <...>

> 

>> +.. code-block:: console

>> +

>> +   The interfaced name can be changed by adding the iface=foo0

>> +   e.g. --vedv=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...

> 

> s/vedv/vdev

> eth_tap needs to be net_tap as part of unifying device names work


Fixed.
> 

> <...>

> 

>> diff --git a/drivers/net/Makefile b/drivers/net/Makefile

>> index bc93230..b4afa98 100644

>> --- a/drivers/net/Makefile

>> +++ b/drivers/net/Makefile

>> @@ -55,6 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx

>> DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio

>> DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3

>> DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt

>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap

> 

> Rest of the PMDs sorted alphabetically, please do same.


Done.
> 

>> 

>> ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)

>> DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost

> 

> <...>

> 

>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c

> 

> <...>

> 

>> +

>> +static const char *drivername = "Tap PMD";

>> +static int tap_unit = 0;

> 

> No need to initialize to zero.


Fixed
> 

> <...>

> 

>> +

>> +struct pmd_internals {

>> +	char name[RTE_ETH_NAME_MAX_LEN];	/* Internal Tap device name */

>> +	uint16_t nb_queues;			/* Number of queues supported */

>> +	uint16_t pad0;

> 

> Why this padding? Is it reserved?


Removed pad0. I just like to know about gaps in the structures is the reason.
> 

>> +	struct ether_addr eth_addr;	/* Mac address of the device port */

>> +

>> +	int if_index;			/* IF_INDEX for the port */

>> +	int fds[RTE_PMD_TAP_MAX_QUEUES]; /* List of all file descriptors */

>> +

>> +	struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];	/* List of RX queues */

>> +	struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];	/* List of TX queues */

>> +};

>> +

>> +/*

>> + * Tun/Tap allocation routine

>> + *

>> + * name is the number of the interface to use, unless NULL to take the host

>> + * supplied name.

>> + */

>> +static int

>> +tun_alloc(char * name)

> 

> char *name


Fixed.
> 

> <...>

> 

>> +

>> +	/* Always set the fiile descriptor to non-blocking */

> 

> s/fiile/file

> 

>> +	if (fcntl(fd, F_SETFL, O_NONBLOCK) < 0) {

>> +		RTE_LOG(ERR, PMD, "Unable to set to nonblocking\n");

>> +		perror("F_SETFL, NONBLOCK");

>> +		goto error;

>> +	}

>> +

>> +	/* If the name is different that new name as default */

>> +	if (name && strcmp(name, ifr.ifr_name))

>> +		strcpy(name, ifr.ifr_name);

> What about more secure copy?


Changed to be more secure.
> 

>> +

>> +	return fd;

>> +

>> +error:

>> +	if (fd > 0)

>> +		close(fd);

>> +	return -1;

>> +}

>> +

> 

> <...>

> 

>> +

>> +static void

>> +tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)

>> +{

>> +	struct pmd_internals *internals = dev->data->dev_private;

>> +

>> +	dev_info->driver_name = drivername;

>> +	dev_info->if_index = internals->if_index;

>> +	dev_info->max_mac_addrs = 1;

>> +	dev_info->max_rx_pktlen = (uint32_t)ETHER_MAX_VLAN_FRAME_LEN;

>> +	dev_info->max_rx_queues = (uint16_t)internals->nb_queues;

>> +	dev_info->max_tx_queues = (uint16_t)internals->nb_queues;

> casting to uint16_t is not requires, it is already uint16_t.


Removed
> 

>> +	dev_info->min_rx_bufsize = 0;

>> +	dev_info->pci_dev = NULL;

>> +}

>> +

>> +static void

>> +tap_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)

> igb_stats?

> 

>> +{

>> +	unsigned i, imax;

>> +	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;

>> +	unsigned long rx_bytes_total = 0, tx_bytes_total = 0;

>> +	const struct pmd_internals *internal = dev->data->dev_private;

>> +

>> +	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?

>> +		internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;

>> +

>> +	for (i = 0; i < imax; i++) {

>> +		igb_stats->q_ipackets[i] = internal->rxq[i].stats.ipackets;

>> +		igb_stats->q_ibytes[i] = internal->rxq[i].stats.ibytes;

>> +		rx_total += igb_stats->q_ipackets[i];

>> +		rx_bytes_total += igb_stats->q_ibytes[i];

>> +	}

>> +

>> +	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?

>> +		internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;

> Do we need to duplicate imax calculation?


Removed
> 

> 

>> +

>> +	for (i = 0; i < imax; i++) {

>> +		igb_stats->q_opackets[i] = internal->txq[i].stats.opackets;

>> +		igb_stats->q_errors[i] = internal->txq[i].stats.errs;

>> +		igb_stats->q_obytes[i] = internal->txq[i].stats.obytes;

>> +		tx_total += igb_stats->q_opackets[i];

>> +		tx_err_total += igb_stats->q_errors[i];

>> +		tx_bytes_total += igb_stats->q_obytes[i];

>> +	}

>> +

>> +	igb_stats->ipackets = rx_total;

>> +	igb_stats->ibytes = rx_bytes_total;

>> +	igb_stats->opackets = tx_total;

>> +	igb_stats->oerrors = tx_err_total;

>> +	igb_stats->obytes = tx_bytes_total;

>> +}

>> +

> 

> <...>

> 

>> +

>> +static int

>> +rte_eth_dev_create(const char *name,

>> +		   struct rte_eth_dev **eth_dev,

>> +		   const struct eth_dev_ops *dev_ops,

>> +		   void **internals, size_t internal_size,

>> +		   uint16_t flag)

>> +{

>> +	char buff[RTE_ETH_NAME_MAX_LEN];

>> +	int numa_node = rte_socket_id();

>> +	struct rte_eth_dev *dev = NULL;

>> +	struct rte_eth_dev_data *data = NULL;

>> +	void *priv = NULL;

>> +

>> +	if ((name == NULL) || (eth_dev == NULL) || (dev_ops == NULL) ||

>> +	    (internals == NULL) || (internal_size == 0)) {

>> +		RTE_PMD_DEBUG_TRACE("Paramters are invalid\n");

>> +		return -1;

>> +	}

>> +

>> +	dev = rte_eth_dev_allocate(name);

>> +	if (dev == NULL) {

>> +		RTE_PMD_DEBUG_TRACE("%s: rte_eth_dev_allocate failed for %s\n",

>> +				    name, buff);

>> +		goto error;

>> +	}

>> +

>> +	if (flag & RTE_USE_PRIVATE_DATA) {

> 

> You may need to save this flag value somewhere in internals, to decide

> how to free data later.


Let me look into this one more and see if it is required at all.
> 

>> +		/*

>> +		 * now do all data allocation - for eth_dev structure, dummy

>> +		 * pci driver and internal (private) data

>> +		 */

>> +		snprintf(buff, sizeof(buff), "D-%s-%d", name, numa_node);

>> +		data = rte_zmalloc_socket(buff, sizeof(struct rte_eth_dev_data),

>> +					  0, numa_node);

>> +		if (data == NULL) {

>> +			RTE_PMD_DEBUG_TRACE("%s: Unable to allocate memory\n",

>> +					    name);

>> +			goto error;

>> +		}

>> +		/* move the current state of the structure to the new one */

>> +		rte_memcpy(data, dev->data, sizeof(struct rte_eth_dev_data));

> Why do we need to copy, trying to preserve which data?

> 

>> +		dev->data = data;	/* Override the current data pointer */

>> +	} else

>> +		data = dev->data;

>> +

>> +	snprintf(buff, sizeof(buff), "I-%s-%d", name, numa_node);

>> +	priv = rte_zmalloc_socket(buff, internal_size, 0, numa_node);

>> +	if (priv == NULL) {

>> +		RTE_PMD_DEBUG_TRACE("Unable to allocate internal memory %lu\n",

>> +				    internal_size);

>> +		goto error;

>> +	}

>> +

>> +	/* Setup some default values */

>> +	dev->dev_ops = dev_ops;

>> +	data->dev_private = priv;

> 

>> +	data->port_id = dev->data->port_id;

>> +	memmove(data->name, dev->data->name, strlen(dev->data->name));

> These two assignments are useless, needs to be done before "dev->data =

> data" assignment.


Reworked this code area to remove it.
> 

>> +

>> +	dev->driver = NULL;

>> +	data->dev_flags = RTE_ETH_DEV_DETACHABLE;

>> +	data->kdrv = RTE_KDRV_NONE;

>> +	data->numa_node = numa_node;

>> +

>> +	*eth_dev = dev;

>> +	*internals = priv;

>> +

>> +	return 0;

>> +error:

>> +	rte_free(priv);

>> +

>> +	if (flag & RTE_USE_PRIVATE_DATA)

>> +		rte_free(data);

>> +

>> +	rte_eth_dev_release_port(dev);

>> +

>> +	return -1;

>> +}

>> +

>> +static int

>> +pmd_init_internals(const char *name, struct tap_info *tap,

>> +		   struct pmd_internals **internals,

>> +		   struct rte_eth_dev **eth_dev)

>> +{

>> +	struct rte_eth_dev *dev = NULL;

>> +	struct pmd_internals *internal = NULL;

>> +	struct rte_eth_dev_data *data = NULL;

>> +	int ret, i, fd = -1;

>> +

>> +	RTE_LOG(INFO, PMD,

>> +		"%s: Create TUN/TAP Ethernet device with %d queues on numa %u\n",

>> +		name, RTE_PMD_TAP_MAX_QUEUES, rte_socket_id());

>> +

>> +	pmd_link.link_speed = tap->speed;

>> +

>> +	ret = rte_eth_dev_create(tap->name, &dev, &ops,

>> +				 (void **)&internal, sizeof(struct pmd_internals),

> Why rte_eth_dev_create() get "void **internals" which requires casting,

> but not "struct pmd_internals **internals” ?


Fixed.
> 

>> +				 RTE_USE_PRIVATE_DATA);

>> +	if (ret < 0)

>> +		return -1;

>> +

>> +	strncpy(internal->name, tap->name, sizeof(internal->name));

>> +

>> +	internal->nb_queues = RTE_PMD_TAP_MAX_QUEUES;

>> +

>> +	/* Create the first Tap device */

>> +	if ((fd = tun_alloc(dev->data->name)) < 0) {

>> +		RTE_LOG(ERR, PMD, "tun_alloc(%s) failed\n", dev->data->name);

>> +		rte_free(internal);

> rte_free(dev->data); ?

> But needs to check RTE_USE_PRIVATE_DATA ..


See above
> 

>> +		rte_eth_dev_release_port(dev);

>> +		return -1;

>> +	}

>> +

>> +	/* Presetup the fds to -1 as being not working */

>> +	for(i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {

>> +		internal->fds[i] = -1;

>> +		internal->rxq[i].fd = -1;

>> +		internal->txq[i].fd = -1;

>> +	}

>> +

>> +	/* Take the TUN/TAP fd and place in the first location */

>> +	internal->rxq[0].fd = fd;

>> +	internal->txq[0].fd = fd;

>> +	internal->fds[0] = fd;

>> +

>> +	if (pmd_mac_address(fd, dev, &internal->eth_addr) < 0) {

>> +		rte_free(internal);

> rte_free(dev->data); ?


Yes Added.
> 

>> +		rte_eth_dev_release_port(dev);

>> +		return -1;

>> +	}

>> +

>> +	data = dev->data;

>> +

>> +	data->dev_link = pmd_link;

>> +	data->mac_addrs = &internal->eth_addr;

>> +

>> +	data->nb_rx_queues = (uint16_t)internal->nb_queues;

>> +	data->nb_tx_queues = (uint16_t)internal->nb_queues;

> no cast required.


Removed
> 

>> +	data->drv_name = drivername;

>> +

>> +	*eth_dev = dev;

>> +	*internals = internal;

>> +

>> +	return 0;

>> +}

>> +

> 

> <...>

> 

>> +

>> +static int

>> +set_interface_speed(const char *key __rte_unused,

>> +		    const char *value,

>> +		    void *extra_args __rte_unused)

> need to drop  __rte_unused for extra_args

> 

>> +{

>> +	struct tap_info *tap = (struct tap_info *)extra_args;

>> +

>> +	pmd_link.link_speed = (value) ? atoi(value) : ETH_SPEED_NUM_10G;

>> +	tap->speed = pmd_link.link_speed;

>> +

>> +	return 0;

>> +}

>> +

>> +/*

>> + * Open a TAP interface device.

>> + */

>> +static int

>> +rte_pmd_tap_devinit(const char *name, const char *params)

>> +{

>> +	int ret = 0;

>> +	struct rte_kvargs *kvlist;

>> +	struct tap_info tap_info;

>> +

>> +	/* Setup default values */

>> +	memset(&tap_info, 0, sizeof(tap_info));

>> +

>> +	tap_info.speed = ETH_SPEED_NUM_10G;

>> +	snprintf(tap_info.name, sizeof(tap_info.name), "dtap%d", tap_unit++);

> What about extracting iface name "dtap" into a macro to make it more

> visible.


Created macro for the default name.
> 

>> +

>> +	if ((params == NULL) || (params[0] == '\0')) {

>> +		RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s\n", name);

>> +

>> +		ret = eth_dev_tap_create(name, &tap_info);

> This "name" is not used at all (except from RTE_LOG), instead tap->name

> is used for interface name, so why carying this variable around?


Fixed and changed the API.
> 

>> +		goto leave;

>> +	}

>> +

>> +	RTE_LOG(INFO, PMD, "Initialize %s with params (%s)\n", name, params);

>> +

>> +	kvlist = rte_kvargs_parse(params, valid_arguments);

>> +	if (!kvlist) {

>> +		ret = eth_dev_tap_create(name, &tap_info);

>> +		goto leave;

>> +	}

>> +

>> +	if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) == 1) {

>> +		ret = rte_kvargs_process(kvlist, ETH_TAP_SPEED_ARG,

>> +					 &set_interface_speed, &tap_info);

>> +		if (ret < 0)

>> +			goto leave;

>> +	} else

>> +		set_interface_speed(NULL, NULL, &tap_info);

> This call is redundant, tap_info already has default speed value set.


Removed
> 

>> +

>> +	if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {

>> +		ret = rte_kvargs_process(kvlist, ETH_TAP_IFACE_ARG,

>> +					 &set_interface_name, &tap_info);

>> +		if (ret < 0)

>> +			goto leave;

>> +	} else

>> +		set_interface_name(NULL, NULL, (void *)&tap_info);

> tap_info->name already set to default value (dtap%d), this call is not

> required.


Removed
> 

>> +

>> +	rte_kvargs_free(kvlist);

>> +

>> +leave:

>> +	if (ret == -1)

>> +		RTE_LOG(INFO, PMD, "Failed to create pmd_tap for %s\n", name);

>> +

>> +	return ret;

>> +}

>> +

>> +/*

>> + * detach a TAP device.

>> + */

>> +static int

>> +rte_pmd_tap_devuninit(const char *name)

>> +{

>> +	struct rte_eth_dev *eth_dev = NULL;

>> +	struct pmd_internals *internals;

>> +	int i;

>> +

>> +	RTE_LOG(INFO, PMD, "Closing TUN/TAP Ethernet device on numa %u\n",

>> +		rte_socket_id());

>> +

>> +	if (name == NULL)

> This check is redundant, eal layer won't call this function with "name

> == NULL”


Removed
> 

>> +		return 0;

>> +

>> +	/* find the ethdev entry */

>> +	eth_dev = rte_eth_dev_allocated(name);

>> +	if (eth_dev == NULL)

>> +		return 0;

>> +

>> +	internals = eth_dev->data->dev_private;

>> +	for (i = 0; i < internals->nb_queues; i++)

>> +		if (internals->fds[i] != -1)

>> +			close(internals->fds[i]);

>> +

>> +	rte_free(eth_dev->data->dev_private);

>> +	rte_free(eth_dev->data);

> data can be shared?

> Don't we need a RTE_USE_PRIVATE_DATA flag check?

> 

>> +

>> +	rte_eth_dev_release_port(eth_dev);

>> +

>> +	return 0;

>> +}

>> +

>> +static struct rte_vdev_driver pmd_tap_drv = {

>> +	.init = rte_pmd_tap_devinit,

>> +	.uninit = rte_pmd_tap_devuninit,

>> +};

>> +

>> +DRIVER_REGISTER_VDEV(eth_tap, pmd_tap_drv);

> name convention is now: “net_tap"


Fixed
> 

>> +DRIVER_REGISTER_PARAM_STRING(eth_tap,

>> +			     "iface=<string>,speed=N");

>> diff --git a/drivers/net/tap/rte_pmd_tap_version.map b/drivers/net/tap/rte_pmd_tap_version.map

>> new file mode 100644

>> index 0000000..61463bf

>> --- /dev/null

>> +++ b/drivers/net/tap/rte_pmd_tap_version.map

>> @@ -0,0 +1,4 @@

>> +DPDK_16.11 {

>> +

>> +	local: *;

>> +};

>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk

>> index 1a0095b..bd1d10f 100644

>> --- a/mk/rte.app.mk

>> +++ b/mk/rte.app.mk

>> @@ -129,6 +129,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)

>> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost

>> endif # $(CONFIG_RTE_LIBRTE_VHOST)

>> _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio

>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP)        += -lrte_pmd_tap

> please put in alphebetical order


Done
> 

>> 

>> ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)

>> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
  
Michał Mirosław Oct. 12, 2016, 8:14 a.m. UTC | #8
2016-10-11 22:56 GMT+02:00 Wiles, Keith <keith.wiles@intel.com>:
>> On Oct 11, 2016, at 6:30 AM, Michał Mirosław <mirqus@gmail.com> wrote:
>>
>> 2016-10-04 16:45 GMT+02:00, Keith Wiles <keith.wiles@intel.com>:
>>> The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces
>>> on the local host. The PMD allows for DPDK and the host to
>>> communicate using a raw device interface on the host and in
>>> the DPDK application. The device created is a Tap device with
>>> a L2 packet header.
>> [...]
>>> +static uint16_t
>>> +pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
>>> +{
>>> +    int len, n;
>>> +    struct rte_mbuf *mbuf;
>>> +    struct rx_queue *rxq = queue;
>>> +    struct pollfd pfd;
>>> +    uint16_t num_rx;
>>> +    unsigned long num_rx_bytes = 0;
>>> +
>>> +    pfd.events = POLLIN;
>>> +    pfd.fd = rxq->fd;
>>> +    for (num_rx = 0; num_rx < nb_pkts; ) {
>>> +            n = poll(&pfd, 1, 0);
>>> +
>>> +            if (n <= 0)
>>> +                    break;
>>> +
>>
>> Considering that syscalls are rather expensive, it would be cheaper to
>> allocate an mbuf here and free it when read() returns -1 instead of
>> calling poll() to check whether a packet is waiting. This way you
>> save a syscall per packet and replace one syscall with one mbuf free
>> per poll.
>
> I made this change, but saw no performance difference in the two methods. Removing poll seems reasonable as it is simpler. TAP is already so slow is why the performance did not change is my guess. Anyone wanting to use TAP as a high performance interface is going to be surprised. I believe the best use case for the TAP interface is for control or exception path.

Agreed, TAP does not look like designed for performance as a first goal.

You could do the same simplification for TX path, BTW.

Best Regards,
Michał Mirosław
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 7c33ad4..fad74e4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -392,6 +392,11 @@  F: doc/guides/nics/pcap_ring.rst
 F: app/test/test_pmd_ring.c
 F: app/test/test_pmd_ring_perf.c
 
+Tap PMD
+M: Keith Wiles <keith.wiles@intel.com>
+F: drivers/net/tap
+F: doc/guides/nics/tap.rst
+
 Null Networking PMD
 M: Tetsuya Mukawa <mtetsuyah@gmail.com>
 F: drivers/net/null/
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 2483dfa..59a2053 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -44,3 +44,5 @@  CONFIG_RTE_LIBRTE_PMD_VHOST=y
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
 CONFIG_RTE_LIBRTE_POWER=y
 CONFIG_RTE_VIRTIO_USER=y
+CONFIG_RTE_LIBRTE_PMD_TAP=y
+CONFIG_RTE_PMD_TAP_MAX_QUEUES=32
diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
new file mode 100644
index 0000000..072def8
--- /dev/null
+++ b/doc/guides/nics/tap.rst
@@ -0,0 +1,84 @@ 
+..  BSD LICENSE
+    Copyright(c) 2016 Intel Corporation. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Intel Corporation nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Tun/Tap Poll Mode Driver
+========================================
+
+The rte_eth_tap.c PMD creates a device using TUN/TAP interfaces on the local
+host. The PMD allows for DPDK and the host to communicate using a raw device
+interface on the host and in the DPDK application.
+
+The device created is a TAP device, which sends/receives packet in a raw format
+with a L2 header. The usage for a TAP PMD is for connectivity to the local host
+using a TAP interface. When the TAP PMD is initialized it will create a number
+of tap devices in the host accessed via 'ifconfig -a' or 'ip' command. The
+commands can be used to assign and query the virtual like device.
+
+These TAP interfaces can be used with wireshark or tcpdump or Pktgen-DPDK along
+with being able to be used as a network connection to the DPDK application. The
+method enable one or more interfaces is to use the --vdev=eth_tap option on the
+DPDK application  command line. Each --vdev=eth_tap option give will create an
+interface named dtap0, dtap1, ... and so forth.
+
+.. code-block:: console
+
+   The interfaced name can be changed by adding the iface=foo0
+   e.g. --vedv=eth_tap,iface=foo0 --vdev=eth_tap,iface=foo1, ...
+
+.. code-block:: console
+
+   Also the speed of the interface can be changed from 10G to whatever number
+   needed, but the interface does not enforce that speed.
+   e.g. --vdev=eth_tap,iface=foo0,speed=25000
+
+After the DPDK application is started you can send and receive packets on the
+interface using the standard rx_burst/tx_burst APIs in DPDK. From the host point
+of view you can use any host tool like tcpdump, wireshark, ping, Pktgen and
+others to communicate with the DPDK application. The DPDK application may not
+understand network protocols like IPv4/6, UDP or TCP unless the application has
+been written to understand these protocols.
+
+If you need the interface as a real network interface meaning running and has
+a valid IP address then you can do this with the following commands:
+
+.. code-block:: console
+
+   sudo ip link set dtap0 up; sudo ip addr add 192.168.0.250/24 dev dtap0
+   sudo ip link set dtap1 up; sudo ip addr add 192.168.1.250/24 dev dtap1
+
+Please change the IP addresses as you see fit.
+
+If routing is enabled on the host you can also communicate with the DPDK App
+over the internet via a standard socket layer application as long as you account
+for the protocol handing in the application.
+
+If you have a Network Stack in your DPDK application or something like it you
+can utilize that stack to handle the network protocols. Plus you would be able
+to address the interface using an IP address assigned to the internal interface.
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index bc93230..b4afa98 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -55,6 +55,7 @@  DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap
 
 ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += vhost
diff --git a/drivers/net/tap/Makefile b/drivers/net/tap/Makefile
new file mode 100644
index 0000000..e18f30c
--- /dev/null
+++ b/drivers/net/tap/Makefile
@@ -0,0 +1,57 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_tap.a
+
+EXPORT_MAP := rte_pmd_tap_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += rte_eth_tap.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_mempool
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += lib/librte_kvargs
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
new file mode 100644
index 0000000..680edd1
--- /dev/null
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -0,0 +1,866 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_vdev.h>
+#include <rte_kvargs.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/socket.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <poll.h>
+#include <arpa/inet.h>
+#include <linux/if.h>
+#include <linux/if_tun.h>
+#include <linux/if_ether.h>
+#include <fcntl.h>
+
+#include <poll.h>
+
+/* Linux based path to the TUN device */
+#define TUN_TAP_DEV_PATH        "/dev/net/tun"
+
+#define ETH_TAP_IFACE_ARG       "iface"
+#define ETH_TAP_SPEED_ARG       "speed"
+
+static const char *valid_arguments[] = {
+	ETH_TAP_IFACE_ARG,
+	ETH_TAP_SPEED_ARG,
+	NULL
+};
+
+static const char *drivername = "Tap PMD";
+static int tap_unit = 0;
+
+static struct rte_eth_link pmd_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_DOWN,
+	.link_autoneg = ETH_LINK_SPEED_AUTONEG
+};
+
+struct tap_info {
+	char name[RTE_ETH_NAME_MAX_LEN]; /* Interface name supplied/given */
+	int speed;			 /* Speed of interface */
+};
+
+struct pkt_stats {
+	uint64_t opackets;		/* Number of output packets */
+	uint64_t ipackets;		/* Number of input packets */
+	uint64_t obytes;		/* Number of bytes on output */
+	uint64_t ibytes;		/* Number of bytes on input */
+	uint64_t errs;			/* Number of error packets */
+};
+
+struct rx_queue {
+	struct rte_mempool *mp;		/* Mempool for RX packets */
+	uint16_t in_port;		/* Port ID */
+	int fd;
+
+	struct pkt_stats stats;		/* Stats for this RX queue */
+};
+
+struct tx_queue {
+	int fd;
+	struct pkt_stats stats;		/* Stats for this TX queue */
+};
+
+struct pmd_internals {
+	char name[RTE_ETH_NAME_MAX_LEN];	/* Internal Tap device name */
+	uint16_t nb_queues;			/* Number of queues supported */
+	uint16_t pad0;
+	struct ether_addr eth_addr;	/* Mac address of the device port */
+
+	int if_index;			/* IF_INDEX for the port */
+	int fds[RTE_PMD_TAP_MAX_QUEUES]; /* List of all file descriptors */
+
+	struct rx_queue rxq[RTE_PMD_TAP_MAX_QUEUES];	/* List of RX queues */
+	struct tx_queue txq[RTE_PMD_TAP_MAX_QUEUES];	/* List of TX queues */
+};
+
+/*
+ * Tun/Tap allocation routine
+ *
+ * name is the number of the interface to use, unless NULL to take the host
+ * supplied name.
+ */
+static int
+tun_alloc(char * name)
+{
+	struct ifreq ifr;
+	unsigned int features;
+	int fd;
+
+	memset(&ifr, 0, sizeof(struct ifreq));
+
+	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+	if (name && name[0])
+		strncpy(ifr.ifr_name, name, IFNAMSIZ);
+
+	fd = open(TUN_TAP_DEV_PATH, O_RDWR);
+	if (fd < 0) {
+		RTE_LOG(ERR, PMD, "Unable to create TAP interface");
+		goto error;
+	}
+
+	/* Grab the TUN features to verify we can work */
+	if (ioctl(fd, TUNGETFEATURES, &features) < 0) {
+		RTE_LOG(ERR, PMD, "Unable to get TUN/TAP features\n");
+		goto error;
+	}
+	RTE_LOG(DEBUG, PMD, "TUN/TAP Features %08x\n", features);
+
+	if (!(features & IFF_MULTI_QUEUE) && (RTE_PMD_TAP_MAX_QUEUES > 1)) {
+		RTE_LOG(DEBUG, PMD, "TUN/TAP device only one queue\n");
+		goto error;
+	} else if ((features & IFF_ONE_QUEUE) && (RTE_PMD_TAP_MAX_QUEUES == 1)) {
+		ifr.ifr_flags |= IFF_ONE_QUEUE;
+		RTE_LOG(DEBUG, PMD, "Single queue only support\n");
+	} else {
+		ifr.ifr_flags |= IFF_MULTI_QUEUE;
+		RTE_LOG(DEBUG, PMD, "Multi-queue support for %d queues\n",
+			RTE_PMD_TAP_MAX_QUEUES);
+	}
+
+	/* Set the TUN/TAP configuration and get the name if needed */
+	if (ioctl(fd, TUNSETIFF, (void *)&ifr) < 0) {
+		RTE_LOG(ERR, PMD, "Unable to set TUNSETIFF for %s\n", ifr.ifr_name);
+		perror("TUNSETIFF");
+		goto error;
+	}
+
+	/* Always set the fiile descriptor to non-blocking */
+	if (fcntl(fd, F_SETFL, O_NONBLOCK) < 0) {
+		RTE_LOG(ERR, PMD, "Unable to set to nonblocking\n");
+		perror("F_SETFL, NONBLOCK");
+		goto error;
+	}
+
+	/* If the name is different that new name as default */
+	if (name && strcmp(name, ifr.ifr_name))
+		strcpy(name, ifr.ifr_name);
+
+	return fd;
+
+error:
+	if (fd > 0)
+		close(fd);
+	return -1;
+}
+
+/*
+ * Callback to handle the rx burst of packets to the correct interface and file
+ * descriptor(s) in a multi-queue setup.
+ */
+static uint16_t
+pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	int len, n;
+	struct rte_mbuf *mbuf;
+	struct rx_queue *rxq = queue;
+	struct pollfd pfd;
+	uint16_t num_rx;
+	unsigned long num_rx_bytes = 0;
+
+	pfd.events = POLLIN;
+	pfd.fd = rxq->fd;
+	for (num_rx = 0; num_rx < nb_pkts; ) {
+		n = poll(&pfd, 1, 0);
+
+		if (n <= 0)
+			break;
+
+		if (pfd.revents == 0)
+			continue;
+
+		if (pfd.revents & POLLERR) {
+			rxq->stats.errs++;
+			RTE_LOG(ERR, PMD, "Packet Error\n");
+			break;
+		}
+		if (pfd.revents & POLLHUP)
+			RTE_LOG(ERR, PMD, "Peer closed connection\n");
+
+		/* allocate the next mbuf */
+		mbuf = rte_pktmbuf_alloc(rxq->mp);
+		if (unlikely(mbuf == NULL)) {
+			RTE_LOG(ERR, PMD, "Unable to allocate mbuf\n");
+			break;
+		}
+
+		len = read(pfd.fd, rte_pktmbuf_mtod(mbuf, char *),
+			   rte_pktmbuf_tailroom(mbuf));
+		if (len <= 0) {
+			RTE_LOG(ERR, PMD, "len %d\n", len);
+			rte_pktmbuf_free(mbuf);
+			break;
+		}
+
+		mbuf->data_len = len;
+		mbuf->pkt_len = len;
+		mbuf->port = rxq->in_port;
+
+		/* account for the receive frame */
+		bufs[num_rx++] = mbuf;
+		num_rx_bytes += mbuf->pkt_len;
+	}
+	rxq->stats.ipackets += num_rx;
+	rxq->stats.ibytes += num_rx_bytes;
+
+	return num_rx;
+}
+
+/*
+ * Callback to handle sending packets from the tap interface
+ */
+static uint16_t
+pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct rte_mbuf *mbuf;
+	struct tx_queue *txq = queue;
+	struct pollfd pfd;
+	uint16_t num_tx = 0;
+	unsigned long num_tx_bytes = 0;
+	int i, n;
+
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	pfd.events = POLLOUT;
+	pfd.fd = txq->fd;
+	for (i = 0; i < nb_pkts; i++) {
+		n = poll(&pfd, 1, 0);
+
+		if (n <= 0)
+			break;
+
+		if (pfd.revents & POLLOUT) {
+			/* copy the tx frame data */
+			mbuf = bufs[num_tx];
+			n = write(pfd.fd, rte_pktmbuf_mtod(mbuf, void*),
+				  rte_pktmbuf_pkt_len(mbuf));
+			if (n <= 0)
+				break;
+
+			num_tx++;
+			num_tx_bytes += mbuf->pkt_len;
+			rte_pktmbuf_free(mbuf);
+		}
+	}
+
+	txq->stats.opackets += num_tx;
+	txq->stats.errs += nb_pkts - num_tx;
+	txq->stats.obytes += num_tx_bytes;
+
+	return num_tx;
+}
+
+static int
+tap_dev_start(struct rte_eth_dev *dev)
+{
+	/* Force the Link up */
+	dev->data->dev_link.link_status = ETH_LINK_UP;
+
+	return 0;
+}
+
+/*
+ * This function gets called when the current port gets stopped.
+ */
+static void
+tap_dev_stop(struct rte_eth_dev *dev)
+{
+	int i;
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	for (i = 0; i < internals->nb_queues; i++)
+		if (internals->fds[i] != -1)
+			close(internals->fds[i]);
+
+	dev->data->dev_link.link_status = ETH_LINK_DOWN;
+}
+
+static int
+tap_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	return 0;
+}
+
+static void
+tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+
+	dev_info->driver_name = drivername;
+	dev_info->if_index = internals->if_index;
+	dev_info->max_mac_addrs = 1;
+	dev_info->max_rx_pktlen = (uint32_t)ETHER_MAX_VLAN_FRAME_LEN;
+	dev_info->max_rx_queues = (uint16_t)internals->nb_queues;
+	dev_info->max_tx_queues = (uint16_t)internals->nb_queues;
+	dev_info->min_rx_bufsize = 0;
+	dev_info->pci_dev = NULL;
+}
+
+static void
+tap_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
+{
+	unsigned i, imax;
+	unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
+	unsigned long rx_bytes_total = 0, tx_bytes_total = 0;
+	const struct pmd_internals *internal = dev->data->dev_private;
+
+	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
+		internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+	for (i = 0; i < imax; i++) {
+		igb_stats->q_ipackets[i] = internal->rxq[i].stats.ipackets;
+		igb_stats->q_ibytes[i] = internal->rxq[i].stats.ibytes;
+		rx_total += igb_stats->q_ipackets[i];
+		rx_bytes_total += igb_stats->q_ibytes[i];
+	}
+
+	imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS) ?
+		internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+	for (i = 0; i < imax; i++) {
+		igb_stats->q_opackets[i] = internal->txq[i].stats.opackets;
+		igb_stats->q_errors[i] = internal->txq[i].stats.errs;
+		igb_stats->q_obytes[i] = internal->txq[i].stats.obytes;
+		tx_total += igb_stats->q_opackets[i];
+		tx_err_total += igb_stats->q_errors[i];
+		tx_bytes_total += igb_stats->q_obytes[i];
+	}
+
+	igb_stats->ipackets = rx_total;
+	igb_stats->ibytes = rx_bytes_total;
+	igb_stats->opackets = tx_total;
+	igb_stats->oerrors = tx_err_total;
+	igb_stats->obytes = tx_bytes_total;
+}
+
+static void
+tap_stats_reset(struct rte_eth_dev *dev)
+{
+	int i;
+	struct pmd_internals *internal = dev->data->dev_private;
+
+	for (i = 0; i < internal->nb_queues; i++) {
+		internal->rxq[i].stats.ipackets = 0;
+		internal->rxq[i].stats.ibytes = 0;
+	}
+
+	for (i = 0; i < internal->nb_queues; i++) {
+		internal->txq[i].stats.opackets = 0;
+		internal->txq[i].stats.errs = 0;
+		internal->txq[i].stats.obytes = 0;
+	}
+}
+
+static void
+tap_dev_close(struct rte_eth_dev *dev __rte_unused)
+{
+}
+
+static void
+tap_rx_queue_release(void *queue)
+{
+	struct rx_queue *rxq = queue;
+
+	if (rxq && (rxq->fd > 0)) {
+		close(rxq->fd);
+		rxq->fd = -1;
+	}
+}
+
+static void
+tap_tx_queue_release(void *queue)
+{
+	struct tx_queue *txq = queue;
+
+	if (txq && (txq->fd > 0)) {
+		close(txq->fd);
+		txq->fd = -1;
+	}
+}
+
+static int
+tap_link_update(struct rte_eth_dev *dev __rte_unused,
+		int wait_to_complete __rte_unused)
+{
+	return 0;
+}
+
+static int
+tap_setup_queue(struct rte_eth_dev *dev,
+		struct pmd_internals *internals,
+		uint16_t qid)
+{
+	struct rx_queue *rx = &internals->rxq[qid];
+	struct tx_queue *tx = &internals->txq[qid];
+	int fd;
+
+	if ((fd = rx->fd) < 0)
+		if ((fd = tx->fd) < 0) {
+			RTE_LOG(INFO, PMD, "Add queue to TAP %s for qid %d\n",
+				dev->data->name, qid);
+			if ((fd = tun_alloc(dev->data->name)) < 0) {
+				RTE_LOG(ERR, PMD, "tun_alloc(%s) failed\n", dev->data->name);
+				return -1;
+			}
+		}
+
+	dev->data->rx_queues[qid] = rx;
+	dev->data->tx_queues[qid] = tx;
+
+	rx->fd = tx->fd = fd;
+
+	return fd;
+}
+
+static int
+tap_rx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t rx_queue_id,
+		   uint16_t nb_rx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_rxconf *rx_conf __rte_unused,
+		   struct rte_mempool *mp)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	uint16_t buf_size;
+	int fd;
+
+	if ((rx_queue_id >= internals->nb_queues) || (mp == NULL)) {
+		RTE_LOG(ERR, PMD, "nb_queues %d mp %p\n", internals->nb_queues, mp);
+		return -1;
+	}
+
+	internals->rxq[rx_queue_id].mp = mp;
+	internals->rxq[rx_queue_id].in_port = dev->data->port_id;
+
+	/* Now get the space available for data in the mbuf */
+	buf_size = (uint16_t) (rte_pktmbuf_data_room_size(mp) -
+			       RTE_PKTMBUF_HEADROOM);
+
+	if (buf_size < ETH_FRAME_LEN) {
+		RTE_LOG(ERR, PMD,
+			"%s: %d bytes will not fit in mbuf (%d bytes)\n",
+			dev->data->name, ETH_FRAME_LEN, buf_size);
+		return -ENOMEM;
+	}
+
+	fd = tap_setup_queue(dev, internals, rx_queue_id);
+	if (fd == -1)
+		return -1;
+
+	internals->fds[rx_queue_id] = fd;
+	RTE_LOG(INFO, PMD, "RX TAP device name %s, qid %d on fd %d\n",
+		dev->data->name, rx_queue_id, internals->rxq[rx_queue_id].fd);
+
+	return 0;
+}
+
+static int
+tap_tx_queue_setup(struct rte_eth_dev *dev,
+		   uint16_t tx_queue_id,
+		   uint16_t nb_tx_desc __rte_unused,
+		   unsigned int socket_id __rte_unused,
+		   const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct pmd_internals *internals = dev->data->dev_private;
+	int ret = -1;
+
+	if (tx_queue_id >= internals->nb_queues)
+		return -1;
+
+	ret = tap_setup_queue(dev, internals, tx_queue_id);
+
+	RTE_LOG(INFO, PMD, "TX TAP device name %s, qid %d on fd %d\n",
+		dev->data->name, tx_queue_id, internals->txq[tx_queue_id].fd);
+
+	return ret;
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_start              = tap_dev_start,
+	.dev_stop               = tap_dev_stop,
+	.dev_close              = tap_dev_close,
+	.dev_configure          = tap_dev_configure,
+	.dev_infos_get          = tap_dev_info,
+	.rx_queue_setup         = tap_rx_queue_setup,
+	.tx_queue_setup         = tap_tx_queue_setup,
+	.rx_queue_release       = tap_rx_queue_release,
+	.tx_queue_release       = tap_tx_queue_release,
+	.link_update            = tap_link_update,
+	.stats_get              = tap_stats_get,
+	.stats_reset            = tap_stats_reset,
+};
+
+#define RTE_USE_GLOBAL_DATA	0x0000
+#define RTE_USE_PRIVATE_DATA	0x0001
+
+static int
+pmd_mac_address(int fd, struct rte_eth_dev *dev, struct ether_addr *addr)
+{
+	struct ifreq ifr;
+
+	if ((fd <= 0) || (dev == NULL) || (addr == NULL))
+		return -1;
+
+	memset(&ifr, 0, sizeof(ifr));
+
+	if (ioctl(fd, SIOCGIFHWADDR, &ifr) == -1) {
+		RTE_LOG(ERR, PMD, "ioctl failed (SIOCGIFHWADDR) (%s)\n",
+			ifr.ifr_name);
+		return -1;
+	}
+
+	/* Set the host based MAC address to this special MAC format */
+	ifr.ifr_hwaddr.sa_data[0] = 'T';
+	ifr.ifr_hwaddr.sa_data[1] = 'a';
+	ifr.ifr_hwaddr.sa_data[2] = 'p';
+	ifr.ifr_hwaddr.sa_data[3] = '-';
+	ifr.ifr_hwaddr.sa_data[4] = dev->data->port_id;
+	ifr.ifr_hwaddr.sa_data[5] = dev->data->numa_node;
+	if (ioctl(fd, SIOCSIFHWADDR, &ifr) == -1) {
+		RTE_LOG(ERR, PMD, "%s: ioctl failed (SIOCSIFHWADDR) (%s)\n",
+			dev->data->name, ifr.ifr_name);
+		return -1;
+	}
+
+	/*
+	 * Set the local application MAC address, needs to be different then
+	 * the host based MAC address.
+	 */
+	ifr.ifr_hwaddr.sa_data[0] = 'd';
+	ifr.ifr_hwaddr.sa_data[1] = 'n';
+	ifr.ifr_hwaddr.sa_data[2] = 'e';
+	ifr.ifr_hwaddr.sa_data[3] = 't';
+	ifr.ifr_hwaddr.sa_data[4] = dev->data->port_id;
+	ifr.ifr_hwaddr.sa_data[5] = dev->data->numa_node;
+	memcpy(addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN);
+
+	return 0;
+}
+
+static int
+rte_eth_dev_create(const char *name,
+		   struct rte_eth_dev **eth_dev,
+		   const struct eth_dev_ops *dev_ops,
+		   void **internals, size_t internal_size,
+		   uint16_t flag)
+{
+	char buff[RTE_ETH_NAME_MAX_LEN];
+	int numa_node = rte_socket_id();
+	struct rte_eth_dev *dev = NULL;
+	struct rte_eth_dev_data *data = NULL;
+	void *priv = NULL;
+
+	if ((name == NULL) || (eth_dev == NULL) || (dev_ops == NULL) ||
+	    (internals == NULL) || (internal_size == 0)) {
+		RTE_PMD_DEBUG_TRACE("Paramters are invalid\n");
+		return -1;
+	}
+
+	dev = rte_eth_dev_allocate(name);
+	if (dev == NULL) {
+		RTE_PMD_DEBUG_TRACE("%s: rte_eth_dev_allocate failed for %s\n",
+				    name, buff);
+		goto error;
+	}
+
+	if (flag & RTE_USE_PRIVATE_DATA) {
+		/*
+		 * now do all data allocation - for eth_dev structure, dummy
+		 * pci driver and internal (private) data
+		 */
+		snprintf(buff, sizeof(buff), "D-%s-%d", name, numa_node);
+		data = rte_zmalloc_socket(buff, sizeof(struct rte_eth_dev_data),
+					  0, numa_node);
+		if (data == NULL) {
+			RTE_PMD_DEBUG_TRACE("%s: Unable to allocate memory\n",
+					    name);
+			goto error;
+		}
+		/* move the current state of the structure to the new one */
+		rte_memcpy(data, dev->data, sizeof(struct rte_eth_dev_data));
+		dev->data = data;	/* Override the current data pointer */
+	} else
+		data = dev->data;
+
+	snprintf(buff, sizeof(buff), "I-%s-%d", name, numa_node);
+	priv = rte_zmalloc_socket(buff, internal_size, 0, numa_node);
+	if (priv == NULL) {
+		RTE_PMD_DEBUG_TRACE("Unable to allocate internal memory %lu\n",
+				    internal_size);
+		goto error;
+	}
+
+	/* Setup some default values */
+	dev->dev_ops = dev_ops;
+	data->dev_private = priv;
+	data->port_id = dev->data->port_id;
+	memmove(data->name, dev->data->name, strlen(dev->data->name));
+
+	dev->driver = NULL;
+	data->dev_flags = RTE_ETH_DEV_DETACHABLE;
+	data->kdrv = RTE_KDRV_NONE;
+	data->numa_node = numa_node;
+
+	*eth_dev = dev;
+	*internals = priv;
+
+	return 0;
+error:
+	rte_free(priv);
+
+	if (flag & RTE_USE_PRIVATE_DATA)
+		rte_free(data);
+
+	rte_eth_dev_release_port(dev);
+
+	return -1;
+}
+
+static int
+pmd_init_internals(const char *name, struct tap_info *tap,
+		   struct pmd_internals **internals,
+		   struct rte_eth_dev **eth_dev)
+{
+	struct rte_eth_dev *dev = NULL;
+	struct pmd_internals *internal = NULL;
+	struct rte_eth_dev_data *data = NULL;
+	int ret, i, fd = -1;
+
+	RTE_LOG(INFO, PMD,
+		"%s: Create TUN/TAP Ethernet device with %d queues on numa %u\n",
+		name, RTE_PMD_TAP_MAX_QUEUES, rte_socket_id());
+
+	pmd_link.link_speed = tap->speed;
+
+	ret = rte_eth_dev_create(tap->name, &dev, &ops,
+				 (void **)&internal, sizeof(struct pmd_internals),
+				 RTE_USE_PRIVATE_DATA);
+	if (ret < 0)
+		return -1;
+
+	strncpy(internal->name, tap->name, sizeof(internal->name));
+
+	internal->nb_queues = RTE_PMD_TAP_MAX_QUEUES;
+
+	/* Create the first Tap device */
+	if ((fd = tun_alloc(dev->data->name)) < 0) {
+		RTE_LOG(ERR, PMD, "tun_alloc(%s) failed\n", dev->data->name);
+		rte_free(internal);
+		rte_eth_dev_release_port(dev);
+		return -1;
+	}
+
+	/* Presetup the fds to -1 as being not working */
+	for(i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
+		internal->fds[i] = -1;
+		internal->rxq[i].fd = -1;
+		internal->txq[i].fd = -1;
+	}
+
+	/* Take the TUN/TAP fd and place in the first location */
+	internal->rxq[0].fd = fd;
+	internal->txq[0].fd = fd;
+	internal->fds[0] = fd;
+
+	if (pmd_mac_address(fd, dev, &internal->eth_addr) < 0) {
+		rte_free(internal);
+		rte_eth_dev_release_port(dev);
+		return -1;
+	}
+
+	data = dev->data;
+
+	data->dev_link = pmd_link;
+	data->mac_addrs = &internal->eth_addr;
+
+	data->nb_rx_queues = (uint16_t)internal->nb_queues;
+	data->nb_tx_queues = (uint16_t)internal->nb_queues;
+	data->drv_name = drivername;
+
+	*eth_dev = dev;
+	*internals = internal;
+
+	return 0;
+}
+
+static int
+eth_dev_tap_create(const char *name, struct tap_info *tap)
+{
+	struct pmd_internals *internals = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+
+	if (pmd_init_internals(name, tap, &internals, &eth_dev) < 0)
+		return -1;
+
+	eth_dev->rx_pkt_burst = pmd_rx_burst;
+	eth_dev->tx_pkt_burst = pmd_tx_burst;
+
+	return 0;
+}
+
+static int
+set_interface_name(const char *key __rte_unused,
+		   const char *value,
+		   void *extra_args)
+{
+	struct tap_info *tap = (struct tap_info *)extra_args;
+
+	if (value)
+		snprintf(tap->name, sizeof(tap->name), "%s", value);
+	else
+		snprintf(tap->name, sizeof(tap->name), "dtap%d", (tap_unit - 1));
+
+	return 0;
+}
+
+static int
+set_interface_speed(const char *key __rte_unused,
+		    const char *value,
+		    void *extra_args __rte_unused)
+{
+	struct tap_info *tap = (struct tap_info *)extra_args;
+
+	pmd_link.link_speed = (value) ? atoi(value) : ETH_SPEED_NUM_10G;
+	tap->speed = pmd_link.link_speed;
+
+	return 0;
+}
+
+/*
+ * Open a TAP interface device.
+ */
+static int
+rte_pmd_tap_devinit(const char *name, const char *params)
+{
+	int ret = 0;
+	struct rte_kvargs *kvlist;
+	struct tap_info tap_info;
+
+	/* Setup default values */
+	memset(&tap_info, 0, sizeof(tap_info));
+
+	tap_info.speed = ETH_SPEED_NUM_10G;
+	snprintf(tap_info.name, sizeof(tap_info.name), "dtap%d", tap_unit++);
+
+	if ((params == NULL) || (params[0] == '\0')) {
+		RTE_LOG(INFO, PMD, "Initializing pmd_tap for %s\n", name);
+
+		ret = eth_dev_tap_create(name, &tap_info);
+		goto leave;
+	}
+
+	RTE_LOG(INFO, PMD, "Initialize %s with params (%s)\n", name, params);
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (!kvlist) {
+		ret = eth_dev_tap_create(name, &tap_info);
+		goto leave;
+	}
+
+	if (rte_kvargs_count(kvlist, ETH_TAP_SPEED_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_TAP_SPEED_ARG,
+					 &set_interface_speed, &tap_info);
+		if (ret < 0)
+			goto leave;
+	} else
+		set_interface_speed(NULL, NULL, &tap_info);
+
+	if (rte_kvargs_count(kvlist, ETH_TAP_IFACE_ARG) == 1) {
+		ret = rte_kvargs_process(kvlist, ETH_TAP_IFACE_ARG,
+					 &set_interface_name, &tap_info);
+		if (ret < 0)
+			goto leave;
+	} else
+		set_interface_name(NULL, NULL, (void *)&tap_info);
+
+	rte_kvargs_free(kvlist);
+
+leave:
+	if (ret == -1)
+		RTE_LOG(INFO, PMD, "Failed to create pmd_tap for %s\n", name);
+
+	return ret;
+}
+
+/*
+ * detach a TAP device.
+ */
+static int
+rte_pmd_tap_devuninit(const char *name)
+{
+	struct rte_eth_dev *eth_dev = NULL;
+	struct pmd_internals *internals;
+	int i;
+
+	RTE_LOG(INFO, PMD, "Closing TUN/TAP Ethernet device on numa %u\n",
+		rte_socket_id());
+
+	if (name == NULL)
+		return 0;
+
+	/* find the ethdev entry */
+	eth_dev = rte_eth_dev_allocated(name);
+	if (eth_dev == NULL)
+		return 0;
+
+	internals = eth_dev->data->dev_private;
+	for (i = 0; i < internals->nb_queues; i++)
+		if (internals->fds[i] != -1)
+			close(internals->fds[i]);
+
+	rte_free(eth_dev->data->dev_private);
+	rte_free(eth_dev->data);
+
+	rte_eth_dev_release_port(eth_dev);
+
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_tap_drv = {
+	.init = rte_pmd_tap_devinit,
+	.uninit = rte_pmd_tap_devuninit,
+};
+
+DRIVER_REGISTER_VDEV(eth_tap, pmd_tap_drv);
+DRIVER_REGISTER_PARAM_STRING(eth_tap,
+			     "iface=<string>,speed=N");
diff --git a/drivers/net/tap/rte_pmd_tap_version.map b/drivers/net/tap/rte_pmd_tap_version.map
new file mode 100644
index 0000000..61463bf
--- /dev/null
+++ b/drivers/net/tap/rte_pmd_tap_version.map
@@ -0,0 +1,4 @@ 
+DPDK_16.11 {
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 1a0095b..bd1d10f 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -129,6 +129,7 @@  ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
 endif # $(CONFIG_RTE_LIBRTE_VHOST)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD)    += -lrte_pmd_vmxnet3_uio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP)        += -lrte_pmd_tap
 
 ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb