[dpdk-dev,v10,03/11] net/failsafe: add fail-safe PMD

Message ID 7390b7f14a1925ece0c55c6b1df8da358c725017.1500130634.git.gaetan.rivet@6wind.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Gaëtan Rivet July 15, 2017, 5:57 p.m. UTC
  Introduce the fail-safe poll mode driver initialization and enable its
build infrastructure.

This PMD allows for applications to benefit from true hot-plugging
support without having to implement it.

It intercepts and manages Ethernet device removal events issued by
slave PMDs and re-initializes them transparently when brought back.
It also allows defining a contingency to the removal of a device, by
designating a fail-over device that will take on transmitting operations
if the preferred device is removed.

Applications only see a fail-safe instance, without caring for
underlying activity ensuring their continued operations.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 MAINTAINERS                                       |   5 +
 config/common_base                                |   5 +
 doc/guides/nics/fail_safe.rst                     | 142 +++++
 doc/guides/nics/features/failsafe.ini             |  24 +
 doc/guides/nics/index.rst                         |   1 +
 drivers/net/Makefile                              |   2 +
 drivers/net/failsafe/Makefile                     |  66 +++
 drivers/net/failsafe/failsafe.c                   | 232 ++++++++
 drivers/net/failsafe/failsafe_args.c              | 331 +++++++++++
 drivers/net/failsafe/failsafe_eal.c               | 138 +++++
 drivers/net/failsafe/failsafe_ops.c               | 664 ++++++++++++++++++++++
 drivers/net/failsafe/failsafe_private.h           | 210 +++++++
 drivers/net/failsafe/failsafe_rxtx.c              | 107 ++++
 drivers/net/failsafe/rte_pmd_failsafe_version.map |   4 +
 mk/rte.app.mk                                     |   1 +
 15 files changed, 1932 insertions(+)
 create mode 100644 doc/guides/nics/fail_safe.rst
 create mode 100644 doc/guides/nics/features/failsafe.ini
 create mode 100644 drivers/net/failsafe/Makefile
 create mode 100644 drivers/net/failsafe/failsafe.c
 create mode 100644 drivers/net/failsafe/failsafe_args.c
 create mode 100644 drivers/net/failsafe/failsafe_eal.c
 create mode 100644 drivers/net/failsafe/failsafe_ops.c
 create mode 100644 drivers/net/failsafe/failsafe_private.h
 create mode 100644 drivers/net/failsafe/failsafe_rxtx.c
 create mode 100644 drivers/net/failsafe/rte_pmd_failsafe_version.map
  

Comments

Thomas Monjalon July 16, 2017, 3:58 p.m. UTC | #1
Hi Gaetan,

15/07/2017 19:57, Gaetan Rivet:
> +#. Start testpmd. The slave device should be blacklisted from normal EAL
> +   operations to avoid probing it twice when in PCI blacklist mode.
> +
> +   .. code-block:: console
> +
> +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> +         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> +         -b 84:00.0 -b 00:04.0 -- -i

It is weird to use -w to declare the failsafe device.
And I think it does not work with -w.
Should it be changed to --vdev?
  
Gaëtan Rivet July 16, 2017, 8 p.m. UTC | #2
On Sun, Jul 16, 2017 at 05:58:13PM +0200, Thomas Monjalon wrote:
> Hi Gaetan,
> 
> 15/07/2017 19:57, Gaetan Rivet:
> > +#. Start testpmd. The slave device should be blacklisted from normal EAL
> > +   operations to avoid probing it twice when in PCI blacklist mode.
> > +
> > +   .. code-block:: console
> > +
> > +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> > +         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> > +         -b 84:00.0 -b 00:04.0 -- -i
> 
> It is weird to use -w to declare the failsafe device.
> And I think it does not work with -w.
> Should it be changed to --vdev?

It did work before [1], and it was a way to showcase the new format, but
with [1] applied, then it should come back to --vdev indeed.

[1]: http://dpdk.org/ml/archives/dev/2017-July/071361.html
  
Ferruh Yigit July 17, 2017, 1:56 p.m. UTC | #3
On 7/15/2017 6:57 PM, Gaetan Rivet wrote:
> Introduce the fail-safe poll mode driver initialization and enable its
> build infrastructure.
> 
> This PMD allows for applications to benefit from true hot-plugging
> support without having to implement it.
> 
> It intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back.
> It also allows defining a contingency to the removal of a device, by
> designating a fail-over device that will take on transmitting operations
> if the preferred device is removed.
> 
> Applications only see a fail-safe instance, without caring for
> underlying activity ensuring their continued operations.

All PMD in a single patch is hard to review, I am sure some details
missed during the review, but taking account the histroy of the PMD I
accept this as it is, but I will rely on your support to fix issues in
the future.

> 
> Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
> Acked-by: Olga Shern <olgas@mellanox.com>

<...>

> +Usage example
> +~~~~~~~~~~~~~
> +
> +This section shows some example of using **testpmd** with a fail-safe PMD.
> +
> +#. Request huge pages:
> +
> +   .. code-block:: console
> +
> +      echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

I think this is extra for usage sample, if you want there is generic
guide [1] that you ca reference.

[1]
http://dpdk.org/doc/guides/nics/build_and_test.html

> +
> +#. Start testpmd. The slave device should be blacklisted from normal EAL
> +   operations to avoid probing it twice when in PCI blacklist mode.
> +
> +   .. code-block:: console
> +
> +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> +         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> +         -b 84:00.0 -b 00:04.0 -- -i

Do you think does it make sense to stress sub-device shouldn't be probed
by EAL, I believe it is not clear above.

> +
> +   Note that PCI blacklist mode is the default PCI operating mode. In this
> +   configuration, the fail-safe cannot proceed with its slaves if they have
> +   been probed beforehand.
> +
> +#. Alternatively, it can be used alongside any other device in whitelist mode.
> +
> +   .. code-block:: console
> +
> +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
> +         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
> +         -w 81:00.0 -- -i
> +
<...>

> +[Features]
> +Link status          = Y

> +MTU update           = Y
> +Promiscuous mode     = Y
> +Allmulticast mode    = Y

> +VLAN filter          = Y
> +Packet type parsing  = Y

I am not sure how to document some of these features, because they
depends on sub-device capability. I guess if sub-device doesn't support
packet type parsing, this feature won't be supported?

> +Basic stats          = Y

> +Stats per queue      = Y
> +Unicast MAC filter   = Y
> +Queue start/stop     = Y
> +Jumbo frame          = Y
> +Multicast MAC filter = Y

Is above ones supported by PMD, I don't see them unless I miss something.

+ "Flow Control" seems supported.

<...>

> +# This lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf

DEPDIRS-y no more used, can be removed.

> +
> +# Basic CFLAGS:
> +CFLAGS += -std=gnu99 -Wall -Wextra

-Wall should be coming from $(WERROR_FLAGS), no need to add here.

And are you sure about gnu99, mlx drivers tends to enforce a standard
and they updated to c11, do you want to do same here.

> +CFLAGS += -O3
> +CFLAGS += -I.
> +CFLAGS += -D_DEFAULT_SOURCE
> +CFLAGS += -D_XOPEN_SOURCE=700

Is there a reason for these variables, or are these copy-paste?

> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -Wno-strict-prototypes
> +CFLAGS += -pedantic -DPEDANTIC

Again, just question, is pedantic mode intentional, or copy-paste?

<...>

> +static int
> +fs_eth_dev_create(struct rte_vdev_device *vdev)
> +{
> +	struct rte_eth_dev *dev;
> +	struct ether_addr *mac;
> +	struct fs_priv *priv;
> +	struct sub_device *sdev;
> +	const char *params;
> +	unsigned int socket_id;
> +	uint8_t i;
> +	int ret;
> +
> +	dev = NULL;
> +	priv = NULL;
> +	params = rte_vdev_device_args(vdev);
> +	socket_id = rte_socket_id();
> +	INFO("Creating fail-safe device on NUMA socket %u",
> +	     socket_id);

No line break required.

> +	dev = rte_eth_vdev_allocate(vdev, sizeof(*priv));
> +	if (dev == NULL) {
> +		ERROR("Unable to allocate rte_eth_dev");
> +		return -1;
> +	}
> +	priv = dev->data->dev_private;
> +	PRIV(dev)->dev = dev;

Altough this is valid, what about?

priv = PRIV(dev);
priv->dev = dev;

> +	dev->dev_ops = &failsafe_ops;
> +	TAILQ_INIT(&dev->link_intr_cbs);

Not required, rte_eth_dev_allocate() initializes this.

> +	dev->data->dev_flags = 0x0;

Not required to set zero, dev->data already memset to zero.

> +	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
> +	dev->data->dev_link = eth_link;
> +	PRIV(dev)->nb_mac_addr = 1;
> +	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
> +	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
> +	if (params == NULL) {

I would prefer input control as first thing in the function, before
allocating the device.

> +		ERROR("This PMD requires sub-devices, none provided");
> +		goto free_dev;
> +	}
> +	ret = fs_sub_device_create(dev, params);

This function looks like just allocates memory for sub devices, does it
make sense to rename it as fs_sub_device_alloc()?

> +	if (ret) {
> +		ERROR("Could not allocate sub_devices");
> +		goto free_dev;
> +	}
<...>

> +free_args:
> +	failsafe_args_free(dev);
> +free_subs:
> +	fs_sub_device_free(dev);
> +free_dev:
> +	rte_eth_dev_release_port(dev);

Device private data should be freed.

> +	return -1;
> +}
<...>

> +static int
> +rte_pmd_failsafe_probe(struct rte_vdev_device *vdev)
> +{
> +	const char *name;
> +
> +	name = rte_vdev_device_name(vdev);
> +	if (vdev == NULL)
> +		return -EINVAL;

I think you don't need this check, if name is NULL, probe shouldn't be
called, same for remove().

> +	INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
> +			name);

Line break not required.

<...>

> +RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
> +RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);

I belive alias is not required for new PMDs, this is for backward
compability for old drivers.


<...>

> +int
> +failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
> +{
> +	struct fs_priv *priv;
> +	char mut_params[DEVARGS_MAXLEN] = "";

Out of curiosity, what does "mut" stands for?

> +	struct rte_kvargs *kvlist = NULL;
> +	unsigned int arg_count;
> +	size_t n;
> +	int ret;
> +
> +	if (dev == NULL || params == NULL)
> +		return -EINVAL;

This check looks like redundant.

> +	priv = PRIV(dev);
> +	ret = 0;
> +	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
> +	/* default parameters */
> +	mac_from_arg = 0;

This is global value, I believe it is better to set default value where
variable defined with a comment. Here is easy to miss the default value.

> +	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
> +	if (n >= sizeof(mut_params)) {
> +		ERROR("Parameter string too long (>=%zu)",
> +				sizeof(mut_params));
> +		return -ENOMEM;
> +	}
> +	ret = fs_parse_sub_devices(fs_parse_device_param,
> +				   dev, params);

Why the device argument is not defined as dev=xxx, instead of current
dev(xxx).

"dev=xxx" will be compatible with rest of the argument usage, and it
will be possible to use kvargs to parse it, which will make this code
simpler I believe.

What is the reason of using different syntax?

> +	if (ret < 0)
> +		return ret;
> +	ret = fs_remove_sub_devices_definition(mut_params);
> +	if (ret < 0)
> +		return ret;
> +	if (strnlen(mut_params, sizeof(mut_params)) > 0) {
> +		kvlist = rte_kvargs_parse(mut_params,
> +				pmd_failsafe_init_parameters);
> +		if (kvlist == NULL) {
> +			ERROR("Error parsing parameters, usage:\n"
> +				PMD_FAILSAFE_PARAM_STRING);
> +			return -1;
> +		}
> +		/* MAC addr */
> +		arg_count = rte_kvargs_count(kvlist,
> +				PMD_FAILSAFE_MAC_KVARG);
> +		if (arg_count == 1) {
> +			ret = rte_kvargs_process(kvlist,
> +					PMD_FAILSAFE_MAC_KVARG,
> +					&fs_get_mac_addr_arg,
> +					&dev->data->mac_addrs[0]);
> +			if (ret < 0)
> +				goto free_kvlist;
> +			mac_from_arg = 1;
> +		}

Is ignoring the case mac defined more than once intentional?

> +	}
> +free_kvlist:
> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +

<...>

> +static int
> +fs_count_device(struct rte_eth_dev *dev, const char *param,
> +		uint8_t head __rte_unused)
> +{
> +	size_t b = 0;
> +
> +	while  (param[b] != '(' &&
> +		param[b] != '\0')
> +		b++;
> +	if (strncmp(param, "dev", b) &&
> +	    strncmp(param, "exec", b)) {

I believe param "exec" will be introduced in further patches?

> +		ERROR("Unrecognized device type: %.*s", (int)b, param);
> +		return -EINVAL;
> +	}
> +	PRIV(dev)->subs_tail += 1;
> +	return 0;
> +}
> +

<...>

> +static int
> +fs_bus_init(struct rte_eth_dev *dev)
> +{
> +	struct sub_device *sdev;
> +	struct rte_devargs *da;
> +	uint8_t i;
> +	int ret;
> +
> +	FOREACH_SUBDEV(sdev, i, dev) {

Can FOREACH_SUBDEV_ST(..., DEV_PARSED) be used here?

And what do you think renaming "FOREACH_SUBDEV_ST" to
"FOREACH_SUBDEV_STATE"?

> +		if (sdev->state != DEV_PARSED)
> +			continue;
> +		da = &sdev->devargs;
> +		ret = rte_eal_hotplug_add(da->bus->name,
> +					  da->name,
> +					  da->args);

<...>

> +	/*
> +	 * We only update TX_SUBDEV if we are not started.
> +	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
> +	 * preferred port only upon starting it, so that the switch is smoother.
> +	 */
> +	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) 

Can you please document concept of the "prefered sub device" in
documentation?

> +		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
> +		    (TX_SUBDEV(dev) == NULL ||
> +		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
> +			DEBUG("Switching tx_dev to preferred sub_device");
> +			PRIV(dev)->subs_tx = 0;
> +		}

<...>

> +static struct rte_eth_dev_info default_infos = {
> +	.driver_name = pmd_failsafe_driver_name,

This should be dev->device->driver->name, but already overwriiten by
rte_eth_dev_info_get() so you can drop this.

> +	/* Max possible number of elements */

<...>

> +	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +		DEBUG("Closing sub_device %d", i);
> +		rte_eth_dev_close(PORT_ID(sdev));
> +		sdev->state = DEV_ACTIVE - 1;

Should it be better to set state to DEV_PROBED? Instead of calculation.

> +	}
> +	fs_dev_free_queues(dev);
> +}
> +

<...>

> +static void
> +fs_stats_get(struct rte_eth_dev *dev,
> +	     struct rte_eth_stats *stats)
> +{
> +	memset(stats, 0, sizeof(*stats));

memset not required, done by API

> +	if (TX_SUBDEV(dev) == NULL)
> +		return;
> +	rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> +}
> +

<...>

> +		sdev = TX_SUBDEV(dev);
> +		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
> +		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;

Is intention &= ?

> +		PRIV(dev)->infos.tx_offload_capa &=
> +					default_infos.tx_offload_capa;
> +		PRIV(dev)->infos.flow_type_rss_offloads &=
> +					default_infos.flow_type_rss_offloads;
> +	}
> +	rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
> +}

<...>

> +	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
> +		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
> +		if (ret) {
> +			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
> +			      " with error %d", i, ret);

You can prefer to not break the log message.

> +			return ret;
> +		}
> +	}
> +	return 0;
> +}

<...>

> +
> +#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000

Is this related to next patches in the set?

<...>

> +enum dev_state {
> +	DEV_UNDEFINED = 0,

Setting value not required.

> +	DEV_PARSED,
> +	DEV_PROBED,
> +	DEV_ACTIVE,
> +	DEV_STARTED,
> +};

<...>
  
Gaëtan Rivet July 17, 2017, 5:11 p.m. UTC | #4
On Mon, Jul 17, 2017 at 02:56:54PM +0100, Ferruh Yigit wrote:
> On 7/15/2017 6:57 PM, Gaetan Rivet wrote:
> > Introduce the fail-safe poll mode driver initialization and enable its
> > build infrastructure.
> > 
> > This PMD allows for applications to benefit from true hot-plugging
> > support without having to implement it.
> > 
> > It intercepts and manages Ethernet device removal events issued by
> > slave PMDs and re-initializes them transparently when brought back.
> > It also allows defining a contingency to the removal of a device, by
> > designating a fail-over device that will take on transmitting operations
> > if the preferred device is removed.
> > 
> > Applications only see a fail-safe instance, without caring for
> > underlying activity ensuring their continued operations.
> 
> All PMD in a single patch is hard to review, I am sure some details
> missed during the review, but taking account the histroy of the PMD I
> accept this as it is, but I will rely on your support to fix issues in
> the future.
> 

Sure, sorry for having this one first big patch.
I thought about having a skeleton patch first, but found it made little
sense. I tried to restrict this version to the bare functionalities,
adding the others afterward.

I will fix any issues. From what I've seen I agree with almost all of your
remarks and will send a new version shortly. In the meantime, I will answer
in this email a few clarifying questions.

<...>

> > +VLAN filter          = Y
> > +Packet type parsing  = Y
> 
> I am not sure how to document some of these features, because they
> depends on sub-device capability. I guess if sub-device doesn't support
> packet type parsing, this feature won't be supported?
> 

Yes, supporting a feature for the fail-safe means that there is some
verification and synchronization code related to this feature. All
sub_device should have feature parity, and the features of the fail-safe are limited
to those of the sub_devices.

I thought advertizing the support made sense as there was some code
related to it in the fail-safe.

> > +int
> > +failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
> > +{
> > +	struct fs_priv *priv;
> > +	char mut_params[DEVARGS_MAXLEN] = "";
> 
> Out of curiosity, what does "mut" stands for?
> 

This is the mutable version of params.

<...>

> > +	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
> > +	if (n >= sizeof(mut_params)) {
> > +		ERROR("Parameter string too long (>=%zu)",
> > +				sizeof(mut_params));
> > +		return -ENOMEM;
> > +	}
> > +	ret = fs_parse_sub_devices(fs_parse_device_param,
> > +				   dev, params);
> 
> Why the device argument is not defined as dev=xxx, instead of current
> dev(xxx).
> 
> "dev=xxx" will be compatible with rest of the argument usage, and it
> will be possible to use kvargs to parse it, which will make this code
> simpler I believe.
> 
> What is the reason of using different syntax?
> 

Using the dev() syntax allows the user to explicitly set the limits of
the sub_device declaration, clarifying for which device each kvargs is.

The issue is that the kvargs library does not allow to set state
informations in the parser depending on the position in the kvlist. An
alternative would have been for example to restrict the kvargs to that
of the last declared dev=, however, this means multi-stage kvargs
parsing, which mean pre-processing of the parameter list, etc...

An example:

net_failsafe0,dev=net_tap0,iface=tap0,mac=00:01:02:03:04:05
net_failsafe0,dev(net_tap0,iface=tap0),mac=00:01:02:03:04:05

This is much simpler to parse this way, and much clearer I think
for users.

The kvargs library was not designed with recursive PMDs in mind.

<...>

> > +static int
> > +fs_bus_init(struct rte_eth_dev *dev)
> > +{
> > +	struct sub_device *sdev;
> > +	struct rte_devargs *da;
> > +	uint8_t i;
> > +	int ret;
> > +
> > +	FOREACH_SUBDEV(sdev, i, dev) {
> 
> Can FOREACH_SUBDEV_ST(..., DEV_PARSED) be used here?
> 

I could use it, this would restrict the iteration only to sub_devices
being at least of the state DEV_PARSED. However, in the check just
below:

+		if (sdev->state != DEV_PARSED)
+			continue;

I would have to pass on any device being in a state higher than
DEV_PARSED. Thus, using FOREACH_SUBDEV_ST would not simplify the code
flow. By using FOREACH_SUBDEV() directly, the reader at least has a
simpler parsing to do of my intent:

foreach subdev not "parsed".

> And what do you think renaming "FOREACH_SUBDEV_ST" to
> "FOREACH_SUBDEV_STATE"?
> 

Sure, I pushed for brievity but it might be easier to read.

<...>

> > +	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> > +		DEBUG("Closing sub_device %d", i);
> > +		rte_eth_dev_close(PORT_ID(sdev));
> > +		sdev->state = DEV_ACTIVE - 1;
> 
> Should it be better to set state to DEV_PROBED? Instead of calculation.
> 

I wanted to be able to add / remove device states without having to
rewrite each of those state changes (there are a few in several places).
If I insert a new device state between ACTIVE and PROBED, setting to
DEV_PROBED would still be valid (no compile error), but it would be a
bug. It would be very easy to miss a reference to this specific state.

Those states bugs are a little hard to find at runtime, they usually
have subtle side-effects.

I can change it if you prefer, but I would probably introduce a helper
in the form of fs_dev_state_prev/next, thus having a single place to
check for any changes.
  
Gaëtan Rivet July 17, 2017, 11:17 p.m. UTC | #5
On Mon, Jul 17, 2017 at 02:56:54PM +0100, Ferruh Yigit wrote:

<...>

> > +Stats per queue      = Y
> > +Unicast MAC filter   = Y
> > +Queue start/stop     = Y
> > +Jumbo frame          = Y
> > +Multicast MAC filter = Y
> 
> Is above ones supported by PMD, I don't see them unless I miss something.
> 

Queue start/stop was an error.
All others are supported as long as the slave support it.

<...>

> > +		sdev = TX_SUBDEV(dev);
> > +		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
> > +		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
> 
> Is intention &= ?
> 

rx_offload_capa is already set a little higher, and then an AND is done
on it with all slaves. The "=" is correct here. Thanks for asking
though, it's always useful to check :).

---

I fixed all other remarks. I will wait a little for any possible
additional changes you might want to point before sending the new
version.
  
Ferruh Yigit July 18, 2017, 10:13 a.m. UTC | #6
On 7/18/2017 12:17 AM, Gaëtan Rivet wrote:
<...>

> 
> I fixed all other remarks. I will wait a little for any possible
> additional changes you might want to point before sending the new
> version.

OK, there were some patches from Thomas addition to failsafe patchset,
are they all clarified, merged or rejected for this patchset?
  
Gaëtan Rivet July 18, 2017, 11:01 a.m. UTC | #7
On Tue, Jul 18, 2017 at 11:13:34AM +0100, Ferruh Yigit wrote:
> On 7/18/2017 12:17 AM, Gaëtan Rivet wrote:
> <...>
> 
> > 
> > I fixed all other remarks. I will wait a little for any possible
> > additional changes you might want to point before sending the new
> > version.
> 
> OK, there were some patches from Thomas addition to failsafe patchset,
> are they all clarified, merged or rejected for this patchset?

The patch he sent last week is postponed. It does not work as it is, and
the fix requires additional API / dev. We will look into it next
release.
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 368973a..294b8b0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -341,6 +341,11 @@  F: drivers/net/enic/
 F: doc/guides/nics/enic.rst
 F: doc/guides/nics/features/enic.ini
 
+Fail-safe PMD
+M: Gaetan Rivet <gaetan.rivet@6wind.com>
+F: drivers/net/failsafe/
+F: doc/guides/nics/fail_safe.rst
+
 Intel e1000
 M: Wenzhuo Lu <wenzhuo.lu@intel.com>
 F: drivers/net/e1000/
diff --git a/config/common_base b/config/common_base
index 8ae6e92..7805605 100644
--- a/config/common_base
+++ b/config/common_base
@@ -420,6 +420,11 @@  CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
 CONFIG_RTE_LIBRTE_PMD_NULL=y
 
 #
+# Compile fail-safe PMD
+#
+CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
new file mode 100644
index 0000000..5c8a93b
--- /dev/null
+++ b/doc/guides/nics/fail_safe.rst
@@ -0,0 +1,142 @@ 
+..  BSD LICENSE
+    Copyright 2017 6WIND S.A.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of 6WIND S.A. nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Fail-safe poll mode driver library
+==================================
+
+The Fail-safe poll mode driver library (**librte_pmd_failsafe**) is a virtual
+device that allows using any device supporting hotplug (sudden device removal
+and plugging on its bus), without modifying other components relying on such
+device (application, other PMDs).
+
+Additionally to the Seamless Hotplug feature, the Fail-safe PMD offers the
+ability to redirect operations to secondary devices when the primary has been
+removed from the system.
+
+.. note::
+
+   The library is enabled by default. You can enable it or disable it manually
+   by setting the ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` configuration option.
+
+Features
+--------
+
+The Fail-safe PMD only supports a limited set of features. If you plan to use a
+device underneath the Fail-safe PMD with a specific feature, this feature must
+be supported by the Fail-safe PMD to avoid throwing any error.
+
+Check the feature matrix for the complete set of supported features.
+
+Compilation option
+------------------
+
+This option can be modified in the ``$RTE_TARGET/build/.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` (default **y**)
+
+  Toggle compiling librte_pmd_failsafe.
+
+Using the Fail-safe PMD from the EAL command line
+-------------------------------------------------
+
+The Fail-safe PMD can be used like most other DPDK virtual devices, by passing a
+``--vdev`` parameter to the EAL when starting the application. The device name
+must start with the *net_failsafe* prefix, followed by numbers or letters. This
+name must be unique for each device. Each fail-safe instance must have at least one
+sub-device, up to ``RTE_MAX_ETHPORTS-1``.
+
+A sub-device can be any legal DPDK device, including possibly another fail-safe
+instance.
+
+Fail-safe command line parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- **dev(<iface>)** parameter
+
+  This parameter allows the user to define a sub-device. The ``<iface>`` part of
+  this parameter must be a valid device definition. It could be the argument
+  provided to any ``-w`` device specification or the argument that would be
+  given to a ``--vdev`` parameter (including a fail-safe).
+  Enclosing the device definition within parenthesis here allows using
+  additional sub-device parameters if need be. They will be passed on to the
+  sub-device.
+
+- **mac** parameter [MAC address]
+
+  This parameter allows the user to set a default MAC address to the fail-safe
+  and all of its sub-devices.
+  If no default mac address is provided, the fail-safe PMD will read the MAC
+  address of the first of its sub-device to be successfully probed and use it as
+  its default MAC address, trying to set it to all of its other sub-devices.
+  If no sub-device was successfully probed at initialization, then a random MAC
+  address is generated, that will be subsequently applied to all sub-device once
+  they are probed.
+
+Usage example
+~~~~~~~~~~~~~
+
+This section shows some example of using **testpmd** with a fail-safe PMD.
+
+#. Request huge pages:
+
+   .. code-block:: console
+
+      echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Start testpmd. The slave device should be blacklisted from normal EAL
+   operations to avoid probing it twice when in PCI blacklist mode.
+
+   .. code-block:: console
+
+      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
+         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
+         -b 84:00.0 -b 00:04.0 -- -i
+
+   Note that PCI blacklist mode is the default PCI operating mode. In this
+   configuration, the fail-safe cannot proceed with its slaves if they have
+   been probed beforehand.
+
+#. Alternatively, it can be used alongside any other device in whitelist mode.
+
+   .. code-block:: console
+
+      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
+         -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)'
+         -w 81:00.0 -- -i
+
+Using the Fail-safe PMD from an application
+-------------------------------------------
+
+This driver strives to be as seamless as possible to existing applications, in
+order to propose the hotplug functionality in the easiest way possible.
+
+Care must be taken, however, to respect the **ether** API concerning device
+access, and in particular, using the ``RTE_ETH_FOREACH_DEV`` macro to iterate
+over ethernet devices, instead of directly accessing them or by writing one's
+own device iterator.
diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
new file mode 100644
index 0000000..3c52823
--- /dev/null
+++ b/doc/guides/nics/features/failsafe.ini
@@ -0,0 +1,24 @@ 
+;
+; Supported features of the 'fail-safe' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status          = Y
+Queue start/stop     = Y
+MTU update           = Y
+Jumbo frame          = Y
+Promiscuous mode     = Y
+Allmulticast mode    = Y
+Unicast MAC filter   = Y
+Multicast MAC filter = Y
+VLAN filter          = Y
+Packet type parsing  = Y
+Basic stats          = Y
+Stats per queue      = Y
+ARMv7                = Y
+ARMv8                = Y
+Power8               = Y
+x86-32               = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 240d082..17eaaf4 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -64,6 +64,7 @@  Network Interface Controller Drivers
     vhost
     vmxnet3
     pcap_ring
+    fail_safe
 
 **Figures**
 
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 35ed813..d33c959 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -59,6 +59,8 @@  DIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena
 DEPDIRS-ena = $(core-libs)
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
 DEPDIRS-enic = $(core-libs) librte_hash
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe
+DEPDIRS-failsafe = $(core-libs)
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
 DEPDIRS-fm10k = $(core-libs) librte_hash
 DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
new file mode 100644
index 0000000..2b5e5f8
--- /dev/null
+++ b/drivers/net/failsafe/Makefile
@@ -0,0 +1,66 @@ 
+#   BSD LICENSE
+#
+#   Copyright 2017 6WIND S.A.
+#   Copyright 2017 Mellanox.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of 6WIND S.A. nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# Library name
+LIB = librte_pmd_failsafe.a
+
+EXPORT_MAP := rte_pmd_failsafe_version.map
+
+LIBABIVER := 1
+
+# Sources are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_args.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
+
+# No exported include files
+
+# This lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf
+
+# Basic CFLAGS:
+CFLAGS += -std=gnu99 -Wall -Wextra
+CFLAGS += -O3
+CFLAGS += -I.
+CFLAGS += -D_DEFAULT_SOURCE
+CFLAGS += -D_XOPEN_SOURCE=700
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -Wno-strict-prototypes
+CFLAGS += -pedantic -DPEDANTIC
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
new file mode 100644
index 0000000..0fb09c9
--- /dev/null
+++ b/drivers/net/failsafe/failsafe.c
@@ -0,0 +1,232 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_alarm.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "failsafe_private.h"
+
+const char pmd_failsafe_driver_name[] = FAILSAFE_DRIVER_NAME;
+static const struct rte_eth_link eth_link = {
+	.link_speed = ETH_SPEED_NUM_10G,
+	.link_duplex = ETH_LINK_FULL_DUPLEX,
+	.link_status = ETH_LINK_UP,
+	.link_autoneg = ETH_LINK_SPEED_AUTONEG,
+};
+
+static int
+fs_sub_device_create(struct rte_eth_dev *dev,
+		const char *params)
+{
+	uint8_t nb_subs;
+	int ret;
+
+	ret = failsafe_args_count_subdevice(dev, params);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->subs_tail > FAILSAFE_MAX_ETHPORTS) {
+		ERROR("Cannot allocate more than %d ports",
+			FAILSAFE_MAX_ETHPORTS);
+		return -ENOSPC;
+	}
+	nb_subs = PRIV(dev)->subs_tail;
+	PRIV(dev)->subs = rte_zmalloc(NULL,
+			sizeof(struct sub_device) * nb_subs,
+			RTE_CACHE_LINE_SIZE);
+	if (PRIV(dev)->subs == NULL) {
+		ERROR("Could not allocate sub_devices");
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static void
+fs_sub_device_free(struct rte_eth_dev *dev)
+{
+	rte_free(PRIV(dev)->subs);
+}
+
+static int
+fs_eth_dev_create(struct rte_vdev_device *vdev)
+{
+	struct rte_eth_dev *dev;
+	struct ether_addr *mac;
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	const char *params;
+	unsigned int socket_id;
+	uint8_t i;
+	int ret;
+
+	dev = NULL;
+	priv = NULL;
+	params = rte_vdev_device_args(vdev);
+	socket_id = rte_socket_id();
+	INFO("Creating fail-safe device on NUMA socket %u",
+	     socket_id);
+	dev = rte_eth_vdev_allocate(vdev, sizeof(*priv));
+	if (dev == NULL) {
+		ERROR("Unable to allocate rte_eth_dev");
+		return -1;
+	}
+	priv = dev->data->dev_private;
+	PRIV(dev)->dev = dev;
+	dev->dev_ops = &failsafe_ops;
+	TAILQ_INIT(&dev->link_intr_cbs);
+	dev->data->dev_flags = 0x0;
+	dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
+	dev->data->dev_link = eth_link;
+	PRIV(dev)->nb_mac_addr = 1;
+	dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
+	dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
+	if (params == NULL) {
+		ERROR("This PMD requires sub-devices, none provided");
+		goto free_dev;
+	}
+	ret = fs_sub_device_create(dev, params);
+	if (ret) {
+		ERROR("Could not allocate sub_devices");
+		goto free_dev;
+	}
+	ret = failsafe_args_parse(dev, params);
+	if (ret)
+		goto free_subs;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		goto free_args;
+	mac = &dev->data->mac_addrs[0];
+	if (mac_from_arg) {
+		/*
+		 * If MAC address was provided as a parameter,
+		 * apply to all probed slaves.
+		 */
+		FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+			ret = rte_eth_dev_default_mac_addr_set(PORT_ID(sdev),
+							       mac);
+			if (ret) {
+				ERROR("Failed to set default MAC address");
+				goto free_args;
+			}
+		}
+	} else {
+		/*
+		 * Use the ether_addr from first probed
+		 * device, either preferred or fallback.
+		 */
+		FOREACH_SUBDEV(sdev, i, dev)
+			if (sdev->state >= DEV_PROBED) {
+				ether_addr_copy(&ETH(sdev)->data->mac_addrs[0],
+						mac);
+				break;
+			}
+		/*
+		 * If no device has been probed and no ether_addr
+		 * has been provided on the command line, use a random
+		 * valid one.
+		 * It will be applied during future slave state syncs to
+		 * probed slaves.
+		 */
+		if (i == priv->subs_tail)
+			eth_random_addr(&mac->addr_bytes[0]);
+	}
+	INFO("MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
+		mac->addr_bytes[0], mac->addr_bytes[1],
+		mac->addr_bytes[2], mac->addr_bytes[3],
+		mac->addr_bytes[4], mac->addr_bytes[5]);
+	return 0;
+free_args:
+	failsafe_args_free(dev);
+free_subs:
+	fs_sub_device_free(dev);
+free_dev:
+	rte_eth_dev_release_port(dev);
+	return -1;
+}
+
+static int
+fs_rte_eth_free(const char *name)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	dev = rte_eth_dev_allocated(name);
+	if (dev == NULL)
+		return -ENODEV;
+	ret = failsafe_eal_uninit(dev);
+	if (ret)
+		ERROR("Error while uninitializing sub-EAL");
+	failsafe_args_free(dev);
+	fs_sub_device_free(dev);
+	rte_free(PRIV(dev));
+	rte_eth_dev_release_port(dev);
+	return ret;
+}
+
+static int
+rte_pmd_failsafe_probe(struct rte_vdev_device *vdev)
+{
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (vdev == NULL)
+		return -EINVAL;
+	INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
+			name);
+	return fs_eth_dev_create(vdev);
+}
+
+static int
+rte_pmd_failsafe_remove(struct rte_vdev_device *vdev)
+{
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	INFO("Uninitializing " FAILSAFE_DRIVER_NAME " for %s", name);
+	return fs_rte_eth_free(name);
+}
+
+static struct rte_vdev_driver failsafe_drv = {
+	.probe = rte_pmd_failsafe_probe,
+	.remove = rte_pmd_failsafe_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
+RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);
+RTE_PMD_REGISTER_PARAM_STRING(net_failsafe, PMD_FAILSAFE_PARAM_STRING);
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
new file mode 100644
index 0000000..79e5bfa
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -0,0 +1,331 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+
+#include <rte_devargs.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+
+#include "failsafe_private.h"
+
+#define DEVARGS_MAXLEN 4096
+
+/* Callback used when a new device is found in devargs */
+typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
+		uint8_t head);
+
+int mac_from_arg;
+
+const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_MAC_KVARG,
+	NULL,
+};
+
+/*
+ * input: text.
+ * output: 0: if text[0] != '(',
+ *         0: if there are no corresponding ')'
+ *         n: distance to corresponding ')' otherwise
+ */
+static size_t
+closing_paren(const char *text)
+{
+	int nb_open = 0;
+	size_t i = 0;
+
+	while (text[i] != '\0') {
+		if (text[i] == '(')
+			nb_open++;
+		if (text[i] == ')')
+			nb_open--;
+		if (nb_open == 0)
+			return i;
+		i++;
+	}
+	return 0;
+}
+
+static int
+fs_parse_device(struct sub_device *sdev, char *args)
+{
+	struct rte_devargs *d;
+	int ret;
+
+	d = &sdev->devargs;
+	DEBUG("%s", args);
+	ret = rte_eal_devargs_parse(args, d);
+	if (ret) {
+		DEBUG("devargs parsing failed with code %d", ret);
+		return ret;
+	}
+	sdev->bus = d->bus;
+	sdev->state = DEV_PARSED;
+	return 0;
+}
+
+static int
+fs_parse_device_param(struct rte_eth_dev *dev, const char *param,
+		uint8_t head)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	char *args = NULL;
+	size_t a, b;
+	int ret;
+
+	priv = PRIV(dev);
+	a = 0;
+	b = 0;
+	ret = 0;
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	a = b;
+	b += closing_paren(&param[b]);
+	if (a == b) {
+		ERROR("Dangling parenthesis");
+		return -EINVAL;
+	}
+	a += 1;
+	args = strndup(&param[a], b - a);
+	if (args == NULL) {
+		ERROR("Not enough memory for parameter parsing");
+		return -ENOMEM;
+	}
+	sdev = &priv->subs[head];
+	if (strncmp(param, "dev", 3) == 0) {
+		ret = fs_parse_device(sdev, args);
+		if (ret)
+			goto free_args;
+	} else {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+free_args:
+	free(args);
+	return ret;
+}
+
+static int
+fs_parse_sub_devices(parse_cb *cb,
+		struct rte_eth_dev *dev, const char *params)
+{
+	size_t a, b;
+	uint8_t head;
+	int ret;
+
+	a = 0;
+	head = 0;
+	ret = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',') {
+			a = b + 1;
+			continue;
+		}
+		if (params[b] == '(') {
+			size_t start = b;
+
+			b += closing_paren(&params[b]);
+			if (b == start) {
+				ERROR("Dangling parenthesis");
+				return -EINVAL;
+			}
+			ret = (*cb)(dev, &params[a], head);
+			if (ret)
+				return ret;
+			head += 1;
+			b += 1;
+			if (params[b] == '\0')
+				return 0;
+		}
+		a = b + 1;
+	}
+	return 0;
+}
+
+static int
+fs_remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
+{
+	char buffer[DEVARGS_MAXLEN] = {0};
+	size_t a, b;
+	int i;
+
+	a = 0;
+	i = 0;
+	while (params[a] != '\0') {
+		b = a;
+		while (params[b] != '(' &&
+		       params[b] != ',' &&
+		       params[b] != '\0')
+			b++;
+		if (b == a) {
+			ERROR("Invalid parameter");
+			return -EINVAL;
+		}
+		if (params[b] == ',' || params[b] == '\0')
+			i += snprintf(&buffer[i], b - a + 1, "%s", &params[a]);
+		if (params[b] == '(') {
+			size_t start = b;
+			b += closing_paren(&params[b]);
+			if (b == start)
+				return -EINVAL;
+			b += 1;
+			if (params[b] == '\0')
+				goto out;
+		}
+		a = b + 1;
+	}
+out:
+	snprintf(params, DEVARGS_MAXLEN, "%s", buffer);
+	return 0;
+}
+
+static int
+fs_get_mac_addr_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	struct ether_addr *ea = out;
+	int ret;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+		&ea->addr_bytes[0], &ea->addr_bytes[1],
+		&ea->addr_bytes[2], &ea->addr_bytes[3],
+		&ea->addr_bytes[4], &ea->addr_bytes[5]);
+	return ret != ETHER_ADDR_LEN;
+}
+
+int
+failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
+{
+	struct fs_priv *priv;
+	char mut_params[DEVARGS_MAXLEN] = "";
+	struct rte_kvargs *kvlist = NULL;
+	unsigned int arg_count;
+	size_t n;
+	int ret;
+
+	if (dev == NULL || params == NULL)
+		return -EINVAL;
+	priv = PRIV(dev);
+	ret = 0;
+	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
+	/* default parameters */
+	mac_from_arg = 0;
+	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
+	if (n >= sizeof(mut_params)) {
+		ERROR("Parameter string too long (>=%zu)",
+				sizeof(mut_params));
+		return -ENOMEM;
+	}
+	ret = fs_parse_sub_devices(fs_parse_device_param,
+				   dev, params);
+	if (ret < 0)
+		return ret;
+	ret = fs_remove_sub_devices_definition(mut_params);
+	if (ret < 0)
+		return ret;
+	if (strnlen(mut_params, sizeof(mut_params)) > 0) {
+		kvlist = rte_kvargs_parse(mut_params,
+				pmd_failsafe_init_parameters);
+		if (kvlist == NULL) {
+			ERROR("Error parsing parameters, usage:\n"
+				PMD_FAILSAFE_PARAM_STRING);
+			return -1;
+		}
+		/* MAC addr */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_MAC_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_MAC_KVARG,
+					&fs_get_mac_addr_arg,
+					&dev->data->mac_addrs[0]);
+			if (ret < 0)
+				goto free_kvlist;
+			mac_from_arg = 1;
+		}
+	}
+free_kvlist:
+	rte_kvargs_free(kvlist);
+	return ret;
+}
+
+void
+failsafe_args_free(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		free(sdev->devargs.args);
+		sdev->devargs.args = NULL;
+	}
+}
+
+static int
+fs_count_device(struct rte_eth_dev *dev, const char *param,
+		uint8_t head __rte_unused)
+{
+	size_t b = 0;
+
+	while  (param[b] != '(' &&
+		param[b] != '\0')
+		b++;
+	if (strncmp(param, "dev", b) &&
+	    strncmp(param, "exec", b)) {
+		ERROR("Unrecognized device type: %.*s", (int)b, param);
+		return -EINVAL;
+	}
+	PRIV(dev)->subs_tail += 1;
+	return 0;
+}
+
+int
+failsafe_args_count_subdevice(struct rte_eth_dev *dev,
+			const char *params)
+{
+	return fs_parse_sub_devices(fs_count_device,
+				    dev, params);
+}
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
new file mode 100644
index 0000000..f4bd777
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -0,0 +1,138 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static int
+fs_bus_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_devargs *da;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PARSED)
+			continue;
+		da = &sdev->devargs;
+		ret = rte_eal_hotplug_add(da->bus->name,
+					  da->name,
+					  da->args);
+		if (ret) {
+			ERROR("sub_device %d probe failed %s%s%s", i,
+			      rte_errno ? "(" : "",
+			      rte_errno ? strerror(rte_errno) : "",
+			      rte_errno ? ")" : "");
+			continue;
+		}
+		ETH(sdev) = rte_eth_dev_allocated(da->name);
+		if (ETH(sdev) == NULL) {
+			ERROR("sub_device %d init went wrong", i);
+			return -ENODEV;
+		}
+		sdev->dev = ETH(sdev)->device;
+		ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
+		sdev->state = DEV_PROBED;
+	}
+	return 0;
+}
+
+int
+failsafe_eal_init(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	ret = fs_bus_init(dev);
+	if (ret)
+		return ret;
+	/*
+	 * We only update TX_SUBDEV if we are not started.
+	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
+	 * preferred port only upon starting it, so that the switch is smoother.
+	 */
+	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			/* Using first probed device */
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+				DEBUG("Switching tx_dev to sub_device %d",
+				      i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static int
+fs_bus_uninit(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev = NULL;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+		ret = rte_eal_hotplug_remove(sdev->bus->name,
+					     sdev->dev->name);
+		if (ret) {
+			ERROR("Failed to remove requested device %s",
+			      sdev->dev->name);
+			continue;
+		}
+		sdev->state = DEV_PROBED - 1;
+	}
+	return 0;
+}
+
+int
+failsafe_eal_uninit(struct rte_eth_dev *dev)
+{
+	int ret;
+
+	ret = fs_bus_uninit(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
new file mode 100644
index 0000000..91e2193
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -0,0 +1,664 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_debug.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+
+#include "failsafe_private.h"
+
+static struct rte_eth_dev_info default_infos = {
+	.driver_name = pmd_failsafe_driver_name,
+	/* Max possible number of elements */
+	.max_rx_pktlen = UINT32_MAX,
+	.max_rx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_tx_queues = RTE_MAX_QUEUES_PER_PORT,
+	.max_mac_addrs = FAILSAFE_MAX_ETHADDR,
+	.max_hash_mac_addrs = UINT32_MAX,
+	.max_vfs = UINT16_MAX,
+	.max_vmdq_pools = UINT16_MAX,
+	.rx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	.tx_desc_lim = {
+		.nb_max = UINT16_MAX,
+		.nb_min = 0,
+		.nb_align = 1,
+		.nb_seg_max = UINT16_MAX,
+		.nb_mtu_seg_max = UINT16_MAX,
+	},
+	/* Set of understood capabilities */
+	.rx_offload_capa = 0x0,
+	.tx_offload_capa = 0x0,
+	.flow_type_rss_offloads = 0x0,
+};
+
+static int
+fs_dev_configure(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_PROBED)
+			continue;
+		DEBUG("Configuring sub-device %d", i);
+		ret = rte_eth_dev_configure(PORT_ID(sdev),
+					dev->data->nb_rx_queues,
+					dev->data->nb_tx_queues,
+					&dev->data->dev_conf);
+		if (ret) {
+			ERROR("Could not configure sub_device %d", i);
+			return ret;
+		}
+		sdev->state = DEV_ACTIVE;
+	}
+	return 0;
+}
+
+static int
+fs_dev_start(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (sdev->state != DEV_ACTIVE)
+			continue;
+		DEBUG("Starting sub_device %d", i);
+		ret = rte_eth_dev_start(PORT_ID(sdev));
+		if (ret)
+			return ret;
+		sdev->state = DEV_STARTED;
+	}
+	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else {
+		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
+		    TX_SUBDEV(dev) == NULL) {
+			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+				DEBUG("Switching tx_dev to sub_device %d", i);
+				PRIV(dev)->subs_tx = i;
+				break;
+			}
+		}
+	}
+	return 0;
+}
+
+static void
+fs_dev_stop(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
+		rte_eth_dev_stop(PORT_ID(sdev));
+		sdev->state = DEV_STARTED - 1;
+	}
+}
+
+static int
+fs_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
+		ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_up failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
+		ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_link_down failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void fs_dev_free_queues(struct rte_eth_dev *dev);
+static void
+fs_dev_close(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Closing sub_device %d", i);
+		rte_eth_dev_close(PORT_ID(sdev));
+		sdev->state = DEV_ACTIVE - 1;
+	}
+	fs_dev_free_queues(dev);
+}
+
+static void
+fs_rx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct rxq *rxq;
+
+	if (queue == NULL)
+		return;
+	rxq = queue;
+	dev = rxq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, rx_queue_release)
+			(ETH(sdev)->data->rx_queues[rxq->qid]);
+	dev->data->rx_queues[rxq->qid] = NULL;
+	rte_free(rxq);
+}
+
+static int
+fs_rx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t rx_queue_id,
+		uint16_t nb_rx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_rxconf *rx_conf,
+		struct rte_mempool *mb_pool)
+{
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	uint8_t i;
+	int ret;
+
+	rxq = dev->data->rx_queues[rx_queue_id];
+	if (rxq != NULL) {
+		fs_rx_queue_release(rxq);
+		dev->data->rx_queues[rx_queue_id] = NULL;
+	}
+	rxq = rte_zmalloc(NULL, sizeof(*rxq),
+			  RTE_CACHE_LINE_SIZE);
+	if (rxq == NULL)
+		return -ENOMEM;
+	rxq->qid = rx_queue_id;
+	rxq->socket_id = socket_id;
+	rxq->info.mp = mb_pool;
+	rxq->info.conf = *rx_conf;
+	rxq->info.nb_desc = nb_rx_desc;
+	rxq->priv = PRIV(dev);
+	dev->data->rx_queues[rx_queue_id] = rxq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
+				rx_queue_id,
+				nb_rx_desc, socket_id,
+				rx_conf, mb_pool);
+		if (ret) {
+			ERROR("RX queue setup failed for sub_device %d", i);
+			goto free_rxq;
+		}
+	}
+	return 0;
+free_rxq:
+	fs_rx_queue_release(rxq);
+	return ret;
+}
+
+static void
+fs_tx_queue_release(void *queue)
+{
+	struct rte_eth_dev *dev;
+	struct sub_device *sdev;
+	uint8_t i;
+	struct txq *txq;
+
+	if (queue == NULL)
+		return;
+	txq = queue;
+	dev = txq->priv->dev;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		SUBOPS(sdev, tx_queue_release)
+			(ETH(sdev)->data->tx_queues[txq->qid]);
+	dev->data->tx_queues[txq->qid] = NULL;
+	rte_free(txq);
+}
+
+static int
+fs_tx_queue_setup(struct rte_eth_dev *dev,
+		uint16_t tx_queue_id,
+		uint16_t nb_tx_desc,
+		unsigned int socket_id,
+		const struct rte_eth_txconf *tx_conf)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	uint8_t i;
+	int ret;
+
+	txq = dev->data->tx_queues[tx_queue_id];
+	if (txq != NULL) {
+		fs_tx_queue_release(txq);
+		dev->data->tx_queues[tx_queue_id] = NULL;
+	}
+	txq = rte_zmalloc("ethdev TX queue", sizeof(*txq),
+			  RTE_CACHE_LINE_SIZE);
+	if (txq == NULL)
+		return -ENOMEM;
+	txq->qid = tx_queue_id;
+	txq->socket_id = socket_id;
+	txq->info.conf = *tx_conf;
+	txq->info.nb_desc = nb_tx_desc;
+	txq->priv = PRIV(dev);
+	dev->data->tx_queues[tx_queue_id] = txq;
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev),
+				tx_queue_id,
+				nb_tx_desc, socket_id,
+				tx_conf);
+		if (ret) {
+			ERROR("TX queue setup failed for sub_device %d", i);
+			goto free_txq;
+		}
+	}
+	return 0;
+free_txq:
+	fs_tx_queue_release(txq);
+	return ret;
+}
+
+static void
+fs_dev_free_queues(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		fs_rx_queue_release(dev->data->rx_queues[i]);
+		dev->data->rx_queues[i] = NULL;
+	}
+	dev->data->nb_rx_queues = 0;
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		fs_tx_queue_release(dev->data->tx_queues[i]);
+		dev->data->tx_queues[i] = NULL;
+	}
+	dev->data->nb_tx_queues = 0;
+}
+
+static void
+fs_promiscuous_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_enable(PORT_ID(sdev));
+}
+
+static void
+fs_promiscuous_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_promiscuous_disable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_enable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_enable(PORT_ID(sdev));
+}
+
+static void
+fs_allmulticast_disable(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_allmulticast_disable(PORT_ID(sdev));
+}
+
+static int
+fs_link_update(struct rte_eth_dev *dev,
+		int wait_to_complete)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling link_update on sub_device %d", i);
+		ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
+		if (ret && ret != -1) {
+			ERROR("Link update failed for sub_device %d with error %d",
+			      i, ret);
+			return ret;
+		}
+	}
+	if (TX_SUBDEV(dev)) {
+		struct rte_eth_link *l1;
+		struct rte_eth_link *l2;
+
+		l1 = &dev->data->dev_link;
+		l2 = &ETH(TX_SUBDEV(dev))->data->dev_link;
+		if (memcmp(l1, l2, sizeof(*l1))) {
+			*l1 = *l2;
+			return 0;
+		}
+	}
+	return -1;
+}
+
+static void
+fs_stats_get(struct rte_eth_dev *dev,
+	     struct rte_eth_stats *stats)
+{
+	memset(stats, 0, sizeof(*stats));
+	if (TX_SUBDEV(dev) == NULL)
+		return;
+	rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
+}
+
+static void
+fs_stats_reset(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_stats_reset(PORT_ID(sdev));
+}
+
+/**
+ * Fail-safe dev_infos_get rules:
+ *
+ * No sub_device:
+ *   Numerables:
+ *      Use the maximum possible values for any field, so as not
+ *      to impede any further configuration effort.
+ *   Capabilities:
+ *      Limits capabilities to those that are understood by the
+ *      fail-safe PMD. This understanding stems from the fail-safe
+ *      being capable of verifying that the related capability is
+ *      expressed within the device configuration (struct rte_eth_conf).
+ *
+ * At least one probed sub_device:
+ *   Numerables:
+ *      Uses values from the active probed sub_device
+ *      The rationale here is that if any sub_device is less capable
+ *      (for example concerning the number of queues) than the active
+ *      sub_device, then its subsequent configuration will fail.
+ *      It is impossible to foresee this failure when the failing sub_device
+ *      is supposed to be plugged-in later on, so the configuration process
+ *      is the single point of failure and error reporting.
+ *   Capabilities:
+ *      Uses a logical AND of RX capabilities among
+ *      all sub_devices and the default capabilities.
+ *      Uses a logical AND of TX capabilities among
+ *      the active probed sub_device and the default capabilities.
+ *
+ */
+static void
+fs_dev_infos_get(struct rte_eth_dev *dev,
+		  struct rte_eth_dev_info *infos)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL) {
+		DEBUG("No probed device, using default infos");
+		rte_memcpy(&PRIV(dev)->infos, &default_infos,
+			   sizeof(default_infos));
+	} else {
+		uint32_t rx_offload_capa;
+
+		rx_offload_capa = default_infos.rx_offload_capa;
+		FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
+			rte_eth_dev_info_get(PORT_ID(sdev),
+					&PRIV(dev)->infos);
+			rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa;
+		}
+		sdev = TX_SUBDEV(dev);
+		rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
+		PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
+		PRIV(dev)->infos.tx_offload_capa &=
+					default_infos.tx_offload_capa;
+		PRIV(dev)->infos.flow_type_rss_offloads &=
+					default_infos.flow_type_rss_offloads;
+	}
+	rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
+}
+
+static const uint32_t *
+fs_dev_supported_ptypes_get(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	struct rte_eth_dev *edev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return NULL;
+	edev = ETH(sdev);
+	/* ENOTSUP: counts as no supported ptypes */
+	if (SUBOPS(sdev, dev_supported_ptypes_get) == NULL)
+		return NULL;
+	/*
+	 * The API does not permit to do a clean AND of all ptypes,
+	 * It is also incomplete by design and we do not really care
+	 * to have a best possible value in this context.
+	 * We just return the ptypes of the device of highest
+	 * priority, usually the PREFERRED device.
+	 */
+	return SUBOPS(sdev, dev_supported_ptypes_get)(edev);
+}
+
+static int
+fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
+		ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_vlan_filter failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static int
+fs_flow_ctrl_get(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+
+	sdev = TX_SUBDEV(dev);
+	if (sdev == NULL)
+		return 0;
+	if (SUBOPS(sdev, flow_ctrl_get) == NULL)
+		return -ENOTSUP;
+	return SUBOPS(sdev, flow_ctrl_get)(ETH(sdev), fc_conf);
+}
+
+static int
+fs_flow_ctrl_set(struct rte_eth_dev *dev,
+		struct rte_eth_fc_conf *fc_conf)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+	int ret;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
+		ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_flow_ctrl_set failed for sub_device %d"
+			      " with error %d", i, ret);
+			return ret;
+		}
+	}
+	return 0;
+}
+
+static void
+fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	/* No check: already done within the rte_eth_dev_mac_addr_remove
+	 * call for the fail-safe device.
+	 */
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
+				&dev->data->mac_addrs[index]);
+	PRIV(dev)->mac_addr_pool[index] = 0;
+}
+
+static int
+fs_mac_addr_add(struct rte_eth_dev *dev,
+		struct ether_addr *mac_addr,
+		uint32_t index,
+		uint32_t vmdq)
+{
+	struct sub_device *sdev;
+	int ret;
+	uint8_t i;
+
+	RTE_ASSERT(index < FAILSAFE_MAX_ETHADDR);
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
+		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
+		if (ret) {
+			ERROR("Operation rte_eth_dev_mac_addr_add failed for sub_device %"
+			      PRIu8 " with error %d", i, ret);
+			return ret;
+		}
+	}
+	if (index >= PRIV(dev)->nb_mac_addr) {
+		DEBUG("Growing mac_addrs array");
+		PRIV(dev)->nb_mac_addr = index;
+	}
+	PRIV(dev)->mac_addr_pool[index] = vmdq;
+	return 0;
+}
+
+static void
+fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+	struct sub_device *sdev;
+	uint8_t i;
+
+	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
+		rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
+}
+
+const struct eth_dev_ops failsafe_ops = {
+	.dev_configure = fs_dev_configure,
+	.dev_start = fs_dev_start,
+	.dev_stop = fs_dev_stop,
+	.dev_set_link_down = fs_dev_set_link_down,
+	.dev_set_link_up = fs_dev_set_link_up,
+	.dev_close = fs_dev_close,
+	.promiscuous_enable = fs_promiscuous_enable,
+	.promiscuous_disable = fs_promiscuous_disable,
+	.allmulticast_enable = fs_allmulticast_enable,
+	.allmulticast_disable = fs_allmulticast_disable,
+	.link_update = fs_link_update,
+	.stats_get = fs_stats_get,
+	.stats_reset = fs_stats_reset,
+	.dev_infos_get = fs_dev_infos_get,
+	.dev_supported_ptypes_get = fs_dev_supported_ptypes_get,
+	.mtu_set = fs_mtu_set,
+	.vlan_filter_set = fs_vlan_filter_set,
+	.rx_queue_setup = fs_rx_queue_setup,
+	.tx_queue_setup = fs_tx_queue_setup,
+	.rx_queue_release = fs_rx_queue_release,
+	.tx_queue_release = fs_tx_queue_release,
+	.flow_ctrl_get = fs_flow_ctrl_get,
+	.flow_ctrl_set = fs_flow_ctrl_set,
+	.mac_addr_remove = fs_mac_addr_remove,
+	.mac_addr_add = fs_mac_addr_add,
+	.mac_addr_set = fs_mac_addr_set,
+};
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
new file mode 100644
index 0000000..d0ec4f8
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -0,0 +1,210 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
+#define _RTE_ETH_FAILSAFE_PRIVATE_H_
+
+#include <rte_dev.h>
+#include <rte_ethdev.h>
+#include <rte_devargs.h>
+
+#define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
+
+#define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PARAM_STRING	\
+	"dev(<ifc>),"			\
+	"mac=mac_addr"			\
+	""
+
+#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+
+#define FAILSAFE_MAX_ETHPORTS 2
+#define FAILSAFE_MAX_ETHADDR 128
+
+/* TYPES */
+
+struct rxq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	/* id of last sub_device polled */
+	uint8_t last_polled;
+	unsigned int socket_id;
+	struct rte_eth_rxq_info info;
+};
+
+struct txq {
+	struct fs_priv *priv;
+	uint16_t qid;
+	unsigned int socket_id;
+	struct rte_eth_txq_info info;
+};
+
+enum dev_state {
+	DEV_UNDEFINED = 0,
+	DEV_PARSED,
+	DEV_PROBED,
+	DEV_ACTIVE,
+	DEV_STARTED,
+};
+
+struct sub_device {
+	/* Exhaustive DPDK device description */
+	struct rte_devargs devargs;
+	struct rte_bus *bus;
+	struct rte_device *dev;
+	struct rte_eth_dev *edev;
+	/* Device state machine */
+	enum dev_state state;
+};
+
+struct fs_priv {
+	struct rte_eth_dev *dev;
+	/*
+	 * Set of sub_devices.
+	 * subs[0] is the preferred device
+	 * any other is just another slave
+	 */
+	struct sub_device *subs;
+	uint8_t subs_head; /* if head == tail, no subs */
+	uint8_t subs_tail; /* first invalid */
+	uint8_t subs_tx; /* current emitting device */
+	uint8_t current_probed;
+	/* current number of mac_addr slots allocated. */
+	uint32_t nb_mac_addr;
+	struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
+	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
+	/* current capabilities */
+	struct rte_eth_dev_info infos;
+};
+
+/* RX / TX */
+
+uint16_t failsafe_rx_burst(void *rxq,
+		struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
+uint16_t failsafe_tx_burst(void *txq,
+		struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
+
+/* ARGS */
+
+int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
+void failsafe_args_free(struct rte_eth_dev *dev);
+int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
+
+/* EAL */
+
+int failsafe_eal_init(struct rte_eth_dev *dev);
+int failsafe_eal_uninit(struct rte_eth_dev *dev);
+
+/* GLOBALS */
+
+extern const char pmd_failsafe_driver_name[];
+extern const struct eth_dev_ops failsafe_ops;
+extern int mac_from_arg;
+
+/* HELPERS */
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PRIV(dev) \
+	((struct fs_priv *)(dev)->data->dev_private)
+
+/* sdev: (struct sub_device *) */
+#define ETH(sdev) \
+	((sdev)->edev)
+
+/* sdev: (struct sub_device *) */
+#define PORT_ID(sdev) \
+	(ETH(sdev)->data->port_id)
+
+/**
+ * Stateful iterator construct over fail-safe sub-devices:
+ * s:     (struct sub_device *), iterator
+ * i:     (uint8_t), increment
+ * dev:   (struct rte_eth_dev *), fail-safe ethdev
+ * state: (enum dev_state), minimum acceptable device state
+ */
+#define FOREACH_SUBDEV_ST(s, i, dev, state)				\
+	for (i = fs_find_next((dev), 0, state);				\
+	     i < PRIV(dev)->subs_tail && (s = &PRIV(dev)->subs[i]);	\
+	     i = fs_find_next((dev), i + 1, state))
+
+/**
+ * Iterator construct over fail-safe sub-devices:
+ * s:   (struct sub_device *), iterator
+ * i:   (uint8_t), increment
+ * dev: (struct rte_eth_dev *), fail-safe ethdev
+ */
+#define FOREACH_SUBDEV(s, i, dev)			\
+	FOREACH_SUBDEV_ST(s, i, dev, DEV_UNDEFINED)
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define PREFERRED_SUBDEV(dev) \
+	(&PRIV(dev)->subs[0])
+
+/* dev: (struct rte_eth_dev *) fail-safe device */
+#define TX_SUBDEV(dev)							  \
+	(PRIV(dev)->subs_tx >= PRIV(dev)->subs_tail		   ? NULL \
+	 : (PRIV(dev)->subs[PRIV(dev)->subs_tx].state < DEV_PROBED ? NULL \
+	 : &PRIV(dev)->subs[PRIV(dev)->subs_tx]))
+
+/**
+ * s:   (struct sub_device *)
+ * ops: (struct eth_dev_ops) member
+ */
+#define SUBOPS(s, ops) \
+	(ETH(s)->dev_ops->ops)
+
+#define LOG__(level, m, ...) \
+	RTE_LOG(level, PMD, "net_failsafe: " m "%c", __VA_ARGS__)
+#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
+#define DEBUG(...) LOG_(DEBUG, __VA_ARGS__)
+#define INFO(...) LOG_(INFO, __VA_ARGS__)
+#define WARN(...) LOG_(WARNING, __VA_ARGS__)
+#define ERROR(...) LOG_(ERR, __VA_ARGS__)
+
+/* inlined functions */
+
+static inline uint8_t
+fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
+		enum dev_state min_state)
+{
+	while (sid < PRIV(dev)->subs_tail) {
+		if (PRIV(dev)->subs[sid].state >= min_state)
+			break;
+		sid++;
+	}
+	if (sid >= PRIV(dev)->subs_tail)
+		return PRIV(dev)->subs_tail;
+	return sid;
+}
+
+#endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
new file mode 100644
index 0000000..a45b4e5
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_rxtx.c
@@ -0,0 +1,107 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include "failsafe_private.h"
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_rx_burst(void *queue,
+		  struct rte_mbuf **rx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct fs_priv *priv;
+	struct sub_device *sdev;
+	struct rxq *rxq;
+	void *sub_rxq;
+	uint16_t nb_rx;
+	uint8_t nb_polled, nb_subs;
+	uint8_t i;
+
+	rxq = queue;
+	priv = rxq->priv;
+	nb_subs = priv->subs_tail - priv->subs_head;
+	nb_polled = 0;
+	for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
+		i++;
+		if (i == priv->subs_tail)
+			i = priv->subs_head;
+		sdev = &priv->subs[i];
+		if (unlikely(ETH(sdev) == NULL))
+			continue;
+		if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
+			continue;
+		if (unlikely(sdev->state != DEV_STARTED))
+			continue;
+		sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
+		nb_rx = ETH(sdev)->
+			rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
+		if (nb_rx) {
+			rxq->last_polled = i;
+			return nb_rx;
+		}
+	}
+	return 0;
+}
+
+/*
+ * TODO: write fast version,
+ * without additional checks, to be activated once
+ * everything has been verified to comply.
+ */
+uint16_t
+failsafe_tx_burst(void *queue,
+		  struct rte_mbuf **tx_pkts,
+		  uint16_t nb_pkts)
+{
+	struct sub_device *sdev;
+	struct txq *txq;
+	void *sub_txq;
+
+	txq = queue;
+	sdev = TX_SUBDEV(txq->priv->dev);
+	if (unlikely(sdev == NULL))
+		return 0;
+	if (unlikely(ETH(sdev) == NULL))
+		return 0;
+	if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
+		return 0;
+	sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
+	return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
+}
diff --git a/drivers/net/failsafe/rte_pmd_failsafe_version.map b/drivers/net/failsafe/rte_pmd_failsafe_version.map
new file mode 100644
index 0000000..b6d2840
--- /dev/null
+++ b/drivers/net/failsafe/rte_pmd_failsafe_version.map
@@ -0,0 +1,4 @@ 
+DPDK_17.08 {
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 5bb4290..c25fdd9 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -121,6 +121,7 @@  _LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)      += -lrte_pmd_e1000
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)        += -lrte_pmd_ena
 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)       += -lrte_pmd_enic
 _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)      += -lrte_pmd_fm10k
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE)   += -lrte_pmd_failsafe
 _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)       += -lrte_pmd_i40e
 _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)      += -lrte_pmd_ixgbe
 ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)