[dpdk-dev,v4,05/12] net/failsafe: add plug-in support

Message ID 6b96432155ffd2d5fc6f6011a0b229c2224116f8.1496065002.git.gaetan.rivet@6wind.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Gaëtan Rivet May 29, 2017, 1:42 p.m. UTC
  Periodically check for the existence of a device.
If a device has not been initialized and exists on the system, then it
is probed and configured.

The configuration process strives to synchronize the states between the
plugged-in sub-device and the fail-safe device.

Signed-off-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Acked-by: Olga Shern <olgas@mellanox.com>
---
 doc/guides/nics/fail_safe.rst           |  19 +++
 drivers/net/failsafe/Makefile           |   1 +
 drivers/net/failsafe/failsafe.c         |  71 ++++++++++
 drivers/net/failsafe/failsafe_args.c    |  32 +++++
 drivers/net/failsafe/failsafe_eal.c     |  30 +----
 drivers/net/failsafe/failsafe_ether.c   | 228 ++++++++++++++++++++++++++++++++
 drivers/net/failsafe/failsafe_ops.c     |  25 ++--
 drivers/net/failsafe/failsafe_private.h |  60 ++++++++-
 8 files changed, 423 insertions(+), 43 deletions(-)
 create mode 100644 drivers/net/failsafe/failsafe_ether.c
  

Comments

Stephen Hemminger May 31, 2017, 3:15 p.m. UTC | #1
On Mon, 29 May 2017 15:42:17 +0200
Gaetan Rivet <gaetan.rivet@6wind.com> wrote:

> Periodically check for the existence of a device.
> If a device has not been initialized and exists on the system, then it
> is probed and configured.
> 
> The configuration process strives to synchronize the states between the
> plugged-in sub-device and the fail-safe device.

There are existing event models (udev and netlink) that could be used to
do plug-in support without polling. Polling relies on application doing
rte_alarms and many don't.
  
Gaëtan Rivet June 1, 2017, 2:12 p.m. UTC | #2
On Wed, May 31, 2017 at 08:15:26AM -0700, Stephen Hemminger wrote:
> On Mon, 29 May 2017 15:42:17 +0200
> Gaetan Rivet <gaetan.rivet@6wind.com> wrote:
> 
> > Periodically check for the existence of a device.
> > If a device has not been initialized and exists on the system, then it
> > is probed and configured.
> > 
> > The configuration process strives to synchronize the states between the
> > plugged-in sub-device and the fail-safe device.
> 
> There are existing event models (udev and netlink) that could be used to
> do plug-in support without polling. Polling relies on application doing
> rte_alarms and many don't.

Indeed. This possibility arose during development.

The main issue with it however is that it introduces an asynchronous
design, which the DPDK and PMDs underneath are not well-suited to
interact with. It goes against the grain in a way.

The polling is simple. It can work with all models of device and is
independent of event models specific to any architecture.

It also allows to simplify the contexts in which probing and
removal are done. Currently there is only one, the interrupt thread.
This solves a few possible race conditions without having to resort to
critical sections.

The only dependency is on another DPDK subsystem, rte_alarm.
I used alarms here because rte_timers need regular rte_timer_manage()
calls and there is little way to guarantee the frequency of the calls.

rte_alarms do not force any externalities on applications, thus allowing a
seamless use of the fail-safe.
  
Stephen Hemminger June 1, 2017, 6 p.m. UTC | #3
On Thu, 1 Jun 2017 16:12:41 +0200
Gaëtan Rivet <gaetan.rivet@6wind.com> wrote:

> On Wed, May 31, 2017 at 08:15:26AM -0700, Stephen Hemminger wrote:
> > On Mon, 29 May 2017 15:42:17 +0200
> > Gaetan Rivet <gaetan.rivet@6wind.com> wrote:
> >   
> > > Periodically check for the existence of a device.
> > > If a device has not been initialized and exists on the system, then it
> > > is probed and configured.
> > > 
> > > The configuration process strives to synchronize the states between the
> > > plugged-in sub-device and the fail-safe device.  
> > 
> > There are existing event models (udev and netlink) that could be used to
> > do plug-in support without polling. Polling relies on application doing
> > rte_alarms and many don't.  
> 
> Indeed. This possibility arose during development.
> 
> The main issue with it however is that it introduces an asynchronous
> design, which the DPDK and PMDs underneath are not well-suited to
> interact with. It goes against the grain in a way.
> 
> The polling is simple. It can work with all models of device and is
> independent of event models specific to any architecture.
> 
> It also allows to simplify the contexts in which probing and
> removal are done. Currently there is only one, the interrupt thread.
> This solves a few possible race conditions without having to resort to
> critical sections.
> 
> The only dependency is on another DPDK subsystem, rte_alarm.
> I used alarms here because rte_timers need regular rte_timer_manage()
> calls and there is little way to guarantee the frequency of the calls.
> 
> rte_alarms do not force any externalities on applications, thus allowing a
> seamless use of the fail-safe.
> 


The issue with rte_alarm and also with LSC interrupt callbacks is that
they don't run on a normal DPDK EAL application thread. These callbacks
run on a DPDK internal pthread. I remember having to do some application
hacks like having the callback generate an internal event on a pipe.
  
Gaëtan Rivet June 4, 2017, 11:09 p.m. UTC | #4
On Thu, Jun 01, 2017 at 11:00:10AM -0700, Stephen Hemminger wrote:
> On Thu, 1 Jun 2017 16:12:41 +0200
> Gaëtan Rivet <gaetan.rivet@6wind.com> wrote:
> 
> > On Wed, May 31, 2017 at 08:15:26AM -0700, Stephen Hemminger wrote:
> > > On Mon, 29 May 2017 15:42:17 +0200
> > > Gaetan Rivet <gaetan.rivet@6wind.com> wrote:
> > >   
> > > > Periodically check for the existence of a device.
> > > > If a device has not been initialized and exists on the system, then it
> > > > is probed and configured.
> > > > 
> > > > The configuration process strives to synchronize the states between the
> > > > plugged-in sub-device and the fail-safe device.  
> > > 
> > > There are existing event models (udev and netlink) that could be used to
> > > do plug-in support without polling. Polling relies on application doing
> > > rte_alarms and many don't.  
> > 
> > Indeed. This possibility arose during development.
> > 
> > The main issue with it however is that it introduces an asynchronous
> > design, which the DPDK and PMDs underneath are not well-suited to
> > interact with. It goes against the grain in a way.
> > 
> > The polling is simple. It can work with all models of device and is
> > independent of event models specific to any architecture.
> > 
> > It also allows to simplify the contexts in which probing and
> > removal are done. Currently there is only one, the interrupt thread.
> > This solves a few possible race conditions without having to resort to
> > critical sections.
> > 
> > The only dependency is on another DPDK subsystem, rte_alarm.
> > I used alarms here because rte_timers need regular rte_timer_manage()
> > calls and there is little way to guarantee the frequency of the calls.
> > 
> > rte_alarms do not force any externalities on applications, thus allowing a
> > seamless use of the fail-safe.
> > 
> 
> 
> The issue with rte_alarm and also with LSC interrupt callbacks is that
> they don't run on a normal DPDK EAL application thread. These callbacks
> run on a DPDK internal pthread. I remember having to do some application
> hacks like having the callback generate an internal event on a pipe.
> 

On the other hand, not all applications would make use of those hacks,
and adding those would impose architecture elements on users. While
convenient, this goes somewhat against the tool-box ethos of DPDK.

In the end, I had to leverage the existing tools. Interrupts in DPDK are
a known weak point, but they are at least working and not too heavy
conceptually on applications (clean threading model, no need for signal
masks, etc). Better implementation might crop up at some point, if those
hurdles are too much and shared by many.
  
Stephen Hemminger June 5, 2017, 3:25 p.m. UTC | #5
On Mon, 5 Jun 2017 01:09:19 +0200
Gaëtan Rivet <gaetan.rivet@6wind.com> wrote:

> On Thu, Jun 01, 2017 at 11:00:10AM -0700, Stephen Hemminger wrote:
> > On Thu, 1 Jun 2017 16:12:41 +0200
> > Gaëtan Rivet <gaetan.rivet@6wind.com> wrote:
> >   
> > > On Wed, May 31, 2017 at 08:15:26AM -0700, Stephen Hemminger wrote:  
> > > > On Mon, 29 May 2017 15:42:17 +0200
> > > > Gaetan Rivet <gaetan.rivet@6wind.com> wrote:
> > > >     
> > > > > Periodically check for the existence of a device.
> > > > > If a device has not been initialized and exists on the system, then it
> > > > > is probed and configured.
> > > > > 
> > > > > The configuration process strives to synchronize the states between the
> > > > > plugged-in sub-device and the fail-safe device.    
> > > > 
> > > > There are existing event models (udev and netlink) that could be used to
> > > > do plug-in support without polling. Polling relies on application doing
> > > > rte_alarms and many don't.    
> > > 
> > > Indeed. This possibility arose during development.
> > > 
> > > The main issue with it however is that it introduces an asynchronous
> > > design, which the DPDK and PMDs underneath are not well-suited to
> > > interact with. It goes against the grain in a way.
> > > 
> > > The polling is simple. It can work with all models of device and is
> > > independent of event models specific to any architecture.
> > > 
> > > It also allows to simplify the contexts in which probing and
> > > removal are done. Currently there is only one, the interrupt thread.
> > > This solves a few possible race conditions without having to resort to
> > > critical sections.
> > > 
> > > The only dependency is on another DPDK subsystem, rte_alarm.
> > > I used alarms here because rte_timers need regular rte_timer_manage()
> > > calls and there is little way to guarantee the frequency of the calls.
> > > 
> > > rte_alarms do not force any externalities on applications, thus allowing a
> > > seamless use of the fail-safe.
> > >   
> > 
> > 
> > The issue with rte_alarm and also with LSC interrupt callbacks is that
> > they don't run on a normal DPDK EAL application thread. These callbacks
> > run on a DPDK internal pthread. I remember having to do some application
> > hacks like having the callback generate an internal event on a pipe.
> >   
> 
> On the other hand, not all applications would make use of those hacks,
> and adding those would impose architecture elements on users. While
> convenient, this goes somewhat against the tool-box ethos of DPDK.
> 
> In the end, I had to leverage the existing tools. Interrupts in DPDK are
> a known weak point, but they are at least working and not too heavy
> conceptually on applications (clean threading model, no need for signal
> masks, etc). Better implementation might crop up at some point, if those
> hurdles are too much and shared by many.
> 

The alarm solution is a good intermediate step. But eventually in the spirit
of the DPDK there should be option to have an event driven model. Maybe the event
library will help.

For me the litmus test is can the known open source heavy weight DPDK applications
like VPP work?
  

Patch

diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index 056f85f..c04891a 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -102,6 +102,11 @@  Fail-safe command line parameters
   address is generated, that will be subsequently applied to all sub-device once
   they are probed.
 
+- **hotplug_poll** parameter [UINT64] (default **2000**)
+
+  This parameter allows the user to configure the amount of time in milliseconds
+  between two slave upkeep round.
+
 Usage example
 ~~~~~~~~~~~~~
 
@@ -131,3 +136,17 @@  Care must be taken, however, to respect the **ether** API concerning device
 access, and in particular, using the ``RTE_ETH_FOREACH_DEV`` macro to iterate
 over ethernet devices, instead of directly accessing them or by writing one's
 own device iterator.
+
+Plug-in feature
+---------------
+
+A sub-device can be defined without existing on the system when the fail-safe
+PMD is initialized. Upon probing this device, the fail-safe PMD will detect its
+absence and postpone its use. It will then register for a periodic check on any
+missing sub-device.
+
+During this time, the fail-safe PMD can be used normally, configured and told to
+emit and receive packets. It will store any applied configuration, and try to
+apply it upon the probing of its missing sub-device. After this configuration
+pass, the new sub-device will be synchronized with other sub-devices, i.e. be
+started if the fail-safe PMD has been started by the user before.
diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
index 06199ad..4567961 100644
--- a/drivers/net/failsafe/Makefile
+++ b/drivers/net/failsafe/Makefile
@@ -40,6 +40,7 @@  SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_args.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ether.c
 
 # No exported include files
 
diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
index 7cf33e8..888f07b 100644
--- a/drivers/net/failsafe/failsafe.c
+++ b/drivers/net/failsafe/failsafe.c
@@ -80,6 +80,72 @@  fs_sub_device_free(struct rte_eth_dev *dev)
 	rte_free(PRIV(dev)->subs);
 }
 
+static void fs_hotplug_alarm(void *arg);
+
+int
+failsafe_hotplug_alarm_install(struct rte_eth_dev *dev)
+{
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+	if (PRIV(dev)->pending_alarm)
+		return 0;
+	ret = rte_eal_alarm_set(hotplug_poll * 1000,
+				fs_hotplug_alarm,
+				dev);
+	if (ret) {
+		ERROR("Could not set up plug-in event detection");
+		return ret;
+	}
+	PRIV(dev)->pending_alarm = 1;
+	return 0;
+}
+
+int
+failsafe_hotplug_alarm_cancel(struct rte_eth_dev *dev)
+{
+	int ret = 0;
+
+	if (PRIV(dev)->pending_alarm) {
+		rte_errno = 0;
+		rte_eal_alarm_cancel(fs_hotplug_alarm, dev);
+		if (rte_errno) {
+			ERROR("rte_eal_alarm_cancel failed (errno: %s)",
+			      strerror(rte_errno));
+			ret = -rte_errno;
+		} else {
+			PRIV(dev)->pending_alarm = 0;
+		}
+	}
+	return ret;
+}
+
+static void
+fs_hotplug_alarm(void *arg)
+{
+	struct rte_eth_dev *dev = arg;
+	struct sub_device *sdev;
+	int ret;
+	uint8_t i;
+
+	if (!PRIV(dev)->pending_alarm)
+		return;
+	PRIV(dev)->pending_alarm = 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		if (sdev->state != PRIV(dev)->state)
+			break;
+	/* if we have non-probed device */
+	if (i != PRIV(dev)->subs_tail) {
+		ret = failsafe_eth_dev_state_sync(dev);
+		if (ret)
+			ERROR("Unable to synchronize sub_device state");
+	}
+	ret = failsafe_hotplug_alarm_install(dev);
+	if (ret)
+		ERROR("Unable to set up next alarm");
+}
+
 static int
 fs_eth_dev_create(struct rte_vdev_device *vdev)
 {
@@ -128,6 +194,11 @@  fs_eth_dev_create(struct rte_vdev_device *vdev)
 	ret = failsafe_eal_init(dev);
 	if (ret)
 		goto free_args;
+	ret = failsafe_hotplug_alarm_install(dev);
+	if (ret) {
+		ERROR("Could not set up plug-in event detection");
+		goto free_args;
+	}
 	mac = &dev->data->mac_addrs[0];
 	if (mac_from_arg) {
 		/*
diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
index f07d26e..8f334aa 100644
--- a/drivers/net/failsafe/failsafe_args.c
+++ b/drivers/net/failsafe/failsafe_args.c
@@ -45,9 +45,11 @@ 
 typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
 		uint8_t head);
 
+uint64_t hotplug_poll;
 int mac_from_arg;
 
 const char *pmd_failsafe_init_parameters[] = {
+	PMD_FAILSAFE_PLUG_IN_POLL_KVARG,
 	PMD_FAILSAFE_MAC_KVARG,
 	NULL,
 };
@@ -221,6 +223,24 @@  fs_remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
 }
 
 static int
+fs_get_u64_arg(const char *key __rte_unused,
+		const char *value, void *out)
+{
+	uint64_t *u64 = out;
+	char *endptr = NULL;
+
+	if ((value == NULL) || (out == NULL))
+		return -EINVAL;
+	errno = 0;
+	*u64 = strtoull(value, &endptr, 0);
+	if (errno != 0)
+		return -errno;
+	if (endptr == value)
+		return -1;
+	return 0;
+}
+
+static int
 fs_get_mac_addr_arg(const char *key __rte_unused,
 		const char *value, void *out)
 {
@@ -252,6 +272,7 @@  failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 	ret = 0;
 	priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
 	/* default parameters */
+	hotplug_poll = FAILSAFE_HOTPLUG_DEFAULT_TIMEOUT_MS;
 	mac_from_arg = 0;
 	n = snprintf(mut_params, sizeof(mut_params), "%s", params);
 	if (n >= sizeof(mut_params)) {
@@ -274,6 +295,16 @@  failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 				PMD_FAILSAFE_PARAM_STRING);
 			return -1;
 		}
+		/* PLUG_IN event poll timer */
+		arg_count = rte_kvargs_count(kvlist,
+				PMD_FAILSAFE_PLUG_IN_POLL_KVARG);
+		if (arg_count == 1) {
+			ret = rte_kvargs_process(kvlist,
+					PMD_FAILSAFE_PLUG_IN_POLL_KVARG,
+					&fs_get_u64_arg, &hotplug_poll);
+			if (ret < 0)
+				goto free_kvlist;
+		}
 		/* MAC addr */
 		arg_count = rte_kvargs_count(kvlist,
 				PMD_FAILSAFE_MAC_KVARG);
@@ -287,6 +318,7 @@  failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
 			mac_from_arg = 1;
 		}
 	}
+	PRIV(dev)->state = DEV_PARSED;
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
index 087d4f3..d7b12e2 100644
--- a/drivers/net/failsafe/failsafe_eal.c
+++ b/drivers/net/failsafe/failsafe_eal.c
@@ -90,37 +90,14 @@  fs_bus_init(struct rte_eth_dev *dev)
 int
 failsafe_eal_init(struct rte_eth_dev *dev)
 {
-	struct sub_device *sdev;
-	uint8_t i;
 	int ret;
 
 	ret = fs_bus_init(dev);
 	if (ret)
 		return ret;
-	/*
-	 * We only update TX_SUBDEV if we are not started.
-	 * If a sub_device is emitting, we will switch the TX_SUBDEV to the
-	 * preferred port only upon starting it, so that the switch is smoother.
-	 */
-	if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
-		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
-		    (TX_SUBDEV(dev) == NULL ||
-		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
-			DEBUG("Switching tx_dev to preferred sub_device");
-			PRIV(dev)->subs_tx = 0;
-		}
-	} else {
-		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
-		    TX_SUBDEV(dev) == NULL) {
-			/* Using first probed device */
-			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
-				DEBUG("Switching tx_dev to sub_device %d",
-				      i);
-				PRIV(dev)->subs_tx = i;
-				break;
-			}
-		}
-	}
+	if (PRIV(dev)->state < DEV_PROBED)
+		PRIV(dev)->state = DEV_PROBED;
+	fs_switch_dev(dev);
 	return 0;
 }
 
@@ -156,5 +133,6 @@  failsafe_eal_uninit(struct rte_eth_dev *dev)
 	ret = fs_bus_uninit(dev);
 	if (ret)
 		return ret;
+	PRIV(dev)->state = DEV_PROBED - 1;
 	return 0;
 }
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
new file mode 100644
index 0000000..7910952
--- /dev/null
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -0,0 +1,228 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright 2017 6WIND S.A.
+ *   Copyright 2017 Mellanox.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of 6WIND S.A. nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "failsafe_private.h"
+
+static int
+fs_eth_dev_conf_apply(struct rte_eth_dev *dev,
+		struct sub_device *sdev)
+{
+	struct rte_eth_dev *edev;
+	struct rte_vlan_filter_conf *vfc1;
+	struct rte_vlan_filter_conf *vfc2;
+	uint32_t i;
+	int ret;
+
+	edev = ETH(sdev);
+	/* RX queue setup */
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		struct rxq *rxq;
+
+		rxq = dev->data->rx_queues[i];
+		ret = rte_eth_rx_queue_setup(PORT_ID(sdev), i,
+				rxq->info.nb_desc, rxq->socket_id,
+				&rxq->info.conf, rxq->info.mp);
+		if (ret) {
+			ERROR("rx_queue_setup failed");
+			return ret;
+		}
+	}
+	/* TX queue setup */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		struct txq *txq;
+
+		txq = dev->data->tx_queues[i];
+		ret = rte_eth_tx_queue_setup(PORT_ID(sdev), i,
+				txq->info.nb_desc, txq->socket_id,
+				&txq->info.conf);
+		if (ret) {
+			ERROR("tx_queue_setup failed");
+			return ret;
+		}
+	}
+	/* dev_link.link_status */
+	if (dev->data->dev_link.link_status !=
+	    edev->data->dev_link.link_status) {
+		DEBUG("Configuring link_status");
+		if (dev->data->dev_link.link_status)
+			ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
+		else
+			ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
+		if (ret) {
+			ERROR("Failed to apply link_status");
+			return ret;
+		}
+	} else {
+		DEBUG("link_status already set");
+	}
+	/* promiscuous */
+	if (dev->data->promiscuous != edev->data->promiscuous) {
+		DEBUG("Configuring promiscuous");
+		if (dev->data->promiscuous)
+			rte_eth_promiscuous_enable(PORT_ID(sdev));
+		else
+			rte_eth_promiscuous_disable(PORT_ID(sdev));
+	} else {
+		DEBUG("promiscuous already set");
+	}
+	/* all_multicast */
+	if (dev->data->all_multicast != edev->data->all_multicast) {
+		DEBUG("Configuring all_multicast");
+		if (dev->data->all_multicast)
+			rte_eth_allmulticast_enable(PORT_ID(sdev));
+		else
+			rte_eth_allmulticast_disable(PORT_ID(sdev));
+	} else {
+		DEBUG("all_multicast already set");
+	}
+	/* MTU */
+	if (dev->data->mtu != edev->data->mtu) {
+		DEBUG("Configuring MTU");
+		ret = rte_eth_dev_set_mtu(PORT_ID(sdev), dev->data->mtu);
+		if (ret) {
+			ERROR("Failed to apply MTU");
+			return ret;
+		}
+	} else {
+		DEBUG("MTU already set");
+	}
+	/* default MAC */
+	DEBUG("Configuring default MAC address");
+	ret = rte_eth_dev_default_mac_addr_set(PORT_ID(sdev),
+			&dev->data->mac_addrs[0]);
+	if (ret) {
+		ERROR("Setting default MAC address failed");
+		return ret;
+	}
+	/* additional MAC */
+	if (PRIV(dev)->nb_mac_addr > 1)
+		DEBUG("Configure additional MAC address%s",
+			(PRIV(dev)->nb_mac_addr > 2 ? "es" : ""));
+	for (i = 1; i < PRIV(dev)->nb_mac_addr; i++) {
+		struct ether_addr *ea;
+
+		ea = &dev->data->mac_addrs[i];
+		ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), ea,
+				PRIV(dev)->mac_addr_pool[i]);
+		if (ret) {
+			char ea_fmt[ETHER_ADDR_FMT_SIZE];
+
+			ether_format_addr(ea_fmt, ETHER_ADDR_FMT_SIZE, ea);
+			ERROR("Adding MAC address %s failed", ea_fmt);
+		}
+	}
+	/* VLAN filter */
+	vfc1 = &dev->data->vlan_filter_conf;
+	vfc2 = &edev->data->vlan_filter_conf;
+	if (memcmp(vfc1, vfc2, sizeof(struct rte_vlan_filter_conf))) {
+		uint64_t vbit;
+		uint64_t ids;
+		size_t i;
+		uint16_t vlan_id;
+
+		DEBUG("Configuring VLAN filter");
+		for (i = 0; i < RTE_DIM(vfc1->ids); i++) {
+			if (vfc1->ids[i] == 0)
+				continue;
+			ids = vfc1->ids[i];
+			while (ids) {
+				vlan_id = 64 * i;
+				/* count trailing zeroes */
+				vbit = ~ids & (ids - 1);
+				/* clear least significant bit set */
+				ids ^= (ids ^ (ids - 1)) ^ vbit;
+				for (; vbit; vlan_id++)
+					vbit >>= 1;
+				ret = rte_eth_dev_vlan_filter(
+					PORT_ID(sdev), vlan_id, 1);
+				if (ret) {
+					ERROR("Failed to apply VLAN filter %hu",
+						vlan_id);
+					return ret;
+				}
+			}
+		}
+	} else {
+		DEBUG("VLAN filter already set");
+	}
+	return 0;
+}
+
+int
+failsafe_eth_dev_state_sync(struct rte_eth_dev *dev)
+{
+	struct sub_device *sdev;
+	uint32_t inactive;
+	int ret;
+	uint8_t i;
+
+	if (PRIV(dev)->state < DEV_PROBED)
+		return 0;
+	ret = failsafe_eal_init(dev);
+	if (ret)
+		return ret;
+	if (PRIV(dev)->state < DEV_ACTIVE)
+		return 0;
+	inactive = 0;
+	FOREACH_SUBDEV(sdev, i, dev)
+		if (sdev->state == DEV_PROBED)
+			inactive |= UINT32_C(1) << i;
+	ret = dev->dev_ops->dev_configure(dev);
+	if (ret)
+		return ret;
+	FOREACH_SUBDEV(sdev, i, dev) {
+		if (inactive & (UINT32_C(1) << i)) {
+			ret = fs_eth_dev_conf_apply(dev, sdev);
+			if (ret) {
+				ERROR("Could not apply configuration to sub_device %d",
+				      i);
+				/* TODO: disable device */
+				return ret;
+			}
+		}
+	}
+	/*
+	 * If new devices have been configured, check if
+	 * the link state has changed.
+	 */
+	if (inactive)
+		dev->dev_ops->link_update(dev, 1);
+	if (PRIV(dev)->state < DEV_STARTED)
+		return 0;
+	ret = dev->dev_ops->dev_start(dev);
+	if (ret)
+		return ret;
+	return 0;
+}
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index 693162e..4044473 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -89,6 +89,8 @@  fs_dev_configure(struct rte_eth_dev *dev)
 		}
 		sdev->state = DEV_ACTIVE;
 	}
+	if (PRIV(dev)->state < DEV_ACTIVE)
+		PRIV(dev)->state = DEV_ACTIVE;
 	return 0;
 }
 
@@ -108,21 +110,9 @@  fs_dev_start(struct rte_eth_dev *dev)
 			return ret;
 		sdev->state = DEV_STARTED;
 	}
-	if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
-		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
-			DEBUG("Switching tx_dev to preferred sub_device");
-			PRIV(dev)->subs_tx = 0;
-		}
-	} else {
-		if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
-		    TX_SUBDEV(dev) == NULL) {
-			FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
-				DEBUG("Switching tx_dev to sub_device %d", i);
-				PRIV(dev)->subs_tx = i;
-				break;
-			}
-		}
-	}
+	if (PRIV(dev)->state < DEV_STARTED)
+		PRIV(dev)->state = DEV_STARTED;
+	fs_switch_dev(dev);
 	return 0;
 }
 
@@ -132,6 +122,7 @@  fs_dev_stop(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
+	PRIV(dev)->state = DEV_STARTED - 1;
 	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
 		rte_eth_dev_stop(PORT_ID(sdev));
 		sdev->state = DEV_STARTED - 1;
@@ -183,6 +174,10 @@  fs_dev_close(struct rte_eth_dev *dev)
 	struct sub_device *sdev;
 	uint8_t i;
 
+	failsafe_hotplug_alarm_cancel(dev);
+	if (PRIV(dev)->state == DEV_STARTED)
+		dev->dev_ops->dev_stop(dev);
+	PRIV(dev)->state = DEV_ACTIVE - 1;
 	FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
 		DEBUG("Closing sub_device %d", i);
 		rte_eth_dev_close(PORT_ID(sdev));
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index e7a7592..8fb72fe 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -41,12 +41,14 @@ 
 #define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
 
 #define PMD_FAILSAFE_MAC_KVARG "mac"
+#define PMD_FAILSAFE_PLUG_IN_POLL_KVARG "hotplug_poll"
 #define PMD_FAILSAFE_PARAM_STRING	\
 	"dev(<ifc>),"			\
-	"mac=mac_addr"			\
+	"mac=mac_addr,"			\
+	"hotplug_poll=u64"		\
 	""
 
-#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
+#define FAILSAFE_HOTPLUG_DEFAULT_TIMEOUT_MS 2000
 
 #define FAILSAFE_MAX_ETHPORTS 2
 #define FAILSAFE_MAX_ETHADDR 128
@@ -105,8 +107,22 @@  struct fs_priv {
 	uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
 	/* current capabilities */
 	struct rte_eth_dev_info infos;
+	/*
+	 * Fail-safe state machine.
+	 * This level will be tracking state of the EAL and eth
+	 * layer at large as defined by the user application.
+	 * It will then steer the sub_devices toward the same
+	 * synchronized state.
+	 */
+	enum dev_state state;
+	unsigned int pending_alarm:1; /* An alarm is pending */
 };
 
+/* MISC */
+
+int failsafe_hotplug_alarm_install(struct rte_eth_dev *dev);
+int failsafe_hotplug_alarm_cancel(struct rte_eth_dev *dev);
+
 /* RX / TX */
 
 uint16_t failsafe_rx_burst(void *rxq,
@@ -125,10 +141,15 @@  int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
 int failsafe_eal_init(struct rte_eth_dev *dev);
 int failsafe_eal_uninit(struct rte_eth_dev *dev);
 
+/* ETH_DEV */
+
+int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);
+
 /* GLOBALS */
 
 extern const char pmd_failsafe_driver_name[];
 extern const struct eth_dev_ops failsafe_ops;
+extern uint64_t hotplug_poll;
 extern int mac_from_arg;
 
 /* HELPERS */
@@ -224,4 +245,39 @@  fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
 	return sid;
 }
 
+static inline void
+fs_switch_dev(struct rte_eth_dev *dev)
+{
+	enum dev_state req_state;
+
+	req_state = PRIV(dev)->state;
+	if (PREFERRED_SUBDEV(dev)->state >= req_state) {
+		if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
+		    (TX_SUBDEV(dev) == NULL ||
+		     (req_state == DEV_STARTED) ||
+		     (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
+			DEBUG("Switching tx_dev to preferred sub_device");
+			PRIV(dev)->subs_tx = 0;
+		}
+	} else if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < req_state) ||
+		   TX_SUBDEV(dev) == NULL) {
+		struct sub_device *sdev;
+		uint8_t i;
+
+		/* Using acceptable device */
+		FOREACH_SUBDEV_ST(sdev, i, dev, req_state) {
+			DEBUG("Switching tx_dev to sub_device %d",
+			      i);
+			PRIV(dev)->subs_tx = i;
+			break;
+		}
+	} else if (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < req_state) {
+		DEBUG("No device ready, deactivating tx_dev");
+		PRIV(dev)->subs_tx = PRIV(dev)->subs_tail;
+	} else {
+		return;
+	}
+	rte_wmb();
+}
+
 #endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */