[v2] kni: restrict bifurcated device support

Message ID 20211123164618.3585878-1-ferruh.yigit@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] kni: restrict bifurcated device support |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Ferruh Yigit Nov. 23, 2021, 4:46 p.m. UTC
  To enable bifurcated device support, rtnl_lock is released before calling
userspace callbacks and asynchronous requests are enabled.

But these changes caused more issues, like bug #809, #816. To reduce the
scope of the problems, the bifurcated device support related changes are
only enabled when it is requested explicitly with new 'enable_bifurcated'
module parameter.
And bifurcated device support is disabled by default.

So the bifurcated device related problems are isolated and they can be
fixed without impacting all use cases.

Bugzilla ID: 816
Fixes: 631217c76135 ("kni: fix kernel deadlock with bifurcated device")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Igor Ryzhov <iryzhov@nfware.com>
---
Cc: Igor Ryzhov <iryzhov@nfware.com>
Cc: Elad Nachman <eladv6@gmail.com>
Cc: Eric Christian <erclists@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Sahithi Singam <sahithi.singam@oracle.com>

v2:
* Updated release note
* Verified the fix with two KNI interface setting MAC address in endless
  loop, and confirmed deadlock is not observed (which was observed
  without this patch).
---
 .../prog_guide/kernel_nic_interface.rst       | 28 +++++++++++++
 doc/guides/rel_notes/release_21_11.rst        |  6 +++
 kernel/linux/kni/kni_dev.h                    |  3 ++
 kernel/linux/kni/kni_misc.c                   | 36 ++++++++++++++++
 kernel/linux/kni/kni_net.c                    | 42 +++++++++++--------
 5 files changed, 98 insertions(+), 17 deletions(-)
  

Comments

Thomas Monjalon Nov. 24, 2021, 1:51 p.m. UTC | #1
23/11/2021 17:46, Ferruh Yigit:
> To enable bifurcated device support, rtnl_lock is released before calling
> userspace callbacks and asynchronous requests are enabled.
> 
> But these changes caused more issues, like bug #809, #816. To reduce the
> scope of the problems, the bifurcated device support related changes are
> only enabled when it is requested explicitly with new 'enable_bifurcated'
> module parameter.
> And bifurcated device support is disabled by default.
> 
> So the bifurcated device related problems are isolated and they can be
> fixed without impacting all use cases.
> 
> Bugzilla ID: 816
> Fixes: 631217c76135 ("kni: fix kernel deadlock with bifurcated device")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Acked-by: Igor Ryzhov <iryzhov@nfware.com>

This approach looks pragmatic.
BTW, I'm not sure there is a strong need for KNI with bifurcated device.

Applied, thanks.
  

Patch

diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 1ce03ec1a374..771c7d7fdac4 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -56,6 +56,12 @@  can be specified when the module is loaded to control its behavior:
                     off   Interfaces will be created with carrier state set to off.
                     on    Interfaces will be created with carrier state set to on.
                      (charp)
+    parm:           enable_bifurcated: Enable request processing support for
+                    bifurcated drivers, which means releasing rtnl_lock before calling
+                    userspace callback and supporting async requests (default=off):
+                    on    Enable request processing support for bifurcated drivers.
+                     (charp)
+
 
 Loading the ``rte_kni`` kernel module without any optional parameters is
 the typical way a DPDK application gets packets into and out of the kernel
@@ -174,6 +180,28 @@  To set the default carrier state to *off*:
 If the ``carrier`` parameter is not specified, the default carrier state
 of KNI interfaces will be set to *off*.
 
+.. _kni_bifurcated_device_support:
+
+Bifurcated Device Support
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+User callbacks are executed while kernel module holds the ``rtnl`` lock, this
+causes a deadlock when callbacks run control commands on another Linux kernel
+network interface.
+
+Bifurcated devices has kernel network driver part and to prevent deadlock for
+them ``enable_bifurcated`` is used.
+
+To enable bifurcated device support:
+
+.. code-block:: console
+
+    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko enable_bifurcated=on
+
+Enabling bifurcated device support releases ``rtnl`` lock before calling
+callback and locks it back after callback. Also enables asynchronous request to
+support callbacks that requires rtnl lock to work (interface down).
+
 KNI Creation and Deletion
 -------------------------
 
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 4d8c59472af1..fa2ce760d806 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -75,6 +75,12 @@  New Features
     operations.
   * Added multi-process support.
 
+* **Updated default KNI behavior on net devices control callbacks.**
+
+  Updated KNI net devices control callbacks to run with ``rtnl`` kernel lock
+  held by default. A newly added ``enable_bifurcated`` KNI kernel module
+  parameter can be used to run callbacks with ``rtnl`` lock released.
+
 * **Added HiSilicon DMA driver.**
 
   The HiSilicon DMA driver provides device drivers for the Kunpeng's DMA devices.
diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index c15da311ba25..e8633486eeb8 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -34,6 +34,9 @@ 
 /* Default carrier state for created KNI network interfaces */
 extern uint32_t kni_dflt_carrier;
 
+/* Request processing support for bifurcated drivers. */
+extern uint32_t bifurcated_support;
+
 /**
  * A structure describing the private information for a kni device.
  */
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index f4944e1ddf33..f10dcd069d00 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -41,6 +41,10 @@  static uint32_t multiple_kthread_on;
 static char *carrier;
 uint32_t kni_dflt_carrier;
 
+/* Request processing support for bifurcated drivers. */
+static char *enable_bifurcated;
+uint32_t bifurcated_support;
+
 #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */
 
 static int kni_net_id;
@@ -565,6 +569,22 @@  kni_parse_carrier_state(void)
 	return 0;
 }
 
+static int __init
+kni_parse_bifurcated_support(void)
+{
+	if (!enable_bifurcated) {
+		bifurcated_support = 0;
+		return 0;
+	}
+
+	if (strcmp(enable_bifurcated, "on") == 0)
+		bifurcated_support = 1;
+	else
+		return -1;
+
+	return 0;
+}
+
 static int __init
 kni_init(void)
 {
@@ -590,6 +610,13 @@  kni_init(void)
 	else
 		pr_debug("Default carrier state set to on.\n");
 
+	if (kni_parse_bifurcated_support() < 0) {
+		pr_err("Invalid parameter for bifurcated support\n");
+		return -EINVAL;
+	}
+	if (bifurcated_support == 1)
+		pr_debug("bifurcated support is enabled.\n");
+
 #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
 	rc = register_pernet_subsys(&kni_net_ops);
 #else
@@ -656,3 +683,12 @@  MODULE_PARM_DESC(carrier,
 "\t\ton    Interfaces will be created with carrier state set to on.\n"
 "\t\t"
 );
+
+module_param(enable_bifurcated, charp, 0644);
+MODULE_PARM_DESC(enable_bifurcated,
+"Enable request processing support for bifurcated drivers, "
+"which means releasing rtnl_lock before calling userspace callback and "
+"supporting async requests (default=off):\n"
+"\t\ton    Enable request processing support for bifurcated drivers.\n"
+"\t\t"
+);
diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index 611719b5ee27..29e5b9e21f9e 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -113,12 +113,14 @@  kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 
 	ASSERT_RTNL();
 
-	/* If we need to wait and RTNL mutex is held
-	 * drop the mutex and hold reference to keep device
-	 */
-	if (req->async == 0) {
-		dev_hold(dev);
-		rtnl_unlock();
+	if (bifurcated_support) {
+		/* If we need to wait and RTNL mutex is held
+		 * drop the mutex and hold reference to keep device
+		 */
+		if (req->async == 0) {
+			dev_hold(dev);
+			rtnl_unlock();
+		}
 	}
 
 	mutex_lock(&kni->sync_lock);
@@ -132,12 +134,14 @@  kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 		goto fail;
 	}
 
-	/* No result available since request is handled
-	 * asynchronously. set response to success.
-	 */
-	if (req->async != 0) {
-		req->result = 0;
-		goto async;
+	if (bifurcated_support) {
+		/* No result available since request is handled
+		 * asynchronously. set response to success.
+		 */
+		if (req->async != 0) {
+			req->result = 0;
+			goto async;
+		}
 	}
 
 	ret_val = wait_event_interruptible_timeout(kni->wq,
@@ -160,9 +164,11 @@  kni_net_process_request(struct net_device *dev, struct rte_kni_request *req)
 
 fail:
 	mutex_unlock(&kni->sync_lock);
-	if (req->async == 0) {
-		rtnl_lock();
-		dev_put(dev);
+	if (bifurcated_support) {
+		if (req->async == 0) {
+			rtnl_lock();
+			dev_put(dev);
+		}
 	}
 	return ret;
 }
@@ -207,8 +213,10 @@  kni_net_release(struct net_device *dev)
 	/* Setting if_up to 0 means down */
 	req.if_up = 0;
 
-	/* request async because of the deadlock problem */
-	req.async = 1;
+	if (bifurcated_support) {
+		/* request async because of the deadlock problem */
+		req.async = 1;
+	}
 
 	ret = kni_net_process_request(dev, &req);