usertools: fix bind failure from dpdk to kernel

Message ID 20220805031022.9795-1-lihuisong@huawei.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers
Series usertools: fix bind failure from dpdk to kernel |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/github-robot: build success github build: passed
ci/intel-Testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional fail Functional Testing issues
ci/iol-x86_64-unit-testing fail Testing issues

Commit Message

lihuisong (C) Aug. 5, 2022, 3:10 a.m. UTC
  Currently, the steps for binding device from dpdk driver to kernel
driver is as follows:
echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind

This steps cannot bind device from dpdk driver to kernel driver on
platform with kernel 5.19. The 'driver_override' must be specify
kernel driver before binding device to kernel driver.

Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 usertools/dpdk-devbind.py | 52 ++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 33 deletions(-)
  

Comments

Stephen Hemminger Aug. 5, 2022, 3:35 p.m. UTC | #1
On Fri, 5 Aug 2022 11:10:22 +0800
Huisong Li <lihuisong@huawei.com> wrote:

> Currently, the steps for binding device from dpdk driver to kernel
> driver is as follows:
> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
> 
> This steps cannot bind device from dpdk driver to kernel driver on
> platform with kernel 5.19. The 'driver_override' must be specify
> kernel driver before binding device to kernel driver.
> 
> Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>

Not sure exactly what you did and why.
The patch seems to just remove the check that the driver
is in the set of dpdk_drivers.
  
lihuisong (C) Aug. 9, 2022, 11:44 a.m. UTC | #2
在 2022/8/5 23:35, Stephen Hemminger 写道:
> On Fri, 5 Aug 2022 11:10:22 +0800
> Huisong Li <lihuisong@huawei.com> wrote:
>
>> Currently, the steps for binding device from dpdk driver to kernel
>> driver is as follows:
>> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
>> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
>>
>> This steps cannot bind device from dpdk driver to kernel driver on
>> platform with kernel 5.19. The 'driver_override' must be specify
>> kernel driver before binding device to kernel driver.
>>
>> Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Not sure exactly what you did and why.
> The patch seems to just remove the check that the driver
> is in the set of dpdk_drivers.
> .
Currently, the end of the operation binding device from kernel driver to
dpdk driver write '\00' to driver_override file so as to this device can
be bound to any other driver. And perform following steps to
bind device dpdk driver to kernel driver:
echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind

However, due to the patch[1] merged into 5.19 kernel, 'driver_override'
in the pci_dev is no longer NULL by writing '\00' to driver_override file.
This causes PCI match device failure and the device will never be bound to
their kernel driver.

In 5.19 kernel, I found that dpdk-devbind.py need to write '\n' to
driver_override file if we want to bind divce to any other driver.
But I think it is not necessary to write empty to driver_override
file. After all, the device has only one kernel driver, and binding
to dpdk driver(like, vfio-pci) must specify driver_override.

[1] 23d99baf9d72 ("PCI: Use driver_set_override() instead of open-coding")
  
Stephen Hemminger Aug. 9, 2022, 5:58 p.m. UTC | #3
On Tue, 9 Aug 2022 19:44:41 +0800
"lihuisong (C)" <lihuisong@huawei.com> wrote:

> 在 2022/8/5 23:35, Stephen Hemminger 写道:
> > On Fri, 5 Aug 2022 11:10:22 +0800
> > Huisong Li <lihuisong@huawei.com> wrote:
> >  
> >> Currently, the steps for binding device from dpdk driver to kernel
> >> driver is as follows:
> >> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
> >> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
> >>
> >> This steps cannot bind device from dpdk driver to kernel driver on
> >> platform with kernel 5.19. The 'driver_override' must be specify
> >> kernel driver before binding device to kernel driver.
> >>
> >> Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Huisong Li <lihuisong@huawei.com>  
> > Not sure exactly what you did and why.
> > The patch seems to just remove the check that the driver
> > is in the set of dpdk_drivers.
> > .  
> Currently, the end of the operation binding device from kernel driver to
> dpdk driver write '\00' to driver_override file so as to this device can
> be bound to any other driver. And perform following steps to
> bind device dpdk driver to kernel driver:
> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind


> 
> However, due to the patch[1] merged into 5.19 kernel, 'driver_override'
> in the pci_dev is no longer NULL by writing '\00' to driver_override file.
> This causes PCI match device failure and the device will never be bound to
> their kernel driver.


Linux kernel does not look favorably on API changes and that looks like
the kernel changed behavior. That should be reported and fixed there.

> In 5.19 kernel, I found that dpdk-devbind.py need to write '\n' to
> driver_override file if we want to bind divce to any other driver.
> But I think it is not necessary to write empty to driver_override
> file. After all, the device has only one kernel driver, and binding
> to dpdk driver(like, vfio-pci) must specify driver_override.
> 
> [1] 23d99baf9d72 ("PCI: Use driver_set_override() instead of open-coding")

The method of using driver override is based off of what the
upstream driverctl package https://gitlab.com/driverctl/driverctl does.
In fact, I would recommend the driverctl package to users over using
our python script.
  
Krzysztof Kozlowski Aug. 10, 2022, 5:59 a.m. UTC | #4
On 09/08/2022 14:44, lihuisong (C) wrote:
> 
> 在 2022/8/5 23:35, Stephen Hemminger 写道:
>> On Fri, 5 Aug 2022 11:10:22 +0800
>> Huisong Li <lihuisong@huawei.com> wrote:
>>
>>> Currently, the steps for binding device from dpdk driver to kernel
>>> driver is as follows:
>>> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
>>> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
>>>
>>> This steps cannot bind device from dpdk driver to kernel driver on
>>> platform with kernel 5.19. The 'driver_override' must be specify
>>> kernel driver before binding device to kernel driver.
>>>
>>> Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>> Not sure exactly what you did and why.
>> The patch seems to just remove the check that the driver
>> is in the set of dpdk_drivers.
>> .
> Currently, the end of the operation binding device from kernel driver to
> dpdk driver write '\00' to driver_override file so as to this device can
> be bound to any other driver. 

This could have work but this was not the way to use the
driver_override. The kernel ABI document clearly states:
"and  may be cleared with an empty string (echo > driver_override)."
Documentation/ABI/testing/sysfs-bus-pci

Please use the kernel ABI how it is described. Using it in wrong way
might sometimes work, sometimes not.


> And perform following steps to
> bind device dpdk driver to kernel driver:
> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
> 
> However, due to the patch[1] merged into 5.19 kernel, 'driver_override'
> in the pci_dev is no longer NULL by writing '\00' to driver_override file.
> This causes PCI match device failure and the device will never be bound to
> their kernel driver.
> 
> In 5.19 kernel, I found that dpdk-devbind.py need to write '\n' to
> driver_override file if we want to bind divce to any other driver.
> But I think it is not necessary to write empty to driver_override
> file. 

It is necessary because in 2014 it was described that PCI
driver_override works like that. What you are implying here is that "it
is not necessary to follow the API and we can do it differently"...

> After all, the device has only one kernel driver, and binding
> to dpdk driver(like, vfio-pci) must specify driver_override.
> 
> [1] 23d99baf9d72 ("PCI: Use driver_set_override() instead of open-coding")


Best regards,
Krzysztof
  
Krzysztof Kozlowski Aug. 10, 2022, 6:02 a.m. UTC | #5
On 09/08/2022 20:58, Stephen Hemminger wrote:
>>
>> However, due to the patch[1] merged into 5.19 kernel, 'driver_override'
>> in the pci_dev is no longer NULL by writing '\00' to driver_override file.
>> This causes PCI match device failure and the device will never be bound to
>> their kernel driver.
> 
> 
> Linux kernel does not look favorably on API changes and that looks like
> the kernel changed behavior. That should be reported and fixed there.

To clarify around this issue:

There were no API changes. Linux kernel follows the API exactly how it
is described in the API document since 2014:
Documentation/ABI/testing/sysfs-bus-pci

There was no change in kernel API.

There was a change in undocumented, unsupported and wrong usage of
driver_override API.

Best regards,
Krzysztof
  
Stephen Hemminger Aug. 10, 2022, 1:49 p.m. UTC | #6
On Wed, 10 Aug 2022 08:59:15 +0300
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote:

> On 09/08/2022 14:44, lihuisong (C) wrote:
> > 
> > 在 2022/8/5 23:35, Stephen Hemminger 写道:  
> >> On Fri, 5 Aug 2022 11:10:22 +0800
> >> Huisong Li <lihuisong@huawei.com> wrote:
> >>  
> >>> Currently, the steps for binding device from dpdk driver to kernel
> >>> driver is as follows:
> >>> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
> >>> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
> >>>
> >>> This steps cannot bind device from dpdk driver to kernel driver on
> >>> platform with kernel 5.19. The 'driver_override' must be specify
> >>> kernel driver before binding device to kernel driver.
> >>>
> >>> Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Huisong Li <lihuisong@huawei.com>  
> >> Not sure exactly what you did and why.
> >> The patch seems to just remove the check that the driver
> >> is in the set of dpdk_drivers.
> >> .  
> > Currently, the end of the operation binding device from kernel driver to
> > dpdk driver write '\00' to driver_override file so as to this device can
> > be bound to any other driver.   
> 
> This could have work but this was not the way to use the
> driver_override. The kernel ABI document clearly states:
> "and  may be cleared with an empty string (echo > driver_override)."
> Documentation/ABI/testing/sysfs-bus-pci
> 
> Please use the kernel ABI how it is described. Using it in wrong way
> might sometimes work, sometimes not.

No, the kernel ABI is what ever worked before.
The documentation is not the definitive standard.
  
Stephen Hemminger Aug. 10, 2022, 1:50 p.m. UTC | #7
On Wed, 10 Aug 2022 09:02:57 +0300
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote:

> On 09/08/2022 20:58, Stephen Hemminger wrote:
> >>
> >> However, due to the patch[1] merged into 5.19 kernel, 'driver_override'
> >> in the pci_dev is no longer NULL by writing '\00' to driver_override file.
> >> This causes PCI match device failure and the device will never be bound to
> >> their kernel driver.  
> > 
> > 
> > Linux kernel does not look favorably on API changes and that looks like
> > the kernel changed behavior. That should be reported and fixed there.  
> 
> To clarify around this issue:
> 
> There were no API changes. Linux kernel follows the API exactly how it
> is described in the API document since 2014:
> Documentation/ABI/testing/sysfs-bus-pci
> 
> There was no change in kernel API.
> 
> There was a change in undocumented, unsupported and wrong usage of
> driver_override API.
> 
> Best regards,
> Krzysztof

Linux documentation is not the standard, the code is.
  
lihuisong (C) Aug. 11, 2022, 2:10 a.m. UTC | #8
在 2022/8/10 21:49, Stephen Hemminger 写道:
> On Wed, 10 Aug 2022 08:59:15 +0300
> Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote:
>
>> On 09/08/2022 14:44, lihuisong (C) wrote:
>>> 在 2022/8/5 23:35, Stephen Hemminger 写道:
>>>> On Fri, 5 Aug 2022 11:10:22 +0800
>>>> Huisong Li <lihuisong@huawei.com> wrote:
>>>>   
>>>>> Currently, the steps for binding device from dpdk driver to kernel
>>>>> driver is as follows:
>>>>> echo $BDF > /sys/bus/pci/drivers/vfio-pci/unbind
>>>>> echo $BDF > /sys/bus/pci/drivers/$kernel_driver/bind
>>>>>
>>>>> This steps cannot bind device from dpdk driver to kernel driver on
>>>>> platform with kernel 5.19. The 'driver_override' must be specify
>>>>> kernel driver before binding device to kernel driver.
>>>>>
>>>>> Fixes: 720b7a058260 ("usertools: fix device binding with kernel tools")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>>> Not sure exactly what you did and why.
>>>> The patch seems to just remove the check that the driver
>>>> is in the set of dpdk_drivers.
>>>> .
>>> Currently, the end of the operation binding device from kernel driver to
>>> dpdk driver write '\00' to driver_override file so as to this device can
>>> be bound to any other driver.
>> This could have work but this was not the way to use the
>> driver_override. The kernel ABI document clearly states:
>> "and  may be cleared with an empty string (echo > driver_override)."
>> Documentation/ABI/testing/sysfs-bus-pci
>>
>> Please use the kernel ABI how it is described. Using it in wrong way
>> might sometimes work, sometimes not.
> No, the kernel ABI is what ever worked before.
> The documentation is not the definitive standard.
> .

Before:
Writing "\0" or "\n" to driver_override file can clear
'driver_override' in the pci_dev.

Now(5.19 kernel):
Only "\n" can clear 'driver_override'.
Note: for 'echo > driver_override', the 'buf'
in driver_override_store() is equal to "\n".
  

Patch

diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 4d9c1be666..0e21ab3543 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -365,6 +365,24 @@  def bind_one(dev_id, driver, force):
         unbind_one(dev_id, force)
         dev["Driver_str"] = ""  # clear driver string
 
+    # For kernels >= 5.19 driver_override must be written, or device
+    # can not be bind from dpdk driver to other specified driver.
+    filename = "/sys/bus/pci/devices/%s/driver_override" % dev_id
+    if exists(filename):
+        try:
+            f = open(filename, "w")
+        except OSError as err:
+            print("Error: bind failed for %s - Cannot open %s: %s"
+                  % (dev_id, filename, err), file=sys.stderr)
+            return
+        try:
+            f.write("%s" % driver)
+            f.close()
+        except OSError as err:
+            print("Error: bind failed for %s - Cannot write driver %s to "
+                  "PCI ID: %s" % (dev_id, driver, err), file=sys.stderr)
+            return
+
     # For kernels >= 3.15 driver_override can be used to specify the driver
     # for a device rather than relying on the driver to provide a positive
     # match of the device.  The existing process of looking up
@@ -372,23 +390,8 @@  def bind_one(dev_id, driver, force):
     # will erroneously bind other devices too which has the additional burden
     # of unbinding those devices
     if driver in dpdk_drivers:
-        filename = "/sys/bus/pci/devices/%s/driver_override" % dev_id
-        if exists(filename):
-            try:
-                f = open(filename, "w")
-            except OSError as err:
-                print("Error: bind failed for %s - Cannot open %s: %s"
-                      % (dev_id, filename, err), file=sys.stderr)
-                return
-            try:
-                f.write("%s" % driver)
-                f.close()
-            except OSError as err:
-                print("Error: bind failed for %s - Cannot write driver %s to "
-                      "PCI ID: %s" % (dev_id, driver, err), file=sys.stderr)
-                return
         # For kernels < 3.15 use new_id to add PCI id's to the driver
-        else:
+        if not exists(filename):
             filename = "/sys/bus/pci/drivers/%s/new_id" % driver
             try:
                 f = open(filename, "w")
@@ -432,23 +435,6 @@  def bind_one(dev_id, driver, force):
             bind_one(dev_id, saved_driver, force)
         return
 
-    # For kernels > 3.15 driver_override is used to bind a device to a driver.
-    # Before unbinding it, overwrite driver_override with empty string so that
-    # the device can be bound to any other driver
-    filename = "/sys/bus/pci/devices/%s/driver_override" % dev_id
-    if exists(filename):
-        try:
-            f = open(filename, "w")
-        except OSError as err:
-            sys.exit("Error: unbind failed for %s - Cannot open %s: %s"
-                     % (dev_id, filename, err))
-        try:
-            f.write("\00")
-            f.close()
-        except OSError as err:
-            sys.exit("Error: unbind failed for %s - Cannot write %s: %s"
-                     % (dev_id, filename, err))
-
 
 def unbind_all(dev_list, force=False):
     """Unbind method, takes a list of device locations"""