[Bug,957] IXGBE LSC IRQ configured state is lost on certain link down events

Message ID bug-957-3@http.bugs.dpdk.org/ (mailing list archive)
State Rejected, archived
Delegated to: Qi Zhang
Headers
Series [Bug,957] IXGBE LSC IRQ configured state is lost on certain link down events |

Checks

Context Check Description
ci/Intel-compilation warning apply issues

Commit Message

bugzilla@dpdk.org March 14, 2022, 9:50 p.m. UTC
  https://bugs.dpdk.org/show_bug.cgi?id=957

            Bug ID: 957
           Summary: IXGBE LSC IRQ configured state is lost on certain link
                    down events
           Product: DPDK
           Version: 20.11
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: bly454@gmail.com
  Target Milestone: ---

Hello,

We recently ran into an issue with DPDK 20.11 for the IXGBE driver operating in
10G BASE-T mode. We have been able to replicate this behavior using
dpdk-testpmd and do not see any recent/pertinent updates, so we are hopeful
someone may be able to advise based on the information provided below. On the
surface, based on our investigation, it would appear the current link-down
transition logic does not correctly preserve IRQ mask configurations,
specifically LSC, when a link partner causes some sort of slow or bounced link
down event.
Background: 
We recently started using a new 3rd party traffic generator card for testing
our application. We found when using this card in 10G BASE-T mode and toggling
link up/down, it would correctly cause our application to detect the port to be
down in our DPDK design. However, the link down event handling by the DPDK
IXGBE driver appears to permanently disable its LSC IRQ detection on the first
port down event such that any subsequent link up or down events from the
external test card on this port would no longer be detected. The only way to
restore link up was to restart the DPDK port in our design (stop/start). Having
looked at this a bit, we switched over to the classic testpmd application and
observed the exact same behavior.

Here is the data we believe you would find interesting:

NIC in question:

# lspci -D -nn | grep -F [0200] | grep 552
0000:03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection
X552/X557-AT 10GBASE-T [8086:15ad]
0000:03:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection
X552/X557-AT 10GBASE-T [8086:15ad]
# dpdk-devbind.py -s | grep 552
0000:03:00.0 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=vfio-pci
unused=uio_pci_generic
0000:03:00.1 'Ethernet Connection X552/X557-AT 10GBASE-T 15ad' drv=vfio-pci
unused=uio_pci_generic

We made the following debug logging changes to try an capture interesting data
to share:

     return 0;
@@ -4648,7 +4647,9 @@ ixgbe_dev_interrupt_delayed_handler(void *param)

     ixgbe_disable_intr(hw);

-    eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
+   eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
+   PMD_DRV_LOG(ERR, "in delay func: eicr 0x%08x", eicr);
+   PMD_DRV_LOG(ERR, "enable intr delayed, mask: 0x%08x, orig: 0x%08x, flags:
0x%08x", intr->mask, intr->mask_original, intr->flags);
    if (eicr & IXGBE_EICR_MAILBOX)
          ixgbe_pf_mbx_process(dev);

With the above “log-err” additions, we have provided the following results. The
first set of data below was generated using an older 3rd party traffic
generator card to provide “good” results that show the IXGBE driver working
correctly. Following that are the non-working (bad) logging results for the new
traffic generator card. Both 3rd party cards correctly transition between down
and up states.


######################################################################
# good sequence, both down detection and then up detection
######################################################################
# port transition from up to down
<27>1 2022-03-05T00:12:11.415436+00:00 - -  ixgbe_dev_interrupt_get_status():
eicr 100000
<27>1 2022-03-05T00:12:11.415489+00:00 - -  ixgbe_dev_interrupt_action():
enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001
<27>1 2022-03-05T00:12:11.425448+00:00 - -  ixgbe_dev_interrupt_get_status():
eicr 2000000
<27>1 2022-03-05T00:12:11.446191+00:00 - -  ixgbe_dev_interrupt_action():
enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000000
<27>1 2022-03-05T00:12:15.415600+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00000000
<27>1 2022-03-05T00:12:15.415655+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000,
orig: 0x02300000, flags: 0x00000000

# port transition from down to up
<27>1 2022-03-05T00:12:33.856734+00:00 - -  ixgbe_dev_interrupt_get_status():
eicr 2000000
<27>1 2022-03-05T00:12:33.877463+00:00 - -  ixgbe_dev_interrupt_action():
enable intr immediately, mask: 0x02300000, orig: 0x00000000, flags: 0x00000000
<27>1 2022-03-05T00:12:34.203274+00:00 - -  ixgbe_dev_interrupt_get_status():
eicr 100000
<27>1 2022-03-05T00:12:34.207905+00:00 - -  ixgbe_dev_interrupt_action():
enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001
<27>1 2022-03-05T00:12:35.207994+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00100000
<27>1 2022-03-05T00:12:35.208027+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000,
orig: 0x02300000, flags: 0x00000001

######################################################################
# bad sequence, detects down event, but does not see the up event
######################################################################
# port transition from up to down
<27>1 2022-03-05T00:13:00.377072+00:00 - -  ixgbe_dev_interrupt_get_status():
eicr 100000
<27>1 2022-03-05T00:13:00.377127+00:00 - -  ixgbe_dev_interrupt_action():
enable intr immediately, mask: 0x02200000, orig: 0x02300000, flags: 0x00000001
<27>1 2022-03-05T00:13:00.643788+00:00 - -  ixgbe_dev_interrupt_get_status():
eicr 2100000
<27>1 2022-03-05T00:13:00.664603+00:00 - -  ixgbe_dev_interrupt_action():
enable intr immediately, mask: 0x02200000, orig: 0x02200000, flags: 0x00000001
<27>1 2022-03-05T00:13:01.664703+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00000000
<27>1 2022-03-05T00:13:01.664738+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000,
orig: 0x02200000, flags: 0x00000001
<27>1 2022-03-05T00:13:04.377237+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): in delay func: eicr 0x00000000
<27>1 2022-03-05T00:13:04.377269+00:00 - - 
ixgbe_dev_interrupt_delayed_handler(): enable intr delayed, mask: 0x02200000,
orig: 0x00000000, flags: 0x00000000

# port transition from down to up
<nothing happens as LSC IRQ is not enabled due to above link-down sequence>
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
b/drivers/net/ixgbe/ixgbe_ethdev.c
index 5a30c39593..75a9f9163b 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4497,7 +4497,7 @@  ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev)

     /* read-on-clear nic registers here */
    eicr = IXGBE_READ_REG(hw, IXGBE_EICR);
-    PMD_DRV_LOG(DEBUG, "eicr %x", eicr);
+    PMD_DRV_LOG(ERR, "eicr %x", eicr);

     intr->flags = 0;

@@ -4614,7 +4613,7 @@  ixgbe_dev_interrupt_action(struct rte_eth_dev *dev)
          }
    }

-    PMD_DRV_LOG(DEBUG, "enable intr immediately");
+    PMD_DRV_LOG(ERR, "enable intr immediately, mask: 0x%08x, orig: 0x%08x,
flags: 0x%08x", intr->mask, intr->mask_original, intr->flags);
    ixgbe_enable_intr(dev);