[v2,2/8] net/ice: enhance debug when HW fails to transmit
Checks
Commit Message
At the moment, if the driver sets an incorrect Tx descriptor, the HW
will raise a MDD event reported as:
ice_interrupt_handler(): OICR: MDD event
Add some debug info for this case and the VF index in all logs.
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
drivers/net/ice/ice_ethdev.c | 29 +++++++++++++++++++++++++----
1 file changed, 25 insertions(+), 4 deletions(-)
Comments
On Fri, Apr 05, 2024 at 04:45:56PM +0200, David Marchand wrote:
> At the moment, if the driver sets an incorrect Tx descriptor, the HW
> will raise a MDD event reported as:
> ice_interrupt_handler(): OICR: MDD event
>
> Add some debug info for this case and the VF index in all logs.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> drivers/net/ice/ice_ethdev.c | 29 +++++++++++++++++++++++++----
> 1 file changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
> index 87385d2649..fd494e6b3b 100644
> --- a/drivers/net/ice/ice_ethdev.c
> +++ b/drivers/net/ice/ice_ethdev.c
> @@ -1389,6 +1389,7 @@ ice_interrupt_handler(void *param)
> uint32_t oicr;
> uint32_t reg;
> uint8_t pf_num;
> + uint16_t vf_num;
> uint8_t event;
> uint16_t queue;
> int ret;
> @@ -1432,28 +1433,48 @@ ice_interrupt_handler(void *param)
> if (reg & GL_MDET_TX_PQM_VALID_M) {
> pf_num = (reg & GL_MDET_TX_PQM_PF_NUM_M) >>
> GL_MDET_TX_PQM_PF_NUM_S;
> + vf_num = (reg & GL_MDET_TX_PQM_VF_NUM_M) >>
> + GL_MDET_TX_PQM_VF_NUM_S;
> event = (reg & GL_MDET_TX_PQM_MAL_TYPE_M) >>
> GL_MDET_TX_PQM_MAL_TYPE_S;
> queue = (reg & GL_MDET_TX_PQM_QNUM_M) >>
> GL_MDET_TX_PQM_QNUM_S;
>
> PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
> - "%d by PQM on TX queue %d PF# %d",
> - event, queue, pf_num);
> + "%d by PQM on TX queue %d PF# %d VF# %d",
> + event, queue, pf_num, vf_num);
> }
>
Would this output be misleading in the case where there is no VF involved
and the actual MDD error comes from the PF?
/Bruce
On Mon, Apr 8, 2024 at 5:23 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Fri, Apr 05, 2024 at 04:45:56PM +0200, David Marchand wrote:
> > At the moment, if the driver sets an incorrect Tx descriptor, the HW
> > will raise a MDD event reported as:
> > ice_interrupt_handler(): OICR: MDD event
> >
> > Add some debug info for this case and the VF index in all logs.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> > drivers/net/ice/ice_ethdev.c | 29 +++++++++++++++++++++++++----
> > 1 file changed, 25 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
> > index 87385d2649..fd494e6b3b 100644
> > --- a/drivers/net/ice/ice_ethdev.c
> > +++ b/drivers/net/ice/ice_ethdev.c
> > @@ -1389,6 +1389,7 @@ ice_interrupt_handler(void *param)
> > uint32_t oicr;
> > uint32_t reg;
> > uint8_t pf_num;
> > + uint16_t vf_num;
> > uint8_t event;
> > uint16_t queue;
> > int ret;
> > @@ -1432,28 +1433,48 @@ ice_interrupt_handler(void *param)
> > if (reg & GL_MDET_TX_PQM_VALID_M) {
> > pf_num = (reg & GL_MDET_TX_PQM_PF_NUM_M) >>
> > GL_MDET_TX_PQM_PF_NUM_S;
> > + vf_num = (reg & GL_MDET_TX_PQM_VF_NUM_M) >>
> > + GL_MDET_TX_PQM_VF_NUM_S;
> > event = (reg & GL_MDET_TX_PQM_MAL_TYPE_M) >>
> > GL_MDET_TX_PQM_MAL_TYPE_S;
> > queue = (reg & GL_MDET_TX_PQM_QNUM_M) >>
> > GL_MDET_TX_PQM_QNUM_S;
> >
> > PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
> > - "%d by PQM on TX queue %d PF# %d",
> > - event, queue, pf_num);
> > + "%d by PQM on TX queue %d PF# %d VF# %d",
> > + event, queue, pf_num, vf_num);
> > }
> >
> Would this output be misleading in the case where there is no VF involved
> and the actual MDD error comes from the PF?
I will check, but IIRC, the queue here is an "absolute" queue number
that should help figure out whether it is the PF or a VF in the case
vf_num == 0.
On Thu, Apr 11, 2024 at 10:30:19AM +0200, David Marchand wrote:
> On Mon, Apr 8, 2024 at 5:23 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Fri, Apr 05, 2024 at 04:45:56PM +0200, David Marchand wrote:
> > > At the moment, if the driver sets an incorrect Tx descriptor, the HW
> > > will raise a MDD event reported as:
> > > ice_interrupt_handler(): OICR: MDD event
> > >
> > > Add some debug info for this case and the VF index in all logs.
> > >
> > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > ---
> > > drivers/net/ice/ice_ethdev.c | 29 +++++++++++++++++++++++++----
> > > 1 file changed, 25 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
> > > index 87385d2649..fd494e6b3b 100644
> > > --- a/drivers/net/ice/ice_ethdev.c
> > > +++ b/drivers/net/ice/ice_ethdev.c
> > > @@ -1389,6 +1389,7 @@ ice_interrupt_handler(void *param)
> > > uint32_t oicr;
> > > uint32_t reg;
> > > uint8_t pf_num;
> > > + uint16_t vf_num;
> > > uint8_t event;
> > > uint16_t queue;
> > > int ret;
> > > @@ -1432,28 +1433,48 @@ ice_interrupt_handler(void *param)
> > > if (reg & GL_MDET_TX_PQM_VALID_M) {
> > > pf_num = (reg & GL_MDET_TX_PQM_PF_NUM_M) >>
> > > GL_MDET_TX_PQM_PF_NUM_S;
> > > + vf_num = (reg & GL_MDET_TX_PQM_VF_NUM_M) >>
> > > + GL_MDET_TX_PQM_VF_NUM_S;
> > > event = (reg & GL_MDET_TX_PQM_MAL_TYPE_M) >>
> > > GL_MDET_TX_PQM_MAL_TYPE_S;
> > > queue = (reg & GL_MDET_TX_PQM_QNUM_M) >>
> > > GL_MDET_TX_PQM_QNUM_S;
> > >
> > > PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
> > > - "%d by PQM on TX queue %d PF# %d",
> > > - event, queue, pf_num);
> > > + "%d by PQM on TX queue %d PF# %d VF# %d",
> > > + event, queue, pf_num, vf_num);
> > > }
> > >
> > Would this output be misleading in the case where there is no VF involved
> > and the actual MDD error comes from the PF?
>
> I will check, but IIRC, the queue here is an "absolute" queue number
> that should help figure out whether it is the PF or a VF in the case
> vf_num == 0.
>
Yes, that is my understanding too. Maybe in future we can use the queue
number to identify if it's a VF of not. If the PF is the error cause, I
think the VF details should be omitted.
/Bruce
On Thu, Apr 11, 2024 at 10:42 AM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Thu, Apr 11, 2024 at 10:30:19AM +0200, David Marchand wrote:
> > On Mon, Apr 8, 2024 at 5:23 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Fri, Apr 05, 2024 at 04:45:56PM +0200, David Marchand wrote:
> > > > At the moment, if the driver sets an incorrect Tx descriptor, the HW
> > > > will raise a MDD event reported as:
> > > > ice_interrupt_handler(): OICR: MDD event
> > > >
> > > > Add some debug info for this case and the VF index in all logs.
> > > >
> > > > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > > > ---
> > > > drivers/net/ice/ice_ethdev.c | 29 +++++++++++++++++++++++++----
> > > > 1 file changed, 25 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
> > > > index 87385d2649..fd494e6b3b 100644
> > > > --- a/drivers/net/ice/ice_ethdev.c
> > > > +++ b/drivers/net/ice/ice_ethdev.c
> > > > @@ -1389,6 +1389,7 @@ ice_interrupt_handler(void *param)
> > > > uint32_t oicr;
> > > > uint32_t reg;
> > > > uint8_t pf_num;
> > > > + uint16_t vf_num;
> > > > uint8_t event;
> > > > uint16_t queue;
> > > > int ret;
> > > > @@ -1432,28 +1433,48 @@ ice_interrupt_handler(void *param)
> > > > if (reg & GL_MDET_TX_PQM_VALID_M) {
> > > > pf_num = (reg & GL_MDET_TX_PQM_PF_NUM_M) >>
> > > > GL_MDET_TX_PQM_PF_NUM_S;
> > > > + vf_num = (reg & GL_MDET_TX_PQM_VF_NUM_M) >>
> > > > + GL_MDET_TX_PQM_VF_NUM_S;
> > > > event = (reg & GL_MDET_TX_PQM_MAL_TYPE_M) >>
> > > > GL_MDET_TX_PQM_MAL_TYPE_S;
> > > > queue = (reg & GL_MDET_TX_PQM_QNUM_M) >>
> > > > GL_MDET_TX_PQM_QNUM_S;
> > > >
> > > > PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
> > > > - "%d by PQM on TX queue %d PF# %d",
> > > > - event, queue, pf_num);
> > > > + "%d by PQM on TX queue %d PF# %d VF# %d",
> > > > + event, queue, pf_num, vf_num);
> > > > }
> > > >
> > > Would this output be misleading in the case where there is no VF involved
> > > and the actual MDD error comes from the PF?
> >
> > I will check, but IIRC, the queue here is an "absolute" queue number
> > that should help figure out whether it is the PF or a VF in the case
> > vf_num == 0.
> >
> Yes, that is my understanding too. Maybe in future we can use the queue
> number to identify if it's a VF of not. If the PF is the error cause, I
> think the VF details should be omitted.
IIUC there is a set of other registers (per PF and per VF) that would
make it easier to point and blame.
On the other hand, such a change seems more risky to me and I would
prefer someone from Intel owns it :-).
I'll go back with simply adding the TXPDU check which already helped
me understand where to look when debugging.
@@ -1389,6 +1389,7 @@ ice_interrupt_handler(void *param)
uint32_t oicr;
uint32_t reg;
uint8_t pf_num;
+ uint16_t vf_num;
uint8_t event;
uint16_t queue;
int ret;
@@ -1432,28 +1433,48 @@ ice_interrupt_handler(void *param)
if (reg & GL_MDET_TX_PQM_VALID_M) {
pf_num = (reg & GL_MDET_TX_PQM_PF_NUM_M) >>
GL_MDET_TX_PQM_PF_NUM_S;
+ vf_num = (reg & GL_MDET_TX_PQM_VF_NUM_M) >>
+ GL_MDET_TX_PQM_VF_NUM_S;
event = (reg & GL_MDET_TX_PQM_MAL_TYPE_M) >>
GL_MDET_TX_PQM_MAL_TYPE_S;
queue = (reg & GL_MDET_TX_PQM_QNUM_M) >>
GL_MDET_TX_PQM_QNUM_S;
PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
- "%d by PQM on TX queue %d PF# %d",
- event, queue, pf_num);
+ "%d by PQM on TX queue %d PF# %d VF# %d",
+ event, queue, pf_num, vf_num);
}
reg = ICE_READ_REG(hw, GL_MDET_TX_TCLAN);
if (reg & GL_MDET_TX_TCLAN_VALID_M) {
pf_num = (reg & GL_MDET_TX_TCLAN_PF_NUM_M) >>
GL_MDET_TX_TCLAN_PF_NUM_S;
+ vf_num = (reg & GL_MDET_TX_TCLAN_VF_NUM_M) >>
+ GL_MDET_TX_TCLAN_VF_NUM_S;
event = (reg & GL_MDET_TX_TCLAN_MAL_TYPE_M) >>
GL_MDET_TX_TCLAN_MAL_TYPE_S;
queue = (reg & GL_MDET_TX_TCLAN_QNUM_M) >>
GL_MDET_TX_TCLAN_QNUM_S;
PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
- "%d by TCLAN on TX queue %d PF# %d",
- event, queue, pf_num);
+ "%d by TCLAN on TX queue %d PF# %d VF# %d",
+ event, queue, pf_num, vf_num);
+ }
+
+ reg = ICE_READ_REG(hw, GL_MDET_TX_TDPU);
+ if (reg & GL_MDET_TX_TDPU_VALID_M) {
+ pf_num = (reg & GL_MDET_TX_TDPU_PF_NUM_M) >>
+ GL_MDET_TX_TDPU_PF_NUM_S;
+ vf_num = (reg & GL_MDET_TX_TDPU_VF_NUM_M) >>
+ GL_MDET_TX_TDPU_VF_NUM_S;
+ event = (reg & GL_MDET_TX_TDPU_MAL_TYPE_M) >>
+ GL_MDET_TX_TDPU_MAL_TYPE_S;
+ queue = (reg & GL_MDET_TX_TDPU_QNUM_M) >>
+ GL_MDET_TX_TDPU_QNUM_S;
+
+ PMD_DRV_LOG(WARNING, "Malicious Driver Detection event "
+ "%d by TDPU on TX queue %d PF# %d VF# %d",
+ event, queue, pf_num, vf_num);
}
}
done: