net/gve: Change ERR to DEBUG to prevent flooding of logs for Tx-Dqo.

Message ID 20240219024436.1010010-1-rushilg@google.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers
Series net/gve: Change ERR to DEBUG to prevent flooding of logs for Tx-Dqo. |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS

Commit Message

Rushil Gupta Feb. 19, 2024, 2:44 a.m. UTC
  This was causing failure for testpmd runs (for queues >=15)
presumably due to flooding of logs due to descriptor ring being
overwritten.

Fixes: a01854 ("net/gve: fix dqo bug for chained descriptors")
Cc: stable@dpdk.org

Signed-off-by: Rushil Gupta <rushilg@google.com>
Reviewed-by: Joshua Washington <joshwash@google.com>
---
 drivers/net/gve/gve_tx_dqo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Ferruh Yigit Feb. 19, 2024, 9:33 a.m. UTC | #1
On 2/19/2024 2:44 AM, Rushil Gupta wrote:
> This was causing failure for testpmd runs (for queues >=15)
> presumably due to flooding of logs due to descriptor ring being
> overwritten.
> 
> Fixes: a01854 ("net/gve: fix dqo bug for chained descriptors")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Rushil Gupta <rushilg@google.com>
> Reviewed-by: Joshua Washington <joshwash@google.com>
> ---
>  drivers/net/gve/gve_tx_dqo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/gve/gve_tx_dqo.c b/drivers/net/gve/gve_tx_dqo.c
> index 1a8eb96ea9..30a1455b20 100644
> --- a/drivers/net/gve/gve_tx_dqo.c
> +++ b/drivers/net/gve/gve_tx_dqo.c
> @@ -116,7 +116,7 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>  		first_sw_id = sw_id;
>  		do {
>  			if (sw_ring[sw_id] != NULL)
> -				PMD_DRV_LOG(ERR, "Overwriting an entry in sw_ring");
> +				PMD_DRV_LOG(DEBUG, "Overwriting an entry in sw_ring");
>  
>  			txd = &txr[tx_id];
>  			sw_ring[sw_id] = tx_pkt;

Hi Rushil,

I will continue with this patch, BUT
logging in the datapath has potential problems like this, also it may
have performance impact even log is not printed, because of additional
check in the log function.

For datapath, you can prefer:
- Add log macros controlled by RTE_ETHDEV_DEBUG_RX & RTE_ETHDEV_DEBUG_TX
flags
- Or use RTE_LOG_DP() macro which is compiled out if default log level
is less than the log uses


Also another perspective for the logs, when end-user observes this bloat
of logs, what she can do?
Can she change some driver arguments or environment variables to fix the
issue, if not what is the point of the log?
If this is a condition that can occur dynamically based on received
traffic and user doesn't have control on it, maybe driver should handle
the error without log?
And if this is a log for driver developer, perhaps it should be assert
or similar that is disabled in the release build?
  
Ferruh Yigit Feb. 19, 2024, 10:14 a.m. UTC | #2
On 2/19/2024 2:44 AM, Rushil Gupta wrote:
> This was causing failure for testpmd runs (for queues >=15)
> presumably due to flooding of logs due to descriptor ring being
> overwritten.
> 
> Fixes: a01854 ("net/gve: fix dqo bug for chained descriptors")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Rushil Gupta <rushilg@google.com>
> Reviewed-by: Joshua Washington <joshwash@google.com>
>

Squashed into relevant commit in next-net, thanks.
  
Stephen Hemminger Feb. 19, 2024, 5:26 p.m. UTC | #3
On Mon, 19 Feb 2024 02:44:35 +0000
Rushil Gupta <rushilg@google.com> wrote:

> This was causing failure for testpmd runs (for queues >=15)
> presumably due to flooding of logs due to descriptor ring being
> overwritten.
> 
> Fixes: a01854 ("net/gve: fix dqo bug for chained descriptors")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Rushil Gupta <rushilg@google.com>
> Reviewed-by: Joshua Washington <joshwash@google.com>

Isn't this still an error. What about the descriptor overwritten is there an mbuf leak?
Maybe a statistic would be better than a message?
  
Rushil Gupta Feb. 20, 2024, 3:38 a.m. UTC | #4
I agree.
This bug has manifested for a while before I fixed it partially in "[PATCH]
net/gve: fix dqo bug for chained descriptors"
However, for higher queue counts (> 13); we still see this behavior. I'll
add a statistic.

On Mon, Feb 19, 2024, 10:56 PM Stephen Hemminger <stephen@networkplumber.org>
wrote:

> On Mon, 19 Feb 2024 02:44:35 +0000
> Rushil Gupta <rushilg@google.com> wrote:
>
> > This was causing failure for testpmd runs (for queues >=15)
> > presumably due to flooding of logs due to descriptor ring being
> > overwritten.
> >
> > Fixes: a01854 ("net/gve: fix dqo bug for chained descriptors")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Rushil Gupta <rushilg@google.com>
> > Reviewed-by: Joshua Washington <joshwash@google.com>
>
> Isn't this still an error. What about the descriptor overwritten is there
> an mbuf leak?
> Maybe a statistic would be better than a message?
>
  
Rushil Gupta Feb. 20, 2024, 9:40 a.m. UTC | #5
These are very useful insights Ferruh. I think RTE_LOG_DP() is something
that would have been more suitable here.
Also, the DEBUG statement combined with a statistic will be more useful
than ERR from developer perspective if they see a potential memory leak in
their program.

On Mon, Feb 19, 2024, 3:03 PM Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 2/19/2024 2:44 AM, Rushil Gupta wrote:
> > This was causing failure for testpmd runs (for queues >=15)
> > presumably due to flooding of logs due to descriptor ring being
> > overwritten.
> >
> > Fixes: a01854 ("net/gve: fix dqo bug for chained descriptors")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Rushil Gupta <rushilg@google.com>
> > Reviewed-by: Joshua Washington <joshwash@google.com>
> > ---
> >  drivers/net/gve/gve_tx_dqo.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/gve/gve_tx_dqo.c b/drivers/net/gve/gve_tx_dqo.c
> > index 1a8eb96ea9..30a1455b20 100644
> > --- a/drivers/net/gve/gve_tx_dqo.c
> > +++ b/drivers/net/gve/gve_tx_dqo.c
> > @@ -116,7 +116,7 @@ gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> >               first_sw_id = sw_id;
> >               do {
> >                       if (sw_ring[sw_id] != NULL)
> > -                             PMD_DRV_LOG(ERR, "Overwriting an entry in
> sw_ring");
> > +                             PMD_DRV_LOG(DEBUG, "Overwriting an entry
> in sw_ring");
> >
> >                       txd = &txr[tx_id];
> >                       sw_ring[sw_id] = tx_pkt;
>
> Hi Rushil,
>
> I will continue with this patch, BUT
> logging in the datapath has potential problems like this, also it may
> have performance impact even log is not printed, because of additional
> check in the log function.
>
> For datapath, you can prefer:
> - Add log macros controlled by RTE_ETHDEV_DEBUG_RX & RTE_ETHDEV_DEBUG_TX
> flags
> - Or use RTE_LOG_DP() macro which is compiled out if default log level
> is less than the log uses
>
>
> Also another perspective for the logs, when end-user observes this bloat
> of logs, what she can do?
> Can she change some driver arguments or environment variables to fix the
> issue, if not what is the point of the log?
> If this is a condition that can occur dynamically based on received
> traffic and user doesn't have control on it, maybe driver should handle
> the error without log?
> And if this is a log for driver developer, perhaps it should be assert
> or similar that is disabled in the release build?
>
  

Patch

diff --git a/drivers/net/gve/gve_tx_dqo.c b/drivers/net/gve/gve_tx_dqo.c
index 1a8eb96ea9..30a1455b20 100644
--- a/drivers/net/gve/gve_tx_dqo.c
+++ b/drivers/net/gve/gve_tx_dqo.c
@@ -116,7 +116,7 @@  gve_tx_burst_dqo(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
 		first_sw_id = sw_id;
 		do {
 			if (sw_ring[sw_id] != NULL)
-				PMD_DRV_LOG(ERR, "Overwriting an entry in sw_ring");
+				PMD_DRV_LOG(DEBUG, "Overwriting an entry in sw_ring");
 
 			txd = &txr[tx_id];
 			sw_ring[sw_id] = tx_pkt;