lib/telemetry:fix telemetry conns leak in case of socket write fail

Message ID tencent_69E4B1D2B6C0865DA223940C173EC4904506@qq.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series lib/telemetry:fix telemetry conns leak in case of socket write fail |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-abi-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

ShaoWei Sun Jan. 19, 2024, 11:40 a.m. UTC
  Telemetry can only create 10 conns by default, each of which is processed
by a thread.

When a thread fails to write using socket, the thread will end directly
without reducing the total number of conns.

This will result in the machine running for a long time, and if there are
10 failures, the telemetry will be unavailable

Signed-off-by: sunshaowei01 <1819846787@qq.com>
---
 lib/telemetry/telemetry.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

Bruce Richardson Jan. 19, 2024, 11:54 a.m. UTC | #1
On Fri, Jan 19, 2024 at 07:40:00PM +0800, sunshaowei01 wrote:
> Telemetry can only create 10 conns by default, each of which is processed
> by a thread.
> 
> When a thread fails to write using socket, the thread will end directly
> without reducing the total number of conns.
> 
> This will result in the machine running for a long time, and if there are
> 10 failures, the telemetry will be unavailable
> 
> Signed-off-by: sunshaowei01 <1819846787@qq.com>
> ---

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Power, Ciara Jan. 19, 2024, 3:42 p.m. UTC | #2
> -----Original Message-----
> From: sunshaowei01 <1819846787@qq.com>
> Sent: Friday, January 19, 2024 11:40 AM
> To: dev@dpdk.org
> Cc: Power, Ciara <ciara.power@intel.com>
> Subject: [PATCH] lib/telemetry:fix telemetry conns leak in case of socket write
> fail
> 
> Telemetry can only create 10 conns by default, each of which is processed by a
> thread.
> 
> When a thread fails to write using socket, the thread will end directly without
> reducing the total number of conns.
> 
> This will result in the machine running for a long time, and if there are
> 10 failures, the telemetry will be unavailable
> 
> Signed-off-by: sunshaowei01 <1819846787@qq.com>
> ---
>  lib/telemetry/telemetry.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c index
> 31e2391867..0b00c04090 100644
> --- a/lib/telemetry/telemetry.c
> +++ b/lib/telemetry/telemetry.c
> @@ -378,8 +378,8 @@ client_handler(void *sock_id)
> 
> 	"{\"version\":\"%s\",\"pid\":%d,\"max_output_len\":%d}",
>  			telemetry_version, getpid(), MAX_OUTPUT_LEN);
>  	if (write(s, info_str, strlen(info_str)) < 0) {
> -		close(s);
> -		return NULL;
> +		TMTY_LOG_LINE(ERR, "Socket write base info to client failed");
> +		goto exit;
>  	}
> 
>  	/* receive data is not null terminated */ @@ -404,6 +404,7 @@
> client_handler(void *sock_id)
> 
>  		bytes = read(s, buffer, sizeof(buffer) - 1);
>  	}
> +exit:
>  	close(s);
>  	rte_atomic_fetch_sub_explicit(&v2_clients, 1,
> rte_memory_order_relaxed);
>  	return NULL;
> --
> 2.37.1 (Apple Git-137.1)

Thanks for fixing this.

Acked-by: Ciara Power <ciara.power@intel.com>
  
David Marchand Jan. 19, 2024, 3:54 p.m. UTC | #3
Hello,

On Fri, Jan 19, 2024 at 12:40 PM sunshaowei01 <1819846787@qq.com> wrote:
>
> Telemetry can only create 10 conns by default, each of which is processed
> by a thread.
>
> When a thread fails to write using socket, the thread will end directly
> without reducing the total number of conns.
>
> This will result in the machine running for a long time, and if there are
> 10 failures, the telemetry will be unavailable
>

Thanks for the patch, do you know which commit first triggered the issue?
This is needed so we add a Fixes: tag in the commitlog for backporting
the fix in stable releases.
See https://doc.dpdk.org/guides/contributing/patches.html#patch-fix-related-issues

> Signed-off-by: sunshaowei01 <1819846787@qq.com>

We need your full name in the SoB tag.
Like, for example in my case, it would be David Marchand
<david.marchand@redhat.com>.
  
ShaoWei Sun Jan. 20, 2024, 4:18 a.m. UTC | #4
I have modified my&nbsp; commitlog and resubmitted the patch, but I seem to have forgotten to add a [v2] flag to the patch. Do I need to resubmit the patch again?




------------------&nbsp;原始邮件&nbsp;------------------
发件人: "David Marchand"<david.marchand@redhat.com&gt;; 
发送时间: 2024年1月19日(星期五) 晚上11:54
收件人: " "<1819846787@qq.com&gt;; 
抄送: "dev"<dev@dpdk.org&gt;; "ciara.power"<ciara.power@intel.com&gt;; "Bruce Richardson"<bruce.richardson@intel.com&gt;; 
主题: Re: [PATCH] lib/telemetry:fix telemetry conns leak in case of socket write fail





On Fri, Jan 19, 2024 at 12:40 PM sunshaowei01 <1819846787@qq.com&gt; wrote:
&gt;
&gt; Telemetry can only create 10 conns by default, each of which is processed
&gt; by a thread.
&gt;
&gt; When a thread fails to write using socket, the thread will end directly
&gt; without reducing the total number of conns.
&gt;
&gt; This will result in the machine running for a long time, and if there are
&gt; 10 failures, the telemetry will be unavailable
&gt;

Thanks for the patch, do you know which commit first triggered the issue?
This is needed so we add a Fixes: tag in the commitlog for backporting
the fix in stable releases.
See https://doc.dpdk.org/guides/contributing/patches.html#patch-fix-related-issues

&gt; Signed-off-by: sunshaowei01 <1819846787@qq.com&gt;

We need your full name in the SoB tag.
Like, for example in my case, it would be David Marchand
<david.marchand@redhat.com&gt;.


-- 
David Marchand
  
Bruce Richardson Jan. 22, 2024, 9:05 a.m. UTC | #5
On Sat, Jan 20, 2024 at 12:18:38PM +0800, 1819846787 wrote:
>    I have modified my  commitlog and resubmitted the patch, but I seem to
>    have forgotten to add a [v2] flag to the patch. Do I need to resubmit
>    the patch again?
> 

It's better if the v2 is added, but it's probably not necessary in this
case. However, I see now you have resubmitted as a v2 anyway, which is
fine. Please mark older versions of the patch as "superceded" in patchwork
(patches.dpdk.org site - you'll need an account created with the email
address used to submit your patches. That will allow you to update the
status of your own patches yourself)

Also, one other tip, when submitting a new version of a previously reviewed
patch, unless there are major changes, you can keep any previously added
acks from reviewers. For your v2, for example, after your signed-off-by you
could have also added the "Acked-by: Ciara Power ... " line.

Hope these tips help in the future! Thanks for the contribution.

/Bruce

>    ------------------ ԭʼʼ� ------------------
> 
>    ����: "David Marchand"<david.marchand@redhat.com>;
>    ��ʱ�: 2024119() 11:54
>    ռ�: " "<1819846787@qq.com>;
>    ��: "dev"<dev@dpdk.org>; "ciara.power"<ciara.power@intel.com>; "Bruce
>    Richardson"<bruce.richardson@intel.com>;
>    : Re: [PATCH] lib/telemetry:fix telemetry conns leak in case of socket
>    write fail
> 
>    Hello,
>    On Fri, Jan 19, 2024 at 12:40�6�2PM sunshaowei01 <[1]1819846787@qq.com>
>    wrote:
>    >
>    > Telemetry can only create 10 conns by default, each of which is
>    processed
>    > by a thread.
>    >
>    > When a thread fails to write using socket, the thread will end
>    directly
>    > without reducing the total number of conns.
>    >
>    > This will result in the machine running for a long time, and if there
>    are
>    > 10 failures, the telemetry will be unavailable
>    >
>    Thanks for the patch, do you know which commit first triggered the
>    issue?
>    This is needed so we add a Fixes: tag in the commitlog for backporting
>    the fix in stable releases.
>    See
>    [2]https://doc.dpdk.org/guides/contributing/patches.html#patch-fix-rela
>    ted-issues
>    > Signed-off-by: sunshaowei01 <[3]1819846787@qq.com>
>    We need your full name in the SoB tag.
>    Like, for example in my case, it would be David Marchand
>    <[4]david.marchand@redhat.com>.
>    --
>    David Marchand
> 
> References
> 
>    1. mailto:1819846787@qq.com
>    2. https://doc.dpdk.org/guides/contributing/patches.html#patch-fix-related-issues
>    3. mailto:1819846787@qq.com
>    4. mailto:david.marchand@redhat.com
  

Patch

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 31e2391867..0b00c04090 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -378,8 +378,8 @@  client_handler(void *sock_id)
 			"{\"version\":\"%s\",\"pid\":%d,\"max_output_len\":%d}",
 			telemetry_version, getpid(), MAX_OUTPUT_LEN);
 	if (write(s, info_str, strlen(info_str)) < 0) {
-		close(s);
-		return NULL;
+		TMTY_LOG_LINE(ERR, "Socket write base info to client failed");
+		goto exit;
 	}
 
 	/* receive data is not null terminated */
@@ -404,6 +404,7 @@  client_handler(void *sock_id)
 
 		bytes = read(s, buffer, sizeof(buffer) - 1);
 	}
+exit:
 	close(s);
 	rte_atomic_fetch_sub_explicit(&v2_clients, 1, rte_memory_order_relaxed);
 	return NULL;