[v3] telemetry: fix "in-memory" process socket conflicts

Message ID 20210929135438.3091033-1-bruce.richardson@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v3] telemetry: fix "in-memory" process socket conflicts |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build fail github build: failed
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing fail Testing issues
ci/iol-x86_64-compile-testing fail Testing issues

Commit Message

Bruce Richardson Sept. 29, 2021, 1:54 p.m. UTC
  When DPDK is run with --in-memory mode, multiple processes can run
simultaneously using the same runtime dir. This leads to each process,
as it starts up, removing the telemetry socket of another process,
giving unexpected behaviour.

This patch changes that behaviour to first check if the existing socket
is active. If not, it's an old socket to be cleaned up and can be
removed. If it is active, telemetry initialization fails and an error
message is printed out giving instructions on how to remove the error;
either by using file-prefix to have a different runtime dir (and
therefore socket path) or by disabling telemetry if it not needed.

Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Power <ciara.power@intel.com>
---
V3: Drop CC stable, as will have separate backport patch which does not
error out, so avoiding causing problems with currently running application

V2: fix build error on FreeBSD
---
 lib/telemetry/telemetry.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

--
2.30.2
  

Comments

Ferruh Yigit Oct. 5, 2021, 11:47 a.m. UTC | #1
On 9/29/2021 2:54 PM, Bruce Richardson wrote:
> When DPDK is run with --in-memory mode, multiple processes can run
> simultaneously using the same runtime dir. This leads to each process,
> as it starts up, removing the telemetry socket of another process,
> giving unexpected behaviour.
> 
> This patch changes that behaviour to first check if the existing socket
> is active. If not, it's an old socket to be cleaned up and can be
> removed. If it is active, telemetry initialization fails and an error
> message is printed out giving instructions on how to remove the error;
> either by using file-prefix to have a different runtime dir (and
> therefore socket path) or by disabling telemetry if it not needed.
> 
> Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
> 
> Reported-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Ciara Power <ciara.power@intel.com>

Off the topic.

This is the patch 100.000 in patchwork! Wow, that is huge.

This may be something to celebrate on next face to face meeting.
And I assume Bruce will be buying as the owner of the patch 100000 :)

https://patches.dpdk.org/api/1.2/patches/100000/

The historical numbers from DPDK patchwork:
100000 - Sept. 29, 2021 (184 days) [ 6 months / 26 weeks and 2 days ]
  90000 - March 29, 2021 (172 days)
  80000 - Oct.   8, 2020 (153 days)
  70000 - May    8, 2020 (224 days)
  60000 - Sept. 27, 2019 (248 days)
  50000 - Jan.  22, 2019 (253 days)
  40000 - May   14, 2018 (217 days)
  30000 - Oct.   9, 2017 (258 days)
  20000 - Jan.  25, 2017 (372 days)
  10000 - Jan.  20, 2016 (645 days)
  00001 - April 16, 2014

Last two years average is 10K patch in ~185 days, that is more patch per day
comparing to previous years.
Not sure if this is Covid-19 effect or simply DPDK is getting more popular.

Thanks again to all contributors.
  

Patch

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index 8304fbf6e9..78508c1a1d 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -457,15 +457,30 @@  create_socket(char *path)

 	struct sockaddr_un sun = {.sun_family = AF_UNIX};
 	strlcpy(sun.sun_path, path, sizeof(sun.sun_path));
-	unlink(sun.sun_path);
+
 	if (bind(sock, (void *) &sun, sizeof(sun)) < 0) {
 		struct stat st;

-		TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno));
-		if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode))
+		/* first check if we have a runtime dir */
+		if (stat(socket_dir, &st) < 0 || !S_ISDIR(st.st_mode)) {
 			TMTY_LOG(ERR, "Cannot access DPDK runtime directory: %s\n", socket_dir);
-		sun.sun_path[0] = 0;
-		goto error;
+			goto error;
+		}
+
+		/* check if current socket is active */
+		if (connect(sock, (void *)&sun, sizeof(sun)) == 0) {
+			TMTY_LOG(ERR, "Error binding telemetry socket, path already in use\n");
+			TMTY_LOG(ERR, "Use '--file-prefix' to select a different socket path, or '--no-telemetry' to disable\n");
+			path[0] = 0;
+			goto error;
+		}
+
+		/* socket is not active, delete and attempt rebind */
+		unlink(sun.sun_path);
+		if (bind(sock, (void *) &sun, sizeof(sun)) < 0) {
+			TMTY_LOG(ERR, "Error binding socket: %s\n", strerror(errno));
+			goto error;
+		}
 	}

 	if (listen(sock, 1) < 0) {