[1/2] net/mlx5: improve socket file path

Message ID 20241213092444.2987-1-ming.1.yang@nokia-sbell.com (mailing list archive)
State New
Delegated to: Raslan Darawsheh
Headers
Series [1/2] net/mlx5: improve socket file path |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Yang Ming Dec. 13, 2024, 9:24 a.m. UTC
1. /var/tmp is hard code which is not a good style
2. /var/tmp may be not allowed to be written via container's
read only mode.

Signed-off-by: Yang Ming <ming.1.yang@nokia-sbell.com>
---
 drivers/net/mlx5/linux/mlx5_socket.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
  

Comments

Stephen Hemminger Dec. 13, 2024, 5:12 p.m. UTC | #1
On Fri, 13 Dec 2024 17:24:42 +0800
Yang Ming <ming.1.yang@nokia-sbell.com> wrote:

> 1. /var/tmp is hard code which is not a good style
> 2. /var/tmp may be not allowed to be written via container's
> read only mode.
> 
> Signed-off-by: Yang Ming <ming.1.yang@nokia-sbell.com>

Since this is a unix domain socket, why not use abstract socket
that doesn't have to be associated with filesystem?
  
Bruce Richardson Dec. 13, 2024, 5:16 p.m. UTC | #2
On Fri, Dec 13, 2024 at 09:12:39AM -0800, Stephen Hemminger wrote:
> On Fri, 13 Dec 2024 17:24:42 +0800
> Yang Ming <ming.1.yang@nokia-sbell.com> wrote:
> 
> > 1. /var/tmp is hard code which is not a good style
> > 2. /var/tmp may be not allowed to be written via container's
> > read only mode.
> > 
> > Signed-off-by: Yang Ming <ming.1.yang@nokia-sbell.com>
> 
> Since this is a unix domain socket, why not use abstract socket
> that doesn't have to be associated with filesystem?

In general, I think we should avoid abstract sockets in DPDK. Primary
reason is that they are linux-specific. Last time I checked other unixes,
like BSD, don't support them. A secondary concern is that having a
filesystem path allows permission checks, so for e.g. telemetry sockets,
only users with appropriate permissions can connect. With an abstract socket
we'd have to open up the area of user authentication.

/Bruce
  
Yang Ming Jan. 3, 2025, 2:51 a.m. UTC | #3
On 2024/12/14 01:16, Bruce Richardson wrote:
> On Fri, Dec 13, 2024 at 09:12:39AM -0800, Stephen Hemminger wrote:
>> On Fri, 13 Dec 2024 17:24:42 +0800
>> Yang Ming<ming.1.yang@nokia-sbell.com>  wrote:
>>
>>> 1. /var/tmp is hard code which is not a good style
>>> 2. /var/tmp may be not allowed to be written via container's
>>> read only mode.
>>>
>>> Signed-off-by: Yang Ming<ming.1.yang@nokia-sbell.com>
>> Since this is a unix domain socket, why not use abstract socket
>> that doesn't have to be associated with filesystem?
> In general, I think we should avoid abstract sockets in DPDK. Primary
> reason is that they are linux-specific. Last time I checked other unixes,
> like BSD, don't support them. A secondary concern is that having a
> filesystem path allows permission checks, so for e.g. telemetry sockets,
> only users with appropriate permissions can connect. With an abstract socket
> we'd have to open up the area of user authentication.
>
> /Bruce
>
Hi Stephen & Bruce,

I'm not sure whether abstract socket is a good idea. Maybe it can be improved further or step by step. But we don't need to discuss it for this commit.
We do this improvement because "/var/tmp" and "/var/log" can't be write in Readonly mode of container except that we add /var/ specfic for DPDK application in container's setting. But nearly all DPDK modules have already used common runtime path returned from `rte_eal_get_runtime_dir()`. Why not we apply this common path for Mellanox NIC?
  
Yang Ming March 12, 2025, 2:55 a.m. UTC | #4
On 2025/1/3 10:51, Ming 1. Yang (NSB) wrote:
>
>
> On 2024/12/14 01:16, Bruce Richardson wrote:
>> On Fri, Dec 13, 2024 at 09:12:39AM -0800, Stephen Hemminger wrote:
>>> On Fri, 13 Dec 2024 17:24:42 +0800
>>> Yang Ming<ming.1.yang@nokia-sbell.com>  wrote:
>>>
>>>> 1. /var/tmp is hard code which is not a good style
>>>> 2. /var/tmp may be not allowed to be written via container's
>>>> read only mode.
>>>>
>>>> Signed-off-by: Yang Ming<ming.1.yang@nokia-sbell.com>
>>> Since this is a unix domain socket, why not use abstract socket
>>> that doesn't have to be associated with filesystem?
>> In general, I think we should avoid abstract sockets in DPDK. Primary
>> reason is that they are linux-specific. Last time I checked other unixes,
>> like BSD, don't support them. A secondary concern is that having a
>> filesystem path allows permission checks, so for e.g. telemetry sockets,
>> only users with appropriate permissions can connect. With an abstract socket
>> we'd have to open up the area of user authentication.
>>
>> /Bruce
>>
> Hi Stephen & Bruce,
>
> I'm not sure whether abstract socket is a good idea. Maybe it can be improved further or step by step. But we don't need to discuss it for this commit.
> We do this improvement because "/var/tmp" and "/var/log" can't be write in Readonly mode of container except that we add /var/ specfic for DPDK application in container's setting. But nearly all DPDK modules have already used common runtime path returned from `rte_eal_get_runtime_dir()`. Why not we apply this common path for Mellanox NIC?
>
>
>
Hi Stephen,

I'm not entirely sure whether using an abstract socket is the best 
approach. It might be possible to improve it further or incrementally. 
However, we don't need to discuss this for the current commit.
We made this improvement because the directories "/var/tmp" and 
"/var/log" cannot be written to in a container with read-only mode, 
unless we specifically configure the /var/ directory for the DPDK 
application in the container's settings. Nearly all DPDK modules already 
use the common runtime path returned by rte_eal_get_runtime_dir(). 
Therefore, it makes sense to apply this common path for the Mellanox NIC 
as well.
Actually, the objective of this patch series is to prevent the DPDK 
Mellanox driver from crashing when attempting to access the read-only 
directories "/var/" in a container.

Brs,
Yang Ming
  
Dariusz Sosnowski March 14, 2025, 11:48 a.m. UTC | #5
Hi,

> From: Yang Ming <ming.1.yang@nokia-sbell.com> 
> Sent: Wednesday, March 12, 2025 3:56 AM
> To: Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger <stephen@networkplumber.org>
> Cc: Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam <orika@nvidia.com>; > Suanming Mou <suanmingm@nvidia.com>; Matan Azrad <matan@nvidia.com>; dev@dpdk.org
> Subject: Re: [PATCH 1/2] net/mlx5: improve socket file path
> 
> External email: Use caution opening links or attachments 
> 
> 
> On 2025/1/3 10:51, Ming 1. Yang (NSB) wrote:
> 
> On 2024/12/14 01:16, Bruce Richardson wrote: 
> On Fri, Dec 13, 2024 at 09:12:39AM -0800, Stephen Hemminger wrote:
> On Fri, 13 Dec 2024 17:24:42 +0800
> Yang Ming mailto:ming.1.yang@nokia-sbell.com wrote:
> 
> 1. /var/tmp is hard code which is not a good style
> 2. /var/tmp may be not allowed to be written via container's
> read only mode.
> 
> Signed-off-by: Yang Ming mailto:ming.1.yang@nokia-sbell.com
> Since this is a unix domain socket, why not use abstract socket
> that doesn't have to be associated with filesystem?
> In general, I think we should avoid abstract sockets in DPDK. Primary
> reason is that they are linux-specific. Last time I checked other unixes,
> like BSD, don't support them. A secondary concern is that having a
> filesystem path allows permission checks, so for e.g. telemetry sockets,
> only users with appropriate permissions can connect. With an abstract socket
> we'd have to open up the area of user authentication.
> 
> /Bruce
> 
> Hi Stephen & Bruce,
> I'm not sure whether abstract socket is a good idea. Maybe it can be improved further or step by step. But we don't need to discuss it for this commit. 
> We do this improvement because "/var/tmp" and "/var/log" can't be write in Readonly mode of container except that we add /var/ specfic for DPDK > application in container's setting. But nearly all DPDK modules have already used common runtime path returned from `rte_eal_get_runtime_dir()`. Why > not we apply this common path for Mellanox NIC?
> 
> 
> 
> Hi Stephen,
> 
> I'm not entirely sure whether using an abstract socket is the best approach. It might be possible to improve it further or incrementally. However, we > don't need to discuss this for the current commit.
> We made this improvement because the directories "/var/tmp" and "/var/log" cannot be written to in a container with read-only mode, unless we > specifically configure the /var/ directory for the DPDK application in the container's settings. Nearly all DPDK modules already use the common runtime > path returned by rte_eal_get_runtime_dir(). Therefore, it makes sense to apply this common path for the Mellanox NIC as well.
> Actually, the objective of this patch series is to prevent the DPDK Mellanox driver from crashing when attempting to access the read-only directories "/> var/" in a container.
> 
> Brs,
> Yang Ming

Let me provide the context for the functionality in question here.

mlx5 PMD has an ability to dump HW flow rules using mlx_steering_dump tool
(documented in https://doc.dpdk.org/guides/nics/mlx5.html#how-to-dump-flows and in https://github.com/Mellanox/mlx_steering_dump/tree/master/hws).
Dumping itself is supported only on Linux.
There are 2 ways to use that tool:

1. Application calls rte_flow_dev_dump(), providing FILE* handle.
  This saves the flow rules metadata (e.g., IDs of HW objects) in the file.
  mlx_steering_dump tool is then used to parse it and dump HW rules.
2. mlx_steering_dump communicates through the Unix socket with mlx5 PMD, sharing destination file descriptor of output metadata file with DPDK process through SCM_RIGHTS mechanism.
  mlx5 PMD internally calls rte_flow_dev_dump(), passing the provided file.
  After dumping is done, the tool parses the metadata file and extracts rules from the HW.

In practice, 2nd option is the more frequently used one, because of its convenience since it does not require modification of application code.
It has also the benefit that the tool itself owns the output dump file e.g., tool could be called from outside of the container and debug dump will be generated in host's context, not container context (assuming socket path is mounted on the host).
This is the case for example when containers use VFs of mlx5 NICs.

Changing the filesystem path of the Unix socket would require the update of all the tooling - there would be a need to copy the logic of runtime directory discovery (like it is done in dpdk-telemetry.py).
Until it is done, this change would introduce a breakage for existing users.

What do you think about fallback mechanism here? If "/var/tmp" is read-only or creation of the socket on this path fails, then socket will be created in EAL runtime directory.

Best regards,
Dariusz Sosnowski
  

Patch

diff --git a/drivers/net/mlx5/linux/mlx5_socket.c b/drivers/net/mlx5/linux/mlx5_socket.c
index 6ce0e59643..7fa6e5345c 100644
--- a/drivers/net/mlx5/linux/mlx5_socket.c
+++ b/drivers/net/mlx5/linux/mlx5_socket.c
@@ -20,7 +20,7 @@ 
 
 /* PMD socket service for tools. */
 
-#define MLX5_SOCKET_PATH "/var/tmp/dpdk_net_mlx5_%d"
+#define MLX5_SOCKET_FNAME "dpdk_net_mlx5"
 #define MLX5_ALL_PORT_IDS 0xffff
 
 int server_socket = -1; /* Unix socket for primary process. */
@@ -177,8 +177,8 @@  mlx5_pmd_socket_init(void)
 	ret = fcntl(server_socket, F_SETFL, flags | O_NONBLOCK);
 	if (ret < 0)
 		goto close;
-	snprintf(sun.sun_path, sizeof(sun.sun_path), MLX5_SOCKET_PATH,
-		 getpid());
+	snprintf(sun.sun_path, sizeof(sun.sun_path), "%s/%s_%d",
+		 rte_eal_get_runtime_dir(), MLX5_SOCKET_FNAME, getpid());
 	remove(sun.sun_path);
 	ret = bind(server_socket, (const struct sockaddr *)&sun, sizeof(sun));
 	if (ret < 0) {
@@ -223,6 +223,7 @@  mlx5_pmd_socket_uninit(void)
 					  mlx5_pmd_socket_handle, NULL);
 	claim_zero(close(server_socket));
 	server_socket = -1;
-	MKSTR(path, MLX5_SOCKET_PATH, getpid());
+	MKSTR(path, "%s/%s_%d", rte_eal_get_runtime_dir(), MLX5_SOCKET_FNAME,
+	      getpid());
 	claim_zero(remove(path));
 }